Geographic Information Research
Geographic Information Research: Trans-Atlantic Perspectives EDITED BY
MASSIMO CRAGL...
358 downloads
3343 Views
10MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Geographic Information Research
Geographic Information Research: Trans-Atlantic Perspectives EDITED BY
MASSIMO CRAGLIA University of Sheffield, UK
HARLAN ONSRUD University of Maine, USA ORGANISING COMMITTEE HANS-PETER BÄHR, KEITH CLARKE HELEN COUCLELIS, MASSIMO CRAGLIA HARLAN ONSRUD, FRANÇOIS SALGÉ GEIR-HARALD STRAND
UK Taylor & Francis Ltd, 1 Gunpowder Square, London, EC4A 3DE USA Taylor & Francis Inc., 1900 Frost Road, Suite 101, Bristol, PA 19007 This edition published in the Taylor & Francis e-Library, 2005. “To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” Copyright © Taylor & Francis 1999 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library. ISBN 0-203-21139-1 Master e-book ISBN
ISBN 0-203-26900-4 (Adobe eReader Format) ISBN 0-7484-0801-0 (paper) Library of Congress Cataloguing-in-Publication Data are available Cover design by Hybert Design and Type, Waltham St Lawrence, Berkshire Cover printed by Flexiprint, Lancing, West Sussex
Contents
1
PART ONE
Preface
viii
European Science Foundation and National Science Foundation
xi
Contributors
xii
Introduction and Overview Massimo Craglia and Harlan Onsrud
1
GI and Society: Infrastructural, Ethical and Social Issues 2
Spatial Information Technologies and Societal Problems Helen Couclelis
14
3
Information Ethics, Law and Policy for Spatial Databases: Roles for the Research Community Harlan Onsrud
24
4
Building the European Geographic Information Resource Base: Towards a Policy-Driven Research Agenda Massimo Craglia and lan Masser
30
5
GIS, Environmental Equity Analysis and the Modifiable Areal Unit Problem (MAUP) Daniel Sui
40
6
National Cultural Influences on GIS Design: A Study of Country GIS in King Country, WA, USA and Kreis Osnabruck, Germany Francis Harvey
54
7
The Commodification of Geographic Information: Some Findings from British Local Government Steve Capes
67
8
Nurturing Community Empowerment: Participatory Decision-Making and Community-Based Problem Solving Using GIS Laxmi Romasubramaian
83
9
Climbing Out of the Trenches: Issues in Successful Implementation of Spatial Decision-Support System Paul Patterson
98
v
PART TWO
Gl for Analysis and Modelling
10
Spatial Models and GIS Michael Wegener
108
11
Progress in Spatial Decision Making Using Geographic Information Systems Tim Nyerges
121
12
GIS and Health: From Spatial Analysis to Spatial Decision Support Antony Gatrell
134
13
The use of Neural Nets in Modelling Health Variations—The Case of Västerbotten, Sweden Orjan Pettersson
148
14
Interpolation of Severely Non-Linear Spatial Systems with Missing Data: Using Kriging and Neural Networks to Model Precipitation in Upland Areas Joanne Cheesman and James Petch
163
15
Time and Space in Network Data Structures for Hydrological Modelling Vit Vozenilek
176
16
Approach to Dynamic Wetland Modelling in GIS Carsten Bjorsson
189
17
Use of GIS for Earthquake Hazard and Loss Estimation Stephanie King and Anne Kiremidjian
201
18
An Evaluation of the Effects of Changes in Field Size and Land Use on Soil Erosion Using a GIS-Based USLE Approach Philippe Desmet , W.Ketsman , G.Gowers
214
19
GIS for the Analysis of Structure and Change in Mountain Environments Anna Kurnatowska
227
PART THREE GIS and Remote Sensing 21
Multiple Roles for GIS in Global Change Research Mike Goodchild
257
22
Remote Sensing and Urban Analysis Hans-Peter Bähr
275
23
From Measurement to Analysis: A GIS/RS Approach to Monitoring Changes in Urban Density Victor Mesev
285
24
Landscape Zones Based on Satellite Data for the Analysis of Agrarian Systems in Fouta Djallon (Guinea) using GIS
300
vi
Eléonore Wolff 25
Spatial-Time Data Analysis: The Case of Desertification Julia Maria Seixas
314
26
The Potential Role of GIS in Integrated Assessment of Global Change Millind Kandlikar
327
PART FOUR Methodological Issues 27
Spatial and Temporal Change in Spatial Socio-Economic Units Jonathan Raper
337
28
Spatial-Temporal Geostatistical Kriging Eric Miller
346
29
Using Extended Exploratory Data Analysis for the Selection of an Appropriate Interpolation Model Felix Bucher
360
30
Semantic Modelling for Oceanographic Data Dawn Wright
374
31
Hierarchical Wayfinding—A Model and its Formalisation Adrijana Car
387
32
Integrated Topological and Directional Reasoning in Geographic Information Systems Jayant Sharma
401
33
Distances for Uncertain Topological Relations Stephan Winter
413
PART FIVE
Data Quality
34
Spatial Data Quality for GIS Henri Aalders And Joel Morrison
424
35
Assessing the Impact of Data Quality on Forest Management Decisions using Geographical Sensitivity Analysis Susanna Mcmaster
437
36
Probability Assessment for the use of Geometrical Metadata François Vauglin
455
37
Assessing the Quality of Geodata by Testing Consistency with respect to the Conceptual Data Schema Gerhard Joos
467
38
Data Quality Assessment and Perspectives for Future Spatial-Temporal Analysis Exemplified through Erosion Studies in Ghana
477
vii
Anita Folly PART SIX
Visualisation and Interfaces 39
Visual Reasoning: The Link Between User Interfaces and Map Visualisation for Geographic Information Systems Keith Clarke
490
40
Visualisation of Multivariate Geographic Data for Exploration Aileen Buckley
504
41
Metaphors and User Interface Design David Howard
519
42
Universal Analytical GIS Operations—A Task-Oriented Systematisation of Data Structure—Independent GIS Functionality Jochen Albrecht
531
Postcript lan Masser
545
Index
549
Preface Massimo Craglia and Harlan Onsrud
This volume is the outcome of the Second International Summer Institute in Geographic Information held in Berlin in the Summer of 1996. The meeting was sponsored jointly by the European Science Foundation (ESF) GISDATA programme and the US National Science Foundation (NSF) through the National Center for Geographic Information and Analysis (NCGIA). Like the Summer Institute held the year before in the US, this event was extraordinary in a number of ways. First, the participants came in equal numbers from both sides of the Atlantic, a very unusual feature compared to most international geographic information meetings which tend to be dominated by either European or American participants. Second, the duration of the Institute, which included six full days of meetings and one day for technical excursions, allowed for considerable breadth and depth of interaction among the participants. Third, the majority of participants were at the early stages of their scientific career, as they were completing or had recently completed their doctoral research, and were all selected on the basis of high quality and originality of work in open competitions, one in Europe and one in the USA. Fresh from research experience in their own fields, the early career scientiests could capitalise on the extensive feedback and close interaction with colleagues from other countries doing research in the same area, as well as with senior instructors recognised as leaders in their field. This volume includes most of the papers presented at the Institute. The papers were revised following peer review and direct feedback from colleagues both during and after the meeting. In many instances a revised paper reflects not only the comments of the specialists selected to review the papers but also the knowledge gained from many hours of face-to-face discussions with participants coming from very different disciplinary perspectives. The two Summer Institutes were the flagships of the collaboration between the GISDATA scientific programme of the European Science Foundation (Arnaud et al. 1993) and the NCGIA (Abler, 1987). The topics of the Institutes reflected the twelve GISDATA specialist meetings held in 1993–96 and current or recent NCGIA research initiatives in closely related areas. In many respects, the similarities between the areas identified by GISDATA and the NCGIA as being of highest research priority is indicative of the current concerns of the field as a whole. Intense collaboration between the two programmes has resulted in a critical mass of researchers across the Atlantic addressing the issues with input from a wider range of disciplines and perspectives and less duplication of effort. The goals of the two Summer Institutes were to: • promote basic research in geographic information • develop human resources, particularly among young researchers, and • develop international co-operation between US and European scientists.
ix
This volume, and the one that preceded it (Craglia and Couclelis, 1997) are the tangible evidence that the first goal was met. While the papers speak for themselves, their significance as contributions to the development of the field as a whole is further discussed in the Introduction. The achievement of the other two goals is better evaluated with hindsight. Here we can only outline the means by which the organising committee strove to maximise the value of the Summer Institute for the select international group of young (and not-so-young) scientists who participated. A critical aspect of the success of both Institutes was in bringing together early-career scientists with a substantial subset of the best known senior researchers in geographic information science. The First Summer Institute in the US included 52 participants, of whom 31 were young scholars selected on the basis of the two parallel open competitions and 21 were internationally known researchers. The Second Summer Institute in Berlin in 1996 included 46 participants, of whom 32 were young scholars and 14 were senior scientists. While there was some overlap among the senior scientists between the two Institutes, this was not the case among the young researchers. Therefore, a total of 63 early career scientists had the opportunity to benefit from this unique programme. In their evaluations of the Institutes, the young researchers gave their most glowing ratings to the active presence and day-long availability of several of the “living legends” in the field. These senior scientists gave keynote presentations, taught mini-workshops, led or participated in the ad hoc research project teams that were formed, gave constructive comments to the junior paper presenters, argued vigorously among themselves, co-judged the team projects submitted, guided the field trips, and were enthusiastic participants in the jolly “happy hours” that closed each hard day’s work. The 1996 Institute in Berlin was organised by a scientific panel including Dr. Harlan Onsrud, Professor Helen Couclelis, and Dr. Keith Clarke from the US and Professor Hans-Peter Bähr, Dr. Massimo Craglia, Mr.François Salgé, and Dr. Geir-Harald Strand from Europe. The programme was set up so as to balance plenary and small-group sessions, lectures and hands-on challenges, academic papers and practical workshops, structured and unstructured events. Above all, it sought to encourage a thorough mix of nationalities, research perspectives, and levels of academic experience, for the purpose of helping prepare tomorrow’s experts for the increasingly international and multidisciplinary working environments they are likely to function in. The topics covered by the keynotes and paper presentations included: • • • • • • •
GIS and societal issues including aspects of ethics and law; spatial models and decision-supportsystems; methodological issues in the representation of space and time, data quality, visualisation and interfaces, GIS and remote sensing for urban analysis, and applications areas spanning from global change to health.
The key device used at the Institute to foster collaborative work was the formation of six research teams. Each Team worked on an hypothetical research proposal in response to a mock solicitation for proposals similar in form to those used by NSF. The enthusiasm with which the groups worked until late hours to develop their proposals was one of the hallmarks of the Institute, and a prize for the best proposal was awarded by an international panel of experts following the written submissions and oral presentations. What was particularly noteworthy was not only the extent to which young researchers developed valuable skills
x
by drawing on the experience of seasoned academics in writing these proposals, but also the way in which people coming from very different backgrounds were able to join together and work effectively as a team. The Institute was held in the superb Villa Borsig on the shores of Lake Tegel, and we are indebted to Professor Hans-Peter Bähr for organising it, to Dr. Sigrid Roessner for her excellent tour of Potsdam, and to all those individuals who made this Institute such a resounding success. A special thanks to Michael Goodchild and Ian Masser who spearheaded the NCGIA-GISDATA collaboration of which the two Summer Institutes are the most visible products, and to the two sponsoring organisations, the European Science Foundation and the National Science Foundation which were represented at Villa Borsig by Professor Max Kaase of the ESF Standing Committee for the Social Sciences, and Professor Mike Goodchild, Director of the NCGIA, respectively. The importance of this initiative for future researchers and teachers in geographic information science cannot be overemphasised. Finally, a particular debt of gratitude is owed to all the manuscript referees who have at times been put under substantial pressure to provide feedback within a very short time; and to Christine Goacher at Sheffield who has undertaken the gruelling task of standardising the format of all the chapters while never losing her good sense of humour. REFERENCES ABLER, R. 1987. The national Science Foundation National Center for Geographic Information and Spatial Analysis, in International Journal of GIS, 1(4), pp. 303–326. ARNAUD, A., CRAGLIA, M., MASSER, I., SALGÉ, F. and SCHOLTEN, H. 1993. The research agenda of the European Science Foundation’s GISDATA scientific programme, International Journal of GIS , 7(5), pp. 463–470. CRAGLIA, M. and COUCLELIS,H. (Eds) 1997. Geographic Information Research Bridging the Atlantic. London: Taylor& Francis.
The European Science Foundation is an association of more than 60 major national funding agencies devoted to basic scientific research in over 20 countries. The ESF assists its Member Organisations in two main ways: by bringing scientists together in its Scientific Programmes, Networks and European Research Conferences, to work on topics of common concern; and through the joint study of issues of strategic importance in European science policy. The scientific work sponsored by ESF includes basic research in the natural and technical sciences, the medical and biosciences, the humanities and social sciences. The ESF maintains close relations with other scientific institutions within and outside Europe. By its activities, ESF adds value by co-operation and coordination across national frontiers and endeavours, offers expert scientific advice on strategic issues, and provides the European forum for fundamental science. GISDATA is one of the ESF Social Science scientific programmes and focuses on issues relating to Data Integration, Database Design, and Socio-Economic and Environmental applications of GIS technology. This four year programme was launched in January 1993 and is sponsored by ESF member councils in 14 countries. Through its activities the programme has stimulated a number of successful collaborations among GIS researchers across Europe. The US National Science Foundation (NSF) is an independent federal agency of the US Government. Its aim is to promote and advance scientific progress in the United States. In contrast to other federal agencies which support research focused on specific missions (such as health or defense), the NSF supports basic research in science and engineering across all disciplines. The Foundation accounts for about 25 percent of Federal support to academic institutions for basic research. The agency operates no laboratories itself but does support National Research Centers and certain other activities. The Foundation also supports cooperative research between universities and industry and US participation in international scientific efforts. The National Center for Geographic Information and Analysis (NCGIA) is a consortium comprised of the University of California at Santa Barbara, the State University of New York at Buffalo, and the University of Maine. The Center serves as a focus for basic research activities relating to geographic information science and technology. It is a shared resource fostering collaborative and multidisciplinary research with scientists across the United States and the world. Support for this collaborative work is currently funded by NSF through the Varenius Project.
Contributors
Henri Aalders Delft University of Technology, Faculty of Geodetic Engineering Thijsseweg 11, POB 5030, NL-2600 GA Delft, NETHERLANDS Jochen Albrecht Department of Geography, University of Auckland Private Bag 92019, Auckland, NEW ZEALAND Hans-Peter Bähr, Universitat Karlsruhe (TH), Englerstrasse 7 Postfach 69 80(W), 76128 Karlsruhe 1, GERMANY Carsten Bjornsson GISLAB, GISPLAN, Unit of Landscape Royal Agricultural & Veterinary University Rolighedsvej 23, 2, 1958 Frederiksberg C, DENMARK Felix Bucher University of Zurich, Department of Geography Winterthurerstrasse 190, 8057 Zurich, SWITZERLAND Aileen M.Buckley Department of Geosciences, Oregon State University Corvallis, OR 97331, USA Stephen A.Capes, 8 Duncan Road, Sheffield S10 1SN, UK Adrijana Car Department of Geomatics, University of Newcastle-upon-Tyne, Newcastle-upon-Tyne NE1 7RU, UK Joanne E Cheesman Department of Geography, University of Manchester Mansfield Cooper Building, Oxford Road Manchester M13 9PL, UK Keith C.Clarke Department of Geography, University of California Santa Barbara, CA 93106, USA Helen Couclelis NCGIA, University of California, 350 Phelps Hall Santa Barbara, CA93106, USA Massimo Craglia, Department of Town and Regional Planning, University of Sheffield, Western Bank, SHEFFIELD S10 2TN, UK Philippe Desmet, Laboratory for Experimental Geomorphology, Catholic University of Leuven Redingenstraat 16, B-3000 Leuven, BELGIUM
xiii
Anita Folly School of Agriculture, Food and Environment, Cranfield University Silsoe, Bedfordshire MK45 4DT, UK Anthony C.Gatrell, Institute for Health Research, Lancaster University Lancaster LA1 4YB, UK Michael F.Goodchild National Center for Geographic Information and Analysis, and Department of Geography University of California, Santa Barbara, CA 93106–4060, USA G.Gowers Laboratory for Experimental Geomorphology, Catholic University of Leuven Redingenstraat 16, B-3000 Leuven, BELGIUM Francis Harvey, Department of Geography, University of Kentucky Lexington, KY 40506–0027, USA David Howard The Pennsylvania State University, 710 S. Atherton Street Apt. 300, State College PA 16801, USA Gerhard Joos Institute for Geodesy, University of the Federal Armed Forces Munich D-85577 Neubilberg, GERMANY Milind Kandlikar National Center for Human Dimensions of Global Change Carnegie Mellon University, Pittsburgh, PA 15213, USA W.Ketsman Laboratory for Experimental Geomorphology, Catholic University of Leuven Redingenstraat 16, B-3000 Leuven, BELGIUM Stephanie A.King John A.Blume Earthquake Engineering Center, Department of Civil Engineering Stanford University, California 94305–4020, USA Anne S.Kiremidjian John A.Blume Earthquake Engineering Center, Department of Civil Engineering, Stanford University, California 94305–4020, USA Anna Kurnatowska University of Warsaw, Department of Geography and Regional Studies ul. Krakowskie Przedmiescie 30, 00–927 Warsaw, POLAND Ian Masser, Division of Urban Planning and Management, ITC P.O.Box 6, 7500AA Enschede, NETHERLANDS Susan McMaster ACSM Department of Geography, University of Minnesota, 414 Social Sciences Building, Minneapolis MN 55455, USA Victor Mesev, ESRC Research Fellow, Department of Geography University of Bristol, University Road Bristol, BS8 1SS, UK Joel Morrison, Chief, Geography Division, US Bureau of the Census Washington, D.C. 20233–7400, USA Eric J.Miller Office of Research, OCLC Online Computer Library Center Dublin, Ohio, USA Tim Nyerges Department of Geography, University of Washington
xiv
Seattle, WA 98195, USA Yelena Ogneva-Himmelberger Graduate School of Geography, Clark University Worcester, MA, USA Harlan Onsrud Department of Spatial Information Science and Engineering and National Center for Geographic Information & Analysis University of Maine, Orono, Maine 04469, USA Paul Patterson Onkel-Tom St. #112, 14169 Berlin, GERMANY James Petch Manchester Metropolitan University Department of Environmental and Geographical Sciences John Dalton Building, Chester Street Manchester M1 5GD, UK Örjan Pettersson Department of Social and Economic Geography Umeå University, S-901 87 Umeå, SWEDEN Jonathan Raper Department of Geography, Birkbeck College 7–15 Gresse Street, London W1P 1PA, UK Laxmi Romasubramaian Department of Geography, University of Auckland Private Bag 92019, Auckland, NEW ZEALAND Julia Maria Seixas Faculdade de Ciencas e Tecnologia, Universidade Nova de Lisboa, Quinta de Torre, 2825 Monte de Caparica, PORTUGAL Jayant Sharma Oracle Corporation, One Oracle Drive, Nashua, NH 03062, USA Daniel Z.Sui Department of Geography, Texas A&M University College Station, TX 77845–3147, USA François Vauglin IGN/COGNIT, 2 Av. Pasteur 94160 Saint Mande, FRANCE Vit Vozenilek Department of Geography, Palacky University tr. Svobody 26, 771 46 Olomouc, CZECH REPUBLIC Michael Wegener, Institut fur Raumplannung, University of Dortmund Postfach 500500, D-44221 Dortmund, GERMANY Stephan Winter Dept. of Geoinformation, Technical University of Vienna, Gusshausstrasse 27–29, 1040 Vienna, AUSTRIA. Eléonore Wolff Institut de Grestion de l’ Environment et Amanegement du Territoire CP246, Bd. du Triomphe, B1050 Bruselles, BELGIUM Dawn Wright Department of Geosciences, Oregon State University Corvallis, OR 97331–5506, USA
Chapter One Introduction and Overview Massimo Craglia and Harlan Onsrud
This book brings together some of the latest research in areas of strategic importance for the development of geographic information science across both sides of the Atlantic. With its predecessor, Geographic Information Research Bridging the Atlantic, also published by Taylor & Francis (Craglia and Couclelis, 1997), it spans the entire range of research topics identified by the European Science Foundation’s GISDATA programme (Arnaud et al., 1993) and by the NCGIA (Abler, 1987). These two programmes have been crucial in Europe and the USA in shaping and developing the geographic information (GI) research agenda during the 1990s. As we move towards the year 2000 we see new topics emerging such as a greater focus on the societal implications of geographic information science advancement, alongside some of the traditional ones such as spatial analysis and modelling. We are also starting to see geographic information research moving away from the core disciplines of geography, surveying, and computer science to a wider set of disciplines in the environmental and social sciences. This creates new challenges but also offers opportunities to address together some of the long standing problems identified in this area. Hence we see a greater contribution of philosophers and cognitive scientists into fundamental issues related to the perception and representation of space and the integration of time, the emergence of new specialities such as digital information law and economics, and of new areas of application such as health. These broad trends are well represented in this book which is divided into six sections. The first on “GI and Society” addresses important topics like societal impacts, empowerment, equity, commodification, and ethics. These provide a powerful signal of one of the new directions for GI research which is set to grow even more in importance over the next few years. The second section on “GI for Spatial Analysis and Modelling” could be considered as a more traditional concern of GI research, and yet we see not only an increased sophistication in the approaches being developed but also new important areas of application such as health. Section Three on “GIS and Remote Sensing” indicates the increasing importance that new data sources will have for many disciplines. Whilst remote sensing has been mainly the domain of the environmental sciences in the past, the future increased availability of high resolution imagery is opening up new opportunities for the social sciences including urban planning and analysis. Section Four on “Methodological Issues” gives evidence of the multidisciplinary effort under way to address key outstanding issues such as the inclusion of time into spatial analysis and the formalisation of human cognition of space. The chapters included under this heading illustrate the extent to which the boundaries of GI science are being extended to encompass other disciplines and address more effectively the complexity of the real world. Section Five addresses another topic of growing importance, that of “Data Quality”. Of course data quality has also been important but the much increased availability of digital data from a variety of sources including the Internet brings it much more to the fore and requires further research to develop models that are not only comprehensive but also operational for most users. Finally, the last section of the book
2
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
addresses “Visualisation and Interfaces”. These are areas where current geographic information systems are still relatively primitive and the need for research pressing if the opportunities of data rich environments are to be exploited for scientific inquiry. Many challenges and opportunities are therefore presented in these six sections as detailed below. 1.1 PART ONE: GI AND SOCIETY: INFRASTRUCTURAL, ETHICAL AND SOCIAL ISSUES The opening chapter of this first section of the book by Helen Couclelis aptly sets the scene for what has been a thorny issue for GI researchers over the last few years, particularly since the publication of Ground Truth (Pickles, 1995). Helen captures the essence of the debate in typical Greek philosophical fashion by way of a thesis (does GIS cause societal problems?) and its antithesis (does GIS help alleviate societal problems?). The chapter builds on the wide ranging discussions held in the US between social theorists and techno-enthusiasts in the framework of a special initiative on this topic organised by the NCGIA (I-19). The synthesis is of course that both propositions may be true as GIS, like information itself, is thoroughly embedded in the social context in which it operates. Hence, geographic information scientists have a special responsibility to ensure that all the aspects of their discipline are carefully investigated, and the results are widely disseminated. Ignorance is the worst enemy of society. The role of the geographic information science academic sector in helping define an ethical framework for GI use is explored by Harlan Onsrud in Chapter 3, thus directly building on the discussion by Couclelis. Harlan makes the important point that rapid technological progress and increased availability of digital data related to individuals is moving the whole field in an ethically grey zone with fuzzy boundaries. What to some is “smart business practice” giving a competitive advantage, for others is plain unethical behaviour. This is largely due to the inertia with which recognised legal and deontological frameworks respond to the new challenges posed by technology and societal change. In this grey zone, there is often the tendency either to do nothing or to do too much on the basis of emotional responses. Hence what is needed is for a close scrutiny of current practices to gather the evidence necessary for an informed debate on information policy and professional conduct, a role well suited to socially-aware GI researchers. A European perspective on some of these same issues is provided by Massimo Craglia and Ian Masser in Chapter 4. The authors give an overview of the recent developments towards a European Geographic Information Infrastructure (EGII) by the European Commission. This infrastructure is similar in concept to the American National Spatial Data Infrastructure, and is all the more necessary in Europe given the enormous variations that exist across national boundaries which inhibit social and economic analysis. The key difference between the European and USA experience is however in the extent of high level political support that a strategic initiative of this kind receives. Hence, the authors argue that in spite of the recent flurry of activity, the future of an EGII is still unclear and they put forward a policy-driven research agenda which gives greater emphasis to social and methodological issues than technical ones. As many of the topics identified lay outside the traditional strengths of the European GI research community and require increased inter-disciplinary efforts, they represent a formidable challenge and opportunity for European GI research. A practical case-study of the social and ethical issues involved in the use of GIS-based analyses is presented in Chapter 5 by Daniel Sui who investigates effects of the Modifiable Areal Unit Problem on the results of GIS-based environmental equity analysis. As he clearly demonstrates in his study of the relationship between extent of toxic waste and the socio-economic characteristics of neighbourhoods in
INTRODUCTION AND OVERVIEW
3
Houston, Texas, by varying the scale of analysis and areal framework it is possible to arrive at any desired results, with the added bonus of having a techno-gloss on it via GIS. On this basis Daniel convincingly argues that GIS like any other information handling technology, can shape our understanding of social reality so that the effects are due not to the phenomena being measured but to the tools measuring them. This adds to the mounting evidence that technology is not value neutral but a social construct to be critically challenged and evaluated. Given the social dimension of technology, and GIS, are there any cultural differences in the way GIS is implemented and used? Francis Harvey addresses this issue in Chapter 6 through two case studies of GIS implementation in public agencies in the USA and Germany. Building on the conceptual framework developed by Hofstede (1980) in his cross-national study of IBM employees, Francis considers the apparently very different approaches adopted in the two case-studies which reflect the administrative and organisational cultures and traditions as much as they speak about national traits. In this study, the German approach appears strongly encased in hierarchy and adherence to procedures as much as the American appears flexible to the point of becoming chaotic. Yet under these first impressions, it is clear that both casestudies share the need for all the actors involved to negotiate, whether openly or covertly, thus reinforcing the perspective on the social dimension of GIS, a dimension mediated by national, professional and other forms of cultural identity. Stephen Capes in Chapter 7 appears to break from this “social” stream to address the issue of the increasing treatment of digital (geographic) information as a commodity. The break is only apparent though because much of recent discussion on access to data has ended up focusing on the price of data, contrasting the US (federal) model of free data with the British one of highly charged data. As Stephen argues this is a very simplistic, and not very helpful view. In the first place, being a commodity is more than just commercialisation, i.e. vigorous charging. It also includes dissemination, exchange, and creating valueadded information services. In the second place, he shows that the well publicised practice of the British Ordnance Survey to recover a high proportion of its costs from the sale of data is not shared by other public agencies such as local government. Using evidence from local government in Britain, Stephen demonstrates that all four facets of commodification are present, largely based on organisational mandates and cultures. Finally, he argues that on this evidence the position in Britain and the US is not dissimilar, if one cares to look to the practices of USA state and local governments as well. Given the increasing pressures on all governments to reduce their spending and look for new forms of revenue, this chapter is a valuable contribution to a debate often based on cliché rather than facts. In Chapter 8, Laxmi Romasubramaian addresses some of the points raised in Chapter 2 by Helen Couclelis and provides evidence that GIS can also help empower local communities in making choices about their future. On the basis of a broad review of the literature on GIS adoption and implementation, the social dimension of GIS and documented case-studies of GIS use in local communities, Laxmi acknowledges that GIS offers opportunities but is also affected by numerous constraints. These include technical, organisational, data-related, and skill issues, particularly for local communities. The most important point the author makes though is that the critics of GIS in local communities who argue that GIS centralises decision-making and alienates non technical users, are not so much making a point about GIS but giving an indictment of the decision-making process. Therefore, what is required is a much wider use of participatory approaches in GIS implementation which build upon the experiences of participatory urban planning. The final chapter of this section by Paul Patterson, looks at spatial decision support systems (SDSS) in an organisational context. Using the example of routing software that has been implemented across many different organisational settings, Paul draws lessons for implementation which centre on the appropriateness
4
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
of data resolution, user feedback and interaction, ability to include extensions and customisation, and broader organisational issues. As he argues, if SDSS are ever to be usefully implemented it is important that implementation experiences are disseminated and discussed, trying to extract generally valid approaches from the specific experiences. 1.2 PART TWO: GI FOR ANALYSIS AND MODELLING Part 2 of the book focuses on GIS and spatial analysis and modelling, topics that continue to be at the core of the GI research community. The opening chapter by Michael Wegener focuses on GIS and spatial models, as these are critical to any forecasting activity in both social and environmental sciences. Michael provides a very useful classification of models used in both these scientific fields based on their comprehensiveness, structure, theoretical foundations, and techniques, and investigates the extent to which GIS offers real opportunities for the development of new models which were not previously possible or thought of. On the basis of his thorough review, he concludes that the potential for GIS to offer new data structures for spatial models represents the most promising challenge of GIS for spatial modelling. He also makes the important point that the increasing complexity of environmental and social problems requires the use of integrated spatial information systems cutting across disciplines which have traditionally developed separately. In the following Chapter 11, Timothy Nyerges provides a comprehensive review of the state of development of GIS for spatial decision support. Using a theoretical framework called Enhanced Adaptive Structuration Theory, Timothy explores the inputs necessary to decision-making, the process of interaction involved in spatial decision making, and the outcomes of this process. The progress made to date in the use of GIS for spatial decision-making is explored by way of seven research questions, which should be of extreme interest to any PhD student looking for a significant thesis. Timothy also identifies appropriate methodologies to investigate them, and urges researchers to take up the challenge to move the field of GIS for spatial decision-making forward. Anthony Gatrell, in Chapter 12, continues on the theme of spatial analysis and decision support from the specific angle of health research, one of the fastest growing fields for GIS application. Anthony makes the useful distinction between two streams of research: epidemiological studies (the geography of disease) which build on the natural sciences traditions, and healthcare planning research (the geography of health), which typically take a more social science approach. Each of these streams has tended in the past to operate in parallel and if the streams used GIS at all they required different functionalities and methodologies. Now as argued by Anthony, we are really starting to see the two streams coming together and a real opportunity emerging to link their requirements into dedicated spatial decision support systems based on GIS, and spatial statistical tools. This building of bridges across disciplines and scientific traditions echoes the views expressed in Chapter 10 by Michael Wegener and indicates that whichever the field of application, the sheer complexity of our society requires wider awareness and combined approaches. The health theme is developed further by Orjan Pettersson in Chapter 13, with a very interesting casestudy of public health variations in Västerbotten county, Sweden, using a neural networks approach. Orjan uses an index of ill-health based on the number of sick-leave days and explores the geographical variations in relation to socio-economic and labour market characteristics of the county. The comparison between neural nets and linear regression analysis indicates that neural nets are good at identifying non-linear relationships and at providing a predictive model at the microregional level. However, there are also a number of important constraints which make this approach far from perfect and which require further work.
INTRODUCTION AND OVERVIEW
5
Joanne Cheesman and James Petch address a subset of these issues in Chapter 14 by comparing the use of neural networks and kriging to develop areal precipitation models for upland catchment areas where precipitation gauges are few, unevenly distributed, and often located in lowlands. Joanne and James provide a good review of both neural networks and kriging, highlighting the advantages and disadvantages of each. In their specific study in the North of England, they conclude that kriging provides the more accurate results where data is more abundant and neural networks perform better in the higher altitude areas where data points are more scarce. This chapter nicely complements the previous one in building the empirical evidence on the suitability of different methodologies to solve different types of problems, including data density and distribution. In Chapter 15, Vit Vozenilek gives an overview of the many problems to be faced in trying to model hydrological systems with off-the-shelf GIS. He first addresses the general concepts for including space and time in network analysis and then develops a practical approach to modelling river systems with PC Arc/ Info. His approach is put to the test through three case-studies of increasing complexity. He demonstrates that useful results for a great deal of applications are possible, but that many approximations are needed to do so. A number of methodological issues relating to data structures, data requirements and scale effects are raised by Vit whose approach is essentially suited to mono-directional flows in a network. The limitation identified in the previous chapter of mono-directional flows in a network is addressed in Chapter 16 by Carsten Bjorsson in the context of modelling wetland areas. These areas are very sensitive to pollution and need very careful handling for their remediation. The model proposed tries to handle both space and time in a GIS raster structure using focal analysis which calculates for each cell a value as a function of its immediate neighbours and therefore can handle movement of flows in more than one direction. This individual cell addressing enables Carsten to calculate for each cell in the grid the rate of change of the stream of water as a function of precipitation, flow from adjacent cells, groundwater contribution, flow out of the cell and evaporation. The model is at the early stages of development but by comparison to the traditional network models it promises to allow back flows, partial area flows and flooding which are crucial in wetlands and that network models have difficulty in handling. Chapter 17 by Stephanie King and Anne Kiremidijan presents a useful application of GIS for earthquake hazard and loss estimation. There has been a growing interest in this field over the last few years which have witnessed large-scale disasters such as those in Kobe and Los Angeles and a major international conference on “Earthquakes in Mega-Cities”, held in September 1997 in Seeheim, Germany. Against this background, Stephanie and Anne show how GIS can usefully integrate the geological, structural, and economic data to produce micro-zones of risk. Moreover with an intelligent use of buffering, look-up tables, and ad-hoc routines it is possible to model the effects of an earthquake and start to put in place the necessary mitigation measures. Philippe Desmet and his colleagues discuss in Chapter 18 an algorithm implemented in GIS to utilise the Universal Soil Loss Equation in a two-dimensional terrain. The authors clearly demonstrate the validity of their approach in an IDRISI environment by investigating the effect through four test-sites near Leuven, Belgium. The comparison of their method with manual calculation shows that the latter underestimates erosion risk because the effect of flow convergence cannot be taken into account, particularly in concave areas where overland flow tends to concentrate. Hence the proposed method is able to deal with more complex land units, extending its applicability in land resource management. Anna Kurnatowska in Chapter 19 combines GIS functionalities with statistical methods to analyse and compare the environmental structure of two mountain areas placed in different climatic zones: one in Scotland, the other in Poland. GIS is used to delineate homogeneous typological units called geocomplexes and describe their morphology. The area, perimeter and number of distinctive land units in a particular type
6
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
of geocomplex are then statistically analysed to identify spatial patterns, and spatial relationships between different units. This analysis enables Anna to develop sensitivity maps which are then an important input to environmental impact assessment and conservation strategies. It is a feature of this chapter to integrate well established methodologies in Central Europe for the analysis of geocomplexes with Western approaches using GIS to arrive at the desired results. The final chapter of this section addresses an important issue already raised by Michael Wegener and Anthony Gatrell in their respective chapters: the need for cross-disciplinary efforts to deal with the increasing complexity of our society. In this respect, Yelena Ogneva-Himmelberger argues that Markov chain analysis is a useful approach to integrate in GIS socio-economic and ecological processes in modelling landcover change. Following an overview of Markov chain models Yelena applies them to a study area in Mexico to link land-cover change maps to maps identifying socio-economic and ecological factors responsible for such change. The results indicate the opportunities opened up by Markov chain probabilitymodels coupled with logistic regression models for an improved understanding of land-cover change processes, providing that the relevant variables and decision rules are carefully formulated. 1.3 PART THREE: GIS AND REMOTE SENSING This section of the book looks at the contribution that the increasing integration of remote sensing with GIS can make for the analysis of both environmental and socio-economic processes. The opening chapter by Michael Goodchild sets the scene in the context of global change research. This broad ranging review and analysis of the opportunities and limitations of GIS for the understanding and modelling of local-global processes touches on many themes. They include data issues, such as the difficulty to collect the requisite data at a global scale; conceptual issues such as understanding and modelling the effects of human activity across a range of scales, local, regional, and global; and methodological issues in integrating different types of models with complex feedback mechanisms over time and three dimensions. As the chapter points out, the areas where further research is needed are many, yet there is little doubt that the convergence of different media and analytical tools offer real opportunities for new insights. Hans-Peter Bähr in Chapter 22 focuses on a social-science application of remote sensing and GIS, urban analysis. Hans-Peter acknowledges at the very beginning that the combination of remote sensing and GIS has considerable merits for urban analysis, particularly with the arrival of high resolution satellite data but also by way of the more traditional airborne imagery. In particular remote sensing offers detailed up-to-date information which overcomes a traditional problem of urban analysis, obsolete data. Other advantages are that remote sensing data may be taken upon request according to the need of the user who can specify parameters and variables, and that it is possible to handle large amounts of data. The opportunities for exploiting these data sources and techniques are therefore very significant. Hans-Peter also identifies a number of research challenges and gives some examples to illustrate both opportunities and constraints. Victor Mesev in Chapter 23 reinforces the message of the previous chapter by pointing out that according to the Official Journal of the European Communities, nearly 90 percent of the European population will soon be living in urban areas. Hence the importance of being able to analyse and model urban areas accurately is critical. To do so, Victor proposes a cohesive two-part strategy that links the measurement of urban structures with the analysis of urban growth and density change. To obtain reliable measurements of urban structures Victor links conventional image classifications with contextual GIS based urban land use and socio-economic information. The spatial analysis part of the strategy builds on the work by Batty and Longley (1994) who showed that the fractals are a convenient way to represent the shape and growth of
INTRODUCTION AND OVERVIEW
7
urban settlements across space and time. The examples provided demonstrate the value of developing integrated urban models using RS and GIS, an area in which we will certainly see much more work in the future. Moving from the urban to the rural environment, Eléonore Wolff uses a combination of remote sensing techniques and GIS to delineate landscape zones and analyse agrarian systems in developing countries. Agrarian systems characterise space through the interplay of production factors (labour and investment) and products such as crops and livestock. Because of their complexity they tend to be studied through extensive ground surveys which then makes it difficult to generalise patterns at the regional scale. To overcome these limitations, Eléonore utilises raw high resolution remotely sensed data to delineate numerically homogeneous landscape zones which can then be used to generalise the results of local household surveys. Her method involves standard remote sensing techniques which are however applied to complete high resolution scenes to achieve useful analytical results. The starting point for Julia Seixas in Chapter 25 is that we are still a long way from being able to identify and assess the processes that take place in the environment which we attempt to capture by remote sensing. This is due to the lack of knowledge of environmental processes at the spatial resolution of the sensors (30 units or less), and the huge quantity of data associated with temporal series. To address these issues, Julia proposes a methodology inspired by exploratory data analysis to deal with spectral data and identifies in her study the spatial-temporal patterns of a desertification process. The methodology proposed characterises the statistical landscape of remotely sensed images using a kernel-based approach to measure the spatial association of values in the image and their variability over space (variance). From here time is integrated in the analysis by measuring the temporal average process and the temporal variability, hence developing an integrated spatial-temporal assessment of the association and variances of the spectral values. Although further extensions to this method are needed, the good results achieved in the presented case-study indicate that this is a highly promising route to follow. Chapter 26 by Milind Kandlikar returns to the role of GIS in the integrated assessment of global change discussed by Goodchild in the opening chapter of this section. His particular interest is in the opportunity that GIS offers to handle multiple scales and sectoral aggregations, thus enabling different stakeholders to have a say in integrated assessment models of climatic change. These models he argues are a useful framework to synthesise knowledge from various disciplines. However, because they traditionally have been used at the global scale, they have tended to ignore local concerns and different viewpoints. To help overcome these limitations, Milind argues that GIS can contribute to an integrated assessment by providing geographical specificity, improving data quality and model validation, and coupling data capture and visualisation with the integrated models. 1.4 PART FOUR: METHODOLOGICAL ISSUES This section is opened by Jonathan Raper who gives a thoughtful overview of the many research challenges still to be faced in the handling of space and time within GIS. The chapter focuses on the efforts needed to develop richer representations for the analysis of spatial socio-economic units, and the discussion provides an opportunity to address some very fundamental philosophical, cognitive, social, and analytical issues which are central to making progress in this field. Spatial units are many and functional to many uses, but they can be broadly identified as being along a continuum with symbolic, transient and diffuse units at one end (such as the concept of neighbourhood or vista), and instrumental units at the other end which are largely permanent and concrete (such as property parcels). Most GIS tend to handle a sub-set of the latter
8
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
type, i.e. spatial units which are non-overlapping and filling all the available space. They also focus only on their geometry rather than the social and political processes that govern their existence and behaviour. Other types of units have yet to be properly addressed both in terms of their conceptualisation and formalisation in a computer environment. This is a limitation which can no longer be sustained if GIS is to contribute further to our understanding of real life phenomena in space and time. Eric Miller in Chapter 28, echoes these views and suggests that an additional difficulty in understanding real life phenomena lies in the collection of a sufficient number of spatial and temporal observations at different scales. Whilst simple assumptions about isotropic space enable the use of variograms to handle sparse data points, the reality of natural processes is characterised by multi-dimensional data, sparse data sampling, statistically biased sampled data and irregular or anisotropic locations. To handle these difficulties, Eric suggests extending geostatistical kriging to the spatio-temporal domain to estimate unsampled values and variances in both space and time, and clearly explains how this may approach may operate. This is a very useful contribution which definitively deserves further focused treatment. In Chapter 29 Felix Bucher reports on research that he and his colleagues at the University of Zurich have been carrying out on the extension of exploratory data analysis to assist in the identification of an appropriate interpolation model when sampling phenomena are described by continuous fields. The proposed strategy aims at formalising and structuring the whole decision-making process to overcome the traditional problems encountered in the selection of an interpolation model. In particular, attention is given to the magnitude and manner of first-order and second-order spatial variations in the data being interpolated, the suitability of secondary data to assist the modelling of either form of spatial variation, and the assessment of data properties. The integrated nature of the approach proposed is its most significant characteristic which also provides a basis for possible implementation. Thus far this strategy has focused on spatial data, and the extent to which it can also be extended in the spatio-temporal domain is an open research question. Dawn Wright in Chapter 30 illustrates an application of GIS to support research on the impacts of submarine hydrothermal venting. The Vents Programme discussed in this chapter collects a very wide range of data from a variety of sources spanning temporal and spatial scales from decades to milliseconds and from millimetres to hundreds of kilometres. Hence a key issue is the design of a semantic model able to capture and link the information content of these different data sources independent of their internal data structures. Moreover, the model must offer simplicity of use and communication capabilities among many scientists of diverse backgrounds. GIS offers the core implementation platform for the proposed model which is clearly described by Dawn. As she acknowledges, the existing framework has already provided a useful base but now needs to be extended from an essentially 2-D environment to 3- and 4-D, hence addressing in part many of the challenges identified in the introductory chapters of this section. In Chapter 31, Adrijana Car focuses on spatial information theory and specifically on the use of hierarchies in the cognitive process to structure space. The understanding of how spatial hierarchies are formed and used, she claims, is one of the most important questions in spatial reasoning research today. The research undertaken by Adrijana uses wayfinding on a network as the case-study. Two examples are described, a hierarchical network and a flat one, and the appropriate onthologies made explicit. The comparison of the results indicates the value of the approach as the hierarchical algorithm often produces longer paths but is still preferred by the driver as it is perceived to be “faster”. The strength of this chapter lies in its development of a conceptual model for the hierarchy of space, an efficient hierarchical algorithm and an understanding of the underlying heuristic. All are necessary to formulate executable specification in a GIS and expand the range of GIS applications.
INTRODUCTION AND OVERVIEW
9
In Chapter 32, Jayant Sharma continues the research theme of spatial reasoning and focuses on the problem of formalising and implementing spatial inferences based on qualitative information. As the author explains, inference is the process of combining facts and rules to deduce new facts. Whilst this process may appear trivial to humans it is difficult to formalise the process in automated systems and yet may be extremely useful for searching large spatial databases. Jayant reviews the key concepts underlying spatial cognition and the key role of qualitative spatial information and then develops formalisms for this type of information distinguishing between heterogeneous spatial reasoning in which single spatial relations of different types are considered at each step of the process, and integrated spatial reasoning in which all the relations, such as distance or orientation of objects, are considered simultaneously. The inference of spatial relations is also addressed in Chapter 33 by Stephan Winter. He considers the additional complexity deriving from the uncertain position of objects in space. Rather than handling uncertainty using bands or fuzzy sets, Stephan builds on the 9-intersection model for expressing topological relations and adds metric information of the distance between regions, distance being defined as a function which can be statistically modelled, taking into account both the uncertainty in position and that of relation between objects. The work presented in this chapter is extremely promising and its relevance to GIS data input, managing, analysis and visualisation is clearly identified. 1.5 PART FIVE: DATA QUALITY The section on Data Quality opens with an authoritative overview by Henri Aalders and Joel Morrison who have both been directly involved in various international efforts to define quality standards for digital data. Their opening remarks set the scene in respect to the challenge being faced by traditional map producers as well as by users in the new world of digital spatial data. Whilst in the past users knew what they were getting from their established suppliers, and the latter had a certain degree of control and confidence over the quality of their product, this model is no longer working in the age of Internet surfing and spatial data downloading and integrating. Producers themselves are losing track of what is out there and the extent to which their products are fit for the purpose to which they are used. With this in mind, the authors report on efforts on both sides of the Atlantic to develop a comprehensive and yet workable data quality model based on seven aspects: lineage, accuracy, ability for abstraction, completeness, logical consistency, currency and reliability. As the authors argue, whilst progress has been made in conceptualising the data quality model, its operationalisation still poses many questions in terms of database complexity and requirements for visualisation. Hence, we are likely to see a major effort in this area of increasingly important research over the coming years. Given the importance of “fitness for purpose”, Susanna McMaster in Chapter 35 presents exploratory research on the impact of attribute data quality on the results of spatial modelling for a specific application, forest management. Sensitivity analysis is the underlying method used by Susanna who reviews the few studies which have used this technique and then details the methodology are adopted for her research. Once the spatial model was developed to assess the appropriate management practice in the forest area under study, errors of ±5 percent and ± 25 percent were deliberately introduced in three key attribute variables to assess their impact on the resulting suitability maps. The chapter describes in detail the sequential approach taken and the differential impact of the errors introduced in each variable, thus providing a useful benchmark for other researchers wishing to extend this important area of research into other application domains.
10
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
The experience developed at the French Institut Géographique National in producing digital cartographic products is the context from which François Vauglin in Chapter 36 describes his research on the probability assessment of geometric metadata. Metadata on positional accuracy are created by associating to each coordinate some probabilistic information related to its positional uncertainty. The chapter clearly shows the potential of assessing the probability density function of positional accuracy in GIS and its handling using probabilistic reasoning to have a complete description of the statistical behaviour of positional uncertainty. Although the procedure illustrated is as yet limited to points and lines and needs extending to surfaces, it is a clear contribution to this particular field of research and practice. In Chapter 37, Gerhard Joos focuses on another important aspect of data quality, the consistency of a given data set to its conceptual data schema. As Gerhard argues, it is essential that the rules of the conceptual data schema are understood and tested. This may be done within a GIS if the system is powerful enough to formulate the necessary queries and the rules of the conceptual schema are expressed in a formal language. As this may not be the case in many instances, Gerhard introduces a system independent language which any GIS user can utilise and edit. Having described the general features of the proposed language, Gerhard also gives specific examples of how to handle overlapping objects, or create new objects from old ones with evidence of the flexibility this new set of tools offers GIS users. The final chapter of this section discusses data quality in the context of assembling different data sets in developing countries, the result of which may be a pretty map but with highly debatable meaning. The focus of this chapter is on attribute data quality as information on the other aspects of data quality is often not available in developing countries. Using as case-study a project on soil erosion in Ghana, Anita Folly clearly highlights the many difficulties faced in operating with data for which the currency and quality is either unknown or very poor. This applies to physical and topographical data but even more so to socioeconomic data. One of the key features of the discussion in this chapter is the extent to which the use of GIS with multiple sources of data enabled Anita to improve the quality of the data used as well as documenting it, thus providing an additional service over and above the analytical output of the GIS operations. 1.6 PART SIX: VISUALISATION AND INTERFACES Keith Clarke in Chapter 39 opens the last section of the book on visualisation and user interface issues. Appropriately he reviews recent developments in these two converging fields and argues that much remains to be done to enhance the capabilities of current GIS which by all accounts are still very primitive both in respect to visualisation and user interface. Whilst having different origins, computer graphics and software engineering respectively, visualisation and user interfaces also share some common characteristics as both depend on visual reasoning. In this chapter, Keith identifies some of the key research issues relating to visual reasoning and suggests that binary visual processing may be a fruitful route to explore them by focusing on the existence of information flows rather than attempting the quantification of individual and collective human cognition. In the concluding section, Keith argues that in the future age of Internet Multimedia there will be many opportunities to use more than just vision to analyse and interpret data and that maybe it is time that GI science focused more on the technical barriers to these futures than the deep issues of spatial cognition. In Chapter 40, Aileen Buckley gives an excellent overview of the field of computer visualisation and the opportunities that it increasingly affords for “new ways” of looking at the world. As she argues, visualisation might be the foundation for a second computer revolution by providing scientists with new methods of inquiry and new insights into data rich environments. This chapter stands out for the clarity with
INTRODUCTION AND OVERVIEW
11
which it introduces the key terminology and concepts in the field of computer visualisation drawing on the essential literature. What makes it even more topical is the extent to which the geographic metaphor is increasingly becoming a key searching, retrieving, and viewing mechanism for non geographic data, opening new fields of application in the era of digital libraries. The use of metaphors for interface design is discussed in depth in Chapter 41 by David Howard who has been working on a hierarchical approach to user interfaces for geographic visualisation. Metaphors can be extremely helpful devices for both users and system developers to interact with computer based systems, and the example of the Macintosh and Windows95/NT desktop metaphor is often referred to as a model most users have become familiar with. Whilst offering many opportunities, metaphors have certain problems as well, in particular for their lack of precision. They are by necessity incomplete representations of the system they explain and therefore exact semantics are difficult to convey. Moreover, it is difficult to find metaphors that can be used for multiple types of applications. Spatial metaphors have additional problems of their own as it may occur that users confuse the metaphor with the spatial data itself. In spite of these issues, David argues that the combination of a hierarchical approach to interface design that clearly distinguishes between the conceptual, operational, and implementation levels, together with the judicious use of metaphors, is the way forward to simplifying the relationship between users and increasingly sophisticated systems. A much more radical approach is advocated by Jochen Albrecht in the concluding Chapter 42. Rather than incremental improvements, Jochen argues that GIS should be rethought from the bottom up focusing on what distinguishes them from other software such as automated mapping, namely their ability to perform data analysis. By contrast, Jochen argues that most GIS are still overwhelming users with functions that deal with map making, thus obscuring the core analytical functions. With this in mind, Jochen surveyed over 100 GIS users to identify the key “universal” analytical functions that should be at the basis of a new form of analytical GIS independent of data structures. Twenty sets of key operations emerged from this user requirement analysis, which Jochen clusters into six groups: search, locational analysis, terrain analysis, distributions/neighbourhood, spatial analysis, and measurement. Of course, some of these groupings could be debated as the author acknowledges, but the strength of this approach is that it is then possible to build a user interface, directly addressing the core needs of the user, thus facilitating their analytical tasks. The progress made in this direction is well documented in this chapter which may represent one of the most promising research focuses in this field. 1.7 SUMMARY As the brief overview above indicates, this book covers a wide range of core research issues across many disciplines. In some instances, such as the relationships between GIS and society, the issues raised are not specific to geographical information but are more deeply embedded into the current power structures in society or relate to the broader transition from the industrial society to one increasingly based on information processing. This blurring of distinction between GI and information in general is however not necessarily negative as it demonstrates that GI researchers are also moving out of their niche and addressing broader concerns in society bringing their specific expertise to the fore. As argued by Ian Masser in his Postscript to this volume, GI research is coming into mainstream social and environmental science research in the same way as GI and related technologies are becoming more pervasive in society, moving from dedicated research machines to the desk-top, and increasingly being diffused among individuals rather than being confined to organisations.
12
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
At the same time as GI loses its special status, we are also starting to see an increasing use of the geographic metaphor to search, retrieve, and visualise non-spatial data. Digital libraries were not specifically addressed in this volume but there is little doubt that they will dominate future developments and that many of the research topics addressed in this volume, which are still within the realm of specialists in the field, will underpin the applications that may become commonplace to non-specialists in the next century. Internet based spatial querying and analysis may be one of the ways in which individuals will discover geography in the future. To them issues of data quality, visualisation and interfaces based on a closer understanding of human cognition or the ability to search through space and time and integrate different types of data sources and models, may be something to be taken for granted. To get there however, there is still much to be done, not only from a technical perspective but also from methodological and datarelated perspectives. Moreover, the social and spatial impacts of these developments will need close evaluation to ensure that not only the opportunities but also the costs are in full view of public scrutiny. For the time being, we can take heart as editors of this volume that so many bright researchers from so many different backgrounds are applying their minds and talent to address the core questions identified in this book and are contributing to the further development of this important field. REFERENCES ABLER, R. 1987. The national Science Foundation National Center for Geographic Information and Spatial Analysis, International Journal of GIS 1(4), pp. 303–326. ARNAUD, A., CRAGLIA, M, MASSER, I., SALGÉ F. and SCHOLTEN, H. 1993. The research agenda of the European Science Foundation’s GISDATA scientific programme, International Journal of GIS 7(5), pp. 463–470. BATTY, M. and LONGLEY, P.A. 1994. Fractal Cities: A Geometry of Form and Function. London: Taylor & Francis. CRAGLIA, M. and COUCLELIS, H. (eds) 1997. Geographic Information Research Bridging the Atlantic. London: Taylor & Francis. HOFSTEDE G. 1980. Culture’s Consequences. International Differences in Work-Related Values. Beverly Hills: Sage Publications. PICKLES J. (Ed.) 1995. Ground Truth: the Social Implications of Geographic Information Systems. New York: Guilford Press
Part One GI AND SOCIETY: INFRASTRUCTURAL, ETHICAL AND SOCIAL ISSUES
Chapter Two Spatial Information Technologies and Societal Problems Helen Couclelis
2.1 INTRODUCTION The “and” in the title is tantalising: it establishes a connection between the phrases ‘spatial information technologies’ and ‘societal problems’ on either side of it, but what kind of connection this might be is anybody’s guess. Is it a complementary one, as in “bread and butter”? Does it indicate co-occurrence, as in “wet and cold”, or necessary succession, as in “night and day”? Is it a causal relationship, as in “fall and injury”, a normative one, as in ‘crime and punishment’, a confrontational one, as in “David and Goliath”, or is it more like “doctor and patient”, “question and answer”? Does it matter which of the two phrases comes before “and”? Linguists can have fun digging into the semantics of this simple-looking title. Maybe all these possible meanings make sense. For the purposes of our discussion here I will single out just two contrasting interpretations, as these are at the centre of most debates on the issue of the societal dimensions of spatial information technology: • Thesis 1: Spatial information technology causes societal problems • Thesis 2 (Antithesis): Spatial information technology helps alleviate societal problems. When a thesis and its antithesis can both be rationally defended we know that we have a highly complex issue in our hands. In cases like this any stance that does not give credit to its opposite is bound to be simplistic. Here, it is clearly equally naïve to claim that spatial information technology is plain evil, as it is to see it as the panacea that will help lay all kinds of societal problems to rest. This elaborate debate was recently taken up by Initiative 19 of the US National Center for Geographic Information and Analysis (NCGIA), entitled “GIS and Society: The Social Implications of How People, Space, and the Environment Are Represented in GIS”. Many of the ideas in this chapter originated through my involvement with I19 and my discussions with participants at the specialist meeting in March 1996. After giving a quick overview of the goals of I19, I will present my own typology of issues arising at the interface of spatial information technologies and society. I will then propose a framework to help see these very diverse issues against the background of a small number of domains inviting further research and reflection. What I am aiming at is another interpretation of the “and” in the above title, one that views spatial information technologies and society, despite the many problems and tensions, as inextricably linked.
SPATIAL INFORMATION TECHNOLOGIES AND SOCIETAL PROBLEMS
15
2.2 I19: GIS AND SOCIETY The idea for I19 was conceived in late 1993 during an NCGIA-sponsored meeting held in Friday Harbor, WA, USA. Stringent critiques of GIS by geographers working from the social theory perspective had started appearing in the literature, and there were fears that the two fields would become increasingly alienated (Taylor, 1990; Pickles, 1995). The purpose of the meeting was to bring together a number of GIS researchers and critical theorists, and let them sort out their differences and begin talking with each other. To almost everyone’s surprise, the meeting was a great success, thoroughly constructive, and there was widespread agreement that substantial follow-up efforts needed to be undertaken that would allow the two sides to continue working together. The proposal for I19 was the most tangible outcome of the Friday Harbor meeting, and it has already produced a meeting of its own. There was also a special issue of the journal Cartography and GIS, devoted to a number of expanded papers from the workshop (Sheppard and Poiker, 1995). As described in the proposal, the focus of I19 is on the following conceptual issues (see Harris and Weiner, 1996): 1. In what ways have particular logic and visualisation techniques, value systems, forms of reasoning, and ways of understanding the world been incorporated into existing GIS techniques, and in what ways have alternative forms of representation been filtered out? 2. How has the proliferation and dissemination of databases associated with GIS, as well as differential access to spatial databases, influenced the ability of different social groups to utilise information for their own empowerment? 3. How can the knowledge, needs, desires, and hopes of marginalised social groups be adequately represented in GIS-based decision-making processes? 4. What possibilities and limitations are associated with using GIS as a participatory tool for more democratic resolution of social and environmental conflicts? 5. What ethical and regulatory issues are raised in the context of GIS and Society research and debate? These conceptual issues are addressed in the context of three research themes: • the administration and control of populations; • location conflict involving disadvantaged populations; • the political ecology of natural resource access and use. As was to be expected, the discussion at the I19 specialist meeting ranged well beyond the themes outlined in the proposal. By the end of the three days, four new research proposal outlines had emerged, that were subsequently expanded, approved and funded: 1. 2. 3. 4.
The social history of GIS. The ethics of spatio-visual representation: towards a new mode. A regional and community GIS-based risk analysis. Local knowledge, multiple realities and the production of geographic information: a case-study of the Kanawa Valley, West Virginia.
16
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
2.3 GIS: FROM NUMBER-CRUNCHING TO ONTOLOGY The success of I19 thus far promises that interest in research at the interface of GIS and society will continue to grow and that the technical and societal perspectives on spatial information will continue to enrich each other. Clearly however I19 cannot account for all the work already going on in that general area, nor for all the theoretical and applied questions that could be investigated. In the following I will present a piece of my own perspective on these issues. Going back to the thesis and antithesis put forward at the beginning of this chapter, I will argue that the positive and negative effects of spatial information technologies on society are two sides of the same medal, and that both promoters and users of these technologies need to be constantly alert to the ethical issues always lurking just below the surface. This is not the same as saying that spatial information technologies are value-neutral tools that may be used for either good or bad purposes: rather, it is a recognition that, being thoroughly embedded in the values and practices of society, they share in the ethical dimensions of corresponding debates. To focus the discussion somewhat I will reduce “spatial information technologies” to “geographic information systems”. GIS broadly understood can encompass technologies such as remote sensing and image interpretation, geographic positioning systems (GPS), and the spatial databases available over the Internet, so that there is no significant loss of generality. Still, we have hardly broached the complexity of the issue, since GIS from a societal viewpoint can mean any or all of the following: • • • • • • • •
A technology for storing, combining, analysing, and retrieving large amounts of spatial data. A set of technical tools for the representation and solution of diverse spatial problems. A commodity that is produced, sold and bought. A man-machine system. A community of researchers and workers forming around that technology. A set of institutional and social practices. A set of conventions for representing the geographic world. A way of defining the geographic world and the entities within it.
Two things are notable about this list. First, ranging from the technically mundane to the philosophical, it is probably more involved than most such lists would be for other technologies. Second, most of the items on it, up to the last two, would also appear on a corresponding list for information technology in general. I personally believe that the societal issues that distinctly characterise GIS are to be found primarily in the last two aspects, the representation of the geographic world, and the associated ontology. This is also why representation was the key concept in the title of I19, even though the discussions ranged over a much broader spectrum of questions. The other items on the list do of course also have significant societal dimensions that are specific to GIS and spatial information, but these tend to be variants of similar issues (access, copyright, skills, power, empowerment, democracy, privacy, surveillance, etc.) arising from the broader problematic of modern information technology. I thus personally consider these last two aspects the most challenging—not necessarily because the associated problems are the most significant, but because unless we geographers and GIS researchers and practitioners grapple with them, no-one else will.
SPATIAL INFORMATION TECHNOLOGIES AND SOCIETAL PROBLEMS
17
2.4 ISSUES IN THE RELATION OF GIS AND SOCIETY I will now go through the list of eight aspects identified above, trying to highlight for each one of them both the positive and the problematic dimensions. My goal is to shake any complacent belief that GIS is either pure blessing or unmitigated calamity for society. 1. The simplest, most straightforward aspect of GIS from a societal viewpoint is its data handling capability. In both research and application the possibility to manipulate and analyse vast amounts of spatial data relatively easily, quickly, and cheaply, has not only greatly facilitated traditional activities such as map-making and cadastral maintenance, but has permitted scores of new useful data-intensive applications to be developed. On the other hand, running a GIS requires considerable skills that can only be acquired through substantial specialised training: is the necessary training really open to anyone who might have been employable under the old modes of operation, or does it exclude otherwise competent people who may not have social or geographic access to such training? Questions also arise regarding the displacement of workers lacking these skills by new, usually younger ones, which may create both human problems of employment and problems of lost institutional memory and experience. Further: are those performing the skilled data manipulations also the ones who best understand what the manipulations are for? What are the risks of separating the technical from the substantive expertise on a subject? A last question is whether the strong new emphasis on spatial data brought about by the introduction of GIS may not obscure other valuable ways of looking at problems under some circumstances. As with other information technologies (perhaps more so, because it is colourful and eye-catching), GIS is often seen as “signal and symbol” of efficiency in organisations (Feldman and March, 1981). To what extent that perception corresponds to reality is a question that can only be answered case by case. 2. Perhaps the most widely promoted view of GIS is that of a problem-solving technology applicable to a wide variety of spatial problems in both real-world situations and in research. Unquestionably GIS is that, judging from the myriad of applications world-wide in areas as diverse as planning, transportation, environmental conservation, forestry, medical geography, marketing, utilities management, the military, and so on. Still, there are questions: Whose problems are being solved? Who defines these problems? Who is involved in the generation of solutions? Who evaluates these solutions, and by what criteria? For whom are these solutions valid and good? How do alternative approaches to problem solution (or resolution) fit in? It is difficult to accept that, in a pluralistic and unequal society, a single problemsolving perspective based on the electronic manipulation and display of spatial data may give substantively good results in all the problem situations to which it is technically applicable. From a societal viewpoint, knowing when to use GIS and when to leave it alone may be the basis of good problem-solving. 3. There are several heart-warming examples of GIS being used by native peoples in the middle of the tundra or in disadvantaged neighbourhoods in the middle of inner-city jungles. These examples speak for the wide accessibility of GIS resulting from the phenomenal growth and diffusion of the technology in the past 15 years or so, and the intelligent, responsible, and dedicated work by GIS researchers and practitioners alike. This does not change the fact that GIS is a commodity made up of costly software, hardware, and databases, produced for the most part by a lucrative private industry operating under market constraints. As with any other commodity, however wide-spread, there are those who have it, and those who do not. Even free access to data, a thorny, disputed issue, would not make GIS a free good, even without taking into account the substantial investments needed to train skilled operators.
18
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Access to GIS is unlikely ever to be evenly distributed across the population, or the globe, and just as the rich tend to get richer, the knowledgeable tend to get more knowledgeable, leaving in their wake a growing proletariat of ignorance. These problems of access are compounded by issues of intellectual property, copyright, questions of data quality and appropriate system selection, so that even those who can afford GIS cannot be sure that they got the product they think they are paying for. The cacophony of vendors promoting their own wares in a competitive market has landed many a good system on the wrong desk, resulting in a waste of scarce resources and a substantial opportunity cost for agencies or businesses hoping to jump on the GIS bandwagon. 4. Perhaps the greatest advantage of GIS over traditional methods of spatial data manipulation and analysis lies in the interactive coupling of a human intelligence with the computational power of a machine, and in the intuition-boosting cognitive appeal of the visual representations produced through that interaction. On the surface this efficient man-machine system seems devoid of societal implications, until we notice that this whole interactive process is channelled through a user interface replete with metaphor and convention. Does the desktop metaphor of files and folders mean much to those who have never used a desk? Are the colours conventionally used for land, water, and vegetation just as intuitively obvious to the Tutsi as to the Inuit? Is a world made up of static polygons as comprehensible to the nomadic herder as to the urban dweller? There are also things that a man-man (sorry! person-person) system can do that a man-machine system cannot. GIS interfaces are typically designed with the (isolated) user in mind, whereas answers to spatial problems are most often arrived at by large numbers of different people interacting over time within relationships that may be in turn collaborative or adversarial. Attempts to develop GIS for collaborative decision support are on the right track, provided they take into account that there is much more to real-world decision making than amicable collaboration among technically-minded peers. 5. I mentioned earlier the potential for exclusion inherent in the GIS requirement for specially trained, technically proficient personnel. The other side of the exclusion issue is the formation of a closed subculture of GIS practitioners and researchers, with its own journals, meetings, e-mail lists, social networks, associations, and agendas. The existence of such a distinct GIS subculture is not in itself a bad thing, and is probably necessary for enhancing the professional identity and profile of the speciality as well as for allowing its members to remain on the cutting edge of intellectual and technical developments in the area. However, all such professional subcultures run a risk of internal homogenisation that can stifle true innovation, all the more so when they are relatively young and have not had the time to branch out towards diverse directions. (Just think of the number of GIS practitioners around the globe whose education has been based on the NCGIA core curriculum!) The premature domination of an orthodoxy within the GIS community would be particularly damaging considering the unusually wide range of actual and possible applications of the technology, and the ensuing need for a healthy variety of substantially different perspectives and practices. Other aspects of the GIS subculture that have caused some concern among critical theorists are its purported male-dominated character, and what some people have perceived as the arrogant messages of global oversight and control inherent in the vendors’ advertising slogans and imagery (Roberts and Schein, 1995). 6. The societal aspect of GIS that has attracted the most attention to date is the fusion of that technology with contemporary social practices and institutions, to the point of becoming a set of social practices itself. Within very few years GIS has infiltrated government and business, industry and academia, the environmental movement as well as the military, has been adopted by grassroots movements and native population groups, and has profoundly affected both the practice and self-definition of geography and the perception of the discipline by the public at large. “Geography’s piece of the information
SPATIAL INFORMATION TECHNOLOGIES AND SOCIETAL PROBLEMS
19
revolution” raises a host of important issues similar to those raised by the information revolution in general, but distinguished by “the spatial twist”: issues of access, power, empowerment, democracy, political decision-making, consumer sovereignty, social justice, equity, privacy, surveillance, control; questions about who gains and who loses, what is the opportunity cost, how can we avoid confusing what is desirable with what is possible with the new technology, what should the role of research be in fostering an enlightened practice. These issues dominated the discussions at the I19 meeting as well as the literature on GIS and society that has appeared to date, and will clearly continue to do so for some time. They involve truly “wicked” problems that can never be solved once and for all, even though specific instances are always amenable to thoughtful, creative handling. 7. How the geographic world is represented in GIS is a question that has no counterpart in other forms of information technology. It is an issue specific to GIS and one that concerns all those working in the field, whether their interest is mainly technical or applied to either environmental or social-science problems. The issue is that GIS represents the world in a particular way, and that this is by no means the only way possible. Questions thus arise regarding likely errors of both commission and omission: What are the biases in this abstract digital model of the world underlying GIS, based on information technology, geocoded measurement, and the cartographic tradition? Whose view of reality is reflected in the resulting computer images and possible modes of their manipulation? What interpretations and solutions are supported by these representations? And conversely: what alternative forms of knowledge and ways of understanding the world are excluded from this view, what questions cannot be asked within it, what realities cannot be seen, whose voices are being silenced? Critics have stressed in particular the strong GIS bias towards the visual, leading, in Gregory’s (1994) words, to a view of the “world as exhibition”: a world reduced to what can be displayed—in this case, displayed in the form of either fields or objects. 8. The step from representation to ontology is a small but crucial one. It is the step from seeing the world as if it were as represented, to thinking it actually is as represented. It is the map becoming the territory, the representation determining what exists, how it all works, and what is important. In the case of GIS, it is not just the world represented in a particular kind of interactive visualisation: it is the world where what is visualisable in these terms is all that exists: the ultimate WYSIWYG (What You See Is What You Got-if you don’t see it, it’s not there). I have commented elsewhere on the power of GIS, along with other electronic media, to generate what Mitroff and Bennis (1989) call “unrealities”: Unreality One, where the unreal is made to look so much like the real that you can no longer tell the difference; and Unreality Two, where the unreal becomes so seductive that you no longer care about the difference (Couclelis, 1996). While the choice of any particular mode of representation necessarily constrains what questions can be possibly asked and answered, unwitting or conscious adoption of the corresponding ontology moulds permanently restrictive habits of mind. It would be a great loss, for our speciality as well as for society, if “the world as exhibition” actually became the only GIS world there is. 2.5 THE CONTRIBUTION OF GEOGRAPHIC INFORMATION SCIENCE As GIS researchers and practitioners working on specific technical and scientific problems, we may sometimes wonder how our efforts might relate to the above very general concerns. As responsible human beings and citizens we would like our work to be socially useful or at the very least not to harm people, even though we know that what we do as conscientious professionals often takes on a life of its own once it leaves our hands. The bravest among us have taken their GIS to the streets and had it applied to whatever
20
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 2.1. The geographic triangle
societal cause they considered most deserving. Others take comfort from the fact that the technical problems they are solving are sufficiently removed from the social arena so as not to pose a threat to anyone. Yet it should have become clear from the preceding discussion that the societal implications of GIS, whether positive or negative, do not derive from any particular kind of work, any particular individual choices, but from the whole nexus of technological, historical, commercial, disciplinary, institutional, intellectual and ideological conditions surrounding the development of the field. The effects of GIS on society are thus emergent effects of the aggregate, not of any of its individual parts, and as such we all share modestly but equally in the responsibility for what good or bad results. We thus cannot say that we will leave the worrying to just those interested in the social-science and policy applications of GIS, and to the critics. But how can we relate the particular data model, spatial query language, or generalisation technique we may be working on to a laundry-list of societal issues such as the ones discussed earlier? In my view, if such thing as geographic information science exists, it would not be worthy of the name unless it can help us make that connection. I recently proposed a conceptual framework for geographic information science that would pass that test. (Couclelis, 1998). Here is the idea in a nutshell: As members of the human species living on the earth we all share a fundamental knowledge of the geographic world, a knowledge that has, at the very least, empirical, experiential, and formal components— the difference between the former two being roughly that between explicit and implicit or intuitive. These three perspectives on geographical knowledge form the vertices of what I call the geographic triangle (Figure 2.1). Connecting the empirical and experiential are geographic concepts; connecting the empirical and formal are geographic measurements; and the fusion of the experiential and formal perspectives gives rise to spatial formalisms, that is, geometry and topology. The geographic triangle itself is best represented by the quintessential geographic instrument, the map. GIS, by continuing the cartographic tradition, is thus well grounded in the geographic triangle. In recent decades alternative approaches to geography have emerged out of the wider critical theory and political economy movements that have been flourishing in the social sciences. These have strongly contested the value of the formal perspective for human geography in particular and have developed their
SPATIAL INFORMATION TECHNOLOGIES AND SOCIETAL PROBLEMS
21
Figure 2.2. The ‘social’ vertex: a competing geographic triangle?
own non-quantitative discourses heavily critical of the limitations of, and even threats to society posed by formalist, quantitative, technology-based methodologies. The critique of GIS as reflected in Pickles (1995) and similar writings, which prompted the development of I19, derives almost entirely from these alternative perspectives on geography. The resulting tension between the mainstream and the “new” geographies may be illustrated in Figure 2.2, where a “social” vertex now defines a very different kind of geographic triangle. Confirming what we know from decades worth of literature, there seems to be no common ground between the two kinds of geography. But free that scheme from the flatness of the plane, and see what happens when the two kinds of approaches are allowed to enrich each other! (Figure 2.3). A new dimension is added to both, and a number of critical connections suddenly become evident. The resulting solid gives rise to three more edges and three more triangles. Here is what I suggested the new edges mean. Connecting the social and the empirical are the geographic constructs overlaid on the surface of the earth: the boundaries and the territories, the functional regions and protected areas, the private and public spaces, the neighbourhoods and natural ecosystems. Connecting the social and the experiential are the alternative perspectives on the geographic world borne out of the diverse kinds of social and cultural experiences. And I propose to you that what connects the social and the formal in this context is no other than geographic information: the formal end makes possible the GIS outputs—the maps, the TINS, the animations, the social provides the intentional stance which alone can give meaning to what otherwise would have been nothing but a bunch of fleeting colour pictures. Even those of us working on natural-science applications of GIS should recognise the extent to which the questions we find worth asking, the answers we deem acceptable, and the interpretation of what we do are socially conditioned. Science too is a societal enterprise: the social vertex is part of what we all do! I am thus arguing for a thorough integration of the societal perspective into geographic information science. This certainly does not mean that every one of us should or could start directly exploring societal
22
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 2.3. The tetrahedron of geographic information science
questions in their research. But it does mean that all of us should be aware that these multiple connections between GIS and society exist, and be prepared to occasionally engage in an earnest dialogue with those systematically working to explore them. It is the notion of information in GIS that makes the social dimension inescapable. Information presupposes an intelligence that is attuned to what it may signify, and that intelligence is always socially conditioned. There is no such thing as a lone mind dealing with the world outside of a social context. Robinson Crusoe was a single white Anglo-Saxon male! REFERENCES COUCLELIS, H. 1996. Geographic Illusion Systems: towards a (very partial) research agenda for GIS in the information age, in Harris T. and Weiner D. (Eds.), Report for the Initiative 19 Specialist Meeting on GIS and Society, NCGIA Technical Report 96–8. Santa Barbara, CA: National Center for Geographic Information and Analysis. COUCLELIS, H. 1998. GIS without computers: building geographic information science from the ground up, in Kemp, Z. (Ed.) Innovations in GIS 4: Selected Papers from the Fourth National Conference GIS Research UK. London: Taylor & Francis, pp. 219– 226. FELDMAN, M.S. and MARCH, J.G. 1981. Information in organizations as sign and symbol, Administrative Science Quarterly, 26, pp. 171–186. GREGORY, D. 1994. Geographical Imaginations. Cambridge: Blackwell. HARRIS, T. and WEINER, D. (Eds.). 1996. GIS and Society: the Social Implications of How People, Space, and Environment Are Represented in GIS, Report for the NCGIA Initiative 19 Specialist Meeting, NCGIA Technical Report. Santa Barbara, CA: NCGIA MITROFF, I. and BENNIS, W. 1989. The Unreality Industry: the Deliberate Manufacturing of Falsehood and What It Is Doing to Our Lives. New York: Oxford University Press.
SPATIAL INFORMATION TECHNOLOGIES AND SOCIETAL PROBLEMS
23
PICKLES, J. (Ed.) 1995. Ground Truth: The Social Implications of Geographic Information Systems. New York: Guilford Press. ROBERTS, S.M and SCHEIN, R.H. 1995. Earth shattering: global imagery and GIS, in Pickles, J. (Ed.), Ground Truth: The Social Implications of Geographic Information Systems. New York: Guilford Press, pp. 171–195. SHEPPARD, E. and POIKER, T. (Eds.) 1995. Special issue on “GIS & Society”, Cartography and Geographic Information Systems vol. 22(1), pp. 3–103. TAYLOR, P.J. 1990. GKS, Political Geography Quarterly, 9(3), pp. 211–212.
Chapter Three Information Ethics, Law, and Policy for Spatial Databases: Roles for the Research Community Harlan Onsrud
3.1 INTRODUCTION Ethical conduct may be defined from a practical perspective as conduct that we wish all members in society to aspire to but which is unenforceable by law or undesirable to enforce by law. Legal conduct is typically defined by the documented and recorded findings of our legislatures and courts. Ethical conduct is involved in the choices users make in applying geographic information system technologies on a day to day basis. However, ethical conduct is also involved in the choices scientists and researchers make in determining which aspects of the knowledge base they help advance. For instance, should researchers put their time and effort into expanding the knowledge base that will help advance systems for allowing stricter control over digital information or should we put our efforts into expanding the knowledge base for systems that will allow greater access to information by larger segments of society? Moral stances may be taken in support of either of these as well as many other propositions. The science of ethics helps us sort out which moral arguments have greater validity than others. No person may reliably predict how basic and even applied research advancements will ultimately affect society. However, as developers of new tools and techniques we should at least be aware of the potential social ramifications of our work so we can make informed and, hopefully, ethically supportable choices when the opportunity to make choices arise. The research outlined in this chapter is concerned with discovering the effects of geographic information technologies on society, observing effects when different choices are made for institutionalising or controlling location technologies and datasets, and developing information policy and legal models not yet tried in practice that might lead to greater beneficial results for society. Let us assume that the large rounded rectangle in Figure 3.1 encloses all societal conduct. Conduct to a philosopher is typically defined as behaviour that involves choices on the part of the actor. For most conduct, such as whether you choose to wear black or green socks, society makes no public judgement as to your choice of behaviour. For the most part we can assume that the behavioural choices we make in daily living are legal unless specifically defined as illegal by law. The subset of conduct enclosed by the circle in Figure 3.1 represents conduct that society has deemed to be illegal. If one makes a choice defined by society’s laws as being illegal, one is subject to the sanctions proscribed by society for that conduct. From a practical perspective, we may also classify a subset of conduct as being unethical and another set as being ethical. For these two classes of conduct, as with illegal conduct, society does make a judgement as to rightness or wrongness of your choice of behaviour. Ethical conduct to the non-philosopher is generally
INFORMATION ETHICS, LAW, AND POLICY
25
Figure 3.1: Societal conduct (Onsrud, 1995b).
understood to be “positive” or “laudatory” conduct while unethical conduct is understood to involve “wrong” or “bad” choices. Most ethical conduct is also legal. However, certain conduct that might be considered laudatory and ethical by an individual or even a large proportion of society, such as speeding to prevent a murder or helping a person with a terminal and extremely painful illness commit suicide, may be defined as illegal by society. Actions falling in this class are represented by the light grey area in Figure 3.1. Much unethical conduct is also deemed to be illegal by society. However, a significant body of conduct exists that is unethical yet legal. Even though the vast majority of society may agree that certain volitional actions of individuals are unethical, society may also agree that such actions should not be banned. For instance, disallowing certain conduct might overly restrict other conduct that we highly value or punishing certain non-desired conduct might be too burdensome on society’s resources. It is this body of unethical conduct as represented by the dark shaded area in Figure 3.1 that the remainder of this chapter is primarily concerned with. 3.2 UNETHICAL CONDUCT In determining whether a proposed action in the use of a geographic information system is considered unethical, one might first resort to philosophical theories as set forth in the philosophy literature for guidance. However, the bounds defining unethical behaviour vary substantially depending on the philosophical arguments one accepts. Even though the rules developed by the great philosophers have many areas of agreement, no one philosophical line of reasoning or single rule seems to have stood the test of time in determining the rightness or wrongness of actions (Johnson, 1984). As such, it is this author’s contention that the bounds between unethical conduct and other conduct should be defined first through realworld practical experiences and methods. These results then should be checked against the major philosophical lines of reasoning for their extent of conformance or non-conformance and adapted if necessary. In a practical world, in determining whether a proposed action in the use of geographic information systems might be considered unethical, individuals often try to anticipate whether the consensus of a group
26
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
or a majority of a group would consider the action to be “inappropriate” or “bad” if the group was made aware of the action. For a specific proposed action, it is seldom possible to take an opinion poll of the general public or of the group that may be affected prior to taking the action. For this reason, many professional groups develop explicit codes of conduct to which they hope all of their members will aspire and against which members of their community may assess proposed actions. Because of its recent emergence, the geographic information science community has no well established professional codes of conduct. However, even if one borrows principles from the codes of similar professional groups, many of the professional codes of conduct constructed in the past have been developed by consulting members of the specific discipline without consulting consumers or, for instance, consulting those who may be subjects in the data sets bought and sold within the discipline. Most codes of professional conduct focus on fair dealings among members of the discipline. This results in biases towards those involved in constructing the codes. What is agreed to be “smart business practices” by a large majority of practising professionals may be considered wholly unethical by data subjects or by the consumers of professional products and services. A highly appropriate role for the geographic information science research community would be to determine those common situations in day to day practice in the discipline that give rise to ethical dilemmas and to assess the responses of those affected by the various alternatives proposed for resolving the dilemmas. Parties such as producers, users, vendors, and data subjects all should be consulted. Any consensus lines arrived at from the responses should be assessed in the light of the leading philosophical theories addressing the appropriateness of human conduct. Practical ethical principles and guidelines such as those drawn from broad experiences across many segments of society might also be considered in this process (Kidder, 1995). In this way the geographic information science research community might arrive at recommended actions, if not recommended codes of conduct, that could better benefit those broad segments of society involved in the use of geographic information. 3.3 LEGAL CONDUCT As stated earlier, legal conduct is typically defined in most modern societies by assuming that all behaviour choices are legal unless specifically banned by law. Thus it is primarily the defining of illegal behaviour that allows us to define legal behaviour. As technologies such as geographic information systems, global positioning systems, other location technologies, and computer networks have advanced and are embedded in everyday life, new actions and choices in behaviour arise that were never envisioned or contemplated by past lawmakers. Opportunities arise for some parties to take advantage of others. Much of the opportunist behaviour promotes competition and economic activity and thus the behaviour is considered positive. Other opportunist behaviour is considered so unfair that past laws are construed to cover the changed circumstances and remedies may be had under principles of equity in the courts. In yet other instances, legislators react by passing new laws to cover those unfair opportunist behaviours that new technologies have given rise to but that cannot be dealt with through the application of current laws. Thus, the law adapts over time. We are at a time in history when rapid changes in technology are causing rapid changes in societal relationships. Because of this, the parties to information policy debates often have very little evidence of the actual effects that proposed information policies or laws will have on various groups or individuals in society. Arguments that proposed information policies or laws will or will not create fair, just, and equitable balances among members of society are often highly speculative since little or no evidence typically has been gathered to support the truthness or falsity of the competing claims. The need for information on the
INFORMATION ETHICS, LAW, AND POLICY
27
effects of alternative information policies and laws has created an important role for academics and researchers within the geographic information science community. 3.3.1 Roles for the Geographic Information Science Academic Sector For important social disputes in which geographic data is involved, the academic sector should help articulate and constructively criticise the arguments presented by or for the various stakeholders in the social disputes. If the private sector information industry is in a pitched battle with local governments over how spatial data should be distributed, the academic sector can fill an important role by listening to all stakeholders in the active dispute, identifying the strongest arguments for each set of voices in the debate, and helping to better construct the logic and persuasiveness of those arguments. The academic sector also needs to expose and articulate arguments in support of disenfranchised groups in society. The academic sector should purposely broaden debates to include voices not actively being heard but that also have a stake in the outcome of information policy debates. Government agencies and the private business sector have greater resources at hand to promote their interests than many others in society. For this reason the academic sector has often taken on the role of citizen advocacy or minority advocacy as a social obligation when those voices otherwise would not be heard. For instance, it is largely the academic sector rather than government agencies or the private business sector that is raising concerns over the adverse impacts on personal information privacy caused by the pervasiveness of geographic data sets and the use of those datasets as tools for the massive integration of data about individuals. The academic community has an extremely important role to play in continually questioning the logic and validity of all arguments presented in information policy debates. This community needs to evidence the truth or falseness of claims by collecting evidence on the ramifications of following one information policy or legal choice over another. The research community can inform important social debates by going out and observing information policy and law in action. If the claim is made that sale of GIS data by government increases the overall economic well being of a community or adversely impacts the overall economic well being, which of the claims actually holds up in practice and under what circumstances? The academic sector is particularly well suited to evidence the truth and falseness of claims and to observe information policies and laws in action. This is because their work is typically reviewed by peers with an expectation of full disclosure of study and survey methods. Considered reflection and analysis is the expected norm. If study methods are biased, they are expected to be exposed as such through the peer review process. In many social problem areas, scholars have little personal economic interest or other vested interests in how a social issue might be resolved. Therefore they are often able to judge the evidence, draw conclusions, and make recommendations more dispassionately than those to whom the outcomes matter. In the social arena of information policy debates, however, the academic community as a whole often does have a stake in the outcomes. Thus the review process by peers and scrutiny for bias needs to be more vigilant and scrupulous when vested interests of the academic community are evident. Finally, one of the most important roles for the academic community is to advance new knowledge. The research community should purposely set out to construct and recommend new or unexplored models of action or regulation that may better achieve the goals of all stakeholders in important social disputes. This of course is a long-term and iterative process since any new information policy models or approaches that may be suggested must also be tested and challenged in practice over time.
28
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
3.3.2 Needed Legal and Information Policy Knowledge Advancements There are many unresolved legal and information policy issues germane to geographic data that could be enlightened by the process of aiding articulation of arguments, gathering data to test the veracity of arguments, and developing new models to guide policy making and law making. Some of the more pressing areas in which long term iterative research should be accomplished within the geographic information policy and legal domains include intellectual property rights in geographic information (copyright, trademark, unfair competition, trade secret, patent), incursions on personal information privacy as a result of the availability or use of geographic datasets and processing, access to geographic data housed by government, liability for harmful geographic data and communication of such data, electronic commerce in geographic data (authentication, electronic contracting, admissibility of evidence), anti-trust implications of monopoly-like control over some geographic datasets, free trade in geographic data, and the implications of differences in law and information policies among nations on trade in geographic data, software, and services. Specific detailed research questions in each of the needs areas is readily available in the literature (Onsrud, 1995a). Similarly, a wide range of important unanswered information policy research questions have been raised relating to the sharing of geographic information (Onsrud and Rushton, 1995) and to the development of geographic information technologies that might better allow such technologies to be used for positive change in society (Harris and Wiener, 1996). 3.4 CONCLUSION For as long as modern societies continue to experience rapidly changing technological environments, legal and government policy arrangements for managing and protecting geographic data will remain unclear. Geographic data has been and will continue to be a test bed for developing new information practices and theory. The resolution of legal and information policy conflicts in this arena will have ramifications for other information technologies just as resolution of information policy issues in other arenas are affecting the handling of geographic data. The need for policy makers to reconcile competing social, economic, and political interests in geographic data will become more pressing over time, not less. To respond to these social needs, the geographic information science research community will need to increase fundamental research efforts to address ethical, information policy and legal issues in the context of geographic data. ACKNOWLEDGEMENTS This chapter is based upon work partially supported by the National Center for Geographic Information and Analysis (NCGIA) under NSF grant No. SBR 88–10917. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. REFERENCES HARRIS, T. and WEINER, D. (Eds.) 1996. GIS and Society: The Social Implications of How People, Space and Environment are Represented in GIS, Technical Report 96–7. Santa Barbara: NCGIA, UCSB.
INFORMATION ETHICS, LAW, AND POLICY
29
JOHNSON, O.A. 1984. Ethics Selections from Classical and Contemporary Writers, 5th ed. New York: Holt, Rinehart and Winston. KIDDER, R.M 1995. How Good People Make Tough Choices: Resolving the Dilemmas of Ethical Living. New York: Simon & Schuster. ONSRUD, H.J. (Ed.) 1995a. Proceedings of the Conference on Law and Information Policy for Spatial Databases Orono, Maine: NCGIA, University of Maine. ONSRUD, H.J. 1995b. Identifying Unethical Conduct in the Use of GIS, Cartography and Geographic Information Systems, 22(1), pp. 90–97. ONSRUD, H.J. and RUSHTON, G. (Eds.) 1995. Sharing Geographic Information. Rutgers: CUPR Press.
Chapter Four Building the European Geographic Information Resource Base: Towards a Policy-Driven Research Agenda Massimo Craglia and Ian Masser
4.1 INTRODUCTION The chapter reviews recent developments relating to the formulation of a policy framework for geographic information in Europe. It is divided into two parts which respectively examine the terms of the European debate with particular reference to the initiatives taken by the European Commission, and discuss some of the most urgent issues on which to focus future research efforts. What clearly emerges is that considerable efforts are being made to address the many and complex issues related to the development of a geographic information resource base in Europe and the need for the geographic information research community to engage in the debate and shape the research agenda. The nature of this agenda poses considerable challenges as it lies outside many of the traditional strengths of the European GIS research community and requires a much stronger inter-disciplinary dialogue that has been hitherto the case. 4.2 THE EMERGENCE OF A EUROPEAN POLICY FRAMEWORK FOR GEOGRAPHIC INFORMATION 4.2.1 Introduction This section considers some factors which have led to the emergence of a European policy framework for geographic information. The first part discusses the political context of these developments and the second summarises the main proposals contained in the discussion document entitled GI 2000: Towards a European Policy Framework for Geographic Information, published by DGXIII/E in May 1996 (DGXIII/ E, 1996a). To complete the picture the last part of this section describes a number of related developments that have been taking place in Europe over the last few years. Most of the projects discussed relate to the work of DGXIII/E. It should be noted however that DGXIII/E is not the only Directorate within the European Commission to express an interest in geographic information and GIS technology. These interests are shared by many other Directorates and also by Eurostat which has set up its own GIS (GISCO) to meet the needs of Community administrators. A number of these Directorates have also commissioned GIS related projects under the Fourth Framework for Research and Development which runs from 1994–98. To give an indication of the wide range of GI related activities within the Commission, a search on the ECHO database of the Commission found over 100 “hits” for
TOWARDS A POLICY DRIVEN RESEARCH AGENDA
31
projects dealing with or using GI(S). Most of these are primarily concerned with the application of GIS technology in the field of the environment, transport and spatial planning rather than geographic information policy itself. Consequently DGXIII/E, whose general responsibilities include telecommunications, the information market and the exploitation of the findings of research has played a vital role in the development of European wide geographic information policy 4.2.2 The Political Context The starting point for much of the current discussion is the vision for Europe that was presented to the European Council in Brussels in December 1993 by the then President Jacques Delors in the White Paper entitled Growth, Competitiveness and Employment: The Challenges and Ways Forward into the 21st Century (CEC, 1993). An important component of this vision is the development of the information society essentially within the triad of the European Union, the United States and Japan. One result of this initiative was the formation of a high level group of senior representatives from the industries involved under the chairmanship of Commissioner Martin Bangemann. This group prepared an action plan for “Europe and the global information society” which was presented to the European Council at the Corfu Summit in June 1994 (Bangemann, 1994). In this plan the group argued that recent developments in information and communications technology represent a new industrial revolution that is likely to have profound implications for European society. In order to take advantage of these developments it will be necessary to complete the liberalisation of the telecommunications sector and create the information superhighways that are needed for this purpose. With this in mind the group proposed ten specific actions. These included far reaching proposals for the application of new technology in fields such as road traffic management, transEuropean public administration networks and city information highways. These proposals were subsequently largely incorporated into the Commission’s own plan, Europe’s Way to the Information Society which was published in July 1994 (CEC, 1994). 4.2.3 GI 2000 Parallel to these developments a number of important steps have been taken towards the creation of a European policy framework for geographic information. In April 1994 a meeting of the heads of national geographical institutes was held in Luxembourg which concluded that “it is clear from the strong interest at this meeting that the time is right to begin discussions on the creation and supply of harmonised topographic data across Europe” (DGXIII/E, 1994). This view was reinforced by a letter sent to President Delors by the French minister M. Bosson which urged the Commission to set up a coordinated approach to geographic information in Europe, and the correspondence on this topic between the German and Spanish ministers and Commissioner Bangemann during the summer of 1994. As a result of these developments, a meeting of key people representing geographic information interests in each of the Member States was held in Luxembourg in February 1995. The basic objective of this meeting was to discuss a draft document entitled GI2000: Towards the European Geographic Information Infrastructure (DGXIII/E, 1995a) and identify what actions were needed in this respect. The main conclusion of this meeting was that “it is clear from the debate that the Commission has succeeded in identifying and bringing together the necessary national representative departments that can play a role in developing a Community Action Plan in Geographic Information” (DGXIII/E, 1995b, p. 12). As a result it
32
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
was agreed that DGXIII/E should initiate wide ranging consultations within the European geographic information community with a view to the preparation of a policy document for the Council of Ministers (Masser and Salgé, 1996). Since 1995 the original document has been redrafted at least seven times to take account of the views expressed during different rounds of consultations. In the process its status has changed from a “discussion document” to a “policy document” and back to a “discussion document” again. In the most recent of these drafts which is dated 15 May 1996 (DGXIII/E 1996a) its title has also been changed from GI2000: Towards the European Geographic Information Infrastructure to GI2000: Towards a European Policy Framework for Geographic Information and the term “infrastructure” has largely disappeared from the document to make it more palatable to policy makers who, it is argued, identify “infrastructure” with physical artefacts like pipes and roads and may get confused by the broader connotation of this term. Despite these changes in emphasis, the basic reasoning behind the argument that is presented remains essentially unchanged. There is a discernible trend towards ac hoc harmonisation of geographic information in Europe… However, progress is being hampered by political and institutional considerations that need to be addressed at the highest levels if the opportunities provided by geographic information technology are to be fully exploited. To remove bottlenecks, reduce unnecessary costs and provide new market opportunities, a coherent European policy framework is needed in which the industry and market can prosper (DGXIII/E, 1996a, p.11). Consequently, what is required is a “policy framework to set up and maintain a stable, European wide set of agreed rules, standards, procedures, guidelines and incentives for creating, collecting, exchanging and using geographic information” (DGXIII/E, 1996a, p. 11). The main practical objectives for the policy framework are: 1. To provide a permanent and formal, but open and flexible, framework for organising the provision, distribution and standardisation of geographic information for the benefit of all suppliers and users, both public and private. 2. To achieve a European wide meta data system for information exchanged that conforms to accepted world-wide practices. 3. As far as possible, to harmonise the objectives of national geographic information policies and to learn from experience at national level to ensure that EU-wide objectives can be met as well, at little additional cost and without further delay or waste of prior work already completed. 4. To lay the foundations for rapid growth in the market place by supporting the initiatives and structures needed to guarantee ready access to the wealth of geographic information that already exists in Europe, and to ensure that major tasks in data capture are cost effective, resulting in products and services usable at national and pan European scales. 5. To develop policies which aid European businesses in effective and efficient development of their home markets in a wide range of sectors by encouraging informed and innovative use of geographic information in all its many forms, including new tools and applications which can be used by non experts. 6. To facilitate the development of European policies in a wide range of fields by encouraging the promotion of new and sophisticated analysis, visualisation and presentation tools (including the relevant datasets) and the ability to monitor the efficacy of such policies.
TOWARDS A POLICY DRIVEN RESEARCH AGENDA
33
7. To help realise the business opportunities for the intrinsic European geographic information industry in a global and competitive market place (DGXIII/E, 1996a, p.13). The current draft also contains a list of practical actions that are required in connection with this framework. These include: • Actions to stimulate the creation of base data given that the lack of European base data is seen as “the single most important barrier to the development of the market for geographic information”. In this respect the role of the European Union is essentially to stimulate closer cooperation between national organisations and encourage the participation of the private sector in this task. • Actions to stimulate the creation of meta data services to make it easier to locate existing information and promote data sharing across different applications. • Actions to overcome legal barriers to the use of information while at the same time reducing the potential risks to society from the unrestricted application of modern information technology. It is recognised that the involvement of many organisations and institutions within Europe will be required to create such policy framework and that strong leadership and political support will be needed to carry the process forward. As no organisation exists with the political mandate to create geographic information policy at the European level, it is intended that the European Commission will seek such a mandate from the European Council. Once this is obtained it is envisaged that a high level task force will be set up to implement the policy outlined in the document. 4.2.4 Related R&D Developments A number of related R&D projects have been commissioned by DGXIII/E within the context of the European policy framework, including the IMPACT-2 projects, the three GI studies and INFO2000. The projects developed as part of the IMPACT-2 programme (1993–95) addressed a number of GI-based applications including tourism, education, and socio-economic analysis (see www2.echo.lu/gi/projects/en/ impact2/impact.html). They were particularly useful in building operational experience with respect to the institutional, cultural and legal barriers that need to be resolved for the creation of trans national databases within Europe (Longhorn 1998). The three GI studies commissioned by DG XIII in 1995 specifically related to issues arising out of the discussions regarding the European policy framework. Their basic objectives are reflected in their titles (see also www2.echo.lu/gi/docarchive/ref_doc.html): • Study on policy issues relating to geographic information in Europe (GI-POLICY) • Study on demand and supply for geographic information in Europe, including base data (GI-BASE) • Feasibility study for establishing European wide metadata services for geographic information in Europe (GI-META) Work on these projects began at the start of 1996 and was completed in 1997.
34
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
4.2.4.1 INFO2000 DGXIII/E launched in 1997 its new four year research programme which has been allocated a provisional budget of 65 million ECU. The basic objectives of this programme are to stimulate the multimedia content industry and to encourage the use of multimedia content in the emerging information society. Nearly half the budget for this programme is allocated to action line three for projects that will accelerate the development of the European multimedia industry in four key areas: cultural heritage; business services; geographic information; and science, technology and medical information. To receive support from the geographic information component of this programme projects will have to satisfy at least one of the following objectives: - to demonstrate through innovative pilot applications the advances made in integrated base data and thematic information content… - to provide pan-European information about GI, what is held, in what format and how it is accessed (metadata services and their linking) - to demonstrate integration or inter linking of base data of a pan-European, or trans-border nature that may form the building block of future commercial applications, especially where such projects will held to establish common specifications for pan-European GI datasets - to demonstrate methodologies for collecting, exchanging and using pan-European of trans-border GI, including provision for networked access to other services (DGXIII/E, 1996b, p. 13). 4.2.5 Evaluation From the discussion above it can be seem that geographic information occupies a prominent position in European policy debates at the present time. The initial stimulus for these debates arose out of the growing concerns among leading politicians and policy makers about the need to maintain Europe’s economic competitiveness in an emerging information society. One result of these debates is a wide ranging round of discussions that is currently taking place on the subject of the European policy framework for geographic information which aims to stimulate action by the European Commission itself. Parallel to these discussions DGXIII/E has commissioned a number of R&D projects whose findings are likely to inform this debate further. 4.3. TOWARDS A POLICY-DRIVEN RESEARCH AGENDA FOR GI 4.3.1 Introduction Having reviewed some of the main developments in the debate on the European Policy Framework, this section discusses some of the activities that are currently taking place to define a research agenda for GI in the context of the GI2000 initiative. Before discussing the research agenda, it is useful to outline the current thinking at EU Commission level in relation to the Fifth R&D Framework (1999–2002), and the role that GI may play in it, as well as some of the research topics being canvassed at the present time within the GI
TOWARDS A POLICY DRIVEN RESEARCH AGENDA
35
research community. Reference is therefore made to the discussion documents circulated at a R&D meeting hosted by DG XIII/E in Luxembourg on the 20 June 1996. This meeting was part of a broad consultation process on GI2000 which had already involved representatives of the GI user and GI producer communities. Its objectives were primarily to gather feedback on GI2000, but also to canvas ideas on the key issues relating to GI that ought to be included in the forthcoming Fifth R&D Framework. Although the ideas presented in the following sections were preliminary in nature, it is felt, nevertheless, that they provide the starting point for the development of the future policy-driven research agenda. 4.3.2 The Fifth Framework for R&D The general aims of the Fifth Framework are likely to broaden those of the Fourth Framework (1994–1998) i.e. “to strengthen the scientific and technical base of the EU”, to include issues related to growth, competitiveness, employment and the information society (see CEC 1993, 1994). Hence there is an even stronger connection to industry and job creation as well as developing the opportunities for greater communication and participation within the European Union. Nevertheless there is a growing recognition that long term research is also a strategic area for the EU, and that there is a need to support European scientists in partnership with European industry. This need however must be balanced against the overall aims of the Fifth R&D programme. Underpinning the themes of the Fifth Framework are some broad expected trends, namely: • The continuing reduction in cost and increased performance of Information and Communication Technologies (ICT) which will spread usage more and more to the non-specialists. • The pervasiveness of computer networking which require measures to encourage European companies to move to networked applications. • The growth of electronic commerce and the development of interactive mass media. • The increased competition of generic ICT services in areas of traditional media with significant impacts as e-mail becomes the dominant form of written communication and conventional printing becomes completely digital (DG XIII/E, 1996c). Against this background the main areas for GI research in the Fifth Framework being canvassed at present are: • GI data generalisation: i.e. the need to develop further the technology so that high resolution data collected for one application, such as land management, can be used for another application with lower resolution needs such as tourism, fleet management or environmental monitoring. For this purpose further research on scale-less digital data is needed. • GI data visualisation including virtual reality to satisfy the presentation needs of both existing and new GI users. • Geospatial image and text integration for presentation and analysis. Here the key issues relate to indexing very large datasets, data quality accuracy and precision, the combined effects of which have yet to be fully addressed by the industry. • GI in model building: this includes new algorithms and methodologies to promote greater and better use of GI for citizens, government, and industry and further research on handling temporal data and on error propagation.
36
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Attention is also being given to Geospatial Publishing (collection, creation, packaging, labelling, certification, marketing, distribution), Intelligent Maps (transforming spatial data into knowledge, automated feature recognition from images, virtual reality for visualisation), and Geospatial Data on the Net (new methodologies, standards for data storing, indexing, accessing, and delivering, seamless data integration, and data clustering using object-oriented techniques). From the research topics outlined above it can be seen that the focus is primarily on the technical aspects of GI handling and the opportunities opened up by technological developments. This may be due to the partnership with industry which is one of the hallmarks of the EU research and development programmes. However, there is also considerable evidence which suggests that the barriers to exploiting these opportunities to the full are less technical than conceptual, legal, institutional, organisational, and related to education and awareness (Burrough et al., 1996, Masser et al., 1996; Nansen et al., 1996;). With this in mind, the following section puts forward a policy driven research agenda to support the development of the GI Policy Framework based on the experience of the European Science Foundation GISDATA Scientific programme. 4.3.3 A Policy-Driven Research Agenda The European Science Foundation GISDATA scientific programme was built around a research agenda which identified three main clusters of topics (Arnaud et al., 1993). These were geographic databases, data integration and socio-economic applications (see www.shef.ac.uk/uni/academic/D-H/gis/gisdata.html). The programme has involved over 300 GI researchers from 20 European countries and the USA during the period 1993–97, becoming a recognised voice of the European GI research community as a whole. In the light of the experience of this programme the following topics for a policy driven research agenda can be identified with respect to the needs of the GI 2000 initiative. These can also be grouped in the three main clusters of the original GISDATA research agenda. 4.3.3.1 Geographic Databases Model generalisation: this is a key area to which continues to be critical for the development of seamless scale-less databases. Both GISDATA and the NCGIA have essentially addressed issues of cartographic generalisation and have not gone very far on the much thornier issue of how to generalise attribute data and hence the conceptual data model when moving from one level of resolution to another or integrating data from different sources having different resolution levels. Spatial data handling on the Web: it is clear that GI handling technologies will less and less be based on proprietary software on single installations like a workstation and more and more based instead on distributed network environments which are platform-independent and use a whole suite of software tools and applets. Similarly the development of large digital libraries and metadata services needs further research on spatial agents and appropriate methodologies for data storage, indexing, retrieval and integration. Data quality issues will also increase in importance with the use of third party data rather than data internal to the organisation. Handling qualitative GI: this is a topic that has consistently emerged as requiring special attention to broaden the involvement of the public and non-GI expert users into using GI and related handling technologies. The geography and concerns of individuals and of disciplines in the social sciences outside geography are
TOWARDS A POLICY DRIVEN RESEARCH AGENDA
37
often expressed in qualitative terms rather than cartesian/quantitative ones. How to represent and incorporate these other geographies into our current technologies? How to develop databases and conceptual models that can integrate qualitative and quantitative data types? How to develop the social dimension of GIS? (see also Helen Couclelis, in Chapter 2 of this Volume). 4.3.3.2 Data Integration Liability in the digital age: the liability of data producers and vendors for erroneous digital GI is an important issue that may is currently hampering the development of the market. Yet real evidence of cases is patchy and often not disseminated. There is an urgent need to collect the evidence that is emerging worldwide and develop standard quality assurance mechanisms involving technologists, legal expertise, and data quality experts so that costly mistakes may be avoided in the future whilst providing a framework for the development of the GI market. Protecting Confidentiality: the opportunities for integrating different data sets is at the core of what is special about GI. Nevertheless there is a need to ensure that exploiting these opportunities is not at the expense of the individuals’ rights to privacy. The statistical community has addressed this issue over the years and identified ways to reduce the risks of disclosure, typically by aggregating data at census tractlevel and/or anonymising detailed records. The recent developments in GI handling technologies offer both opportunities and threats. On the one hand it is now possible to design census areas that are more homogeneous, and hence more valuable for analysis, whilst still protecting confidentiality. On the other, it is equally possible to combine different data sets and arrive at almost individual profiles, by-passing to a large extent the restrictions put in place in the past. This is a topic that cannot be left to the market to develop as the solutions are likely to be very unsatisfactory from a civil rights point of view. Hence the involvement of the research community and government is needed. The economics of the digital information market: research is needed in this area as there is a general agreement that the market for digital GI is still immature and that its characteristics are poorly understood. This results in a great many assumptions being made about the potential value of integrating very many data sets for market development which are reminiscent of the many claims made ten years ago about the potential created by GIS for increasing efficiency and reducing costs. The latter have not materialised and there is also a strong case for a realistic assessment of the market for GI. Recent studies on the economics of GI (Coopers and Lybrand, 1996) provide a useful starting point for research in this field which needs also to consider both the multiplier effects of digital GI and the implications for job creation/loss/ and displacement. For the latter, evidence from other industries such as telecommunications and banking should also be taken into account. 4.3.3.3 Applications Risk Management in Europe: the experience of the last 18 months of discussion on the EGII clearly indicates the lack of awareness of policy makers at EU, national, and local level on the specific nature and importance of GI. For this reason there is a need to bring forcefully home the message that GI is essential for effective service delivery and governance. Given potentially catastrophic events such as the floods in The Netherlands in 1995 and Italy in 1996, and trans-border issues like pollution and contamination research on emergency services and the role of GI in their planning and delivery, will highlight the issues in relation to
38
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
analytical and modelling requirements, technical infrastructure for data exchange, minimum data and common definitions needs as well as contributing to raising awareness at senior level. Geographic monitoring systems: an increasing number of applications require real-time monitoring of events such as traffic control, air and water quality, emissions, and temperatures. Yet current GI technology is relatively poor at handling dynamic and temporal data. A number of research-led initiatives in this field have been developed recently but further progress is needed, building on the experiences of different industrial sectors and disciplines, particularly those more used to handle time/flows (e.g. traffic control, utilities) and those traditionally focusing on spatial analysis (geography). Research in this area needs to identify what can be done better with current technology and methods, and the priorities of future applied projects in addressing the needs of industry and governance. Integrating spatial models: policy at both local, national and European level tends to be formulated by sector (e.g. agriculture, transport, environment, regional development, health). Yet each set of policies has profound spatial implications the cumulative effects of which are little understood. Efforts in the late 1960s to develop comprehensive planning largely failed due to lack of political commitment but also because the technology of the time could not cope with the complexity of the systems and their interrelationships. Are the combined pressures for reduced government involvement, more effective targeting of resources, and enhanced technology changing the picture? Is it possible to evaluate the combined effects of different policies and their underpinning theories and models? What is the impact on resource allocation and service provision? Research is clearly needed to look at both technological and methodological issues in this area. 4.4. CONCLUSIONS This chapter reviewed recent developments in Europe with respect to the creation of the European geographic information resource base and put forward a policy-driven research agenda to support these developments. This agenda does not focus on those topics on which progress in one form of other is to be expected relatively soon, such as the development of metadata services, and definition of core base data. Almost all parties agree that these are of primary importance and at least some funding on these issues is already in place in the INFO2000 programme. The agenda therefore addresses specifically those issues where research is needed to inform policy formulation and where important ethical issues are at stake, such as in the area of confidentiality. What clearly emerges from the discussion is that there has been a sustained effort over the last two years to make progress in developing the European GI resource base. However, a great deal of work remains to be done as there are still significant barriers to overcome. They largely stem from the limited awareness of policy makers at all levels (European, national, and local) as to the strategic value of developing the GI resource base, the difficulty of achieving coordination among so many stake-holders in this field, and the immaturity of the market. It should also be noted that many of the topics defined above lie outside the traditional strengths of the European GIS research community and will require a much increased interdisciplinary effort. For this reason in particular the agenda set above represents both a formidable challenge and an opportunity for European GIS research. ACKNOWLEDGEMENT The views expressed in this chapter are those of the authors alone and do not necessarily reflect those of the European agencies referred to in the text. Prof. Masser’s research for this chapter is part of that undertaken
TOWARDS A POLICY DRIVEN RESEARCH AGENDA
39
for an ESRC Senior Research Fellowship award H51427501895 on Building the European Information Resource Base. REFERENCES ARNAUD, A., CRAGLIA, M., MASSER, I., SALGÉ, F. and SCHOLTEN, H. 1993. The research agenda of the European Science Foundation’s GISDATA scientific programme, International Journal of GIS, 7(5), pp. 463–470 BANGEMANN, M. 1994. Europe and the Global Information Society: Recommendations to the European Council Brussels: Commission of the European Communities. BURROUGH, P., CRAGLIA, M, MASSER, I., and SALGÉ, F. 1996 Geographic Information: the European Dimension http://www.shef.ac.uk/uni/academic/D-H/gis/policy_l.html. CEC 1993. Growth, Competitiveness and Employment: the Challenges and Ways Forward into the 21st Century. Brussels: Commission of the European Communities. CEC 1994. Europe’s Way to the Information Society: An Action Plan, COM(94) 347 Final. Brussels: Commission of the European Communities. COOPERS AND LYBRAND 1996. Economic Aspects of the Collection, Dissemination and Integration of Government’s Geospatial Information. Southampton: Ordnance Survey. DGXIII/E 1994. Heads of National Geographic Institutes: Report on Meeting Held on 8 April 1994. Luxembourg: Commission of the European Communities, DGXIII/E. DGXIII/E 1995a. GI 2000: Towards a European Geographic Information Infrastructure. Luxembourg: Commission of the European Communities DGXIII/E. DGXIII/E 1995b. Minutes of the GI 2000 meeting, 8 February 1995. Luxembourg, Commission of the European Communities, DGXIII/E. DGXIII/E 1996a. GI2000—Towards a European Policy Framework for Geographic Information: A Discussion Document. Luxembourg: Commission of the European Communities, DGXIII/E. DGXIII/E 1996b. INFO 2000: Stimulating the Development and Use of Multimedia Information Content. Luxembourg: Commission of the European Communities, DGXIII/E. DGXIII/E 1996c. Fifth Framework Programme: DG XIII/E-E Information Content: Geographic Information. Paper for discussion at the R&D meeting, 20 June 1996, Luxembourg. Luxembourg: Commission of the European Communities, DGXIII/E. LONGHORN, R 1998. An evaluation of the experience of the IMPACT-2 programme in Burrough, P., and Masser, I. (Eds.), European Geographic Information Infrastructures: Opportunities and Pitfalls. London: Taylor & Francis, MASSER, I. and SALGÉ, F. 1996. The European geographic information infrastructure debate, in Craglia, M., and Couclelis, H., (Eds.), Geographic Information Research: Bridging the Atlantic. London: Taylor & Francis, pp. 28–36. MASSER I, CAMPBELL H, and CRAGLIA M. (Eds.) 1996. G1S Diffusion: the Adoption and Use of Geographical Information Systems in Local Government in Europe. London: Taylor & Francis. NANSEN, B., SMITH, N. and DAVEY, A. 1996. A British national geospatial data base, Mapping Awareness 10(3), p. 18–20 and 10(4), p. 38–40.
Chapter Five GIS, Environmental Equity Analysis, and the Modifiable Areal Unit Problem (MAUP) Daniel Sui
When the search for truth is confused with political advocacy, the pursuit of knowledge is reduced to the quest for power (Chase, 1995). 5.1 INTRODUCTION The issue of environmental equity—whether minorities and low income communities across the United States share a disproportionate burden of environmental hazards—has attracted intensive interdisciplinary research efforts in recent years (Bowen et al., 1995; Cutter, 1995). Because of the increasing availability and easy access to several national spatial databases, such as the US EPA’s toxic release inventory (TRI) database and Census Bureau’s TIGER files, GIS technology has been widely used in environmental equity analysis during the past five years (Burke, 1993; Chakraborty and Armstrong, 1994; Lowry et al., 1995). However, most previous studies were based upon only a single set of analytical units, such as census tracts, zip code areas, or counties. To date, environmental equity analysis has been conducted using a variety of areal unit boundaries at different geographical scales without considering the effects of the modifiable areal unit problem (MAUP). The MAUP issue refers to the fact that conclusions in geographic studies are highly sensitive to the scale and the zoning scheme (areal boundaries) used in the analysis (Openshaw, 1983). Numerous empirical studies have revealed that the inclusion of scale and areal boundary factors can alter the conclusions of a theory dramatically (Openshaw, 1984; Fotheringham and Wong, 1991; Amrhein, 1995). Despite, or perhaps because of, their critical importance, the scale and areal unit boundaries chosen in previous environmental equity analysis were often dictated more by expediency than by rational justification. Not surprisingly, diametrically opposing conclusions have been reported in the literature even though basically the same data set was used in the analysis (Bullard, 1990; Goldman and Fitton, 1994). Among the conflicting evidence provided in the literature, we still do not know to what extent the scale and unit of analysis may have over-or under-estimated the relationship between the distribution of toxic facilities and the characteristics of affected population. The goal of this research is to take a systematic approach to addressing the MAUP in environmental equity analysis using GIS and discuss the ethical ramifications of GIS as a social technology. This chapter is organised into seven sections. After a brief introduction, the research background and a literature review are presented in the second section. The third section describes specific research objectives and hypotheses. The fourth section introduces the methodology, followed by empirical results in the fifth
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
41
section and further discussions from a critical social theoretic perspective in the sixth section. The last section contains concluding remarks and future research plans. 5.2 RESEARCH BACKGROUND AND REVIEW OF LITERATURE This research is framed by three sets of extensive literature: recent debates on environmental equity analysis and environmental racism; previous studies on ecological fallacies and the MAUP; and current efforts to explore the social implications of GIS technology via the NCGIA Initiative 19 (I-19). 5.2.1 The Social Problem—Environmental Equity Analysis and the Debate on Environmental Racism. The current literature on the existence and extent of environmental inequity or racism have taken two general approaches: ecological studies that examine the geographical collocation of racial and ethnic minorities and potentially hazardous facilities (Bryant and Mohai, 1992); and case studies of specific instances of environmental injustice in particular areas associated with specific facilities (Edelstein, 1988). Whereas the ecological studies attempt to determine the extent of environmental injustice, they are susceptible to constraints brought about by the fundamental ecological units of analysis. Case studies have the advantage over the ecological approach of being better able to trace the processes involved and can often conduct detailed and specific analysis aimed at establishing relationships and causes associated with the circumstances of the case. Unfortunately, case studies alone are ineffective in determining the spatial extent of the problem. The inherent difficulties in both the ecological and case study approach demand that the macro-level ecological study be integrated with micro-level case studies. Unfortunately, such an integration is rare in the literature. Most previous studies have used only one set of unit for analysis, such as census block groups (von Braun, 1993), census tracts (Burke, 1993), and zip codes (United Church of Christ, 1987). Only a few authors have mentioned the potential impacts of geographic scales (Bowen et al., 1995; Marr and Morris, 1995). Depending on the type and volume of toxic materials released and where (air, water, or soil), these toxic materials will affect populations at different distances (Hallman and Wanderman, 1989). A clear understanding of the impact of geographic scales and units will be the first step towards identifying the most appropriate unit and scale for environment equity analysis and policy implementation. 5.2.2 The Conceptual Problem—Ecological Fallacies and the Modifiable Areal Unit Problem The effects of scales and areal unit boundaries on results of geographical studies have been reported extensively in the literature. Comprehensive reviews on ecological fallacies and the MAUP problem have been provided by Langbein and Lichtman (1978), Openshaw (1983), and Fotheringham and Wong (1991). Ecological fallacy and MAUP studies can be traced at least to Gehlke and Biehl (1934) and Neprash (1934). Robinson (1950) showed empirical evidence on how effects of scales and aggregation methods can dramatically change results in illiteracy studies. These pioneering works have stimulated a wide range of studies on the impact of scale and unit of analysis in environment/ecological modelling and socio-economic
42
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
studies. Although the results vary according to the problems being examined, general conclusions from this wide range of studies indicate that parameters and processes important at one scale or unit are frequently not important or predictive at another scale or unit. Information is often lost as spatial data are aggregated to coarser scales or resolutions. Significant changes may occur when we move from one scale to another or from one zoning system of analysis to another. Each level has its own unique properties that cannot be derived by mere summation of the disaggregated parts. So far no ideal solutions have been developed to solve this stubborn problem in spatial analysis. Depending on the substantive issues being examined, the following methods for tackling the MAUP issue have been reported (Wong, 1996): 1. To identify the basic units and derive optimal scales and zonal configurations for the phenomena being studied (Openshaw, 1983). 2. To conduct sensitivity analysis and shift the emphasis of spatial analysis towards the rates of change across different scales and areal unit boundaries (Fotheringham and Wong, 1991). 3. To abandon traditional statistical analysis and develop scale-independent or frame-independent analytical techniques (Tobler, 1989). 5.2.3 The Philosophical Problem—GIS and Society. The massive proliferation of GIS in society has sparked a new research effort focusing on the dialectal relationship between GIS and society, namely how social contexts have shaped the production and use of GIS and how GIS technology has shaped the outcomes and solutions of societal problems (Sheppard, 1995). Current research has transcended the initial polarising debate (Sui, 1994) and an ongoing collaboration between GISers and social theorists is being developed via NCGIA Initiative 19 (I19). Specifically, I19 is designed to address five philosophical issues with regard to GIS and society (Curry et al., 1995): 1) In what ways have particular ontologies and epistemologies been incorporated into existing GIS techniques, and in what ways have alternative forms of representation and reasoning been filtered out? 2) How have the commodification, the proliferation, and dissemination of databases associated with GIS, as well as differential access to spatial databases, influenced the ability of different social groups to utilise information for their own empowerment? 3) How can the local knowledge, needs, desires, and hopes of marginalised social groups be adequately represented in GIS-based decision-making processes? 4) What are the possibilities and limitations of using GIS as a participatory tool for more democratic resolution of social and environmental conflicts? 5) What kind of ethical codes and regulatory frameworks should be developed for GIS applications in society? I19 addresses these conceptual issues in the context of three research themes: 1) the administration and control of populations by both public and private institutions; 2) the locational conflict involving disadvantaged populations; 3) the political ecology of natural resource access and use.
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
43
This research is situated in these broader questions concerning the social implications of how people, space, and environment are represented in GIS. The empirical results will shed light on many of the issues posed in I19 (see also Couclelis, Chapter 2 in this volume). 5.3 RESEARCH OBJECTIVES AND HYPOTHESES The primary objective of this research is to investigate empirically the effects of the MAUP issue on the results of GIS-based environmental equity analysis. By tying this research to previous studies on the MAUP and ecological fallacies, two sets of testable hypotheses are proposed: the scale-dependency hypothesis and the areal unit-dependency hypothesis. The scale-dependency hypothesis posits that in environmental equity analysis, the results concerning the relationship between the racial (or socio-economic) status of particular neighbourhoods and the distribution of noxious facilities depends on the geographical scales used in the analysis. Specifically, it was expected that a strong correlation of race (or income) with environmental inequity at one geographical scale is not necessarily significant at either higher or lower geographic levels of analysis. The number of important variables generally decreases towards broader scales. The areal unit boundary-dependency hypothesis contends that different areal unit configurations (zoning schemes) at the same scale, which usually result in different aggregation (grouping) of the demographic and environmental data, will produce different results in environmental equity analysis. Similar to the scale hypothesis, it was expected that a strong correlation of race (or income) with environmental inequity at a particular areal unit configuration does not necessarily imply that the relationship will be significant for other areal unit boundaries. Likewise, changes in areal unit boundaries may dramatically alter the results of environmental equity analysis. Unlike the effects of scales, the impacts of areal unit boundaries may be less predictable. Haphazard selection of areal unit boundaries may create haphazard results. A secondary objective of this research is to contextualise conceptually GIS in society via the empirical environmental equity analysis. By tying the empirical results to Heidegger’s enframing theory of technology (Heidegger, 1972) and Habermas’s communicative theory of society (Habermas, 1984, 1987), this chapter calls for a shift from an instrumental rationality to a critical rationality for GIS applications in the social arena. 5.4. DATA AND METHODOLOGY 5.4.1 Study Area and Data In this project, the city of Houston, Texas, serves as the study area because of its diversified ethnic groups, noted environmental problems related to Houston’s petrochemical industry, and the lack of zoning laws. Because the city of Houston proper lies predominantly within Harris county, we use the Harris county boundary as a substitute for Houston’s city limit. The primary data source for this study is the US Environmental Protection Agency’s (EPA) National Toxic Release Inventory (TRI) database from 1987–90. The TRI database contains a complete inventory of toxic release sites in all major US cities. For each toxic release site, this database provides detailed information about the type of chemicals released at each site and precise locational information in the
44
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
latitude/longitude format. The demographic and socio-economic data come from the 1990 Census Summary Tape Files (STF), which were subsequently merged with the census block group, census tract, and zip code boundaries in the 1992 TIGER files. 5.4.2 Analytical Procedures This project was conducted with the following three stages: STAGE ONE: DATA COLLECTION.
We first downloaded Houston’s TRI site data from CD-ROM. Based upon the latitude/longitude information for each site, the TRI data were converted into ARC/INFO coverage format and merged with the 1990 TIGER files and 1990 census demographic and socio-economic data. STAGE TWO: GIS ANALYSIS.
During the second stage of this project, GIS analyses were conducted to derive all the data needed for testing the two hypotheses. Two different methods have been used to aggregate data using different geographical scales and areal unit boundaries: the deterministic approach using the predefined scales and boundaries and the stochastic approach using a double random procedure. The deterministic approach: for the scale-dependency hypothesis, all the TRI sites, demographic, social, and economic data were aggregated at three geographical scales: census block group, census tract, and zip code (Figure 5.1). For the zoning-dependency hypothesis, the census tract level data were re-aggregated to three zoning schemes. ARC/INFO was used to create three new sets of spatial units: buffer zones along major highways (1.5-mile); concentric rings from major population centres (1.5-mile, 3-mile, and 4.5-mile); and sectoral radiating patterns from Houston’s three major ethnic enclaves (45-degree sectoral patterns on four concentric rings with 1.5-mile intervals) (Figure 5.2). The TRI sites, demographic, and socio-economic data were also re-aggregated according to these three new sets of spatial units. The stochastic approach: this approach aggregates data to different scales and areal unit boundaries using a double random procedure with contiguity constraints. The algorithm was originally developed by Openshaw (1977) and essentially works in the following way: if a coverage consisting of N zones is required from the initial M zone system (M>N), N seed polygons are randomly selected. One of the remaining (M-N) polygons is then randomly selected and tested for adjacency to each of the N seed polygons. According to its location, it is either added to one of the seed polygons or replaced by another randomly selected polygon. The process iterates until all the (M-N) polygons have been allocated. The centroid of an aggregate is usually defined as the centroid of the its constituent zones. An important feature of this aggregation procedure is that it preserves the basic structure of the underlying zones. However, because of the double randomness inherent in the planting of seed polygons and the allocation of remaining polygons, each iteration produces a different set of areal unit boundaries. The SAM (Spatial Aggregation Machine) program developed by Yichun Xie at Eastern Michigan University was used to carry out the random aggregation. Sixteen hundred census block groups were used to be the initial zones, which were successively aggregated to 1000 units, 800 units, 600 units, 400 units, and 200 units to test the scale hypothesis. For each scale, ten different areal boundary configurations were randomly formed to test the areal unit boundary dependency hypothesis. The attribute data were also aggregated according to each scale and areal unit boundary configuration.
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
45
Figure 5.1: Aggregation of TRI site data to three pre-defined scales.
After those aggregations using the deterministic and stochastic approaches, we have obtained the following 56 derived data sets: 1. Six data sets derived from the deterministic approach: three geographical scales (block groups, census tracts, and zip code areas) and three areal unit configurations (buffer zones, concentric rings, and sectoral radiations), 2. Fifty data sets from the stochastic approach: five geographical scales (200-unit, 400-unit, 600-unit, 800unit, and 1000-unit) and ten random areal unit boundary configurations for each scale. Statistical analyses were conducted to examine the changes of relationship between race, class, and the distribution of toxic materials. STAGE THREE: STATISTICAL ANALYSIS.
For each of the above derived data sets, statistical analyses were conducted to examine how the relationship between environmental hazards and the characteristics of the surrounding population will change under different geographical scales and zoning schemes. The following two models were estimated using SAS 6.02 for Windows:
where YTRI# is total number of TRI facilities; PMinority percentage of minority population; IPerlncome is per capita income; PDensity is population density; PBlack is percentage of black population; PHispanie is percentage of Hispanic population; PAsian is percentage of Asian population.
46
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 5.2: Aggregation of TRI site data to three pre-defined areal zoning schemes.
5.5 EMPIRICAL RESULTS 5.5.1 Results for the deterministic aggregation Results of these two models for data sets derived from the deterministic approach are shown in Tables 5.1 and 5.2. Table 5.1 contains the results of Models 1 and 2 using three geographic scales: block groups, census tracts, and zip code areas. Using Model 1, the relationship between the number of TRI sites and the characteristics of the surrounding population in terms of racial and socio-economic status was investigated. As shown in Table 5.1, the R2 of the same model increased significantly as the analysis scale moves from block groups to census tracts to zip code areas. The same independent variables explained 69 percent of the variance of the dependent variable at the zip code level, but only 63 percent and 41 percent at the census tract and the block group level respectively. This indicates that at a more disaggregate level, more independent variables need to be incorporated into the model to explain better the variance of the dependent variable. Of greater interest is the dramatic changes of the coefficients for the three independent variables for each different scale. At the block group level, per capita income is the most important independent variable in explaining the changes of the total number of TRI facilities whereas the minority population and population density played a subordinate role. As we move the scale from block groups to census tracts to zip code areas, we observe clearly that the minority population becomes more important for explaining the changes of the total number of TRI facilities whereas the per capita income and population density become less
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
47
significant. In Model 2, the minority population is separated into three subgroups: Blacks, Hispanics, and Asians, to examine further which minority group shares disproportionate environmental burdens. A similar change is observed for the R2 as in Model 1. The Black population appears to be the most significant at the block group level, with the Hispanic population being the most significant at the census tract level. Asians are inversely related to TRI sites at the block group and census tract level, and positively related to TRI sites at the zip code level, although far less significantly than Blacks and Hispanics. Table 5.1: Results of GIS-based environment equity analysis using different geographical scales. Variables
Block groups
Census tracks
Zip codes
2015 0.41 2.29 −3.19 −2.31
583 0.63 3.27 −2.22 −1.91
140 0.69 4.45 −1.39 −0.92
583 0.18 0.062 0.049 −0.067
140 0.24 0.27 0.11 −0.033
Model 1: N R2 b1 b2 b3 Model 2: N 2015 2 R 0.11 b1 0.071 b2 0.035 b3 −0.038 Results significant at 0.95 confidence level
Table 5.2 contains the results of Models 1 and 2 using three zoning schemes: buffer zones along major highways, concentric rings from major population centres; and sectoral radiations from three ethnic enclaves. As shown in Table 5.2, the R2 of the same model decreased significantly as the analysis zoning scheme changes from buffer zones to concentric rings to sectoral areas. This suggests that a model with quite different predictive powers may be produced if the data are aggregated according to different areal unit boundaries. Table 5.2: Results of GIS-based environmental equity analysis using different areal unit boundaries. Variables Model 1: N R2 b1 b2 b3 Model 2: N R2 b1
Block groups
Census tracks
Zip codes
291 0.54 7.31 −1.31 −1.53
398 0.49 4.58 −1.36 0.69
343 0.31 3.91 −4.95 1.43
291 0.24 0.029
398 0.21 0.013
343 0.37 0.094
48
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Variables
Block groups
b2 0.025 b3 −0.14 Results significant at 0.95 confidence level
Census tracks
Zip codes
−0.045 0.026
0.041 −0.29
The importance of minority population dropped substantially from buffer zones to concentric rings to sectoral radiations. The importance of per capita income is similar for buffer zones and concentric rings with a dramatic increase for sectoral aggregation. The population density factor is slightly more complicated than the minority population and per capita income: it is negatively related to the number of TRI sites for the buffer aggregation and positively related to the number of TRI sites, but less important than the other two variables for both the concentric ring and sectoral aggregation. As for Model 2, using data from the buffer zone aggregation, Asian population is inversely related to the number of TRI facilities with Black and Hispanic population being almost equally important. Hispanic population is inversely related to the dependent variable under the concentric ring scheme. Sectoral aggregation produced the highest R2 with Black population being the most important independent variable. Overall, the changes of results according to different zoning schemes are less predictable than those of different scales. 5.5.2 Results from the stochastic aggregation Results for both Model 1 and 2 from random scale and areal boundary changes are shown in Figure 5.3(a)– (c) for Model 1 and Figure 3(f)–(g) for Model 2. All results are significant at the 95 percent confidence level. If the MAUP issue did not exist, there would be little variation in each of the parameter estimates for Models 1 and 2. However, it is quite evident from Figure 5.3(a)–(c) that the change of scale and the modification of the areal boundary units from which the data are aggregated do create substantial variation for parameter estimation and reliability. Both of the models display a great degree of sensitivity to scales and zoning scheme variations. For Model 1, the variations in the estimates of bi and b2 seem to be very systematic: b1 has become systematically increased and the absolute value of b2 has decreased (less negative) as the scale shifts from a more disaggregated (1000 units) to a more aggregated (200 units) one. This result is consistent with the results of the deterministic aggregation. These findings indicate that income (class) tends to become the most important variable in explaining the distribution of TRI sites if more disaggregate data are used in environmental equity analysis and percentage of minorities (race) becomes the most important variable when more aggregate data are employed. Compared to b1 and b2, the estimates for b3 are stable at different aggregations and all remain negative, indicating the consistent inverse relationship between the distribution of TRI sites and population density. For each scale, the random variation in areal unit boundaries (zoning system) has created substantial fluctuations for all the three parameters in the model. However, it seems that the variations are greater at the more disaggregated level than at the disaggregate level. For Model 2, as shown in Figure 5.3(d)–(f), similar variations are observed although a little bit less systematic than the estimates of Model 1. The importance of both PBlack (percentage of black population) and PHispanic (percentage of Hispanic population), especially PHispanic, has increased as the data have been incrementally aggregated. The magnitude of increases is smaller compared to Model 1. The estimates of b3 fluctuate between positive and negative values, which means that PAsian could be either positively or negatively related to the distribution of TRI sites, depending on how data have been aggregated. With
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
49
Figure 5.3: Variations in parameter estimates with random scale and areal boundary changes.
regard to the changes of parameter estimates according to areal unit boundaries at each different scale, variations are still discernible but the magnitude is far smaller that that of Model 1. These results clearly indicate that the results of the environmental equity analysis as currently conducted using GIS so far are highly sensitive to scales and areal units. What is more troubling is the fact that it is possible to find almost any desired results simply by re-aggregating the data to different scales and arealunit boundaries. Although we should consider the technical solutions for the MAUP issue, we must go beyond the technicalities to view this issue from a broader social and intellectual perspective. 5.6 FURTHER DISCUSSIONS: GIS AS A SOCIAL TECHNOLOGY These findings have profound implications for applying GIS to address controversial social issues. This research has demonstrated that, on the one hand, GIS has greatly facilitated the integration of a variety of spatial and non-spatial information at different scales and areal unit boundary configurations. It will be extremely difficult, if not entirely impractical, to conduct a multiple-scale/multiple zoning scheme environmental equity analysis without the aid of GIS. On the other hand, if cautionary steps were not taken to address the MAUP issue, GIS technology would be easily abused to generate whatever results, presumably with unquestionable hi-tech-based objectivity, to advance the political/social agendas of various interest groups. These uncertainties in GIS-based environmental equity analysis have perpetuated the ethical dilemmas facing researchers in this controversial area. I believe that mere technical solutions will not suffice for these dilemmas. We must reconceive and redefine the nature of GIS technology, from viewing it as a value-free tool to viewing it as a socially constructed technology. To achieve this shift of our philosophy, two critical theories are extremely useful in illuminating the implications of this study:
50
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Heidegger’s enframing theory of technology (Heidegger, 1972) and Habermas’s communication theory of society (Habermas, 1984, 1987). According to Heidegger (1972), whenever we are applying a piece of technology to solve a problem, we are enframed by the implicit assumptions of the technology. Technology is a mode of revealing by concealing. Similar to what quantum physicists have told us, whenever an instrument is applied to measure the phenomena being studied, the instrument inadvertently alters the physical conditions of the system being measured which usually leads to unavoidable measurement errors. Heidegger further argues that the enframing nature of technology does not come from the potentially lethal machines or an apparatus of technology. The real danger of the enframing nature of technology is that we are increasingly becoming blind to other alternative ways of looking at things when we turn to technology for solutions of social problems. To Heidegger, technologies are not mere exterior aids but interior transformations of consciousness. Heidegger (1972) prescribed that we must strive to let things reveal their “thingness” instead of relying on a particular technology to do that for us. In the case of GIS-based environmental equity analysis, I believe that researches reported in the literature so far are enframed by the use of secondary data (both TRI facilities and characteristics of the population) and to which scale and areal unit boundaries these data are aggregated. It is beyond the scope of this chapter to discuss the data problems, not just in the technical sense of data-error but in the political sense of data-appropriateness. Even for the MAUP issue alone, as shown above, a single scale and areal unit boundaries do not warrant reliable conclusions. These findings reveal as much the systems we are conducting our research in as the environmental problems they are supposed about. If the public were not being informed about the effects of scales and zoning systems used in environmental equity analysis, as Zimmerman (1994) described in so many court decisions, they would be easily led to believe in a haphazard conclusion drawn at a particular scale or zoning system. To contextualise further the enframing nature of technology, Habermas’ communicative theory of society is also enlightening (Habermas, 1984, 1987). Central to Habermas’ theory is the analysis of “how individuals and organisations systematically manipulate communications to conceal possible problems and solutions, manipulate consent and trust, and misrepresent facts and expectations” (Aitken and Michel, 1995). Habermas (1984) argued that any form of knowledge is a product of human wishes, including the will to power, as well as the human practices of negotiation and communication. To Habermas and many others, technology not only enframes us into a particular mode of thinking, but also, perhaps more troublesome, manufactures fictions that can capture and trap public opinion into illusions. Because all things, including space, people, and environment, have become digital in GIS, they can be more easily manipulated in environmental equity analysis than before. From the results presented above, it can be seen that GIS technology, like all other communication tools, can be (ab)used by individuals and organisations to manufacture results to legitimate and impose political, economic, and social agendas. Far from being a neutral, value-free scientific tool, GIS is actually being used more as a communication and persuasion tool in the studies of many controversial social issues. In order to make GIS fulfil our democratic ideals in society, a shift of our philosophy from viewing GIS as an instrument for problem-solving to viewing it as a socially embedded process for communication is long overdue. Such a critical perspective of GIS entails an ontological as well as an epistemological position that views the subjects of research and representation as situated in complex webs of power relations that construct and shape those very subjects. This philosophical shift demands us to be both critically objective and objectively critical about applications of GIS in society. To be critically objective means to limit one’s conclusions as essentially partial and selective among all the possible conclusions rather than making radical claims about their universal applicability. To be objectively critical means to make one’s position vis-à-vis assumptions and limitations of research methodology explicitly known rather than invisible, because, to a
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
51
great extent how we see determines what we see. GIS-based environmental equity analysis can serve as an engaging example to apply such a critical perspective. As indicated by the empirical results of this study, computer systems can shape our understanding of social reality so that effects are due, not to the phenomena measured, but to the systems measuring it. The social studies of GIS are a journey upstream towards the sources of everyday facts. This shift from instrumental to critical rationality will enable us to examine more vigorously how space, people, and environment have been represented, manipulated, and visualised in GIS, and thus promote more ethical GIS practice in the social arena. 5.7 CONCLUSIONS How big is your backyard, or what is the appropriate geographic scale or zoning system for environmental equity analysis, has always been a contentious issue in environmental justice research. Most previous studies on environment equity analysis were based upon an ad hoc selection of geographic scales and areal unit boundaries without a rational justification. The perplexing MAUP and ecological fallacies in environment equity analysis have not been adequately addressed in the literature. The primary purpose of this chapter was to develop a GIS approach to conduct environment equity analysis using multiple scales and zoning schemes in an attempt to examine the effects of scale and areal unit boundaries on the results of environmental equity analysis, and the implications of GIS technology in addressing controversial social issues. The preliminary results clearly indicate that the findings of environment equity analysis are highly sensitive to the geographical scales and areal-unit boundaries used. Environmental equity analyses based upon a single scale or zoning scheme cannot warrant a reliable conclusion about the actual processes of environmental equity. If the effects of geographic scales and zoning schemes are not considered, it has\been shown that it is possible to find almost any desired results simply by re-aggregating the data to different scales and areal-unit boundaries. The empirical results have confirmed both the scale-dependency and the areal boundary-dependency hypotheses. The findings of this research provide some engaging examples for GIS as a social technology. In order to overcome the enframing nature of GIS technology, GIS practices must be contextualised into their social dimensions as essentially a communication process. Viewing from a critical social theory perspective, GIS discloses the multifarious practices of various social groups with conflicting political agendas, which must be interrogated critically. Otherwise we might be deceived into thinking that the model in the database corresponds primarily to the essence of reality. ACKNOWLEDGEMENTS Part of this research was financially supported by the Creative and Scholarly Research Program, sponsored by the Office of Vice President for Research at Texas AandM University. The athor would like to thank Yichun Xie for providing the SAM program; Carl G.Amrhein and David W.S.Wong for the latest literature on the MAUP; and Michael Kullman, Daniel Overton, Thomas H.Meyer, and Ran Tao for their assistance in data preparation and aggregation. REFERENCES AITKEN, S.C. and MICHEL, S.M. 1995. Who contrives the ‘real’ in GIS? Geographic information, planning and critical theory, Cartography and Geographic Information Systems, 22(1), pp. 17–29.
52
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
AMRHEIN, C.G. 1995. Searching for the elusive aggregation effect: Evidence from statistical simulations, Environment and Planning A, 27(1), pp. 105–119. BOWEN, W.M., SALLING, M.J, HAYNES, K.E., and CYRAN, E.J. 1995. Towards environmental justice: Spatial equity in Ohio and Cleveland, Annals of the Association of American Geographers, 85(4), pp. 641–663. BRYANT, B. and MOHAI, P. (Eds.) 1992. Race and the Incidence of Environmental Hazards: A Time for Discourse. Boulder, CO: Westview Press. BULLARD, R.D. 1990. Dumping in Dixie: Race, Class, and Environmental Quality, Boulder, CO: Westview Press. BURKE, L.M. 1993. Race and environment equity: A geographic analysis in Los Angeles in Geo Info Systems, 4(6), pp. 44–50. CHAKRABORTY, J. and ARMSTRONG, M.R 1994. Estimating the population characteristics of areas affected by hazardous materials accidents in Proceedings of GIS/LIS’94, Phoenix, 25–27 October. Bethesda, MD: ASPRS, pp. 154–163. CHASE, A. 1995. In a Dark Wood: The Fight over Forests and the Rising Tyranny of Ecology. Boston: Houghton Mifflin Co. CURRY, M., HARRIS, T., MARK, D. and WEINER, D. 1995. Social implications of how people, environment and society are represented in GIS, NCGIA New Initiative Proposal (Available on-line at: http://www.geo.wvu.edu/ www/i19/proposal). CUTTER, S.L. 1995. Race, class, and environmental justice, Progress in Human Geography, 19(1), pp. 111–122. EDELSTEIN, M.R. 1988. Contaminated Communities: The Social and Psychological Impacts of Residential Toxic Exposur. Boulder, CO: Westview Press. FOTHERINGHAM, A.S. and WONG, D.W.S. 1991. The modifiable areal unit problem in multivariate statistical analysis, Environment and Planning A, 23(6), pp. 1025–1044. GEHLKE, C.E. and BIEHL, K.K. 1934. Certain effects of grouping upon the size of the correlation coefficient in census tract material, Journal of the American Statistical Association Supplement, 29(1), pp. 169–170. GOLDMAN, B.A. and FITTON, L. 1994. Toxic Wastes and Race Revisited: An Update of the 1987 Report on the Racial and Socio-economic Characteristics of Communities with Hazardous Waste Sites. Washington, D.C: Center for Policy Alternatives. HABERMAS, J. 1984. The Theory of Communicative Action, Vol. 1. Boston: Beacon Press. HABERMAS, J. 1987. The Theory of Communicative Action, Vol. 2, Boston: Beacon Press. HALLMAN, W. and WANDERMAN, A. 1989. Perception of risk and waste hazards in Peck, D.L. (Ed.), Psychological Effects of Hazardous Waste Disposal on Communities. Springfield, IL: Charles C. Thomas, pp. 31–56. HEIDEGGER, M. 1972. The question of concerning technology, in Lovitt, W. (Ed.), The Question Concerning Technology and Other Essays. New York: Harper Colophon, pp. 3– 35. LANGBEIN, L.I. and LICHTMAN, A.J. 1978. Ecological Inference. Beverly Hills, CA: SAGE Publications. LOWRY, J.H., MILLER, H.J., and HEPNER, G.F. 1995. A GIS-based sensitivity analysis of community vulnerability to hazardous contaminants on the Mexico/US border, Photogrametric Engineering and Remote Sensing, 61(11), pp. 1347–1359. MARR, P. and MORRIS, Q. 1995. People, poisons, and pathways: a case study of ecological fallacy. Paper presented at the International Conference on Applications of Computer Mapping in Epidemiological Studies, Tampa, FL, 14– 19 February. NEPRASH, J.A. 1934. Some problems in the correlation of spatially distributed variables, Journal of the American Statistical Association, 29(supplement), pp. 167–168. OPENSHAW, S. 1977. Algorithm 3: a procedure to generate pseudo-random aggregations of N zones into M zones, where M is less than N, Environment and Planning A, 9(6), pp. 1423–1428. OPENSHAW, S. 1983. The Modifiable Areal Unit Problem. CATMOG Series, No. 38, London: Institute of British Geographers. OPENSHAW, S. 1984. Ecological fallacies and the analysis of areal census data in Environment and Planning A, 15(1), pp. 74–92.
GIS, ENVIRONMENTAL EQUITY AND THE MAUP PROBLEM
53
ROBINSON, W.S. 1950. Ecological correlation and the behavior of individuals, American Sociological Review, 15(2), 351–357. SHEPPARD, E. 1995. GIS and society: an overview, Cartography and Geographical Information Systems, 22(1), pp. 5–16. SUI, D.Z. 1994. GIS and urban studies: positivism, post-positivism, and beyond, Urban Geography, 15(3), pp. 258–78. TOBLER, W. 1989. Frame independent spatial analysis, in Goodchild, M.F. and Gopal, S. (Eds.), Accuracy of Spatial Database. New York: Taylor & Francis, pp. 115–122. UNITED CHURCH OF CHRIST 1987. Toxic Wastes and Race in the United States: A National Report on The Racial and Socio-Economic Characteristics of Communities with Hazardous Waste Sites. New York: Commission of Racial Justice, United Church of Christ. VON BRAUN, M. 1993. The use of GIS in assessing exposure and remedial alternatives at Superfund sites in Goodchild, M.F., Parks, B.O., and Steyaert, L.T.(Eds.), Environmental Modeling with GIS. New York: Oxford University Press, pp. 339–347, WONG, D.W.S. 1996. Aggregation effects in geo-referenced data in Arlinghaus, S.L. (Ed.), Practical Handbook of Spatial Statistics. Boca Raton, FL: CRC Press, pp.83–106. ZIMMERMAN, R. 1994. Issues of classification in environmental equity: how we manage is how we measure, Fordham Urban Law Journal, 29(3), pp. 633–669.
Chapter Six National Cultural Influences on GIS Design: a Study of County GIS in King County, Wa, USA and Kreis Osnabrück, Germany Francis Harvey
6.1 NATIONAL CULTURE AND GEOGRAPHIC INFORMATION SYSTEMS This chapter sets out to discern the influence national culture can have on GIS design. Culture plays a crucial role in all human activity, but is entangled with institutions, disciplines, and our daily lives in perplexing ways. Studies of cultural influence on GIS technology stand to benefit the GIS research and practitioner communities through insights into this frequently down-played area. This research focuses specifically on national culture influences in GIS design, building on prior work in geography, cartography, GIS, information science, and sociology. It employs ethnographic research methods to examine the embedded relationships between national culture and GIS design, comparing the GIS designs of a county in the USA with a county in Germany. In sociology the importance of culture is perfectly obvious, but geography (Hettner, 1927; Pickles, 1986), cartography (Wood and Fels, 1986; Harley, 1989), and information systems (Hofstede, 1980; Boisot, 1987; Jordan, 1994) also emphasise the importance of culture. GIS researchers also consider cultural aspects. Some GIS research focuses on differences in the cultural mediation of GIS operations and corresponding cultural concepts (Campari and Frank, 1993; Campari, 1994). Other GIS researchers have studied cultural differences in spatial cognition (Mark and Egenhofer 1994a, 1994b). This research specifically utilises frameworks for examining the influence of national culture on information systems, particularly Hofstede’s cultural dimensions (Hofstede, 1980; Jordan, 1994). Like these works this research builds on the sociological work of Max Weber. Culture is commonly understood in this sociology to be the shared set of beliefs that influence what we consider to be meaningful and valuable. Disciplines, professions, and institutions in modern bureaucratic society nurture and transmit cultural values and meanings (Weber, 1946). In this vein, Obermeyer and Pinto recently discussed the role of professions in GIS in Weber’s framework (Obermeyer and Pinto 1994). Chrisman, writing earlier about the involvement of different disciplines and guilds in spatial data handling, also identifies disciplines as carriers and transmitters of cultural values (Chrisman, 1987). The focus of this research is solely national culture and employs the national culture dimensions described by Hofstede (1980). His framework describes four dimensions of national culture (uncertainty avoidance, power distance, individuality and masculinity) with their influence on thinking and social action. This research examines the two pertinent dimensions (uncertainty avoidance, power distance). Following the presentation of the theoretical background for this work in the next section, the methodology employed is described in the third section. The two research questions are:
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
55
1. How well do Hofstede’s national cultural dimensions reflect differences in the GIS designs as indicated through the analysis of the design documents? 2. Do these dimensions also help explain the actual practice of GIS design? The fourth section presents the GIS of the two counties, Kreis Osnabrück in Germany and King County in the United States, and the fifth section evaluates their differences in terms of Hofstede’s national cultural dimensions. The final section turns to a general review of the research findings and presents an explanation for the differences found between GIS design practice and Hofstede’s formal framework. 6.2 THEORETICAL BACKGROUND Because of its ubiquity, studying culture, even just national culture, is an extremely complex, and introspective activity (Clifford and Marcus, 1986; Emerson et al., 1995). Moving beyond the limits of our own cultural understanding and comprehending another easily becomes a very subjective undertaking. Fortunately, Hofstede, in a study of the influences of national culture on information systems, evaluated 117, 000 questionnaires from 84,000 people in 66 countries. Out of this mass of empirical data he developed four dimensions of national cultural influence on information system design. Although vast in scale, the focus on information systems was limited enough to provide an empirically validated framework that can be employed in evaluating GIS design. Hofstede specifically examined the role of national culture in work-related values and information system design (Hofstede, 1980). Applying theories of culture and organisational structure from Weber (1946) to the research findings, Hofstede (1980) establishes four dimensions of national culture. • uncertainty avoidance: the extent to which future possibilities are defended against or accepted • power distance: the degree of inequality of power between a person at a higher level and a person at a lower level • individualism: the relative importance of individual goals compared with group or collective goals • masculinity: the extent to which the goals of men dominate those of women. Uncertainty avoidance is the focus of information systems, decision support systems and so on (Jordan, 1994). It is considered together here with power distance because of interaction effects (Hofstede, 1980). The other two dimensions, individualism and masculinity, having little importance and relevance to German and US cultures, lie outside the focus of this research. Germanic and Anglo-American cultures are strongly differentiated in terms of uncertainty avoidance; the power distance dimension is quite similar. It is important to note that Hofstede’s findings ascribe ideal typical qualities to each culture in a Weberian sense: they are the strived for forms, not individual characteristics. In other words, research can only find distinctions between social group behaviour in terms of these dimensions. Uncertainty avoidance and power distance form critical interactions affecting organisations. In Germany and the USA, characterised by low power distance, there are two possible ways to keep organisations together and reduce uncertainty. In Germanic cultures, with high uncertainty avoidance, “people have an inner need for living up to rules,…the leading principle which keeps the organisations together can be formal rules” (Hofstede 1980, p. 319). With low uncertainty avoidance (Anglo-American cultures), “…the organisation has to be kept together by more ad hoc negotiation, a situation that calls for a larger tolerance
56
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 6.1: Dimensions of National Culture for low power distance and different uncertainty avoidances (After Hofstede, 1980 p. 319)
for uncertainty from everyone” (Hofstede 1980, p. 319). Figure 6.1 shows important organisational characteristics based on uncertainty avoidance and power distance dimensions. Hofstede makes detailed comments about these differences. The “Anglo” cultures “would tend more toward creating implicitly structured’ organisations” (Hofstede 1980, p. 319). In contrast, German speaking cultures establish “workflow” bureaucracies that prescribe the work process in much greater detail (Hofstede 1980, p. 319). Hofstede argues that problem solving strategies and implicit organisation forms follow: Germans establish regulations, Anglo-Americans have negotiations. Germans conceive of the ideal organisation as a “well-oiled machine”, whereas Anglo-Americans conceive of a “village market” (Hofstede, 1980). Information transaction cost theory (Willamson, 1975) provides additional insight into cultural influence on organisational structure and approaches to problem solving. In this theory, all business activity is transaction between individuals and groups. Information serves as the controlling resource (Jordan 1994). In this form the theory is overly reductionist and simplistic. Boisot (1987) extended this transaction cost theory to include cultural issues, distinguishing two characteristics of information that affects transactions: • codification: the degree of formal representation • diffusion: the degree of spread throughout the population (Jordan, 1994). Internalising the transaction in the organisation reduces the diffusion of information (Jordan, 1994). Centralised information requires a bureaucracy, whereas diffuse information is distributed in a market. These differences correspond to Hofstede’s national cultural characteristics (Jordan, 1994). How GIS design codifies or diffuses information will depend on the importance of uncertainty avoidance and ideal organisation type. Multi-disciplinary and multiple goal orientations (Radford, 1988) will create additional hurdles to face in information system design. Nominally, highly integrated industries and commerce apply the information transaction approach. GIS design approaches often begin with a similar structured systems approach (Gould, 1994). When considering heterogeneous public administrations, a different, highly diversified organisational structure is possible. In county governments the multi-disciplinary interests, missions, goals, and perspectives require special consideration of the cultural values.
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
57
6.3 METHODOLOGY AND RESEARCH DESIGN This research compares two ethnographic case studies of the GIS designs and implementations in King County, Washington, USA and Kreis (County) Osnabrück, Lower-Saxony, Germany. The research design is conceptually divided into two phases. In the first phase design documents were examined and compared (see Harvey, 1995) for the first report of these results). During the second phase, I participated as an observer in the actual design process to validate my findings from the first phase and test Hofstede’s framework. A case study methodology was chosen for the detailed insight it provides into the distinct cultural and institutional context of each GIS (Onsrud and Pinto, 1992). In the case of King County I followed a strategy of contextual inquiry, compared to naturalistic observation used during a shorter visit to Kreis Osnabrück (Wixon and Ramey, 1996). A framework for the case studies was prepared following Hofstede’s framework with a focus on uncertainty avoidance and the role of regulations and negotiations. Ethnographic approaches to differences in scientific practice (Hayek, 1952, 1979; Latour and Woolgar, 1979; Hirschhorn, 1984; Anderson 1994; Nelson, 1994) influenced the choice of participant observation to collect data. The actual issues raised during document evaluation, open-ended interviews, written correspondence, and telephone communications focused on GIS design and the construction of organisational, institutional, and physical components. The case study in King County occurred over a longer portion of time (six months) during which I participated in the system conceptualisation. This was followed by several phone interviews and written communication. Due to the distance to Kreis Osnabrück, key questions were posed in written format, several months before the site visit. During an intensive one week visit, open-ended interviews were held with six project participants and analysed. My training in German planning and administrative law plus experiences with GIS applications in Germany enabled me to get to the key research questions rapidly. The preliminary evaluation of documents and an ongoing exchange of discussions and/or e-mail, allowed a gradual entry into the design and implementation practices of each county. The design documents for each county were examined and evaluated in terms of Hofstede’s framework. Flood protection planning was chosen as a case study to examine in more detail because of the fundamental similarity of this mandate and the availability of digital data in both counties. The preparation of the visit to Kreis Osnabrück involved formulating specific questions and issues about design practice, uncertainty avoidance, and the role of regulations and negotiations. Questions focused on filling gaps in the recent history of the county GIS, understanding the role of different administrative agencies in the design process, and examining the practice of GIS design and implementation. During the visit, I visited several agencies and had discussions with county staff. Because of the far longer duration of observation in King County and my more direct involvement with the project, the case study in King County followed a different plan. The comparison was formulated parallel to my work there, so this case study involved clear retrospective and inquiry phases. After my six months project participation at King County, I had several meetings, telephone calls, and written correspondence with project staff to discuss specific questions related to project history, design, and implementation. 6.4 DESIGNS Both King County and Kreis Osnabrück started the GIS projects examined here in 1989. King County’s design occurred after several failed attempts, and was characterised by ongoing negotiations. Kreis
58
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Osnabrück’s design involved a detailed examination of two departments’ tasks (regional planning and environmental protection) that focused on the identification of data objects and tasks required to fulfil legislated mandates. The essential difference in the GIS design approaches goes back to Kreis Osnabrück’s reliance on standards and regulations (ATKIS, ALK, MERKIS), whereas King County developed its GIS from the ground up. ATKIS—Automatisierte Topographisch-Kartographische Informationssystem (automated topographic and cartographic information system) is the most important standard for Kreis Osnabrück. It is the object orientated data model for provision of vectorised topographic data at three scales: 1:25,000, 1: 200,000 and 1:1,000,000. ALK—Automatisierte Liegemhaftskataster (automated property cadastre) is the automation of the Grundbuch (Property Book), the registry of property ownership. MERKISMaβ staborientierte Einheitliche Raumbezugsbasis für Kommunale Informationssysteme (map scale orientated uniform spatial coordination basis for communal information systems) describes GIS at the communal level as a “….geographic data base for agency specific, spatial communal information systems based on the national coordinate system, a unified data model for all topographic and agency specific spatial data.” (Der Oberkreisdirektor Landkreis Osnabrück, 1990). Kreis Osnabrück’s GIS design approach involves essentially three phases. In the first phase, questions regarding administrative functions (following the respective legal mandates) and problems with the available cartographic products were raised. The results were the basis for the detailed breakdown of administrative functions into tasks and objects. These tasks and objects are finally implemented during the last stage of design, when all issues and conflicts are to be worked out. King County’s (KC) GIS design process is far more complicated. Although it followed the accepted procedure (needs assessment, conceptual design, pilot study), the autonomy of participating agencies and county politics led to a very convoluted development. The final design involves a project that constructs the core data layers and infrastructure, but then finishes. This leaves many issues open to further negotiation. The central group in King County is basically a steering committee. There is no regulation or standardisation of what the county GIS is based on or should provide. The design of KC-GIS was, not surprisingly, difficult. After an internal proposal for a GIS fell apart due to internal strife, PlanGraphics was called in to carry out the design. This began with a needs assessment. The basic tenet of the PlanGraphics needs assessment report points to the requirement for coordination and a centralised organisation. They are the presumed basis for effectively using GIS technology that provides information and services to fulfil county administrative and governmental functions. The design paradigm follows the line that because departmental functions and information are dependent and related to other departments, a centralisation of the functions and information in a county GIS would improve the effectiveness of King County’s administration. The needs assessment report (PlanGraphics, 1992d), adopting a strategy of limited centralisation, focused mostly on elaborating county needs for a GIS in terms of common, shared, and agency specific applications. The intent was to determine which elements of a single department’s applications are common with other departments’ elements. The PlanGraphics GIS design proposal left a great many issues unresolved. These gaps required an exhaustive study of the conceptual design document and discussions with the various agencies to design a project that would fulfil objectives: in other words establishing the playing field and negotiation. Starting with the PlanGraphics documents, a special group in the Information Systems Department of Metro prepared a scoping report (Municipality of Seattle, 1993) with a more exhaustive overview of design, but left the implementation to inter-agency negotiation, and maintenance for even later negotiation.
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
59
Many GIS applications identified in the PlanGraphics reports were eliminated, because the budget for the project was reduced from US$ 20 million to US$ 6.8 million. The project’s focus was limited to creating the infrastructure and essential layers for a county GIS. Afterwards, responsibility for the layers would return to the “stakeholders”. From the PlanGraphics proposal only the names of the essential layers remained. The contents of the layers were left open to negotiation. The reduction in funding without a corresponding redefinition of mission and vague descriptions of mandates meant the design stage carried on into implementation, accompanied by ongoing negotiations. Based on examinations of design documents, the Table 6.1 summarises key design features of the two counties. Table 6.1: Comparison of Kreis Osnabrück and King County GIS design documents Kreis Osnabrück Organisation Lead agency is the information system department of the county government Various working groups are coordinated by a newly created position, GIS data base design is carried out in the responsible agency together with a central coordinating group following ATKIS Purpose Provision of data and information for more efficient administration and planning at the communal level Budget overview DM 2.89 million (app. US$ 1.94 million) Data model (Base layers) Provided and defined largely by the national standards ATKIS, ALK, and MERKIS. Extensions are for county purposes and already listed in the object catalogue. Agencies can extend the data model when needed in a given scheme.
King County The information system department of county transit agency (recently merged into the county government) is the lead agency. Two committees accompany the project GIS data base design is coordinated with other agencies, municipalities, and corporations
The core project aims to provide capabilities that are vaguely defined, i.e. “better management” The basic project goals is the development of a county GIS database. US$6.8 million
No explicit data modeling in the conceptual design documents. In all there are 72 layers. The most important are:
Survey Control Public Land Survey System Street Network Property Political Information collection, analysis, and display Documents describe administrative Documents sometimes identify rough costs (Municipality of Seattle 1993), procedures and source maps in detail, but no detailed requirements, sources, procedures of any kind are identified but not which GIS operations are required. Sources: Municipality of Seattle 1993; PlanGraphics, 1992b; Der Oberkreisdirektor Landkreis Osnabrück 1990, 1992a, 1993b.
60
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
6.5 DIFFERENCES Hofstede’s national cultural dimensions of information system design are clearly recognisable in each county’s GIS design documents. Kreis Osnabrück describes its GIS in terms of a clear and concise framework of laws, regulations, and accepted standard operating procedures. Before any product or GIS function is implemented in Germany, it is first formalised and codified. This usually involves negotiations (for example the modified ATKIS used in Kreis Osnabrück), but these negotiations are completed before the design stage and accepted as a regulation by other bodies. This reliance on regulations slows down the development of the county GIS to the rate at which regulations can be put in place. Several agencies are not satisfied by the slow rate of progress the county GIS is making. However, so far the county director has stayed with a stepby-step process of formalisation followed by implementation. King County, on the other hand, continually negotiates the design and implementation of the county GIS. The loose ends in the design documents reflect the “village market” approach. Piece-by-piece, portions of the county GIS are agreed to and implemented. This leaves design issues and, in particular, maintenance issues open or simply unresolved until implementation, reflecting the national cultural characteristics that lean towards negotiation as a design strategy. Design documents are the basis for negotiations between actors. Agreement is only established for a particular portion of KC-GIS with no guarantee of how long it will be maintained. Additionally, the design documents also leave many courses of action open, requiring extensive negotiations before any work is done. Due to this complexity, the agencies involved in the process still operate independently with limited synchronisation. 6.5.1 Designing and Implementing Regulations Each county’s design stops short of identifying specific GIS operations or functions required to prepare a layer or carry out parts of a task. It was clear that in King County the preparation of design documents and the negotiation of implementation are inextricable. However, there was no exact indication before the naturalistic observation in Kreis Osnabrück of how design documents were actually utilised during implementation. The evidence from the design documents supported Hofstede’s work, but the process of getting the design to work remained obscure. The practice of GIS design in Kreis Osnabrück differed considerably from Hofstede’s characterisation of Germanic national culture and the suggested procedures described in the design documents. The case study research indicates that the transformation of regulations into design and implementation occurs through negotiation. This was related to me using several examples. A good example is the case of database software. Part of the design, it turned out (for various reasons) would not be developed. This was a crucial component of the software system. Lacking it, the entire software design had to be reworked around an offthe-shelve product. This change was worked out through negotiations between participating agencies over the new course of action and between technical staff over design and implementation details. Problems arose nearly every day during implementation, some were large and other were small, requiring quickaction and alterations. Contrasted with Kreis Osnabrück, with only an implicit framework of regulations and guidelines for GIS design and implementation, the design of King County’s GIS project relies heavily on negotiations between departments. Since design work concludes only by pointing out the many loose-ends to be dealt with by the respective departments (PlanGraphics, 1992b), negotiation will always be the crucial step in project design.
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
61
Figure 6.2: Design for KC-GIS (from Municipality of Seattle, 1993)
In Figure 6.2, the puzzle pieces, illustrating how different parts of the county GIS should “fall into place”, graphically suggest the importance negotiations have even at the end of formal design. 6.5.2 Flood Protection Planning The detailed examination of the use of GIS for flood protection planning, a mandate similar in both counties, illustrates the influence of national culture on GIS design. In Kreis Osnabrück flood protection planning is based on the Prussian Water Legislation which states that land use permission must be granted by the County Office of Hydrology before the permit application can be further processed. This permission is the first step in acquiring the permit to build in a flood zone. The county first establishes whether the project lies in a flood zone. If it does, the county is required by law to establish whether the project is capable of approval. Currently this is done by overlaying a transparent plastic map of the flood zones on a map of the area in question.
62
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
The GIS implementation foreseen to support this mandate does not alter the application procedure. Flood protection zones are a legally defined regulation that the GIS implementation must support. GIS overlay will be used in the same way as the overlay of map transparencies is now. Flood protection planning is administratively different, but relies equally on overlay in King County. As in the rest of the USA, flood zones in King County are defined by the Federal Emergency Management Agency (FEMA). The establishment of flood zones and the partial regulation of land use is set out in the Federal Flood Insurance Protection Act of 1968. It establishes that maps for every community in the USA are to be prepared that indicate the flood zones. King County’s flood protection planning is instrumented through the permit application process, following the same overlay approach as in Kreis Osnabrück. It is not possible to obtain a building permit in a FEMA flood zone. A county ordinance lays this out and identifies the executing agency responsible for verification. The ordinance identifies the environmental agency as the clear “stakeholder” who is autonomous (in the context of the county ordinance) in fulfilling this function. However, the autonomy of county agencies leads to a very loosely defined bundle of technology, methods, and procedures for each individual agency. Every agency is distinctly separate and the county GIS design skirts these issues. The organisation of flood protection planning in each county fits Hofstede’s national cultural characteristics. Kreis Osnabrück develops the GIS operations around established regulations and King County employs GIS overlay in a manner consistent to the agency’s established practices following agreements negotiated with the other county agencies. 6.6 REGULATIONS AND NEGOTIATIONS Regardless of national culture, the diversity of perspectives and purposes in any public administration means the social construction of a GIS will always require negotiation. Regulations shift the focal points and lend a strong structure, but even regulations are negotiated. In King County negotiations and renegotiations of the GIS are ongoing. Compared to Kreis Osnabrück, the county GIS is not as stable, but agencies are extremely flexible in their response to institutional, legislative, and political contingencies. In Germany issues are negotiated and then codified as regulations or laws. The results are robust institutional solutions that offer an explicit framework, but bind the agencies involved to already established approaches leading to possible idiosyncratic solutions. New applications, consequences, and new actors’ roles must be addressed and formalised in existing institutional structures before action is taken. This takes up many institutional resources and delays the response of institutions to new technological opportunities. Much time is spent making the technology fit the institution. In King County, the flexibility left for individuals in their respective agencies is how mandates are fulfilled. In Kreis Osnabrück, individuals also propel the future developments in the county. These may take years, or never come about, but it is this resource that gives the institutionalised, bureaucratic government some flexibility. Though these are different frameworks for individual agencies, in each case they provide the necessary work and creativity to develop and provide spatial information technologies that assist in decision making. These findings point to significant differences in the actual practice of GIS design than Hofstede ascribes. On one hand, his assessment of negotiation as dominant fits the actual approach to design in King County, but the reliance on regulations he asserts for Germanic cultures only fits the design documents, not the actual practice of design in Kreis Osnabrück. Hofstede’s cultural dimensions only seem to apply at the abstract organisational level to the influence of national culture in both counties. In other words, the actual
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
63
Figure 6.3: Example of task analysis used in the design of KRIS (from Der Oberkreisdirektor Kreis Osnabrück, 1993a)
practice of design involves negotiations in both cultures. Behind the formal design of the Kreis Osnabrück GIS that explicitly relies on standards and regulations (see for example Figure 6.3), the practice of design is the result of negotiations. Hofstede’s finding that uncertainty avoidance is so high in Germanic culture explains why negotiations are codified as regulations and standards in a very hierarchical, institutionalised system. However, in both counties, the detailed design and implementation of the GIS (getting the design to work) is left completely in the hands of the people creating the system. Key differences between King County and Kreis Osnabrück lie in the organisational orientation of the design work. In Kreis Osnabrück the GIS is implemented by fulfilling standards. In King County work on the design aims to fulfil negotiated requirements and retain institutional and disciplinary positions, The work practice of the individuals on the Kreis Osnabrück project was in fact strongly compartmentalised according to the regulations they needed to implement. However, in spite of this compartmentalisation, much work was carried out resolving discrepancies between different regulations and constructing working
64
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
GIS software. Even in the compartmentalised world of German public administration, individuals rely on a web of contacts with co-workers and knowledgeable outsiders in the practice of design. This non-formalised part of GIS design is a tacit component of their work lives and is scarcely mentioned in discussions. Regulations remain important in the strong hierarchy. Informal meetings and arrangements with co-workers and outsiders are only the backdrops for design activities. The dominant view is that if these practices do not culminate in regulations, they are not important to the project, nor worth reporting. The project manager was aware of these issues and apparent conundrums. He indicated the difficulties of implementing broad standards and resolving all the problems of implementation. It is necessary to find a pragmatic solution. In his words, “Kreis Osnabrück strives for an 80 percent solution” (Remme, 1995). For all the regulations and detailed task descriptions, they are learning by doing. Some of the large problems in the past proved unsolvable (for multiple reasons) and the project design was altered. For instance, the unavailability of the database software required a new concept that led to the acquisition of commercial database software and a fundamental change in the project design. In the strong hierarchy of German public administration, emphasis on regulations and fulfilling legal mandates dominates the participants representation of design activities. This attests to the high uncertainty avoidance in Germanic cultures that Hofstede identified. In a culture so engrossed with regulations, it is no wonder that an outside observer, employing quantitative research techniques, only turns up the aspects emphasised by the culture. Going beneath the veneer of regulations and standards to the practice of design reveals a complex practice of negotiation and ad hoc problem solving in both counties. In spite of the emphasis on regulations and standards, the actual work constructing the GIS in Kreis Osnabrück involves negotiations as much as regulations. 6.7 CONCLUSIONS In terms of methodology and further studies, it is apparent that ethnographic research is crucial to unravelling national culture influences on GIS design. Case studies offer substantial advantages for examining the intricacies of GIS work in their cultural, organisational, and disciplinary context. Participant observation and other ethnographic research techniques can aid examinations of other national cultures’ influences, disciplinary and institutional roles, and the actual practice of GIS design and work. Qualitative research can lead to valuable insights to move beyond the tacit recognition that culture is entangled with GIS technology. The ethnographic case studies of King County and Kreis Osnabrück show that, in spite of similarities, national cultural factors help explain substantial differences in the design, practice, and organisation of GIS. The GIS techniques used (overlay of flood protection zones) may well be similar, but national cultural values lead to completely different constructions of GIS technology. Furthermore, in both counties, it is plain that the practice of GIS design involves negotiation. This finding suggests Hofstede’s characterisations of national culture are limited to an abstract organisational level, not necessarily the actual practices of design and implementation. ACKNOWLEDGEMENTS I would like to thank the following individuals for their assistance in carrying out this research: Martin Balikov, Karl Johanson, and Thomas Remme. The University of Washington Graduate School provided financial support for completing the case study in Germany.
NATIONAL CULTURAL INFLUENCES ON GIS DESIGN
65
REFERENCES ANDERSON, R.J. 1994. Representations and requirements: the value of ethnography in system design, HumanComputer Interaction, 9, pp. 151–182. BOISOT M. 1987. Information and Organizations: The Manager as Anthropologist. London: Fontana. CAMPARI, I. 1994. GIS commands as small scale space terms: cross-cultural conflict of their spatial content, in Waugh T. and Healey R. (Eds.). Advances in GIS Research, The Sixth International Symposium on Spatial Data Handling Vol. 1. London: Taylor & Francis, pp. 554–571. CAMPARI, I. and FRANK, A. 1993. Cultural differences in gis: a basic approach. Proceedings of Fourth European Conference and Exhibition on Geographical Information Systems (EGIS ‘93), Genoa, Italy, Vol. 1. Utrecht: EGIS Foundation, pp. 10– 16. CHRISMAN, N.R. 1987. Design of geographic information systems based on social and cultural goals, Photogrammetric Engineering and Remote Sensing, 53(10), pp. 1367– 1370. CLIFFORD, J., and MARCUS, G.D. (Eds.) 1986. Writing Culture. The Poetics and Politics of Ethnography. Berkeley, CA: University of California Press. DER OBERKREISDIREKTOR LANDKREIS OSNABRÜCK. 1990. Das Kommunale Raumbezogene Informationssystem (KRIS) Eine Arbeitspapier zur Realisierung. Osnabrück: Referat A. DER OBERKREISDIREKTOR LANDKREIS OSNABRÜCK. 1992a. Situationsbericht 12/2/92. Osnabrück: Der Oberkreisdirektor. DER OBERKREISDIREKTOR LANDKREIS OSNABRÜCK. 1992. Das Kommunale Raumbezogene Informationssystem Osnabrück (KRIS) Gemeinsamer Abschlußbericht der Projekt-und Entwicklergruppe (Final Report) 20 May 1992. Osnabrück: Der Oberkreisdirektor. DER OBERKREISDIREKTOR LANDKREIS OSNABRÜCK. 1993a. Lösungsvorschlag. Osnabrück: Der Oberkreisdirektor. DER OBERKREISDIREKTOR LANDKREIS OSNABRÜCK. 1993b. Systemkonzept. Osnabrück: Landkreis Osnabrück. EMERSON, R.M., FRETZ, R.I., and SHAW L.L. 1995. Writing Ethnographic Fieldnotes, Chicago: University of Chicago Press. GOULD, M. 1994. GIS design: a hermeneutic view, Photogrammetric Engineering and Remote Sensing, 60(9), pp. 1105–1115. HARLEY, J.B. 1989. Deconstructing the map, Cartographica 26(2), pp. 1–29. HARVEY, F. 1995. National and organizational cultures in geographic information system design: a tale of two counties, in Peuquet D. (Ed.) Proceedings of the Twelfth International Symposium on Computer-Assisted Cartography (AutoCarto 12). Charlotte, NC: ACSM/ASPRS, pp. 197–206. HAYEK, F. 1952, 1979. The Counter-Revolution of Science. Indianapolis: Liberty Press. HETTNER, A. 1927. Die Geographie. Ihre Geschichte, Ihr Wesen und lhre Methoden. Breslau: Ferdinand Hirt. HIRSCHHORN L. 1984. Beyond Mechanization. Work and Technology in a Postindustrial Age. Cambridge, MA, The MIT Press. HOFSTEDE, G. 1980. Culture’s Consequences. International Differences in Work-Related Values. Beverly Hills: Sage Publications. JORDAN, E. 1994. National and Organisational Culture Their Use in Information Systems Design, Faculty of Business, Report. Hong Kong: City Polytechnic of Hong Kong. LATOUR, B. and WOOLGAR, S. 1979. Laboratory Life: The Social Construction of Scientific Facts. Beverly Hills: Sage. MARK, D.M. and EGENHOFER, M.J. 1994a. Calibrating the meanings of spatial predicates from natural language: line-region relations, in Waugh T. and Healey R. (Eds.) Advances in GIS Research, The Sixth International Symposium on Spatial Data Handling, Edinburgh, Scotland, Vol. 1. London: Taylor & Francis, pp. 538–553.
66
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
MARK, D.M. and EGENHOFER, M.J. 1994b. Modeling spatial relationships between lines and regions: combining formal mathematical models and human subjects testing, Cartography and Geographic Information Systems, 21 (4), pp. 195–212. MUNICIPALITY OF SEATTLE. 1993. King County GIS Scoping Project. Seattle: Municipality of Metropolitan Seattle. NELSON, A. 1994. How could scientific facts be socially constructed? Studies in History and Philosophy of Science, 25 (4), pp. 535–547. OBERMEYER, N.J., and PINTO, J.K. 1994. Managing Geographic Information Systems. New York: Guildford Press. ONSRUD, H.J. and PINTO, J.K. 1992. Case study research methods for geographic information systems, URISA Journal 4(1), pp. 32–44. PICKLES, J. 1986. Geography and Humanism. Norwich: Geo Books. PLANGRAPHICS. 1992a. King County GIS Benefit/Cost Report. Frankfort, KY: PlanGraphics. PLANGRAPHICS. 1992b. King County GIS Conceptual Design. Frankfort, KY: PlanGraphics. PLANGRAPHICS. 1992c. King County GIS Implementation and Funding Plan. Frankfort, KY: PlanGraphics. PLANGRAPHICS. 1992d. King County GIS Needs Assessment/Applications, Working Paper. Frankfort, KY: PlanGraphics. RADFORD, K.J. 1988. Strategic and Tactical Decisions, 2nd edition. New York: Springer-Verlag. REMME, T. 1995. The GIS of Kreis Osnabrück (Interview) 17–19 August 1995. SCHULZE, T., and REMME, T. 1995. Kommunale Anwendungen beim Landkreis Osnabrück, in Kopfstahl, E. and Sellge, H. (Eds.), Das GeoinformationssystemATKIS und seine Nutzung in Wirtschaft und Verwaltung. Hannover: Niedersächisches Landesvermessungsamt, pp. 193–198. WEBER, M. 1946. Bureaucracy, in Gerth, H.H. and Mills, C.W. (Eds.), From Max Weber. Essays in Sociology. New York: Oxford University Press, pp. 196–244. WILLAMSON, O.E. 1975. Markets and Hierarchies: Analysis and Antitrust Implications. New York: Free Press. WIXON, D. and RAMEY, J. 1996. Field Methods for Software and Systems Design. New York: John Wiley & Sons. WOOD, D. and FELS, J. 1986. Designs on signs/myth and meaning in maps, Cartographica 23(3), pp. 54–103.
Chapter Seven The Commodification of Geographic Information: Some Findings from British Local Government Stephen Capes
7.1 INTRODUCTION The commodification of geographic information (Openshaw and Goddard, 1987) is of rapidly increasing importance with implications for government, data users and producers, academics, the GIS industry and society at large. This chapter considers the exploitation of the geographic information commodity in British local government with reference to a four-fold model of commodification. In comparing commodification locally with that at the national and international levels, the discussion draws a crucial distinction between commercialisation and commodification. Whereas the former merely involves selling information, commodification encompasses a somewhat broader conceptualisation of the use and value of geographic information, involving its exploitation for strategic as well as commercial purposes. The geographic information market is a growing industry; an estimate of government spending on geographic information in the European Union has been put at some six billion ECU, or 0.1 percent of EU gross national product (Masser and Craglia, 1996). Users of geographic information systems (GIS) require access to data to put onto computers in order that they might store it, integrate it spatially, perform analyses upon it, and present it graphically. An ever-increasing number of studies (for example Blakemore and Singh, 1992; Capes, 1995; Gartner, 1995; Johnson and Onsrud, 1995; Sherwood, 1992) highlight the practice of charging for the geographic information products and services being produced in both the private and public sectors, and the implications this poses for the information society. Rhind (1992a) reviews charging practices in a selection of countries and highlights some of the problems they raise, whilst Beaumont (1992) discusses issues relating to the availability and pricing of government data. Barr (1993) paints a vivid, if imaginary, portrait of the dangers associated with information monopolies, and Maffini (1990) calls for geographic information to remain cheap or free in order that the GIS industry might grow. Both Shepherd (1993) and Onsrud (1992a, 1992b), present balanced reviews of the pros and cons of charging for information. These papers cover views which range from those in favour of greater data sales to those arguing that geographic information should be free to users. In addition, GIS users who have already collected or bought data, analysed it and produced useful outputs, have themselves developed an interest in disseminating and selling these products and services. Bamberger and Sherwood (1993) have published case studies of practice and discussions of issues associated with marketing government geographic information in the USA. Rhind (1992b) explains that the Ordnance Survey, Britain’s national mapping agency, is required to sell its geographic information products to raise the income it needs to meet government targets and pay for an expensive digitisation programme.
68
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Governments at national and international levels have recently begun to take a close interest in making geographic information available to a wider audience using information and communication technology. They are coming to believe that geographic information is of key strategic importance, since geography is a common base which enables the linking of many other data sets. Digital data and GIS are crucial tools in this information integration: in the UK, for instance, it has been suggested that between 60 percent and 80 percent of government information is in some way geographic, a figure used by the Ordnance Survey to support the building of a National Geospatial Database (Nanson et al., 1996). The ongoing creation of a European Geographic Information Infrastructure (Masser and Salgé, 1995) and, in America, the National Spatial Data Infrastructure (Clinton, 1994; Federal Geographic Data Committee, 1993), testify to the importance currently placed on exploiting geographic information as fully as possible. It is therefore evident that, partly as a result of the advent of GIS technology, and partly owing to strategic policy initiatives based around the integrative properties of geographic information, the commodification of geographic information is currently an important focus of research interest. This chapter examines the development of geographic information commodification in Britain, using the example of metropolitan information bureaux and the information products and services they provide, to amplify a model of commodification. The next section gives background information on the British local government sector, particularly its metropolitan information bureaux. The third section introduces a model of geographic information commodification in local government, based on a four-fold typology. Evidence from British information bureaux is presented in the fourth section to support this model, and the fifth section contains an evaluation and conclusions. 7.2 BACKGROUND Local government is an important locus of change in geographic information exploitation. Local authorities are major users of information and information technology (Hepworth et al,. 1989). Much of this information is geographic in nature, and increasingly geographic information systems are used to manage it (Campbell and Masser, 1992). In addition, local government is a significant socio-economic entity. In Britain, local government represents a substantial segment of the economy. It accounts for the employment of nearly three million people (around 10 percent of the economically active population), spends around £46 billion each year (approximately 10 percent of gross domestic product), and is a major supplier of and influence on essential public services such as education, policing and public transport (Stoker, 1991). As well as being a provider of products and services, local government is an important consumer in the economy, purchasing construction, electricity, labour, stationery, geographic information, information technology, and a wide variety of consulting and other contracts. At the same time, local government performs a vital social role, being an important element of both the welfare state (through its provision of housing and social services) and representative democracy (Wilson and Game, 1994). A number of processes might be considered to be interacting to promote the commodification of geographic information in British local government. Changes to the nature, culture and structure of local government are affecting how and for what purposes geographic information is exploited. However, technological change is particularly influential. Applying GIS often has the effect of focusing attention on the usefulness of geographic information. Since so much time and effort is spent getting data in to and out of the system there is a natural tendency to wish to exploit the value of this data. The mapping, analytical and presentational capabilities of a GIS are tools by which this can be achieved.
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
69
Other developments in information technology have had an impact on the exploitation of geographic information. The almost universal spread of the PC, floppy disk and CD-ROM has enhanced data storage and access capabilities, at low cost and ease of access to the user. Remote access to information over networks has improved data availability and distribution. Local networks have enhanced information access within local government organisations. Recent global scale developments with the Internet, notably the World Wide Web, have progressed at a remarkable pace. Local authorities have been influenced by all these developments. As well as being at the forefront of GIS adoption, they have invested widely in other computer technologies such as PCs, local area networks and Internet connections. Moreover, British local authorities have since 1993 made extensive purchases of digital map data from the Ordnance Survey, following the conclusion of an agreement designed to make digital mapping more accessible to the local government sector. In five metropolitan areas in Britain, there are information units charged with providing geographic information services to all the municipal authorities in the locality. These metropolitan information bureaux were established in the mid 1980s, and are jointly funded by the municipal authorities in five of the major British conurbations (see Figure 7.1). However, they are not local authorities themselves: they are independent units within the public sector. The bureaux each serve local authorities in conurbations covering populations between one and six million. Comparative data on the five British metropolitan information bureaux is contained in Table 7.1. Table 7.1: Metropolitan Information Bureaux Bureau Name
Major City Served
Population of Area Served (Millions)
Gross Expenditure (£ Funding As % Of Million) Expenditure
London Research Centre West Midlands Joint Data Team Merseyside Information Service Greater Manchester Research Tyne and Wear Research and Intelligence Unit Source: Capes (1995)
London
6.2
5.8
56
Birmingham
2.5
1.3
98
Liverpool
1.4
0.8
95
Manchester
2.5
0.4
73
Newcastle-uponTyne
1.0
0.3
88
Each metropolitan information bureau has the specific purpose of providing research and intelligence on social, economic, land use and, in some cases, transport issues to the consortium of municipalities it serves; but it also provides information services to a variety of other, mainly public sector, organisations. Since the sponsoring municipal authorities provide only part of an information bureau’s running costs (see Table 7.1), the provision of these services to other organisations for a fee is often an important part of a bureau’s business strategy. Their distinct focus on geographic information provision, their leading position with regard to GIS, and their status as jointly funded bodies with limited public subsidy, makes these metropolitan information bureaux ideal laboratories in which to study the commodification of geographic information at the present time.
70
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 7.1: Metropolitan Information Bureaux Locations
7.3 A MODEL OF COMMODIFICATION The context for geographic information commodification in local government is the changing geographic data availability environment at the national and international levels. The cost recovery versus free access to information debate (Onsrud, 1992a, 1992b) is especially visible in the USA. There, a tradition of free of charge, if not always highly precise, federal government information is being challenged by the development of sophisticated GIS products and services by government agencies and the need to recover at least part of the costs associated with creating, maintaining and supplying these information services (Bamberger and Sherwood, 1993). In Britain, too, government policy in recent years has been to require its national mapping agency to recoup as much as possible of its costs, particularly in relation to digitising large scale maps for use in GIS applications (Rhind, 1992b). With these and other developments in mind, a conceptual framework for the commodification of geographic information has been developed (Capes, 1995). In the context of local government, four
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
71
characteristics of exploiting the information commodity were identified. Commercialisation is the selling of information, often to the business sector but also to the general public. Dissemination is providing or publishing information, often for the general public but also for the commercial sector. Information exchange is the trading or sharing of information amongst public departments and agencies, including metropolitan information bureaux. Value-added information services are information products which involve some extra work on the part of the data provider, such as data analysis, interpretation or repackaging. This fourfold typology of geographic information commodification is detailed below. 7.3.1 Commercialisation Much work has focused on the commercial elements of commodification, concerning the sale of geographic information for a fee or charge by governments and other agencies (for example Antenucci, et al., 1991; Aronoff, 1985; Bamberger and Sherwood, 1993; Blakemore and Singh, 1992). A fundamental charging issue is the level at which the charge is pitched. The literature points out a need to balance, through policy decisions, the benefits to society from widespread and cheap access to government information; and the benefits to society from high fees offsetting information costs and lowering taxation (Blakemore, 1993). Dale and McLaughlin (1988) feel that there is a continuum of information products from those that should be wholly subsidised by the state to those that should be charged for on a completely commercial basis. There are thus a variety of levels for information charges which can be drawn together to develop a “spectrum” of charging for public information. At one end of the spectrum is the zero charge position: information should be free since the enquirer will already have paid for it through taxation. The next position is marginal cost recovery. Here, the information holder makes a charge to cover all or part of the cost of reproducing and distributing the information to the enquirer. This charge might cover photocopying, printing, paper and postage, but it would not account for the costs of collecting, storing and managing the information itself. Such is the charging regime applied by the USA federal government when releasing its information to enquirers (Blakemore and Singh, 1992). A charging component for the cost of staff and computer time may be incorporated. The third point on the charging spectrum is full cost recovery (Aronoff, 1985). Here the charge reflects not only reproduction and distribution costs but also the cost of collecting and managing the information. Cost recovery is a common aim where expensive computer systems (such as GIS) have been procured and used to provide information services (Archer and Croswell, 1989). At the upper extremity of the public information charging spectrum is the practice of charging as much as possible—what the market will bear. This strategy aims to charge a price greater than the full cost of providing information, in order to generate fresh income for the government or department in question. Proponents of this approach argue that a public body best serves the public by making a profit on information products, since this reduces the fiscal burden on the national coffers and provides funds for investment in new information and the technology and staff to manage it. By charging for geographic information in these ways, government (including local government) is exploiting the potential of the information commodity to make money. It is the pecuniary value of information which underlies these activities. However, commercial dealings are just one way by which the information commodity can be exploited. Although putting a monetary price on information quantifies its value in a manner which is easily understood by everyone, the value of geographic information can also be expressed in alternative, non-commercial and non-quantitative ways, such as information dissemination and exchange.
72
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
7.3.2 Dissemination Non-quantitative value is not so tangible or immediate as monetary gain, and can appear in many forms, so is harder to pin down. One instance is the power of geographic information to inform. Information may be used in research and education at schools and universities (Blakemore and Singh, 1992; Lloyd, 1990), or to inform and advise local residents and communities (Craig, 1994). Distributing information can also be undertaken for tactical or political ends, such as attempting subtly to shift opinions round to a certain point of view (Campbell, 1990; Roszak, 1986). Hepworth (1989) and Sherwood (1992) see disseminating local government information as helping to promote an area or a local council and enhance service delivery: providing information to citizens can assist their understanding of available services, whilst giving information to businesses is commonly used by local authorities as an advertising ploy to attract investment and employment to their areas. If public sector information does have a value for people outside government agencies then its availability is an important issue. Availability concerns both the release of internal information and access as mediated by the price charged for it. A potential access problem is posed by the sometimes conflicting goals of commercialisation and data availability. This is highlighted in general terms by Openshaw and Goddard (1987) and Mosco (1988), and is addressed in the context of local government by Hepworth (1989) and Barr (1993). The conflict is between a desire for a government to impose cost recovery targets on its information agencies, and a wish to disseminate information to as wide an audience as possible. Charging for information may price potential members of this audience out of the information market. One remedy is differential charging: the general public and students may be charged at a lower rate than, say, the business sector in order to keep information within an appropriate price band. Solutions may not be this simple. Nevertheless, it is now recognised that access to geographic information for members of the public and voluntary groups is “among the most important issues now before the GIS community” (Dangermond, 1995). Activities providing, circulating and publishing geographic information, often for the general public but also for the business sector, with the primary objectives of promulgating knowledge and raising awareness, can be termed information dissemination. The promotion or marketing of a locality may be aims of information dissemination, along with the stimulation of the local or national economy and labour market. Information dissemination may also be associated with tactical or political aims if the provision of information has less altruistic motives. Various methods may be employed to ensure that geographic information reaches the target groups, with free information and differential charging being possibilities. 7.3.3 Information exchange In addition to selling and disseminating information to external groups, local authorities also exchange information between, and within, themselves. Information which might otherwise be sold is sometimes provided free of charge to participants in data exchange agreements (Taupier, 1995). As a quid pro quo, all involved parties will provide some information in return for receiving new information from the other participants. In Britain, the metropolitan information bureaux exchange information of mutual interest with municipal councils, the population census being a notable case (Capes, 1995). Within councils, departments and sections within departments may share or swap information with one another (Calkins and Weatherbe, 1995).
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
73
Such information exchange arrangements are of benefit to all parties and often operate on a noncommercial basis: all actors aim to minimise their net costs by sharing the total overhead between them (Taupier, 1995). However, there is a growing tendency for information exchanges between and within local authorities to have a commercial component. As Bamberger (1995) relates, there are significant factors hindering information exchange exercises, one being the wish of all parties involved to see the accrual of sufficient benefits to offset the resources they have contributed to the joint venture. If just one participant pulls out of an information exchange, the whole scheme might collapse. Nevertheless, where the incentive for participation is sufficiently great, information exchange activities can and do take place. 7.3.4 Value-added information services Apart from cost minimising, a further stimulus for information exchange might be the possibility of adding value to information. By combining data sets from different agencies, the value of the whole becomes greater than the sum of the parts. A geographic information system is an ideal opportunity for such an activity. If the topographic data from a mapping agency are combined with the settlement information from a planning department and then added to traffic information from a transportation authority, the resulting information system is a powerful tool for analysing and perhaps modelling journeys, their purpose and duration. This analysis would not be possible if one of the data sets were missing. In this way, the power of GIS to add value to information by integrating a number of databases is demonstrated. GIS can also add value to data through the process of verifying and cleaning data which is performed when information is input to the system. But value-added information services do not necessarily require computer technology to be created. The skills and expertise of trained research officers are often vital components in adding value to information. Whilst a computer can rapidly perform statistical work like census tabulation (Rhind, 1983), only a trained human being can interpret this information. Given that answers are what many information clients want, such interpretation is clearly a vital part of many valueadded information services. Presenting results in an accessible manner (perhaps in the form of graphics and maps, as opposed to abstruse tabulations) is a further value-adding service (Garnsworthy, 1990). These value-adding activities all have in common the fact that they make information more useful and hence more valuable to users. In performing value adding work, staff and computer time are expended, along with other resources. It is therefore often the case that this added value attracts a charge to compensate for the extra resources put in to preparing the information in such a form. 7.3.5 Discussion Geographic information commodification in local government has four main components, each of which recognises that information has a value beyond its statutory uses and can therefore be further exploited. These four components are commercialisation, dissemination, exchange and value-added information services. Commercialisation is the selling of information, mainly to businesses, with earning income to recover costs as an objective. Dissemination is providing, circulating and publishing information, often for the general public but also for the business sector, with the primary objectives of promulgating knowledge and raising awareness. Promotion or marketing may be associated aims. Information exchange is the trading or sharing of information between departments, councils and related agencies, with the primary aim being maximised mutual benefits and minimised mutual costs (rather than making money or educating the
74
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
public). Value-added information services involve additional work to enhance an information product by combining, analysing, interpreting or presenting information in a form more useful to potential clients. All four commodification activities embody the recognition that public information is an externally valuable resource as well as an internally useful tool, but each rests on a different concept of value. In the case of commercialisation that value is simple to pinpoint: it is purely pecuniary. For dissemination the value of information is educational, social and economic. With information exchange, value is harder to express because it is jointly accrued, and may take the form of losses foregone as well as benefits gained. Value-added information services result from making information more useful, and hence more valuable, to potential clients. It should be appreciated that these four types of commodification are not mutually exclusive. Information exchange or dissemination might have a monetary element, if a charge (perhaps at marginal cost recovery rates) is made as part of the transaction. Commercialisation may include an educational component where, for instance, an information pack is sold to a school or member of the public: here income is gained but information is still disseminated. Added value might come into play in commercialisation or dissemination, since the information products and services involved may incorporate additional human or technological resource expenditure. The boundaries between the four elements of geographic information commodification are therefore blurred or overlapping. Few information transactions are likely to involve just one facet of commodification. The four strands of geographic information commodification— commercialisation, dissemination, exchange, and value-added information services—can occur together or separately. Nonetheless, as a working model describing commodification in local government this provides a useful apparatus for investigating the nature of commodification in practice. 7.4 GEOGRAPHIC INFORMATION COMMODIFICATION IN METROPOLITAN INFORMATION BUREAUX With the adoption of geographic information systems and under the influence of changes to the ‘municipal information economy’ in Britain (Hepworth et al., 1989), information commodification has begun to develop to a more sophisticated degree than ever before. The most advanced sector of British local government with respect to geographic information commodification is the set of metropolitan information bureaux in major conurbations. These bureaux have the express purpose of providing geographic information and research on behalf of the municipal authorities in their area. They are therefore strong centres of information activity, and are important examples of public sector use of geographic information technologies. In-depth studies of all five metropolitan information bureaux, together with detailed case studies of specific geographic information services they offer, were undertaken to provide evidence on the nature of advanced commodification in local government (Capes, 1995). Commercialisation, or vigorous charging for information, was found to be widespread. Although information products and services provided by the metropolitan information bureaux earn significant amounts of income, none of the services meet full cost recovery: income generated by selling information is used to subsidise the costs of providing services to the municipal authorities or to fund new technology purchases rather than to make profits. A particularly prominent activity in the British metropolitan information bureaux is the exchange of information. By sharing their geographic information processing and storage in one location, municipal authorities can reduce their costs. Little information dissemination is evident in the metropolitan information bureaux (this being far stronger elsewhere in the local government sector), but the bureaux do
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
75
display advanced value-added information services. They collate data from municipal authorities and package it as a single information set. It is then repackaged to suit the needs of customers, for example being made available in a variety of formats such as bound directories, paper lists, sticky mailing labels or on floppy disk. Occurrences of all four types of information commodification in the metropolitan information bureaux are examined in more detail below. 7.4.1 Commercialisation Commercialisation, or vigorous charging for information, is found in all metropolitan information bureaux. The business information service run by the Tyne and Wear Research and Intelligence Unit, although primarily run to promote local economic development, has a commercial component. Sales of these business information products and services gross around £20,000 each year, this money supporting the other activities of the Unit and helping in the purchase of new computers. Moreover, since the costs of this service have been estimated at £28,000, achieving full cost recovery through information sales is technically feasible (income generated presently covers some 70 percent of the estimated costs) although it is ruled out on other grounds. Some local marketing of business information services is carried out, further highlighting the commercial nature of this service. In addition, business information is currently being pushed into new markets via a joint venture with an economic development organisation. The Tyne and Wear Unit also uses economic data to make contacts in the business world and provide it with a large and secure client base beyond the district councils. By these means, the Unit is strengthening its position as a key player in the local economic information market. SASPAC (small-area statistics package) is a strongly commercialised geographic information product, provided by the London Research Centre. SASPAC dominates the census analysis market in larger British local authorities and central government. This census analysis software product generates income for the Centre, helping to subsidise its municipal work programme. An indication of the commercial importance of SASPAC is given by the tight copyright protection applied to it by the London Research Centre, and the strong marketing the Centre has undertaken in order to sell its product as widely as possible. It is in the nature of the project that SASPAC is most interesting. The London Research Centre has been involved in a close partnership with a number of external, private sector companies to develop and market the software. As well as working in partnership with the private sector, the London Research Centre has developed contacts with a wide range of public sector organisations outside the London boroughs (such as central government departments and health authorities). These links have led into spin-offs, with new geographic information products since being developed. The Merseyside Address Referencing System (MARS) is a spatial land and property database of great commercial importance for Merseyside Information Service. MARS is central to its GIS work, forming the spatial “hook” on which all its other data is “hung”. Most crucially, it is provided to the local police department as the basis of the police incident response system in Liverpool. Income generation does not explicitly arise in this case because most payments for services based on the MARS database are contained within the funding subscriptions received from municipal authorities and other local bodies. Nevertheless, the level of these subscriptions (over £150,000 per annum in the case of the Merseyside police) reflects the value of the database and the cost of maintaining it. A commercial approach to geographic information can therefore present itself either as high pricing in order to recover costs, raise money or fund new developments; or as a strategic effort to widen the client
76
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
base for information to ensure a stable future for the service. Both these techniques indicate a businesslike approach to treating information as a commodity. Raising income is a feature of all metropolitan information bureaux. In each case, the money generated eventually helps subsidise the core services of the bureau, reducing the amount demanded from the subscribing municipal councils. The Tyne and Wear business information service and the London Research Centre’s SASPAC product have both seen efforts to widen the market of users in order to secure the products against competition (largely from elsewhere in the public sector). 7.4.2 Dissemination Because the metropolitan information bureaux are primarily concerned with providing information to municipal authorities and other public sector organisations as opposed to the general public or businesses, wide dissemination of information would not be expected. There is, however, dissemination evident in some cases. Greater Manchester Research is one example, with its policy of publishing all its work as a report or bulletin available for sale to any firm or individual. The business information at Tyne and Wear Research and Intelligence Unit displays disseminatory characteristics, being available at half price to organisations concerned with economic development, or for consultation free of charge in libraries by anyone. A slightly different form of information dissemination is displayed by the Research Library at the London Research Centre, which circulates alerting bulletins to London borough councillors to keep them updated on local government and London affairs. Other research has shown geographic information dissemination to be widespread in local authorities, with particular concentration in London borough councils (Capes, 1995). Dissemination can involve the wide provision of information, either free of charge or at reduced rates, to various client groups. These may be businesses, councillors or the general public. Information to businesses is usually geared towards economic development (such as business information in Tyne and Wear), and information to the public may have the aim of spreading knowledge or promoting the organisation that is disseminating the information. Evidence for dissemination may be indicated by differential charging structures, where some groups get information at cheaper rates than others. 7.4.3 Information exchange Given that the purpose of metropolitan information bureaux is to share information and information processing amongst municipal authorities, there is a great deal of information exchange visible in the metropolitan information bureaux. The aim of information exchange is either to reduce the costs of information storage or processing, or to enhance the value of information by combining it with other data sets (see Capes, 1995). An example of the former is the GIS activities at Merseyside Information Service. In this case, a centrally-run wide-area network results in shared ‘information overheads’ for the subscribing municipal councils and other bodies, since information, software, hardware and support are all provided by MIS from the hub of the network. An example of the latter is the Tyne and Wear business information service. Business databases from the five Tyne and Wear districts are combined to create a more valuable countywide database, of use for strategic monitoring and enquiry purposes. Outputs based on this information are
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
77
provided back to the district councils. In addition they benefit from the commercial side of the service, taking a share of the income it generates. Shared information systems are justified either on grounds of cost or because they provide useful information that individual local authorities could not obtain operating independently. An example lies in the census information services provided by every metropolitan information bureau. These are justified on cost grounds (the metropolitan information bureaux make one purchase of the census data and are able to distribute it amongst the municipal authorities in their area free of census office copyright restrictions, thus avoiding the need for each authority to buy its own set of census data) and on shared information grounds (by holding the census data for the whole area, the metropolitan information bureaux can provide strategic information which the districts working on their own could not obtain). 7.4.4 Value-added information services Examples of value-added information services are common in the metropolitan information bureaux. The Tyne and Wear business information service involves collating then repackaging business information, making it available in various formats (bound directories, paper lists, sticky mailing labels or on floppy disk) to suit the needs of customers. Depending on the user, the form in which they take the information has more value than the alternatives. SASPAC is an information technology product which adds value to other geographic information by enabling census data to be analysed and printed. The Merseyside Address Referencing System (MARS) and its associated GIS activities add considerable value to local and national data sets by pulling them together on a single wide-access system; integrating other data; adding a comprehensive land and property gazetteer to reference and analyse the data; supplying appropriate software and hardware to do all this; and making training and advice available to users of these systems. Few local government examples of value-added information services are as sophisticated as the MARS example, and few are so dependent on high technology. Most value-added information services (and many instances of geographic information commodification) involve information technology, and most involve some degree of innovation. 7.4.5 Discussion Commercialisation can be characterised as the practice of charging clients both inside and outside the organisation for geographic information. The motivation behind charging varies: it may be to earn income to reduce an organisation’s call on taxation revenue; income may be used to subsidise the purchase of new equipment or software, or to part-fund the development of new information services; charging can be used to deter frivolous requests for information. The vigour applied to charging also varies, but only rarely in local government are attempts made to exceed marginal cost recovery (Capes, 1995). Indeed, zero charging may be applied when information is provided in small quantities; to certain individuals or groups; or as a “loss-leader” to promote the organisation, its other products, and its locality. These marginal costs might include photocopying, printing, materials (paper, toner and so on), computer time, and a component to cover office overheads (such as heating and lighting). In some cases, staff time involved in preparing information to meet a request is recorded as a marginal cost and is charged for; in these cases, value has been added to information and commercial clients might reasonably be expected to pay for these enhancements. The highest charging rate found by Capes (1995) for staff time in local
78
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
government was £75 per hour. Not usually included in charges for local government geographic information are the costs of acquiring and maintaining the data upon which information products and services are based. This is because collecting this data is done at public expense to help fulfil the statutory functions of local government; charges which attempt to recover these costs are rarely seen as desirable at present in local government. Data charges are generally only made for data bought from another body—fees for copying Ordnance Survey maps, and the payment of census office royalties are two cases in point. Geographic information dissemination in local government is characterised by the provision of free or reduced-price information to certain client groups, particularly the business sector and members of the public. Local authorities appear to disseminate information for any of three main reasons: to promote local economic development, to promulgate knowledge in the community, and to promote the information bureau or its parent bodies. The purpose of geographic information exchange in local government may be to reduce the costs of information management or purchase to the participating organisations; or to enhance the value of individual data sets by combining them with other information. Geographic information exchange can therefore be characterised as the trading or sharing of information with other public bodies, particularly other departments and local authorities, which brings benefits to all parties involved. Information is itself the main currency of such exchanges, although where a difference is thought to exist between the values of the information items exchanged the balance may be made up with money. Value-added information services entail making geographic information more useful or accessible to clients. This may be achieved through the use of hardware, software, skills or expert knowledge to repackage, collate, integrate, analyse, publish or present information. Adding value does not include reselling existing data products. Instead, it involves repackaging information by adding something new, such as analysis (often by computer), skills (or expert knowledge), presentation (in graphical or map form, or on different media such as floppy disk), or just publication (making information available to a wider audience, such as publishing business information as a directory). 7.5 EVALUATION AND CONCLUSIONS The typology of geographic information commodification in local government encompassed commercialisation, dissemination, information exchange, and value-added information services. This model is sufficiently robust to describe and analyse those instances of geographic information commodification found by research in British local government. All four types of commodification have been found to be present in metropolitan information bureaux, and these findings parallel developments visible elsewhere in local government. The metropolitan information bureaux have a highly-developed commercial profile, depending on generating income, or the promise of income, to maintain their staff and services. Income is used to subsidise existing services and fund new developments. Both these activities are increasingly in evidence elsewhere in local government, where budgets sometimes have income generation targets written in, and the purchase of new facilities (such as geographic information systems) may require income guarantees to gain approval. Information products of metropolitan information bureaux are sometimes vigorously marketed, perhaps in conjunction with private sector partners. Such partnerships to exploit commercial opportunities are beginning to be visible in local authorities: the joint venture between Powys County Council in Wales and private companies to develop and market the C91 census package is a key example. Local authorities
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
79
are also starting to address marketing and market research, as illustrated by the appointment of a marketing manager at Manchester City Council. Information dissemination is presently more common in local councils than in the metropolitan information bureaux, which are primarily concerned with providing information to local authorities rather than to the community at large. Dissemination for organisational defence and promotion is replicated frequently throughout the local authority community. The metropolitan information bureaux have information exchange arrangements similar to those found in the larger local authorities in Britain. In both cases, exchange enhances the value of one agency’s information by combining it with that of others, and at the same time reduces data processing costs for all parties involved. Some sophisticated examples of value-added information services are found in the metropolitan information bureaux. The SASPAC census handling product at the London Research Centre, together with the Merseyside Address Referencing System and its associated GIS activities at Merseyside Information Service, are probably the most exciting instances of value-added information services currently available in local government. This is a function both of their high technological sophistication and of the inter-agency collaboration they involve. Contrasts can be made between geographic information commodification in local government and that taking place in central government. Geographic information provision by some British central government agencies is more strongly commercialised than that of local government in Britain. In part, this reflects the government’s Tradeable Information Initiative, which aims to sell data at full market prices. In the case of the Ordnance Survey, however, it is a change in the status of the organisation that has augmented the commercial role of geographic information. In the context of geographic information in Britain, the Ordnance Survey is in the peculiar position of being a government executive agency (it has been separated from its parent department and given its own budget and cost recovery targets) whilst at the same time retaining copyright of its uniquely valuable geographic data sets. As a geographic information publisher, the Ordnance Survey needs to protect this copyright since it guarantees its revenue, a growing proportion of which it is required to raise from data sales. Local authorities, on the other hand, have many functions and income sources and are not dependent on data sales for their continued existence. Even local government metropolitan information bureaux deal with a range of information items; they are not reliant on any single data set for survival. Local government geographic information providers are not, on the whole, required to meet tough cost recovery targets such as those set by central government for the Ordnance Survey. It cannot, therefore, be regarded as surprising that commercialisation is often stronger in central than local government in Britain. Those cases where the geographic information commercialisation displayed in local government matches or exceeds that found in central government are, at present, exceptions rather than die rule. An interesting comparison can be made between information commercialisation in British government and that in the USA where the pattern suggested above is, to some extent, reversed. In America, the federal (central) government has adopted a non-commercial approach to its data, but some state and local governments exhibit more of a commercial orientation. Under Freedom of Information legislation the federal government releases data at low cost, charging no more than the marginal costs of reproducing and disseminating each information request. This mimics the de facto situation in most local authorities in Britain (Capes, 1995). Much federal data is in raw form, requiring analysis, integration and presentation to make it useful to outside clients. But key geographic products provided cheaply by the federal government have greatly assisted such analyses. For example, census information management products have enabled computer users to map and analyse population census data with a common spatial referencing system (Klosterman and
80
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Lew, 1992). The US census bureau has made these computer files available without copyright restrictions and at low cost. State and local governments in the USA are not always bound by the same federal Freedom of Information laws. These authorities are therefore able to price geographic information commercially. Some state governments, for example, have passed their own laws enabling them to charge for GIS products and services, and recover some or all of the costs associated with providing them. Local governments and their agencies also show evidence of commercialisation. Sherwood (1992) made seven case studies of local government agencies selling geographic information in the USA. As well as the research units of multi-purpose local authorities (such as are still the norm in Britain), Sherwood looked at a regional planning commission, a metropolitan transport and planning agency, a local authority consortium created to develop a GIS, and an arm’s-length company created by another local authority consortium specifically to sell data on a non-profit basis. These agencies tend to exploit their information commercially in concert with private sector firms, who sell and distribute public information, databases and information products in return for a share of the revenue. The heterogeneity of USA local government makes for a different institutional context to that found in Britain at present; however these differences would appear to be decreasing in their extent as British local authorities change their structure, nature and culture. The limited comparison available from Sherwood’s (1992) study suggests that there is considerable parity in geographic information commodification between Britain and America. Sherwood’s findings, along with those reported in Johnson and Onsrud (1995), suggest that the recovery of a significant proportion of costs in American GIS agencies occurs only rarely. Following on from this, a crucial observation as to the nature of commodification in practice can be made. This is to emphasise that commodification is not simply about commercialisation. Full cost recovery is a rarity: it is the strategic spin-offs from exploiting information that tend to be of greater importance. These include: • • • • •
positioning an organisation in a competitive environment by gaining new clients and partners; using income (or the promise of income) to secure funding for new equipment or staff; promoting an organisation by disseminating its information; boosting economic development by providing information to businesses; saving money and gaining new information by exchanging data sets.
Other points to note are the importance of fully utilising the skills and knowledge of staff members: in many cases, their experience can add more value than a computer possibly could. Finally, the integrative properties of geographic information (via the map) can be used to combine and display other data. This chapter has begun to outline the nature of the commodification of geographic information at the local level, and reveal how this reflects trends visible at the national and international levels in Europe and North America. These are, on the one hand to exploit the commercial value of geographic information by charging for data products and services; and on the other hand to disseminate and exchange information strategically by developing data networks and shared infrastructure. Both these trends involve the provision of raw and value-added information services. Findings from British local government are therefore a valuable addition to our knowledge of how data is being exploited by its owners, and the implications this holds for other data users in a developing information society.
THE COMMODIFICATION OF GEOGRAPHIC INFORMATION
81
REFERENCES ANTENUCCI, J.C., BROWN, K., CROSWELL, P.L., KEVANY, M.J., and ARCHER, H. 1991. Geographic Information Systems: a Guide to the Technology. New York: Van Nostrand Rheinhold. ARCHER, H. and CROSWELL, P.L. 1989. Public access to geographic information systems: an emerging legal issue, Photogrammetric Engineering and Remote Sensing, 55(11), pp.1575–1581. ARONOFF, S. 1985. Political implications of full cost recovery for land remote sensing systems, Photogrammetric Engineering and Remote Sensing, 51(1), pp.41–45. BAMBERGER, W.J. 1995. Sharing geographic information among local government agencies in the San Diego region, in Onsrud, H.J. and Rushton, G. (Eds.) Sharing geographic information. New Jersey: Center for Urban Policy Research, pp. 119–137. BAMBERGER, W.J. and SHERWOOD, N. (Eds.) 1993. Marketing government geographic information: issues and guidelines. Washington DC: Urban and Regional Information Systems Association URISA. BARR, R. 1993. Signs of the times, GIS Europe, 2(3), pp. 18–20. BEAUMONT, J.R. 1992. The value of information: a personal commentary with regard to government databases, Environment and Planning A, 24(2), pp. 171–177. BLAKEMORE, M. 1993. Geographic information: provision, availability, costs, and legal challenges. Issues for Europe in the 1990s, Proceedings MARI 93 Paris, 7–9 April 1993, pp.33–39. BLAKEMORE, M. and SINGH, G. 1992. Cost-recovery Charging for Government Information. A False Economy? Gurmukh Singh and Associates Ltd., 200 Alaska Works, 61 Grange Road, London SE1 8BH. CALKINS, H.W. and WEATHERBE, R 1995. Taxonomy of spatial data sharing, in Onsrud, H.J. and Rushton, G. (Eds.) Sharing geographic information. New Jersey: Center for Urban Policy Research, pp. 65–75. CAMPBELL, H.J. 1990. The use of geographical information in local authority planning departments, Unpublished PhD thesis, University of Sheffield, UK. CAMPBELL, H. and MASSER, I. 1992. GIS in local government: some findings from Great Britain, International Journal of Geographical Information Systems, 6(6), pp.529–546. CAPES, S.A. 1995 The commodification of geographic information in local government, Unpublished PhD thesis, University of Sheffield, UK. CLINTON, W.J. 1994. Coordinating Geographic Data Acquisition and Access: the National Spatial Data Infrastructure, Executive Order, 11 April 1994. Office of the Press Secretary, the White House, Washington, DC. CRAIG, W.J. 1994. Data to the people: North American efforts to empower communities with data and information, Proceedings of the Association for Geographic Information Conference (AGI ‘94), pp.1.1.1–1.1.5. DALE, P.P. and McLAUGHLIN, J.D. 1988. Land information management. Oxford: Oxford University Press. DANGERMOND, J. 1995. Public data access: another side of GIS data sharing, in Onsrud, H.J. and Rushton, G. (Eds.) Sharing geographic information. New Jersey: Center for Urban Policy Research, pp. 331–339. FEDERAL GEOGRAPHIC DATA COMMITTEE 1993. A Strategic Plan for the National Spatial Data Infrastructure: Building the Foundation of an Information Based Society. Federal Geographic Data Committee, 590 National Center, Reston, Virginia 22092, USA. GARNSWORTHY, J. 1990. The Tradeable Information Initiative, in Foster, M.J. and Shand, P.J. (Eds.) The Association for Geographic Information Yearbook 1990. London and Oxford: Taylor & Francis and Miles Arnold, pp. 106–108. GARTNER, C. 1995. Commercialisation and the public sector—the New Zealand experience Paper presented at National Mapping Organisations Conference, 25 July to 1 August 1995, Cambridge, UK (not in proceedings). HEPWORTH, M.E. 1989. Geography of the information economy. London: Bellhaven Press. HEPWORTH, M, DOMINY, G. and GRAHAM, S. 1989. Local Authorities and the Information Economy in Great Britain. Newcastle Studies of the Information Economy, Working Paper no. 11. Centre for Urban and Regional Development Studies, University of Newcastle-upon-Tyne, UK.
82
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
JOHNSON, J.P. and ONSRUD, H.J. 1995. Is cost recovery worthwhile? Proceedings of the Annual Conference of the Urban and Regional Information Systems Association (URISA), July 1995, San Antonio, Texas, http:// www.spatial.maine.edu/cost-recovery-worth. KLOSTERMAN, RE. and LEW, A.A. 1992. TIGER products for planning, Journal of the American Planning Association, 58(3), pp.379–385. LLOYD, P.E. 1990. Organisational Databases in the Context of the Regional Research Laboratory Initiative, Regional Research Laboratory Discussion Paper 4, Department of Town and Regional Planning, University of Sheffield, UK. MAFFENI, G. 1990. The role of public domain databases in the growth and development of GIS, Mapping Awareness, 4(1), pp.49–54. MASSER, I. and CRAGLIA, M. 1996. The diffusion of GIS in local government in Europe Ch.7 in Craglia, M. and Couclelis, H. (eds.) Geographic Information Research: Bridging the Atlantic, London: Taylor & Francis. MASSER, I. and SALGÉ, F. 1995. The European geographic information infrastructure debate, in Masser, I., Campbell, H.J. and Craglia, M. (Eds.) GIS Diffusion: the Adoption and Use of Geographical Information Systems in Local Government in Europe. London: Taylor & Francis, pp. 28–36. MOSCO, V. 1988. Introduction, in Mosco, V. and Wasko, J. (Eds.) The Political Economy of Information. Madison: University of Wisconsin Press, pp. 8–13. NANSON, B., SMITH, N. and DAVEY, A. 1996. A British national geospatial database? Mapping Awareness, 10(3), pp. 38–40. ONSRUD, H.J. 1992a. In support of open access for publicly held geographic information, GIS Law, 1(1),pp.3–6. ONSRUD, H.J. 1992b. In support of cost recovery for publicly held geographic information, GIS Law, 1(2), pp. 1–6. OPENSHAW, S. and GODDARD, J. 1987. Some implications of the commodification of information and the emerging information economy for applied geographical analysis in the United Kingdom, Environment and Planning A, 19 (11), pp. 1423–1439. RHIND, D. (Ed.) 1983.A Census User’s Handbook, London: Methuea RHIND, D. 1992a. Data access, charging and copyright and their implications for GIS, InternationalJournal of Geographical Information Systems, 6(1), pp. 13–30. RHIND, D. 1992b. War and peace: GIS data as a commodity, GIS Europe, 1(8), pp.24–26. ROSZAK, T. 1986. The cult of information. Cambridge: Lutterworth Press. SHEPHERD, J. 1993. The cost of geographic information: opening the great debate, GIS Europe, 2(1), pp.13, 56 & 57. SHERWOOD, N. 1992. A review of pricing and distribution strategies: local government case studies, Proceedings URISA ‘92, volume 4. Washington: Urban and Regional Information Systems Association, pp. 13–25. STOKER, G. 1991. The politics of local government. London: Macmillan. TAUPIER, R.P. 1995. Comments on the economics of geographic information and data access in the Commonwealth of Massachusetts, in Onsrud, H.J. and Rushton, G. (Eds.) Sharing geographic information. New Jersey: Center for Urban Policy Research, pp. 277–291. WILSON, D. and GAME, C. 1994. Local government in the United Kingdom. London: Macmillan.
Chapter Eight Nurturing Community Empowerment: Participatory Decision Making and Community Based Problem Solving Using GIS Laxmi Ramasubramanian
8.1 INTRODUCTION In cities and communities across the world, architects, planners, decision makers, and individuals, are using Geographic Information Systems (GIS) and related information technologies to understand and evaluate specific problems occurring in their physical and social environment. In the United States for example, the city of Milwaukee monitors housing stock (Ramasubramanian, 1996), while Oakland’s Healthy Start program has analysed the high incidence of infant mortality (King, 1994a). GIS are being used to describe and explain diverse phenomena such as school drop out rates and provide decision support for a wide range of tasks, for example, the coordination of emergency service delivery in rural areas. These advances, as well as many other innovative uses of information technology and spatial analyses are reported in technical journals such as the Urban and Regional Information Systems Association (URISA) Journal, and in the popular press through magazines like Time, and US News and World Report, as well in trade journals such as GIS World. The use of information technologies and spatial analysis concepts promise many benefits to individuals and communities in our society—a society in which data, information, and knowledge have become commodities, seen as assets much like land, labour, and capital (Gaventa, 1993). At the same time, several thinkers, researchers, and analysts observe that information technologies can become inaccessible, thereby making the promise they offer disappear (e.g. Pickles, 1995). For example, William Mitchell, the Dean of MIT’s School of Architecture and Planning, observes, “While these technologies have the potential to become powerful tools for social change, create opportunities, and broaden access to educational opportunities, jobs, and services, it must be recognised that these benefits will not come automatically” (Mitchell, 1994, p. 4). There is considerable evidence which suggests that individuals and community groups have difficulty in acquiring and using information technologies. In particular, citizens and groups from low income communities and communities of colour are disproportionately affected by lack of access to information technologies (e.g., Sparrow and Vedantham, 1995). 8.2 SCOPE AND SIGNIFICANCE In this context, this chapter explores the appropriateness and the limitations of GIS to facilitate participatory, community-based planning. It is anticipated that spatial mapping and analysis will be valuable to community organisations and groups because it can be used skilfully to identify issues, make comparisons,
84
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
analyse trends, and support socio-political arguments thereby facilitating policy analysis, service delivery, and community participation. There is a push and a pull which is making GIS use increasingly popular at the level of small groups. The push comes from the manufacturers of hardware and software and professional decision makers who are presenting GIS as a panacea for problem solving to a new market segment, while the pull comes from these small groups themselves who are seeking more control over decision-making processes about issues that affect them and see GIS as a useful tool for this purpose. These groups can be characterized as having immediate and local problems, special interests, limited but well defined decision making powers, and limited technical knowledge. To facilitate real participation of community residents in planning and decision-making processes, it is vital that the community has control of, and access to public information. While planners and decision-makers routinely use data and information from any community in order to make decisions for that community, the same information is seldom available or accessible to community residents. Traditionally this disparity in information access has been attributed to the unavailability of both processes and technologies which could involve community residents in traditional planning and decision-making processes. This chapter argues that GIS can be effectively used to facilitate decentralised, community-based decision-making, planning, and research thereby contributing to the creation of an empowered citizenry. While conventional thinking believes that GIS and related technologies tend to centralise decision-making, this author argues that end-users, equipped with a critical world view, will be able to think creatively about ways they can use data, information, and GIS technology in their day-to-day problem solving, thereby making decentralised decision-making a reality. The chapter also argues that Participatory Research (PR) is a viable conceptual and methodological approach to develop critical thinking skills among end-users. 8.3 LITERATURE REVIEW Keeping with the scope of the chapter, this literature review addresses two themes—the nature of the technology and the context within which it will be used. The literature review discusses the organisational and institutional issues surrounding GIS adoption and use. In addition, this section presents a brief discussion about the development of communities, their institutions, and the decision-making processes that are typically encountered in this context. The review concludes by exploring the role that information plays in community-based decision-making. 8.3.1 Organisational and Institutional Issues Affecting GIS Adoption While research in the area of GIS has tended to centre around the technological capabilities of the system, there has been a steady stream of studies which have shown that the adoption and use of GIS is dependent on factors other than those related to the technical capacity (hardware and software) of the system. The GIS literature addressing organisational issues focuses on the diffusion processes of GIS in a variety of organisational contexts. Masser (1993) and Campbell and Masser (1996) have studied GIS diffusion in British local governments while Wiggins (1993) has examined the same issue within public sector agencies in the United States. In addition, Huxhold (1991) has studied GIS diffusion in city government by looking at the development of GIS in the City of Milwaukee over a period of 15 years. Croswell (1991), studying organisations that had acquired a GIS, developed a matrix of common problems associated with GIS implementation in the United States. The main problems identified in a hierarchical
COMMUNITY EMPOWERMENT USING GIS
85
order of high to low incidence are: lack of organisational coordination and conflicts; problems with data and data integration; lack of planning and management support; and the lack of awareness of technology and training. Looking at GIS adoption in planning agencies in developing countries through a case study of a “successful” planning agency in India, this researcher found that developing country agencies follow a model of GIS implementation which is similar to developed countries. As a result, they tend to have the same problems. This research identifies seven factors that facilitate GIS implementation: achieving clarity in problem definition; conducting a user-needs assessment; establishing inter-agency coordination; training of personnel; organising the collection and management of data; designing an incremental system for development; and the important role of advocates within the organisation (Ramasubramanian, 1991). Obermeyer and Pinto (1994) argue that the GIS literature has typically considered the implementation process as a bridge between the system developer and the user. When the system crossed over the bridge, it was regarded as successful. They recommend that implementation success must look at three criteria: technical validity—whether the system works; organisational validity—whether the system meets the organisations and users’ needs; and organisational effectiveness—whether the system improves the present working of an organisation. Masser and Onsrud (1993) propose that the central question in the area of GIS diffusion is to ascertain whether there is any difference between the diffusion process for GIS and for information technology products of similar kinds. The research issue is not what promotes adoption of GIS but what promotes the effective use of GIS. Organisational strategies are considered to be a critical element to enhance GIS use. Research to understand effective use centres around the question of measurement of effectiveness. Looking at institutional issues, the authors argue that research needs to focus on isolating generalisable principles germane to the acquisition, implementation, and particularly the utilisation of a GIS. For example, some of the questions that could be asked are: “What are the organisational and institutional structures that enhance effective implementation and use?”; “What are the strategies that best facilitate the implementation of a GIS within and among organisations?”; “What factors influence implementation and optimal use?”; and “Under what arrangements and circumstances can information sharing more easily occur?” (Masser and Onsrud, 1993). Huxhold and Levinsohn (1995) propose that organisations adopt GIS technology because they anticipate that it will provide new capabilities that will yield benefits to the organisation. Building on this assumption, they outline a conceptual framework to look at GIS adoption and diffusion. This framework consists of four elements: the GIS paradigm, data management principles, technology, and the organisational setting. According to them, a GIS paradigm is the conceptual foundation for using geographic information that provides a common base of reference or focus for the other three elements; data management principles govern the logical structuring and management of large databases that contain map and other data that can be related to the geography of interest to the organisation; technology comprises the effective combination of various hardware and software components that enables the automation of numerous geographic data handling functions; and the organisational setting implies the management environment that provides resources and enables changes to be made for incorporating GIS utilisation throughout the organisation.
86
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
8.3.2 The Context of GIS Application 8.3.2.1 Development of Community-Based Decision-Making King (1981), commenting on community development in Boston over three decades in his book Chain of Change suggests that a community develops in stages. Initially, community residents rely on the good will of others to receive services from city, state, federal, and non-governmental agencies. He calls this the service stage. At this stage, they are not part of any decision-making process. Next, the residents organise themselves into interest-groups to demand, seek and receive services that are appropriate to their needs. They get involved with the decisions that are made about the immediate community. This is the organising stage. Finally, the residents begin to develop and sustain community-based institutions which act as representative voices for them. Community-based organisations so created, then get involved in decisionmaking and attempt to influence issues that affect the immediate community and the surrounding geopolitical region. This is the institution-building stage. This model is useful to understand community-based decision-making because it is a study within spatially defined neighbourhoods over a period of time. Though it does not provide direct empirical evidence, it is written from the community’s perspective and is reliable in understanding community-based decision-making. King’s study compares favourably with another series of case studies that investigate the development of community participation in decision-making in several European countries. Susskind (1983) suggests that communities first experience paternalism in decision-making where the community has no say in the decisions that are made for it. This leads to discontentment, a period of conflict when the decision makers such as city and state agencies struggle with the community to maintain control over the process. The conflict gives way to co-production, a phase where both opposing groups resolve the conflict and create a shared decision-making model. It is important to note that the period of conflict can continue for an extended period of time. 8.3.2.2 Role of Information in Community-Based Decision-Making Since a large part of GIS is related to data and its efficient display and management, it is useful to note that information is seen as a complex source of power in the planning and decision-making process (Forester, 1989). At the same time, several people have argued that we have more data but less information, in short we know more about less (e.g. Friedmann, 1992; Naisbitt, 1994). Forester also suggests that decision makers often make decisions with incomplete information while Alexander (1984) points out that the rational paradigm of decision making is giving way to a host of other paradigms of decision making. King (1981) has presented a geo-political organising model for involving community residents in planning and decision making. One aspect of this model was the development of a computerised directory called the Rainbow Pages. The directory was designed to enable residents to get information about issues and activities that were affecting their neighbourhood. It also provided a way for residents to contact each other and solve problems collectively. This model emphasises self reliance, mutual self help, information availability, and information access. Kretzman and McKnight (1993) have developed a community development strategy which begins “with a clear commitment to discovering the community’s capacities and assets”. While information is critical for
COMMUNITY EMPOWERMENT USING GIS
87
any community development strategy, Kretzman and McKnight argue that cities and urban communities are often defined solely in terms of their negative images. They observe that both academic research and proactive problem solving initiatives begin with “negative” information about neighbourhood “deficits” or “needs”. They present an alternative approach which begins to map community “assets”—an approach that emphasises linkages between different resources or strengths that are present within any neighbourhood or community. Chen (1990), working with the Boston Redevelopment Authority on the South-End Master Plan (a neighbourhood of Boston), found that GIS was a useful way to begin to communicate spatial concepts to non-technical users. His work has been supported by other smaller studies in which non-technical users like high school students and community residents have used GIS to address problems concerning their neighbourhood (Ramasubramanian, 1995). Huxhold and Martin (1996) observe that federal and local funding agencies often require community organisations and groups seeking financial assistance to use data and information. They argue that the use of data and information is beneficial to the funding agency and the community organisations seeking financial support. The funding agency can use the data to determine the relative merit of a grant application and to ensure that the organisation is following funding guidelines. The community organisation seeking financial assistance can plan a more strategic campaign by interpreting the data to demonstrate financial need and the appropriateness of its intervention. In the 1990s, GIS applications have expanded to serve a wide range of users including those users who have typically not used them before. To illustrate with an example, let us take a look at the National Association for the Advancement of Coloured People (NAACP) v. American Family Mutual Insurance Company redlining lawsuit, considered “one of the most important federal cases in Wisconsin history” (Norman, 1995:12). In early 1995, the American Family Mutual Insurance Company agreed to settle a discrimination suit brought against them by NAACP and made a commitment to invest US$ 14.5 million in central Milwaukee. The NAACP had argued that the insurance company was discriminating against African-American residents in Milwaukee’s Northwest side by allowing the practice of “redlining”, that is a policy of systematic disinvestment in the area. While the case never actually went to trial, both parties had gathered a significant volume of data and analyses to support their claims. For their part, the insurance company pointed out that they had a fairly even distribution of insurance policies in Milwaukee using postal zip-codes as the unit of analysis. Their argument was countered by the NAACP who used their own maps to demonstrate that the company’s policies were distributed unevenly, clustered in mostly white census tracts. The “zip code defence” fell apart since mapping the data by census tracts revealed information that was not previously evident in analyses based on zip codes. This is because in the US, Zip codes tend to be so large as to mask differences between white and black neighborhoods, while census tracts are smaller aggregations (Norman, 1995). The NAACP v. American Family case is a high-profile example which vividly portrays the usefulness of GIS. First, it demonstrates that having access to relevant information plays a vital role in identifying issues and placing them within a problem-solving framework. Second, it demonstrates that information plays a significant role in making comparisons, and analysing trends which were required to establish the case for discriminatory behaviour against the insurance company. Third, it demonstrates the power and the potential of spatial analysis and maps to force all parties involved in the debate to address the reality and the gravity of the situation. Finally, this example demonstrates that GIS and related technologies are integral for mapping and analysis, storing large volumes of data, and for looking at different types of data such as demographic information and financial information simultaneously. Thus, if we stop to think about it, we
88
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
begin to recognise that the implementation of information technology and spatial analysis concepts has profound implications for people and communities everywhere, particularly those individuals who are unlikely to have used them before. 8.3.3 Synthesis of Literature Review The literature review covered a broad range of issues—on the one hand, it looked at GIS adoption and diffusion while on the other hand, it looked at community development and decision making. The development of GIS technology which began in the era of mainframe computers has kept pace with other innovations in computer technology. The mapping capacity has been enhanced, systems have become more user-friendly, and most of all, GIS is now affordable to the individual user. At present, GIS software is available with data packages for use on personal computers and can be customised to suit individual needs. GIS has the capacity to analyse several issues that concern community residents, community planners, and decision makers within community-based organisations. However, GIS researchers have not studied GIS adoption and use by small groups such as community-based organisations. Recognising this gap, the National Center of Geographic Information and Analysis has begun to grapple with the social and philosophical issues surrounding GIS adoption by a broader spectrum of society through its Initiative 19 (1– 19) (see Daniel Sui, Chapter 5 of this volume). In the absence of research-based evidence to the contrary, it is safe to hypothesise that GIS adoption in community-based organisations will imitate processes of adoption and use in other contexts such as local governments. Information is invaluable to any community group that intends to work proactively with the local, state, or federal government because administrations tend to rely on empirical evidence and hard data to determine the relative merits of an organisation’s request for funds. However, it should be obvious that, while a planner working for the local government and a community organiser may both use GIS (for example, to map the number of vacant parcels in the neighbourhood), the conclusions they draw and the policy options they recommend to their clients or constituencies will be very different. Both organising models discussed in the literature review use empirical data and qualitative information about the state of the community as a basis for their community development strategies. These models assume that rational, logical, arguments work effectively in influencing decision making around social issues. At the same time, these models use objective information to redefine issues and change the nature of the policy debate. Learning from situations like the NAACP v. American Family case, community activists, informal community groups, and community-based organisations are beginning to view GIS as a useful package of tools and techniques. They are using GIS to understand and evaluate specific problems occurring in their physical environments in areas such as housing, health care, education, economic development, neighbourhood planning, and environmental management (Ramasubramanian, 1995). GIS provides an efficient way to document, update, and manage spatial information, thereby making it possible to conduct analyses over time. GIS has the capacity to level the playing field by assisting small groups to analyse, present, and substantiate their arguments effectively. It also allows them to raise new questions and issues. Environmentalists, for example, have successfully used technologies such as GIS to negotiate their claims and resolve disputes (e.g. Sieber, 1997). Urban communities and their advocates can do the same. The next section will present a conceptual and methodological approach called participatory research and explain why it is an appropriate framework to introduce GIS use to a community group or organisation
COMMUNITY EMPOWERMENT USING GIS
89
comprised largely of non-technical users. It is anticipated that this approach will overcome some of the common problems associated with GIS adoption and its use, particularly issues such as fear of technology and resistance to change. 8.4 PARTICIPATORY RESEARCH CONCEPTS In order to understand participatory research, one first has to understand the basic concept of participatory planning. Friedmann’s (1987) classification of planning traditions is a useful framework to look at participatory planning. The four traditions he identifies are: policy analysis, social reform, social learning, and social mobilisation. Friedmann sees these traditions on a continuum with the policy analysis tradition working to maintain the status quo and social mobilisation working to create change. Positivistic research never discusses the role or value of participation seriously. Participatory planning, on the other hand, accepts ‘participation’ as an implicit condition. Discovering that most of the literature advocating participatory planning comes from the social learning and the social mobilisation traditions should come as no surprise. Why should one look at participatory planning at the present time? Friedmann (1992) and Sassen (1991) argue that it is not possible to plan effectively on behalf of people and states given the changing nature of the economy, the political landscape, and most of all, given the speed at which these changes are occurring, especially in the cities and urban areas of the world. They also suggest that the specialisation of knowledge makes it impossible for one group to plan and determine optimal solutions on behalf of the world community. Brown and Tandon (1991) state that the problems of poverty and social development are complex and require multiparty collaboration. Korten (1986) sees participatory planning as a fundamental construct of a larger strategy supporting people-centred development. Additionally, participatory planning is a model that is working well in situations where it has been undertaken with a genuine commitment to implement change. Thus, it deserves to be studied and seen as a viable social methodology. Finally, participatory planning invokes a basic principle of empowerment—of building the capacity of the community to speak for itself and address issues with the skill and confidence to create change (King, 1992). In his book, Man and Development, Nyerere argues that: “…For the truth is that development means the development of people. Roads, buildings, the increases of crop output, and other things of this nature and not development, they are only tools of development” (Nyerere, 1974, p. 26). Placing people at the centre of the planning and decision making process changes the nature of the debate dramatically. There are several models of participatory planning including Action Science (Argyris et.at., 1985), People-Centred Planning (Daley and Angulo, 1990), Transactive Planning, (Friedmann, 1992), Community-Based Resource Management (Korten, 1983), Participatory Action Research (Whyte, 1991) and Participatory Research (Hall, 1993). The concept of participatory research (PR) stems from larger values of democracy and citizen participation and from within the participatory planning traditions mentioned above. Individuals and communities have come to see participation as an essential approach to effect change. For the purposes of this chapter, PR can be defined as an approach that: 1. develops the capacity of the participants to organise, analyse, and discuss concepts to the level required by the particular endeavour they are involved in;
90
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
2. develops a process to incorporate the participants in the research and decision-making process which includes the formulation of the hypotheses, selection of the research design, and methods of evaluation; and, 3. returns research data and results to the participants. The long term goal of a PR project is to empower people psychologically and politically to effect social change. In the short term, a PR project engages people affected by a particular problem in the analysis of their own situation and emphasises self-reliance, self-assertiveness, and self-determination. According to Gaventa (1993), there are three ways of conducting research within the general framework of the participatory paradigm. The first approach re-appropriates dominant research knowledge. While it is effective in the process of empowerment, it is still based on gaining access to and control of knowledge that has already been codified by others. The second approach which evolves from the first approach aspires to create new knowledge based on people’s experience and makes it possible for people to produce and define their own reality. He recommends a third way where the people are involved in all stages of the research process including problem definition, setting the research agenda, and determining where the results or findings would be used. He argues that once people see themselves as researchers, they can investigate reality for themselves. The role of the outside researcher is changed radically when the research paradigm sees popular knowledge as having equal value to scientific knowledge. The PR perspective presented in Figure 8.1 sees the community as “insiders” who have information and knowledge gained through practical experience. Their knowledge has not been analysed to recognise patterns nor has it been synthesised within any larger societal framework. Individuals in communities live in isolation and their experiences are not connected. Researchers, advocates and consultants are “outsiders” who have seen similar situations and therefore are able to understand patterns and have theories and strategies because of their understanding of existing theoretical frameworks. PR sees these two groups coming together to participate in a mutual learning experience. This phase is dependent on developing effective communication strategies and attitudinal changes in the researcher. The local theory provides context specific cause and effect relationships and it can be shared with insiders to test through action. At the same time, outsiders can take this knowledge and use it to generate theory. This author advocates the use of this participatory perspective in any efforts to use GIS for communitybased planning and decision making. GIS advocates taking the role of the “outside” researcher will be able to communicate more effectively with the community in its attempt to use GIS. The technology will become less of a black box and more of an interactive tool which can be manipulated according to the needs of the users. This framework facilitates decision-making which is based on social learning. It is most effective in context bound situations and in work with small groups. Having briefly discussed participatory research concepts and its usefulness as an approach and a method to introduce GIS to community groups, this chapter will now look at one example in which this approach was attempted. 8.5 REPAIRERS OF THE BREACH—ADVOCATES FOR THE HOMELESS The Repairers of the Breach is a non-profit advocacy organisation in Milwaukee that works with the homeless and those at risk of becoming homeless. The organisation is typical of many community-based organisations in that it runs on a very small budget and relies for the most part on the kindness of strangers and the dedicated service of volunteers.
COMMUNITY EMPOWERMENT USING GIS
91
Figure 8.1 Using GIS from within a Participatory Research Paradigm
Since 1992, the organisation has been concerned about the displacement of low income people, and people of colour in central Milwaukee. In order to confirm what they had documented through anecdotal evidence, key members in the organisation sought to use GIS to monitor displacement. These members were actively involved in defining the research agenda, designing the research questions, and determining the scope and nature of the analysis. While the actual data manipulation and the computer mapping tasks were conducted by university students, the research was directed by the needs and interests of the organisation. Building on this preliminary work, the organisation has decided to use GIS to facilitate community-based research and analysis in order to: • create a comprehensive computer-based socio-economic and environmental profile of the areas they serve in Milwaukee; • customise this profile to include qualitative data and information of particular relevance to people who are homeless or at risk of becoming homeless; and, • develop the skills of neighbourhood residents to gather, analyse, and use data and information about their neighbourhood.
92
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
This model provides one example of how a community can effectively use computer-based technologies like GIS. Presently, key members in the organisation have a very sophisticated conceptual understanding of GIS. Initially, the relationship between the community-based organisation and the university which provided the technical assistance was similar to what King (1981) describes as happening in the service stage. The organisation did not intend to take an active part in the research and analysis but anticipated that the university would provide assistance by addressing their concerns. Over time, leaders in the organisation decided that it is invaluable for those directly affected by the displacement and gentrification trends to be involved in the research and analysis. As the Director, MacCanon Brown points out, “…one concept inspiring this project is a desire to move a system of knowledge into the hands of the people who are often subordinated by the exploitative use of systems of knowledge” (Repairers of the Breach, 1994, p. 9). The experience with this organisation demonstrates that there are several benefits to using a participatory process to work with community-based organisations, particularly if the work involves the introduction of GIS concepts and analysis techniques. Playing the role of outside researcher, the university was able to assist the project without dominating the research process. The university students who worked on the project perfected their technical skills while learning about the policy and pragmatic implications of their research from organisation members who were dealing with the issues surrounding homelessness on a dayto-day basis. The organisation members on the other hand were able to address their concerns from a position of strength, playing a leadership role in the research and analysis. This organisation is currently developing a research proposal to study homelessness in central Milwaukee and has raised money to sustain a drop-in centre with computing facilities to serve the needs of its constituency. The organisation’s actions are one indicator that the participatory approach used for the preliminary research was a catalyst in transforming those individuals who are typically seen as research subjects into researchers. With their enhanced critical thinking skills, the members of this organisation are demonstrating that they can participate with power in decision-making processes. The next section of this chapter will discuss some of the benefits that can accrue and constraints that community-based organisations face as they attempt to acquire a conceptual and technical understanding of GIS. 8.6 BENEFITS AND CONSTRAINTS OF USING GIS The use of GIS within a participatory paradigm bridges the gap between research and practice by creating divergent problem solving perspectives. For example, questions explored through a GIS gain special meaning because demographic, economic, and environmental data can be visually linked with actual features on a map like a house, a tract of land, a stand of trees, or a river. (Audet et.al., 1993). In addition, end-users tend to ascribe meaning to the data they are looking at because it is familiar and concerns them directly. For example, it is not uncommon to find community residents browsing through the database searching for their street, or their home and using address matching features to spatially locate familiar landmarks. As stated earlier, GIS facilitates analysis of spatial patterns and trends over time. A community can analyse the growth and decline of their neighbourhood using common indicators like the number of vacant parcels and the number of building code violations. At the same time, the technology makes it possible for non-technical users to come together to discuss context specific issues. By talking and sharing information, users learn from each other. For example, a community organiser with access to a GIS system can present
COMMUNITY EMPOWERMENT USING GIS
93
information about drug arrests in the neighbourhood over time and use the spatial patterns to talk about the effectiveness of block club organising and neighbourhood watches in preventing drug-related crime. There are some limitations in using GIS to assist community-based planning. Will Craig, Assistant Director at the University of Minnesota’s Center for Urban and Regional Affairs, says that “while the 1990 Census set a new precedent for sharply defined demographic information, community organisers must still depend on cities and counties for much of the other information they need. For example, the city assessor’s office maintains property records. The police department tracks crime. Most public information lands in city computers. More often than not, cities do not distribute this information”. Craig surveyed 31 major US. cities and six Canadian cities and found that while most had broad city data available, the results for ‘subcity’ or neighbourhood data were “dismal” (Nauer, 1994 cf. Ramasubramanian, 1995). During interviews conducted with ten community-based organisations in Milwaukee, this author learned that many of them were experiencing difficulties in their efforts to access and use computer-based technologies. Several barriers to access were noted. They included lack of access to appropriate hardware and software, lack of access to appropriate data and information, lack of appropriate computer skills and research as well as analysis skills, and finally financial constraints. Eight of the ten organisations interviewed are affected by financial constraints. However, only one of the organisations insisted that the lack of financial resources was the primary barrier to access. Lack of access to appropriate hardware and software seems to a major problem affecting most of the organisations interviewed. “We don’t have powerful computers” and “No modems” seemed to be a constant refrain. “We are using seventies technology here”, said one interviewee sounding frustrated about the quality of the computer equipment in her workplace. Another interviewee clarified this point further. She said that even when there are financial resources available to invest in new hardware and software, the end users are not able to get sound advice about what type of system to invest in. Other interviewees appear to agree. Most organisations appear to have adopted a “let’s wait and see” attitude because they feel that the technology is changing too rapidly to be of any use to their organisation. All the organisations surveyed agreed that another major barrier to using computer-based technologies such as GIS, was the lack of computer skills. Some organisations clarified that the lack of access to hardware and software was linked to the issue of training. “Training doesn’t help if we do not have the appropriate systems” says a community leader working with Asian-Americans in Milwaukee. He says that any training programs that his staff attended without actually having the appropriate technology in the workplace were useless. Some organisations said that it took considerable time before individuals could become competent users of the technology. Others indicated that the learning process took a lot of energy. The organisations actually attempting to use computer-based technologies in some way raised the issue that is usually a preoccupation of researchers and analysts—the lack of data and the varying quality of available data. For example, an activist working on issues that affect the American-Indian community pointed out that the census often undercounts American-Indians. In addition, there is very little data that is geared to the needs and interests of this sub-population. MacCanon Brown, the director of the homeless advocacy organisation discussed earlier in the chapter, agrees. She works with the homeless and those at risk of becoming homeless—an invisible population often undercounted and underrepresented in official statistics and analyses. In addition, two or three of the organisations interviewed clamoured for more data. The organisations working on long range planning said that they would like to use all the information pertaining to their neighbourhood that they could get. However, they are beginning to realise that the data are sometimes not available in a form that they can use. For example, data about crime can be aggregated at census tract level, zip code level, or block group level. One organisation may find it useful to look at crimes occurring within a
94
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
block group to mobilise block watches and organise the neighbourhood while another organisation may prefer to look at larger patterns of criminal activity to design intervention programs. One or two interviewees commented on the fact that community-based organisations are so busy maintaining routine operations that they did not have the time or energy to step back and look at the nature of computer-based technologies and the ways in which it could affect their organisation. “Most communitybased organisations cannot look at the global picture-we are different”, said one interviewee. This author observed that many organisations were not able to conceptually integrate computer-based technology use within their current activities. For example, in one neighbourhood organisation, the computer education and resource centre is designed to educate and entertain young people. It is likely to serve adults who want to gain some basic computing and word processing skills. However, the organisation does not appear to have explored the possibility of linking and enhancing their regular programs with the use of information technology. Several organisations are still doing maps with plastic overlays instead of using computer-generated overlays which increase efficiency and accuracy, and can be updated and maintained over time. At the same time, one or two of the organisations were very clear about how they would solve problems using technologies such as GIS. They talked about using the results of the spatial analysis to achieve certain tangible organisational goals such as generating increased awareness of a problem among neighbourhood residents and generating increased resident participation. These organisations also were aware of the value of looking at trends over time, something that is relatively easy to do using GIS. The interviews point out that community-based organisations have to spend a lot of time, energy and resources on gathering fragmented data from different sources and transform it to make it usable before they can use GIS for the neighbourhood scale of analysis. Small groups such as community-based organisations are easily deterred because of these start-up problems. However, this author believes that use of the participatory framework discussed earlier will assist community organisations in overcoming these difficulties. GIS technologies also exhibit certain unique characteristics that affect their diffusion and adoption. One cannot learn information technology concepts as one learns to use a standard computer package. A user of standard computer software begins by learning to manipulate the software. The data that become the focus of the manipulation are created by him or her prior to and/or during the process of working with the software. On the other hand, a GIS user works with several sets of data simultaneously. These data sets are usually created by other entities, for different purposes and with different goals in mind. Therefore, the end user is constantly confronted with uncertainty about the accuracy and availability of the data as well as issues which relate to the capacity of the system and its accessibility, the content, ownership, and privacy of the information. For example, in order to understand the spatial mapping and analysis capabilities available through a GIS, a user has to understand the basic principles of computing and cartography and how data are stored on the system. 8.7 SUMMARY This chapter has argued that GIS can be used effectively at the level of small groups such as communitybased organisations to assist them in problem solving and decision-making. Accordingly, the literature review has discussed the nature of GIS adoption and its use and the context of that use. The literature review has also pointed out that information is used in community-based decision-making because of a push and pull effect—community organisations are beginning to believe that rational arguments supported by data
COMMUNITY EMPOWERMENT USING GIS
95
can support their demands while funding agencies are seeking data and sophisticated analyses to determine the relative merits of the requests for funding they receive. In the next section, the chapter proposed that Participatory Research is a viable conceptual and methodological approach to introduce GIS use to a community group or organisation comprised largely of non-technical users. This approach has been described and explained and further exemplified through the example of the homeless advocacy group’s efforts to use GIS. The benefits of and constraints to using GIS have also been discussed. The critics who argue against GIS use in community-based decision-making often argue that it tends to centralise decision-making and separate it from the realm of understanding of non-technical users. While acknowledging this criticism, this author would like to emphasise that this criticism is more of an indictment about decision-making processes than about GIS and its use. To counter this critique, this chapter has presented a model that approaches decision making through a process of mutual learning between “expert planners and decision makers outside the community” and “community residents who are novice users of technology and information”. This model is very appropriate in looking at issues that affect small groups and communities. There are several barriers to the access and use of information technologies in general, and GIS in particular. Most initiatives to increase access provide the technology, some work on developing data standardisation measures, and data sharing mechanisms. Still fewer initiatives provide access to technology, and data, while putting some rudimentary skills in the hands of end users. However, very few initiatives address what this author believes to be the most important barrier to access—the lack of a critical world view which enables end users to think about ways they can use information technology and GIS in day-to-day problem solving and decision-making. The author is hopeful that the participatory strategies recommended in this chapter will go a long way in encouraging critical thinking among end users. 8.8 CONCLUSION It seems obvious that computer-based technologies like GIS and the decisions made using them are going to affect the lives of ordinary people in communities, even those who are not directly involved in using these technologies (Sclove, 1995). As exemplified in the NAACP v. American Family Insurance case example, it is likely that information will become the centre piece of the “Civil Rights” debate in this decade as corporations continue to use racial and economic demographics to locate and provide services (King, 1994b). Gaventa (1993) argued that the production and control of knowledge maintains the balance of power between powerful corporate interests and powerless individual citizens in a society that is becoming increasingly technocratic, relying on the expertise of scientists to transcend politics. According to him, a knowledge system that subordinates common sense also subordinates common people. This author hopes that the use of GIS within a participatory framework will counter this trend and contribute to the self development and empowerment of community groups by placing information and sophisticated technologies in their hands. Community development is fundamentally concerned about individual and community empowerment. This domain approaches problems through a systems approach in that it addresses more than one problem at a time and makes connections between issues. It believes that community members should be involved in and guide decision-making regarding the development of the community. If research can be defined as a particular process of learning following some codified guidelines, then the question is “who learns?”. In
96
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
non-participatory research, only the researcher learns; in participatory research as discussed earlier, all relevant stakeholders (those who choose to participate in an endeavour) will learn. This learning will empower participants in at least three ways: it will provide specific insights and new understanding of problems; the participants will learn to ask questions and therefore discover how to learn; and the participants will have an opportunity to act using their new knowledge and create new opportunities for their community. Placing GIS as a communication tool within a participatory framework will enhance the quality of decision-making and contribute significantly to individual and community empowerment. REFERENCES ALEXANDER, E.R. 1984. After Rationality, what? Journal of the American Planning Association, 50(1), pp. 62–69. ARGYRIS, C, PUTNAM, R., and SMITH, D. 1985. Action Science: Concepts, Methods, and Skills for Research and Intervention. San Francisco, CA.: Josey-Bass. AUDET, R, HUXHOLD, W., and RAMASUBRAMANIAN, L. 1993. Electronic exploration: an introduction to geographic information systems, The Science Teacher, 60(7), pp. 34–38. BROWN, D. and TANDON, R. 1991. Multiparty Collaboration for Development in Asia. Working paper from Institute for Development Research. Boston, Massachusetts and Society of Participatory Research, New Delhi, India. CAMPBELL, H. and MASSER, I. 1996. Great Britain: The dynamics of GIS diffusion, in Craglia, M., Campbell, H., and Masser, I. (Eds.). GIS Diffusion: the Adoption and Use of Geographical Information Systems in Local Government in Europe. London: Taylor & Francis, pp. 49–66. CHEN, W. 1990. Visual Display of Spatial Information: a Case study of the South End Development Policy Plan, unpublished Masters Thesis, Department of Urban Studies and Planning, Massachusetts Institute of Technology, USA. CROSWELL, P. 1991. Obstacles to GIS implementation and guidelines to increase the opportunities for success, URISA Journal 3(1), pp. 43–56. DALEY, J., and ANGULO, J. 1990. People-centered community planning, Journal of the Community Development Society, 21(2), pp. 88–103, ELDEN, M., and LEVIN, M. 1991. Cogenerative learning: bringing participation into action research, in Whyte, W. (Ed.) Participatory Action Research. Newbury Park, CA.: Sage, pp. 127–142. FORESTER, J. 1989. Planning in the Face of Power. Berkeley, CA: University of California Press. FRIEDMANN, J. 1987. Planning in the Public Domain: From Knowledge toAction. Princeton, NJ: Princeton University Press. FRIEDMANN, J. 1992. Educating the Next Generation of Planners, unpublished working paper, University of California, Los Angeles. GAVENTA, J. 1993. The powerful, the powerless, and the experts: knowledge struggles in an information age, in Park, P., Brydon-Miller, M., Hall, B. and Jackson,T. (Eds.), Voices of Change: Participatory Research in the United States and Canada. Westport, CT: Bergin and Garvey, pp. 21–40. HALL, B. 1993. Introduction, in Park, P., Brydon-Miller, M, Hall, B. and Jackson, T. (Eds.), Voices of Change: Participatory Research in the United States and Canada. Westport, CT: Bergin and Garvey, pp. xiii-xxii. HUXHOLD, W. 1991. An Introduction to Urban Geographic Information Systems. New York, NY: Oxford University Press. HUXHOLD, W., and LEVINSOHN, A. 1995. Managing Geographic Information Systems Projects. New York, NY: Oxford University Press. HUXHOLD, W. and MARTIN, M. 1996. GIS Assists Neighborhood Strategic Planning in Milwaukee, working paper, available from the authors. KING, M.H. 1981. Chain of Change: Struggles for Black Community Development. Boston, MA: South End Press. KING, M.H. 1992. Community Development, unpublished working paper, Massachusetts Institute of Technology, Cambridge, MA, available from the author.
COMMUNITY EMPOWERMENT USING GIS
97
KING, M.H. 1994a. Personal communication with the author. KING, M.H. 1994b. Opening Remarks, Proceedings of The New Technologies Workshop, Massachusetts Institute of Technology. Cambridge, MA: MIT Community Fellows Program. KORTEN, D. (Ed.). 1986. Community Management: Asian Experience and Perspectives. West Hartford, CT: Kumarian Press. KRETZMAN, J. and McKNIGHT, J. 1993. Building Communities from the Inside Out: A Path Toward Finding and Mobilizing a Community’s Assets. Evanston, IL: Center for Urban Affairs and Policy Research, North Western University. MASSER, I. 1993. The diffusion of GIS in British local government, in Masser, I. and Onsrud, I. (Eds.) Diffusion and Use of Geographic Information Systems Technologies, Dordrecht. Kluwer Academic Publishers, pp. 99–116. MASSER, I., and ONSRUD, H. 1993. Extending the research agenda, in Masser, I. and Onsrud, I. (Eds.) Diffusion and Use of Geographic Information Systems Technologies, Dordrecht: Kluwer Academic Publishers, pp. 339–344. MITCHELL, W. 1994. Opening Remarks, Proceedings of the New Technologies Workshop, Massachusetts Institute of Technology. Cambridge, MA: MIT Community Fellows Program. NAISBITT, J 1994. Global Paradox: The Bigger the World Economy, the More Powerful its Smallest Player. New York: Avon Books Inc. NORMAN, J. 1995. Homeowners ensure company does the right thinking, Milwaukee Journal Sentinel Magazine, 26 November 1995, pp. 12–15. NYERERE, J. 1974. Man and Development. London: Oxford University Press. OBERMEYER, N., and PINTO, J. 1994. Managing geographic information systems. New York: The Guilford Press. PICKLES, J. (Ed) 1995. Ground Truth: The Social Implications of Geographic Information Systems. New York: The Guilford Press. RAMASUBRAMANIAN, L. 1991. Mapping Madras: Geographic Information Systems Applications for Metropolitan Management in Developing Countries, unpublished Masters Thesis, Department of Urban Studies and Planning, Massachusetts Institute of Technology. RAMASUBRAMANIAN, L. 1995. Building communities: GIS and participatory decision making, Journal of Urban Technology 3(1), pp. 67–79. RAMASUBRAMANIAN, L. 1996. Neighborhood Strategic Planning in Milwaukee, a Documentation Project. Report researched and written for the NonProfit Center of Milwaukee. Available from the author. REPAIRERS OF THE BREACH. 1994. Proposal for support of a research project submitted to the Poverty and Race Research Action Council. Milwaukee, WI: Available from the author. SASSEN, S. 1991. The global City: New York, London, Tokyo, Princeton: Princeton University Press. SCLOVE, R 1995. Democracy and Technology. New York: Guildford Press. SIEBER, R.E. 1997 Computers in the Grassroots: Environmentalists, Geographic Information Systems, and Public Policy. PhD dissertation, Rutgers University. SPARROW, J. and VEDANTHAM, A. 1995. Inner-city networking: models and opportunities, Journal of Urban Technology, 3(1), pp. 19–28. SUSSKIND, L. 1983. Paternalism, Conflict, and Coproduction: Learning from Citizen Action and Citizen Participation in Western Europe. New York: Plenum. WHYTE. W. (Ed.) 1991. Participatory Action Research. Newbury Park, CA: Sage. WIGGINS, L. 1993. Diffusion and use of geographic information systems in public sector agencies in the United States, in Masser, I. and Onsrud, H. (Eds.) Diffusion and Use of Geographic Information Systems Technologies. Dordrecht: Kluwer Academic Publishers, pp. 147–164..
Chapter Nine Climbing Out of the Trenches: Issues in Successful Implementations of a Spatial Decision Support System Paul Patterson
9.1 INTRODUCTION This chapter describes the development of a particular spatial decision support system (SDSS) and analyses the experience gleaned from the implementation of this system within multiple organisations. This chapter intends to fill a void between GIS conference proceedings which often describe individual user accounts of SDSS implementation within single organisations and academic journals which describe technical model development and simulations but rarely describe the actual implementation of SDSSs within real organisations. By focusing on a system that has been implemented many times across many types of organisations, patterns of important issues emerge. The goal is to step out of the day to day implementation trenches to offer SDSS developers insight into these design and organisational issues which may improve the chances for successful implementation of their systems. The SDSS is RouteSmart™, a mature software system for routing and scheduling of vehicles. RouteSmart™ has been implemented in over 40 organisations, both public and private, in more than seven different industrial sectors. RouteSmart™ implementation therefore offers the opportunity to analyse implementation issues across a wide spectrum of organisations. The author has been involved in the design, coding, and redesign of RouteSmart™ and has personally consulted in its implementation in over 20 organisations since 1990. The first section of this chapter gives a background and description of RouteSmart™. In order to generalise the lessons learned in RouteSmart™ implementations to other similar-type systems, the second section places RouteSmart™ within a classification framework with other SDSSs. Generalising implementation issues to other types of SDSSs can be a risky task. Therefore SDSS typologies are presented to show how RouteSmart™ is related to other types of systems so that other SDSS developers can take the lessons presented in this chapter with the proper level of caution. The third section of this chapter explores the lessons learned from implementation. These lessons fall into the following categories: appropriate data resolution, user feedback and interaction, extensions and customisations, and organisational issues. The fourth and final section explores additional ideas which have not been implemented in RouteSmart™ but which may further improve the chances of successful implementation for other SDSSs.
ISSUES IN THE IMPLEMENTATION OF DECISION-SUPPORT SYSTEMS
99
9.2 BACKGROUND AND DESCRIPTION OF ROUTESMART™ RouteSmart™ is a SDSS for solving routing and scheduling problems over street networks. RouteSmart™ has been implemented in a variety of organisations since 1988 for routing meter readers, sanitation and recycling trucks, mail carriers, express package, telephone book, and newspaper deliverers, and field service personnel (Bodin, et al., 1995). The chances are that if you live in the United States you are currently or will soon be serviced by someone using a RouteSmart™ route. There are two versions of RouteSmart™. The point-to-point version develops routes to individual customers scattered in low densities throughout the service territory. The neighbourhood version develops routes to individual customers clustered on nearly every street in a service territory, such as for garbage collection, mail delivery, or meter reading. The benefits of RouteSmart™ can be summarised in five areas: 1. 2. 3. 4. 5.
balanced workload partitioning between routes, near-optimal travel path generation, automated route mapping and report generation, interface to customer information systems, and user control and override of computer solutions.
By commercial standards RouteSmart™ is a successful example of a SDSS. RouteSmart typically reduces the number of routes needed to service a territory from manual methods. Reductions of 8 to 19 percent are common with reductions as high as 22 percent documented in the press (McCoy, 1995). RouteSmart™ also reduces the time and personnel needed to generate routes—in some cases by as much as six person weeks per rerouting. RouteSmart™ has a short payback period and is occasionally used to justify the entire expense of a general purpose GIS implementation. RouteSmart™ consists of a set of Fortran and C routines that have been integrated within four different GIS systems (Arc/Info, Arc View, GisPlus, and Synercom) in a variety of PC, workstation, and minicomputer environments. Through a user interface the GIS handles the spatial data, topological editing, address matching, selection and extraction of street and customer data for network creation, routing parameter input and formatting, display of routes, manual street swapping between routes, solution management, and report and map generation. The external executables handle network building and topological connectivity testing, the partitioning of customers into balanced routes, and the generation of travel paths. In generating routes and travel paths RouteSmart™ considers one-way streets, single, multiple, and back alley street traversal requirements, mixed modal traversal (driving and walking), turn penalties at each intersection, time windows for customer servicing, demand or supply at each customer location, vehicle capacity, travel times to and from depots and transfer sites when vehicles are full or empty, and office and break times. Extensions have been developed to choose the optimal vehicle mix when vehicle types are constrained by street width. Routes are always balanced on time but can be created based on either number of routes desired (e.g. vehicle availability is the primary constraint), on length of the workday (e.g. overtime reduction is the primary objective), or both. When an integer number of routes cannot be generated within an area a remnant (or partial) route can be created and geographically located by the user. RouteSmart™ has its theoretical roots in the network analysis work of early eighteenth century mathematician Leonhard Euler and in the contemporary work of Larry Bodin, Bruce Golden, Arjan Assad, and Michael Ball of the University of Maryland (Bodin, et al., 1983; Golden and Assad, 1988). The practical application of network analysis was largely limited to very expensive, labour intensive studies before the widespread availability and distribution of GISs and GIS databases (Bodin and Rappoport,
100
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
1992). GIS in this case is an enabling technology-one that fosters the development of a SDSS from theory that has a long history but which could not be practically implemented. 9.3 TYPOLOGIES OF SDSSS There is a large variety of SDSSs. For example: • Retail site location models. For the classic review article see Craig, et al. (1984). • Location-allocation models, such as the Locational Analysis Decision Support System (LADSS) (Densham, 1992). • Watershed analysis models, such as the Geographic Watershed Analysis Modelling System (GEOWAMS) (DePinto, et al., 1994). • Urban simulation models, such as the Integrated Transportation and Land Use Package (ITLUP) (Putman, 1991). There are many others. For an annotated bibliography of papers relating to spatial decision support systems refer to the closing report of the National Center for Geographic Information and Analysis Research Initiative Six (Densham and Goodchild, 1994). For a discussion on general progress in the field of Spatial Decision Support Systems refer to Chapter 11 in this volume. Spatial decision support systems can be categorised in a variety of ways. Working groups for the NCGIA Specialist Meeting for Research Initiative Six on Spatial Decision Support Systems outline two typologies for classifying spatial models: 1. type of model (algorithmic, heuristic, or data manipulation), and 2. type of decision being modelled (decision situation, frequency, stakes, number of decision makers, tools needed, and tool availability) (NCGIA, 1990). Concerning the first typology, RouteSmart™ is a heuristic model for solving route partitioning and travel path problems. In this regard the lessons discussed in the next section will apply most to algorithmic and heuristic models and to a lessor extent data manipulation models. The difference between algorithmic and heuristic models for this discussion is minimal since both require the use of parameters and multiple solutions to compare scenarios. Data manipulation models on the other hand are generally more direct and require less interaction on the part of the user. The second typology is where the important distinctions are made in terms of applicability of these lessons to other SDSSs. Under this second typology RouteSmart™’s decision situation is the operations domain, with a frequency ranging from daily to yearly, for relatively minor stakes beyond the implementing organisation, with relatively few decision makers, using RouteSmart™ and often a customer information system as the primary tools needed. Of these categories it is the decision situation which has the most bearing on the generalisation of these lessons to other SDSSs. Types of decision situations are further explored in section 9.4.4. However it is left to the reader to determine how their SDSS of interest fits into this typology and the corresponding applicability of the lessons learned from RouteSmart™.
ISSUES IN THE IMPLEMENTATION OF DECISION-SUPPORT SYSTEMS
101
9.4 LESSONS LEARNED Many of the lessons learned from RouteSmart™ for successful SDSS implementation can be classified into the following four categories: appropriate data resolution, user feedback and interaction, extensions and customisations, and organisational issues. These categories are all represented as subcomponents of the four major research areas outlined by the National Center for Geographic Information and Analysis Research Initiative Six on Spatial Decision Support Systems (Densham and Goodchild, 1994). 9.4.1 Appropriate Data Resolution The ideal data resolution for a particular SDSS cannot often be used due to a lack of data availability, long computer processing times, statistical considerations, or privacy issues. The first obstacle, data availability, will be overcome for many models as remote collection methods improve and electronic data collection systems and databases are linked together. The second obstacle, long computer run times, are usually due to calculations on large matrices. This will be overcome as computer speeds increase, algorithmic shortcuts are invented, and processing methodologies improve, such as with the use of parallel processing (Densham, 1993). However, the latter two obstacles, statistical validity of aggregated data and privacy concerns, will remain an issue for certain types of models. For instance, in transportation demand models aggregated residential and employment areas called traffic analysis zones (TAZs) are used to indicate where trips originate and terminate. TAZs are used for two reasons. First, large scale personal trip behaviour data are traditionally impractical to collect, although this may change with the adoption of Intelligent Vehicle Highway Systems (IVHS). Second, and perhaps more importantly, freedom of movement considerations are an important issue in modelling people’s individual travel behaviour. Unfortunately TAZs must be large and relatively few in number in order to distribute trips between zones with any sort of statistical accuracy. The trade-off is that the larger these zones are the less accurately trips can be assigned to individual links in the transportation network. This results in traffic assignments to links consisting of generalised aggregations of multiple streets and street segments instead of to individual street segments (Patterson, 1990). The statistical and privacy trade-offs of zone size make the data resolution of transportation models larger than desired thus making the models themselves less useful and less likely to be implemented. This question of appropriate zone size for areas is often referred to as the modifiable areal unit problem (MAUP) and the reader is directed to the classic review article of Openshaw (1984). RouteSmart™ has evolved from an arc segment level data resolution and structure down to the detailed customer level. For SDSSs dealing in application domains which directly impact individuals this is an important goal. Initial RouteSmart™ designs for the neighbourhood version used the centreline street segment as the fundamental data element. Later revisions reduced this to the blockface level. More recent revisions have reduced the resolution of map and report output down to the customer level, although the blockface still remains the basic data structure of the partitioning and travel path optimisation routines in the neighbourhood version. By its very nature the resolution of the point-to-point version has always been at the customer level. The push for this reduction in data resolution corresponds to the general trend of both private and public organisations to become more customer service oriented. RouteSmart™ helps organisations be more responsive by differentiating customers by service type and special requirements on maps and reports. This push has also been extended into two-way interfaces between RouteSmart™ and various customer information, billing, and support systems, such as automated generation of change of
102
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
service day postcards. This push has also created the ability to update routes automatically as customers are added or removed. One unfortunate aspect of RouteSmart™’s data structure is the redundancy between its internal network structure and the street centrelines stored in the GIS. A tighter integration between the geographic elements and the analytical elements (network) as suggested at the NCGIA Specialist Meeting on Spatial Decision Support Systems would create a smaller, faster, and easier to use SDSS (NCGIA, 1990). 9.4.2 User Feedback and Interaction SDSSs must allow user feedback and interaction. Reality (geographic or any other facet) cannot be modelled perfectly in any decision support system, no matter how fine the data resolution and sophisticated the algorithms. There must be opportunities for humans to insert knowledge of local conditions in order for the model to have a chance at being implemented successfully. The model should be able to adapt to these new inputs. If not the model results will be simply a first cut at the spatial decision and the decision maker will be left to adjust the results to fit reality (or not)! In this case the model has limited utility. Perhaps the model designer has a better understanding of the underlying geographic processes at work but the ultimate decision maker is little better off than before. User feedback and interaction have different meanings and take different forms depending on the type of SDSS. Take for example a sophisticated retail site location model with data resolution at the parcel level that is sensitive to sales cannibalisation from outlets within the same franchise. With this model, two potential sites that are located across a divided road from each other would appear as essentially the same site. One or the other might be selected as among the most optimal but not both. In this case the user would want to interact with the model to separate these two sites in geographic space (perhaps by taking into account drive time distances and turn impedances) and rerun the model to see if perhaps both of these sites are suitable. They may very well both be suitable, accounting for the phenomena we see in the real world such as two gas stations of the same franchise located across a divided road from each other. RouteSmart™ allows user feedback and interaction in two ways: through its modular design and its solution approach to modelling. First RouteSmart™ is a collection of algorithms and processes that are run in sequence but designed to be modular to allow user feedback, if desired, between each discrete step. These user feedback steps include the selection of areas and customers of interest, the input of routing parameters, selection of the seed location for remnant routes, the ability to swap streets and customers between routes, the adjustment of turn penalties, the creation of user-defined routes, and control over travel path map plotting scales and number of sequenced maps generated. The second component of RouteSmart™’s user feedback and interaction is its orientation toward solution generation and comparison instead of the generation of a single answer. This is accomplished by allowing solutions to be saved at various stages of completeness and to spawn multiple children solutions. A considerable amount of the system deals with solution management. This is similar to the scenario manager approach develop for a watershed analysis model of the Buffalo River (DePinto, et al., 1994). The disadvantage of this approach is that solutions become obsolete as data changes. Object oriented data structures are being considered in RouteSmart™ as an antidote to this dilemma of solution concurrency. There would be a collection of route objects and their associated children travel path objects. Each of these objects would contain relevant solution history and creation information as well as embedded code to handle notification events such as changed customers or network barriers.
ISSUES IN THE IMPLEMENTATION OF DECISION-SUPPORT SYSTEMS
103
One area where RouteSmart™ could do a better job in allowing user feedback is in the determination of travel paths to and from offices, depots, and transfer sites. Currently the minimum time path is found which may traverse residential roads which drivers may prefer to avoid. Combining a hierarchical path generation algorithm such as the one proposed by Carr in Chapter 31 of this volume with the ability of the analyst to define dynamically and alter these hierarchies would be one solution to this shortcoming. Ultimately it is the drivers on the street who have final control. They will change the route solutions to fit reality as necessary. They will encounter static problems unforeseen by the routing analyst and random problems unforeseen by anyone, including themselves. Reality is complicated. In the end SDSS output of this type is simply a guide to better decisions, not the final word. 9.4.3 Extensions and Customisations In order for SDSSs to move from academic tools for research to practical decision making tools for organisations they must be developed in a manner that allows easy extensions and customisations. These characteristics are necessary for the widespread implementation of any type of SDSS. SDSSs should assist people in decision making. They should not restrict or constrain decisions and they should not change the organisation’s operating procedures or protocols simply because they cannot be customised in any way. Change for the better is good but change for software design limitations is intrusive and will meet with resistance. Even among the fairly homogenous sanitation collection and utility meter reading industries there is wide variety in operating procedures, terminology, and services rendered. Barriers to implementation might include the need to have a particular output format. For example one county in the USA wants to have only the outer boundary streets on its route maps to reduce map clutter and driver confusion. However a different county wants very detailed maps showing house numbers, travel sequence numbers, comments, and special symbology for each customer on each street. The trick in SDSS development, as in software development in general, is to generalise these customisations so that other organisations can benefit from extended functionality. There is a need for user-controlled terminology on menus, buttons, and report column headings. There is also a need to allow special add-ins and extensions that a client may build on top of the SDSS. The SDSS needs to be open ended so it is extensible and can grow in the future. This is more difficult to do than to state. Perhaps in the long run object oriented data structures may help in this area. 9.4.4 Organisational Issues There are many sensitive human and management issues surrounding the implementation of a SDSS in an organisation. Participants in NCGIA’s Collaborative Spatial Decision Making Specialist Meeting (Research Initiative 17) defined Joe’s Cube as having three axes representing the physical, environmental, and procedural settings of a decision making context or situation as referred to in the section on SDSS typologies. The second axis of Joe’s Cube, the environmental axis, measures the organisational impact of a decision making process “in the context of a coupling index that ranges from “tightly coupled”, representing a small group of people with similar goals working on a clearly defined project, to “loosely coupled”, where there is a large group with dissimilar goals working on a problem which is multi-faceted” (NCGIA, 1995, p. 11).
104
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
A simple and practical adaptation of this “environmental axis” would be to classify SDSSs into those that are either strategic, planning, or operations oriented. SDSSs that are strategic in nature, such as site location and marketing models, have less impact on the day to day running of an organisation because the size of the decision making group is small. They also have less impact on society as a whole. SDSSs that are planning oriented, such as transportation demand and land use allocation models, have a wide impact on society as a whole but less of an impact within the particular implementing organisation (although a general purpose GIS system on the other hand will have a large impact). In this planning context decisions will be made one way or another, with or without the use of a SDSS. SDSSs that are operations oriented in nature, such as RouteSmart™, can have a tremendous impact on an organisation. The more central a SDSS is to an organisation’s operations and mission the more serious will be the organisational issues. Knowing where a particular SDSS falls within this framework of strategic, planning, or operational domains will help indicate the extent to which organisational issues should be heeded. Nonetheless, organisational impacts, large or small, should be considered to increase the chance for a successful implementation. In terms of RouteSmart™ and other SDSSs in the domain of operations, the major issues in implementation include the following: 1. Goals of management. Does management wish to implement a SDSS for a desire to modernise, improve efficiency, placate or impress stockholders, council members, or other constituent groups? Understanding the goals of management is key to a successful implementation. 2. Labour relations. RouteSmart™’s objective is to decrease the amount of labour required to service a territory. When people’s jobs are at stake implementation can be very difficult. A strategy for overcoming this obstacle while still benefiting from efficiency is to accommodate future growth without having to hire new people. 3. Old school methods. Implementation of a SDSS to replace manual methods is seen as a loss of control, status, and power by those who used to perform the manual methods. Indeed it is a loss of influence. These people should be incorporated into the new implementation, both for their acceptance of the new system and for the valuable knowledge they possess which cannot be replaced by a model. If they are not a component of the new decision process they will likely attempt to sabotage the SDSS either directly or covertly by undermining the implementation of the model results. 4. Computer scepticism. Computer illiteracy and general mistrust of computer programs that attempt to model reality can hinder implementation. While scepticism may be well founded when it comes to complex models like SDSSs, training classes and on-site consulting can help alleviate unnecessary fear. Training classes using real client data can further alleviate this fear by keeping the content familiar and less abstracted from the user’s reality. 9.5 IMPROVEMENTS FOR SUCCESSFUL IMPLEMENTATION Granted that RouteSmart™ is a SDSS with a rather well defined problem and a history of theory underlying it, what other improvements can be suggested that would apply to other SDSSs? Two suggestions are an explicit attempt to model uncertainty involved with any SDSS solution, and the incorporation of multiparticipant decision making methods for assigning weights to various factors in certain classes of models. Uncertainty is a problem not only for the model builder but for the implementing organisation as well. If uncertainty can be incorporated into the model directly, perhaps in the form of an endogenous variable that can be manipulated by users in an iterative interactive manner, better sensitivity analysis could be
ISSUES IN THE IMPLEMENTATION OF DECISION-SUPPORT SYSTEMS
105
conducted. As an example, in the IVHS model of real time traffic monitoring and route advising, the system being modelled is composed of thousands of individual decision makers, all obtaining the same information and game playing with each other to beat the system. Should the SDSS recommend different “best” alternative routes for each vehicle? How does the model handle the fact that each decision maker is free to ignore the model’s advice to choose his or her own route? Clearly this is a chaotic system of individual actors in which uncertainty must be modelled directly. Only then can sensitivity analysis be performed by the implementing organisation. Similar uncertainties exist with integrated land use and transportation models. These SDSSs attempt to model the simultaneous decisions of an entire city population’s choice of residential and work locations along with transportation modes and routes, using exogenous employment and population projections (Putman, 1991). RouteSmart™ could also benefit from incorporating uncertainty in the service time and demand at each customer location and the drive time uncertainties at various time periods of the day, to mention just a few variables. The form this uncertainty variable takes in any particular SDSS depends on the characteristics and functional form of that SDSS but attempts to incorporate variables of this type are sound development principles. The second suggestion involves incorporating multi-participant decision making models directly within SDSSs—particularly where the weighting of various factors is open to debate, such as land allocation and siting models. Having stakeholders participate directly and in conjunction (rather than disjunction) to derive appropriate weights and values improves the quality of the decisions being made and increases the chance for implementation. A formal approach to modelling qualitative values between multiple participants is the analytical hierarchy process (Saaty, 1990). These models exist today and should be linked either formally or informally to SDSSs. A tightly-coupled formal linkage is preferred as it increases the ability to do direct sensitivity analysis under various weighting scenarios without having the participants wait in a latency period while the models trade data and solutions back and forth. When multiple people with conflicting objectives are using a SDSS, solution speed becomes even more important because in many ways the model is acting as mediator and needs to be perceived as available and responsive. 9.6 CONCLUSION The goal of this chapter was to provide design guidelines and illuminate organisational issues which should be addressed by SDSS developers to improve their chances for successful implementation. The major topics covered include appropriate data resolution, user feedback and interaction, extensions and customisations, and organisational issues. In making such generalisations it is necessary to provide a framework against which other SDSSs may be compared for similarity to determine the level of transferability of these suggestions. It is left to the reader to determine the suitability of these suggestions to their systems. Regardless of the suitability of specific suggestions, this analysis of implementation issues is intended to keep SDSS builders focused on the end result of their work. If support systems are built but cannot be successfully implemented, of what value are they beyond scientific curiosity? True advancement in this field will be when we can point to many models in use—and success will be when they become ubiquitous within organisations. The hope is that by digging deep enough into the trenches of how a particular SDSS has been implemented across multiple organisations the resulting mound of experience will provide a vantage point from which to help route future SDSSs to success.
106
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
ACKNOWLEDGEMENTS RouteSmart is a registered trademark of Bowne Distinct Ltd. REFERENCES BODIN, L. and RAPPOPORT, H. 1992. Vehicle routing and scheduling problems over street networks, Paper presented at Waste Management, Equipment, and Recycling Conference, Rosemont, IL, 6–8 October (Not in proceedings). BODIN, L., PAGAN, G., PATTERSON, P., JANKOWSKI, B., LEVY, L., and DAHL, R 1995. The RouteSmart™ system—functionality, uses, and a look to the future, Paper presented at the 1995 Environmental Systems Research Institute User Conference, Palm Springs, CA, 21–26 May (Not in proceedings). BODIN, L., GOLDEN, B.L., ASSAD, A., and BALL, M. 1983. Routing and scheduling of vehicles and crews: the state of the art, Computers and Operations Research, 10(2), pp. 63–211. CRAIG, S. C, GHOSH, A., McLAFFERTY, S. 1984. Models of the retail location process: a review, Journal of Retailing, 60(1), pp. 5–36. DENSHAM, P.J. 1992. The Locational Analysis Decision Support System (LADSS) Software Series S-92–3. Santa Barbara, CA: National Center for Geographic Information and Analysis. DENSHAM, P.J. 1993. Integrating GIS and parallel processing to provide decision support for hierarchical location selection problems, Proceedings of GIS/LIS’93, Minneapolis, November, Baltimore: ACSM-ASPRS-URISA-AM/ FM, vol. 1 pp. 170–179. DENSHAM, P.J. and GOODCHILD, M.F. 1994. Research Initiative Six: Spatial Decision Support Systems Closing Report. Santa Barbara, CA: National Center for Geographic Information and Analysis. DePINTO, J. V, CALKINS, H.W., DENSHAM, P.I, ATKINSON J., GUAN W., and LIN, H. 1994. Development of GEOWAMS: an approach to the integration of gis and watershed analysis models, Microcomputers in Civil Engineering, 9, pp. 251–262. GOLDEN, B.L. and ASSAD, A. 1988. Vehicle Routing: Methods and Studies. Amsterdam: North-Holland. McCOY, C.R 1995. High tech helps haul the trash, The Philadelphia Inquirer, 10 July. Philadelphia, PA. NCGIA. 1990. Research Initiative Six—Spatial Decision Support Systems: Scientific Report for the Specialist Meeting, Technical Paper 90–5. Santa Barbara, CA: National Center for Geographic Information and Analysis. NCGIA. 1995. Collaborative Spatial Decision Making: Scientific Report for the Initiative 17 Specialist Meeting, Technical Paper 95–14. Santa Barbara, CA: National Center for Geographic Information and Analysis. OPENSHAW, S. 1984. The Modifiable Areal Unit Problem. Norwich: Geo Abstracts. PATTERSON, P. 1990. An evaluation of the capabilities and integration of aggregate transportation demand models with GIS technologies, Proceedings of the Urban and Regional Information Systems Association (URISA) 1990, Washington: URISA, vol. 4, pp. 330–341. PUTMAN, S.H. 1991. Integrated Urban Models 2. London: Pion. SAATY, T. 1990. The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. New York: McGraw-Hill.
Part Two GI FOR ANALYSIS AND MODELLING
Chapter Ten Spatial Models and GIS Michael Wegener
10.1 INTRODUCTION Spatial models have become an important branch of scientific endeavour. In the environmental sciences they include weather forecasting models, climate models, air dispersion models, chemical reaction models, rainfall-runoff models, groundwater models, soil erosion models, biological ecosystems models, energy system models and noise propagation models. In the social sciences they include regional economic development models, land and housing market models, plant and facility location models, spatial diffusion models, migration models, travel and goods transport models and urban land-use models. The representation of space in the first generations of spatial computer models was primitive. It essentially followed the organisation of statistical tables where each line is associated with one spatial unit such as a statistical district, region or “zone” and the columns represent attributes of the areal unit. Networks were coded as lattices, but because nodes were not associated with coordinates, the geometry of networks was only vaguely represented by the lengths (travel times) of their arcs. All this has changed with the advent of geographic information systems. GIS have vastly increased the range of possibilities of organising spatial data. Together with improvements in data availability and increases in computer memory and speed they promise to give rise to new types of spatial models, make better use of existing data, stimulate the collection of new data or even depart for new horizons that could not have been approached before. It was the purpose of the GISDATA Specialist Meeting “Spatial Models and GIS” to find out whether GIS have had or will have an impact on spatial models in the environmental and social sciences (Heywood et al., 1995). The underlying hypothesis was that the new possibilities of storing and handling spatial information provided by GIS might contribute to making existing spatial models easier to use and give rise to new spatial models that were not feasible or not even imagined before. The meeting at Friiberghs Herrgård near Stockholm in June 1995 brought together researchers involved in environmental and socioeconomic modelling to assess the potential and limitations of the convergence of spatial modelling and GIS, to formulate a research agenda for making the best use of this potential and to explore avenues towards more integrated spatial models incorporating both environmental and socio-economic elements in response to the urgent social and environmental problems facing cities and regions today.
SPATIAL MODELS AND GIS
109
10.2 SPATIAL MODELS A model is a simplified representation of an object of investigation for purposes of description, explanation, forecasting or planning. A spatial model is a model of an object of investigation in bispace (space, attribute). A space-time model is a model of an objective of investigation in trispace (space, time, attribute). There are three categories of spatial models with respect to their degree of formalisation: scale, conceptual and mathematical models (Steyaert, 1993). Scale models are representations of real-world physical features such as digital terrain models (DTM) or network models of hydrological systems. Conceptual models use quasi-natural language or flow charts to outline the components of the system under investigation and highlight the linkages between them. Mathematical models operationalise conceptual models by representing their components and interactions with mathematical constructs. Mathematical models may use scale models for organising their data. In the following discussion the emphasis is on mathematical models. Another important classification of spatial models is how they deal with the indeterminism of real-world phenomena (Berry, 1995). Deterministic models generate repeatable solutions based on the direct evaluation of defined relationships, i.e. do not allow for the presence of random variables. Probabilistic models are based on probability distributions of statistically independent events and generate a range of possible solutions. Stochastic models are probabilistic models with conditional probability distributions taking into account temporal and spatial persistence. A third basic classification refers to statics/dynamics. In a static model all stocks have the same time label, i.e. only one point in time is considered. Static models are usually associated with the notion of a steady state or equilibrium. In a dynamic model stocks have two (comparative statics) or more time labels, hence change processes are modelled. Dynamic models may treat time as continuous or discrete. Models with discrete time intervals are called simulation models; with fixed time intervals (periods) they are called recursive, with variable time intervals event-driven. Spatial models can also be classified according to their resolution in space, time and attributes, ranging from the microscopic to the macroscopic. The space dimension can be represented by objects with zero dimension (points), one dimension (lines), two dimensions (areas) or three dimensions (volumes). The size of objects may range from a few metres to thousands of kilometres. In similar terms the time dimension can be represented with zero dimension (event) or one dimension (process); the resolution may range between a few seconds and hundreds of years. It is misleading to talk about time as the “fourth” dimension as there are dynamic spatial models without three-dimensionality. The attribute dimension may be single- or multiattribute. The resolution may range from individual objects (molecules, neurons, travellers) described by a list of attributes to large collectives (gases, species, national economies) described by averages of attributes, with all stages in between. Simulation models of individual objects are called microsimulation models; microsimulation models do not need to simulate all objects of the system of investigation but may work with a sufficiently large sample. There are many more ways of classifying spatial models that can only be indicated here. Beyond the above criteria, spatial models can be classified by: • comprehensiveness: some models deal only with one spatial subsystem, whereas others deal with interactions between different spatial subsystems. • model structure: one group of models applies one single unifying principle for modelling and linking all subsystems; other models consist of loosely coupled submodels, each of which has its own independent internal structure.
110
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
• theoretical foundations: environmental models rely on physical laws, whereas socioeconomic models apply conceptualisations of human behaviour such as random utility or economic equilibrium. • modelling techniques: models may differ by modelling technique such as input-output models, spatial interaction models, neural network models, Markov models or microsimulation models. In the following section spatial models in the environmental and social sciences will be classified by application field. 10.3 APPLICATION FIELDS The range of applications of spatial models in the environmental and social sciences is large and rapidly expanding. The most important application fields at present are described in the following sections: 10.3.1 Environmental Sciences Goodchild et al. (1993b) and Fedra (1993) use the following classification of spatial models in the environmental sciences: • Atmospheric modelling includes general circulation models used for short-or mediumterm weather forecasts or global, regional or micro climate forecasts and atmospheric diffusion models used for modelling the dispersion of pollutants originating from point, line or areal sources and carried by atmospheric processes including chemical reactions (Lee et al., 1993). • Hydrological modelling includes surface water models, such as rainfall-runoff models, streamflow simulation models and flood hydraulics, and groundwater models, such as groundwater flow models, groundwater contamination transport models and variably saturated flow and transport models (Maidment, 1993). • Land-surface-subsurface process modelling includes plant growth, erosion or salinization models, geological models and models of subsurface contamination at hazardous disposal sites or nuclear waste depositories and are typically combined with surface water and groundwater models (Moore et al., 1993). • Biological/ecological systems modelling comprise terrestrial models such as single-species and multispecies vegetation and/or wildlife models, e.g. forest growth models, freshwater models such as fish yield models and nutrient/plankton models for lakes and streams, and marine models such as models of migrations of fish and other sea animals and models of the effect of fishing on fish stocks (Haines Young et al., 1993; Hunsaker et al., 1993; Johnston, 1993). • Integrated modelling includes combinations of one or more of the above groups of models, such as atmospheric and ecosystem models (Schimel and Burke, 1993) or climate, vegetation and hydrology models (Nemani et al., 1993). From the point of view of environmental planning, energy system models and noise propagation models at the urban scale might also be included under the heading of environmental modelling.
SPATIAL MODELS AND GIS
111
10.3.2 Social Sciences Spatial models in the social sciences originated from several disciplines such as economics, geography, sociology and transport engineering and have only since the 1960s been integrated by “synthetic” disciplines such as regional science or multidisciplinary research institutes and planning schools. However, as a point of departure, the classification by discipline is still useful: • Economic modelling with a spatial dimension includes international or multiregional trade models and regional economic development models based on production functions, various definitions of economic potential or multiregional input-output analysis, and on the metropolitan scale models of urban land and housing markets based on the concept of bid rent. Normative economic models based on location theory (minimising location and transport cost) are used to determine optimum locations for manufacturing plants, wholesale and retail outlets or public facilities. • Geographic modelling include models of spatial diffusion of innovations similar to epidemiological models, migration models based on notions of distance and dissimilarity between origin and destination regions frequently coupled with probabilistic models of population dynamics, spatial interaction and location models based on entropy or random utility concepts and models of activity-based mobility behaviour of individuals subject to constraints (“space-time geography”). • Sociological modelling has contributed spatial models of invasion of urban territories by population groups based on analogies from plant and animal ecology (“social ecology”) and models of urban ‘action spaces’ related to concepts of space-time geography. • Transport engineering modelling includes travel and goods transport models based on entropy or randomutility theory with submodels for flow generation, destination choice, modal choice, network search and flow assignment with capacity restraint resulting in user-optimum network equilibrium, and normative models for route planning, transport fleet management and navigation in large transport networks. In more recent developments, concepts of activity-based mobility have been taken up by transport modellers to take account of non-vehicle trips, trip chains, multimodal trips, car sharing and new forms of demand-responsive collective transport. • Integrated modelling includes approaches in which two or more of the above specialised models are combined, such as integrated models of spatial development at the regional or metropolitan scale. Typically such models consist of models of activity location, land use and travel; more recently environmental aspects such as energy consumption, CO2 emissions, air pollution, open space and traffic noise are also addressed. Spatial modelling in the social sciences seems to be more fragmented than in the environmental sciences. At the same time the need for integrative solutions is becoming more urgent because of the interconnectedness of economic, social and environmental problems. 10.4 SPATIAL MODELS AND GIS Geographic information systems are both specialised database management systems for handling spatial information and toolboxes of methods to manipulate spatial information. Because of the limited analytical or modelling capabilities of present GIS, the toolbox side of current GIS seems to be of little interest. Instead it seems to be much more relevant to examine whether the organisation of spatial information in GIS
112
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
is appropriate for spatial models and, more importantly, whether it might facilitate new ways of applying existing models or stimulate the development of new ones. 10.4.1 Data Organisation of Spatial Models Pre-GIS spatial models have used predominantly the following five types of data organisation: Stock matrix. In aggregate spatial models space is subdivided into spatial units usually called zones. In general the area but not the shape or the neighbourhood relations of the zones are known to the model. All attributes of a zone are stored as a (sometimes very long) vector. So the study region is represented as a twodimensional matrix where the rows are the zones and the columns are the attributes. To keep the number of columns of the matrix manageable, the attributes are classified, however, sometimes an element of the attribute vector is a pointer to more complex information such as a univariate or bivariate distribution, for instance of households or dwellings. It is implicitly assumed that all attributes of a zone are uniformly spatially distributed throughout the zone, so the size of the zone determines the spatial resolution of the model. If the model is dynamic, there is one matrix for each point in time. The problem with this data organisation is that the zones (usually administrative subdivisions) are rarely homogenous with respect to the classified attributes and that the attributes are rarely uniformly distributed across the area of the zone. Interaction matrix. The spatial dimension of the model is introduced only indirectly by one or more interaction matrices and/or through the explicit representation of interzonal networks (see below). The interaction matrices are usually square matrices where both rows and columns represent zones. They contain either indicators of the spatial impedance between the zones such as distances, travel times or travel costs, or spatial interactions such as personal trips or flows of commodities or information. In spatial inputoutput analysis, the matrix of flows is actually four-dimensional comprising both inter-zonal and interindustry flows, but for practical reasons this is rarely implemented. If the model is dynamic, there is the same set of interaction matrices for each time period. Network. Pre-GIS network coding is a vector representation of the links of the network. The network topology is introduced by coding a from-node and to-node for each link. However, the coordinates of the nodes are not normally coded because the impedance of the links is determined only by its attributes such as length, mean travel time, capacity, etc. The association between networks and zones is established by pseudo links connecting arbitrarily localised points in the zones (“centroids”) to one or more network nodes. There is no other spatial relationship between network and zones, so spatial impacts of flows within the zones such as traffic noise cannot be modelled. List. Some disaggregate (microsimulation) models seek to avoid the disadvantages of the aggregate matrix representation of classified stocks by using a list representation of individual persons or objects. The list may contain all or a sample of the stock of a kind in the zone, for instance all individuals, households and dwellings or a representative sample of individuals, households and dwellings. Each list item is associated with a vector of attributes referring to it, so no averages are needed. Attributes can contain spatial information (such as address) or temporal information (such as year of retirement). One problem of list organisation is that matching operations (e.g. marriage) are not straightforward. However, by using more sophisticated forms of lists such as inverted lists, search in lists can be made more efficient. Raster. Even before raster-based GIS, a raster organisation of spatial data has been popular in environmental and to a lesser degree in social-science modelling. Raster organisation has the advantage that the topology is implicit in the data model and that some operations such as buffering and density
SPATIAL MODELS AND GIS
113
calculations are greatly simplified, but in every other respect raster-based models share the problems of zone-based models, unless the raster cells are very small. 10.4.2 Data Organisation of GIS for Spatial Models This section examines how the data structures offered by GIS correspond to the data structures used in spatial models. Spatially aggregate zone-based models can be well represented by the polygon data object of vectorbased GIS. Zones are represented by polygons and zonal attributes are stored in polygon attribute tables. However, this one-to-one correspondence underutilises the potential of vector-based GIS to superimpose different layers or coverages of vector-based GIS and to perform set operations on them. There is no advantage in using a GIS for this kind of model. In addition there are no facilities to store multiple sets of attributes with different time labels for one topology without duplicating the whole coverage. There is no data structure in current GIS which corresponds to interaction matrices storing attributes relating to interactions between all pairs of polygons. Networks can be conveniently represented as line coverages with the link attributes stored in arc attribute tables and the node attributes stored in associated point attribute tables. However, it is hardly possible to represent the temporal evolution of networks, i.e. the addition, modification or deletion of links at certain points in time. In pre-GIS times this was done by entering add, change or delete operations with a time label. This, however, is precluded in GIS in which there can be only one arc with the same from-node and to-node. In addition it is relatively difficult to associate two link-coded networks with each other (such as a road network with a public-transport network) or a link-coded network with additional route information where a route is a sequence of links (such as a public transport line). Also the network operations built into some GIS or GIS add-ons, e.g. operations for minimum-path search or location/allocation are no incentive for using a GIS for network representation as they tend to be too simplistic, inflexible and slow to compete with state-of-the-art network modelling algorithms. Nevertheless the ease of digitising, data entry and error checking makes it attractive to use GIS for network coding, even if all further processing takes place outside the GIS. There is a strong affinity between micro data coded in list form and the way point data are stored in point attribute tables in GIS. Therefore it is here where the potential of GIS for supporting spatial models seems to be most obvious. At the same time point attribute tables and point operations are the least complex in GIS and therefore might be most easily be reproduced outside a GIS. In addition there remains the difficulty of specifying multiple events with different time labels for one point at one location or of specifying the movement of one point from one location to another due to the requirement of most GIS that one point identifier can only be associated with one location or coordinate pair. Finally there are raster-based GIS. Their data organisation corresponds directly to that of raster-based spatial models and so shares their advantages and weaknesses plus the added difficulty of introducing time into the model. However, if a very small raster size is chosen, raster-based GIS and raster-based spatial models take on a new quality. If the raster cell is reduced to the size of a pixel, raster-based spatial models allow the generation and manipulation of quasi-continuous surfaces. Moreover, in conjunction with appropriate spatial interpolation techniques, it is possible to co-process polygon-based, network-based and list-based spatial models in one common spatial framework. For instance, in a travel simulation one might use a list to sample trip origins from a population density surface created by spatial interpolation from zonal data, access the nearest network node pixel, perform destination,
114
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
mode and route choice on the link-coded network and return to pixel representation at the destination. The results of such a simulation may be used as link-by-link information to drive a capacity-restraint or networkequilibrium model or may be used as pixel-by-pixel input to environmental impact submodels such as air dispersion or noise propagation models or may be used to drive output routines generating 2D or 3D surface representations. It would be worthwhile to explore the potential of raster-based add-ons to vector-based GIS to support such applications. 10.4.3 Coupling Spatial Models and GIS Many of the more sophisticated algorithms to process spatial data in spatial models are currently not available in commercially available GIS. This brings up the question how spatial models should be integrated with the GIS. Four levels of integration of spatial analysis routines with GIS with increasing intensity of coupling can be distinguished (Nyerges, 1993): • Isolated applications. The GIS and the spatial analysis programme are run in different hardware environments. Data transfer between the possibly different data models is performed by ASCII files offline; the user is the interface. The additional programming expenditure is low, but the efficiency of the coupling is limited. • Loose coupling. Here the coupling is carried out by means of ASCII or binary files; the user is responsible for formatting the files according to format specifications of the GIS. This kind of coupling is carried out on-line on the same computer or on different computers in a local network; with relatively little extra programming the efficiency is greater than with isolated applications. • Tight coupling. In this case the data models may still be different, but automated exchange of data between the GIS and the spatial analysis is possible through a standardised interface without user intervention. This increases the effectiveness of data exchange but requires more programming effort (e.g. macro language programming). The user remains responsible for the integrity of the data. • Full integration. This linkage operates like a homogeneous system from the user’s point of view; data exchange is based on a common data model and database management system. Interaction between GIS and spatial analysis is very efficient. The initial development effort is large, but may be justified by the ease by which later model functions can be added. Maguire (1995) lists examples of various levels of integration available in current GIS software. An external model offers the advantage of independent and flexible development and testing of the model, but is only suitable for loose coupling. Embedding the spatial model into the GIS has the advantage that all functions and data resources of the GIS can be used. However, present GIS fail to provide interfaces for standard computer languages such as C, Pascal or Fortran necessary for internal modelling. In the long run graphical user interfaces from which the user can call up both GIS tools and modelling functions will become available.
SPATIAL MODELS AND GIS
115
10.5 MICROSMULATION AND GIS 10.5.1 More Opportunities One development that is likely to have a profound impact on spatial modelling is the capabilty of GIS to organise and process efficiently spatially disaggregate data. Pre-GIS spatial models received their spatial dimension through a zonal system. This implied the assumption that all attributes of a zone are uniformly spatially distributed throughout the zone. Spatial interaction between zones was established via networks that are linked to centroids of the zones. Zone-based spatial models do not take account of topological relationships and ignore the fact that socio-economic activities and their impacts, e.g. environmental impacts, are continuous in space. The limitations of zonal systems have led to serious methodological difficulties such as the ‘modifiable areal unit problem’ (Openshaw, 1984; Fotheringham and Wong, 1991) and problems of spatial interpolation between incompatible zone systems (Flowerdew and Openshaw, 1987; Goodchild et al., 1993a; Fisher and Langford, 1995). For instance, most existing land use models lack the spatial resolution necessary to represent other environmental phenomena such as energy consumption or CO2 emissions. In particular emission-immission algorithms such as air dispersion, noise propagation and surface and ground water flows, but also micro climate analysis, require a much higher spatial resolution than large zones in which the internal distribution of activities and land uses is not known. Air distribution models typically work with raster data of emission sources and topographic features such as elevation and surface characteristics such as green space, built-up area, high-rise buildings and the like. Noise propagation models require spatially disaggregate data on emission sources, topography and sound barriers such as dams, walls or buildings as well as the threedimensional location of population. Surface and ground water flow models require spatially disaggregate data on river systems and geological information on ground water conditions. Micro climate analysis depends on small-scale mapping of green spaces and built-up areas and their features. In all four cases the information needed is configurational. This implies that not only the attributes of the components of the modelled system such as quantity or cost are of interest but also their physical micro location. This suggests a fundamentally new organisation of urban models based on a microscopic view of urban change processes (Wegener, 1998). This is where GIS come into play. A combination of raster and vector representations of spatial elements, as it is possible in GIS, might lead to spatially disaggregate models that are able to overcome the disadvantages of zonal models. Using spatial interpolation techniques, zonal data can be disaggregated from polygons to pixels to allow the calculation of micro-scale indicators such as population or employment density or air pollution (Bracken and Martin, 1988; Martin and Bracken, 1990; Bracken and Martin, 1994). The vector representation of transport networks allows the application of efficient network algorithms from aggregate transport models such as minimum path search, mode and route choice and equilibrium assignment. The combination of raster and vector representations facilitates activity-based microsimulation of both location and mobility in an integrated and consistent way.
116
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 10.1: Linking microsimulation and GIS
10.5.2 Linking Microsimulation and GIS Microsimulation models require disaggregate spatial data. Geographic information systems (GIS) offer data structures which efficiently link coordinate and attribute data. There is an implicit affinity between microanalytic methods of spatial research and the spatial representation of point data in GIS. Even where no micro data are available, GIS can be used to generate a probabilistic disaggregate spatial database. There are four fields in which GIS can support micro techniques of analysis and modelling (see Figure 10.1): • Storage of spatial data. There is a strong similarity between the storage of individual data required for microsimulation and the structure of point coverages of GIS. In an integrated system of microsimulation modules a GIS data base may therefore be efficient for analysis and modelling. • Generation of new data. GIS may be used to create new data for microsimulation that were not available before. These data can be derived using analytical tools of GIS such as overlay or buffering. • Disaggregation of data. Most data available for urban planning are aggregate zonal data. Microsimulation requires individual, spatially disaggregate data. If micro data are not available, GIS with appropriate microsimulation algorithms can generate a probabilistic disaggregate spatial database. A method for generating synthetic micro data is presented in the next section. • Visualisation. Microsimulation and GIS can be combined to display graphically input data and intermediate and final results as well as to visualise the spatial evolution of the urban system over time through animation.
SPATIAL MODELS AND GIS
117
10.5.3 Spatial Disaggregation of Zonal Data Spatial microsimulation models require the exact spatial location of the modelled activities, i.e. point addresses as input. However, most available data are spatially aggregate. To overcome this, raster cells or pixels are used as addresses for microsimulation. To disaggregate aggregate data spatially within a spatial unit such as an urban district or a census tract, the land use distribution within that zone is taken into consideration, i.e. it is assumed that there are areas of different density within the zone. The spatial disaggregation of zonal data therefore consists of two steps, the generation of a raster representation of land use and the allocation of the data to raster cells. Vector-based GIS record land use data as attributes of polygons. If the GIS software has no option for converting a polygon coverage into a raster representation, the following steps are performed (see Spiekermann and Wegener, 1995; Wegener and Spiekermann, 1996). First, the land use coverage and the coverage containing the zone borders are intersected to get land use polygons for each zone. Then the polygons are converted to raster representation by using a point-in-polygon algorithm for the centroid of each raster cell. As a result each cell has two attributes, the land use category and the zone number of its centroid. These cells represent the addresses for the disaggregation of zonal data and the subsequent microsimulation. The cell size to be selected depends on the required spatial resolution of the microsimulation and is limited by the memory and speed of the computer. The next step merges the land use data and zonal activity data such as population or employment. First for each activity to be disaggregated density-specific weights are assigned to each land use category. Then all cells are attributed with the weights of their land use category. Dividing the weight of a cell by the total of the weights of all cells of the zone gives the probability for that cell to be the address of one element of the zonal activity. Cumulating the weights over the cells of a zone yields a range of numbers associated with each cell. Using a random number generator, for each element of zonal activity, e.g. each household or work place, one cell is selected as its address. The result is a raster representation of the distribution of the activity within the zone that can be used as the spatial basis for a microsimulation. The combination of the raster representation of activities and the vector representation of the transport network provides a powerful data organisation for the joint microsimulation of land use, transport and environment. The raster representation of activities allows the calculation of micro-scale equity and sustainability indicators such as accessibility, air pollution, water quality, noise, micro climate and natural habitats, both for exogenous evaluation and for endogenous feedback into the residential construction and housing market submodels. The vector representation of the network allows the application of efficient network algorithms from aggregate transport models such as minimum path search, mode and route choice and equilibrium assignment. The link between the micro locations of activities in space and the transport network is established by automatic search routines finding the nearest access point to the network or nearest public transport stop. The combination of raster and vector representation in one model allows the application of the activity-based modelling philosophy to modelling both location and mobility in an integrated and consistent way. This vastly expands the range of policies that can be examined. For instance, it is possible to study the impacts of public-transport oriented land-use policies promoting low-rise, highdensity mixed-use areas with short distances and a large proportions of cycling and walking trips as well as new forms of collective travel such as bike-and-ride, kiss-and-ride, park-and-ride or various forms of vehicle-sharing.
118
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
10.6 CONCLUSIONS The main conclusion of this chapter is that the potential of GIS to offer new data organisations for spatial models represents the most promising challenge of GIS for spatial modelling. It should be the primary goal to identify approaches which explore the potential of GIS to facilitate new ways of applying existing models or to stimulate the development of new models. Under this perspective issues of data transformation from one data organisation to another (e.g. from polygon to raster) and the generation of synthetic micro data from aggregate data and integrated approaches combining different data models such as raster and vector deserve particular attention. In comparison, the question of what constitutes the right set of tools for spatial modelling within GIS seems to be of secondary priority. The new potential of GIS for spatial modelling needs to be explored much more thoroughly before a “canon” of typical spatial modelling operations can be defined. In addition there is always the chance that such a canon will become outdated as new problems come up and require new modelling approaches-modelling is not a routine activity that can be standardised. Similarly, the question of how GIS and spatial models should be connected seems to be premature as long as there is no canon of typical operations for spatial modelling. It is likely that for some time to come “loose coupling” will be the appropriate mode for research-oriented spatial modelling environments. From a more fundamental perspective the question may be criticised as being captive to a too restrictive concept of a GIS as a particular software package. After all, is not every spatial model a GIS in as much as it processes spatial information? From this point of view it does not make a difference whether the spatial model is embedded in the GIS or the GIS into the spatial model. There may be a time when GIS are no longer monolithic fortresses with tightly controlled import-export drawbridges but modular, open, interactive systems of functional routines and well documented file formats. Another convergence, which is also related to GIS, seems to be much more important. The growing complexity of environmental problems requires the use of integrated spatial information systems and models cutting across application fields and across the gap between the environmental and social sciences. Separate modelling efforts, with or without a GIS, are no longer sufficient. Joint efforts of computer scientists, landscape ecologists, hydrologists, planners and transport engineers should aim at the development of intelligent, highly integrated spatial information and modelling systems. These systems could play an important role not only in answering the questions of experts but also in educating and informing politicians, administrators and the general public. ACKNOWLEDGEMENT The author is indebted to the other members of the GISDATA task force “Spatial Models and GIS”, Ian Heywood, Ulrich Streit and Josef Strobl, for material from the position paper for the Specialist Meeting at Friiberghs Herrgård and to Klaus Spiekermann for reference to joint work on microsimulation and GIS. REFERENCES BERRY, J.K. 1995. What’s in a model? GIS World, 8(1), pp. 26–28. BRACKEN, I. and MARTIN, D. 1988. The generation of spatial population distributions from census centroid data, Environment and Planning A, 21, pp. 537–543.
SPATIAL MODELS AND GIS
119
BRACKEN, I. and MARTIN, D. 1994. Linkage of the 1981 and 1991 UK Censuses using surface modelling concepts, Environment and Planning A, 27, pp. 379–390. FEDRA, K. 1993. GIS and environmental modelling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 35–50. FISHER, P.P. and LANGFORD, M. 1995. Modelling the errors in areal interpolation between zonal systems by Monte Carlo simulation, Environment and Planning A, 27, pp. 211–224. FLOWERDEW, R. and OPENSHAW, S. 1987. A Review of the Problem of Transferring Data from One Set of Areal Units to Another Incompatible Set, NE.RRL Research Report 87/0. Newcastle: Centre for Urban and Regional Development Studies, University of Newcastle. FOTHERINGHAM, A.S. and WONG, D.W. S. 1991. The modifiable areal unit problem in multivariate statistical analysis, Environment and Planning A, 23, pp. 1025–1044. GOODCHILD, M.F, ANSELIN, L. and DEICHMANN, U. 1993a. A framework for the areal interpolation of socioeconomic data, Environment and Planning A, 25, pp. 383–397. GOODCHILD, M.F., PARKS, B.O. and STEYAERT, L.T. (Eds.) 1993b. Environmental Modelling with GIS. New York: Oxford University Press. HAINES-YOUNG, R., GREEN, D.R. and COUSINS, S.H. 1993. Landscape Ecology and GIS. London: Taylor & Francis. HEYWOOD, I., STREIT, U., STROBL, J. and WEGENER, M. 1995. Spatial models and GIS: new potential for new models? Presentation at the GISDATA Specialist Meeting on “GIS and Spatial Models: New Potential for New Models?”, Friiberghs Herrgård, 14–18 June 1995. HUNSAKER, C.T., NISBET, R.A., LAM, D.C.,BROWDER, J.A., BAKER, W.L., TURNER, M.G. and BOTKIN, D.B. 1993. Spatial models of ecological systems and processes: the role of GIS, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 248–264. JOHNSTON, C.A. 1993. Introduction to quantitative methods and modeling in community, population and landscape ecology, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp.281–283. LEE, T.J., PIELKE, R., KITTEL, T. and WEAVER, J. 1993. Atmospheric modeling and its spatial representation of land surface characteristics, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 108–122. MAGUIRE, D. 1995. Implementing spatial analysis and GIS applications for business and service planning, in Longley, P. and Clarke, C. (Eds.), GIS for Business and Service Planning. Cambridge: GeoInformation International, pp. 171–191. MAIDMENT, D.R. 1993. GIS and hydrological modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 147–167. MARTIN, D. and BRACKEN, I. 1990. Techniques for modelling population-related raster databases, Environment and Planning A, 23, pp. 1069–1075. MOORE, I.S., TURNER, A.K., WILSON, J.P., JENSON, S.K. and BAND, L.E. 1993. GIS and land-surface-subsurface process modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 196–230. NEMANI, R., RUNNING, S.W., BAND, L.E. and PETERSON, D.L. 1993. Regional hydroecological simulation system: an illustration of the integration of ecosystem models in GIS, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.) Environmental Modelling with GIS. New York: Oxford University Press, pp. 296–304. NYERGES, T.L. 1993. Understanding the scope of GIS: its relationship to environmental modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 75–93. OPENSHAW, S. 1984. The Modifiable Areal Unit Problem, Concepts and Techniques in Modern Geography 38. Norwich: Geo Books.
120
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
SCHIMEL, D.S. and BURKE, I.C. 1993. Spatial interactive atmosphere-ecosystem coupling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. New York: Oxford University Press, pp. 284–289. STEYAERT, L.T. 1993. A perspective on the state of environmental simulation modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. Oxford, New York: University Press, pp. 16–30. SPIEKERMANN, K. and WEGENER, M. 1995. Freedom from the tyranny of zones: toward new GIS-based spatial models, Presentation at the GISD AT A Specialist Meeting “GIS and Spatial Models: New Potential for New Models?”, Friiberghs Herrgård, 14–18 June 1995. WEGENER, M. 1998. Applied models of urban transport, land use and environment, in Batten, D., Kim, T.J., Lundqvist, L. and Mattson, L.G. (Eds.), Network Infrastructure and the Urban Environment: Recent Advances in Land-use/ Transportation Modelling. Berlin/Heidelberg/New York: Springer Verlag. WEGENER, M. and SPIEKERMANN, K. 1996. The potential of microsimulation for urban models, in Clarke, G. (Ed.) Microsimulation for Urban and Regional Policy Analysis, European Research in Regional Science, Vol. 6. London: Pion, pp. 146–163.
Chapter Eleven Progress in Spatial Decision Making Using Geographic Information Systems Timothy Nyerges
11.1 INTRODUCTION Where are we with progress in spatial decision making using geographic information systems (GIS)? If you agree with David Cowen who wrote “I conclude that a GIS is best defined as a decision support system involving the integration of spatially referenced data in a problem solving environment” (Cowen, 1988, p. 1554), then perhaps we have amassed a wealth of experience using GIS in a spatial decision making context. If you agree more with Paul Densham who later wrote “current GIS fall short of providing GIA [geographic information analysis] capabilities [for decision making support]” (Densham, 1991, p. 405), then perhaps you believe that GIS has not evolved enough to truly support decision making. However, if you agree with Robert Lake who warns “ultimately at issue is whether the integrative capacity of GIS technology proves robust enough to encompass not simply more data but fundamentally different categories that extend considerably beyond the ethical, political, and epistemological limitations of positivism” (Lake, 1993, p. 141); then perhaps GIS might be a decision support disbenefit. Whether we consider GIS to be a spatial decision support system (SDSS) and/or a disbenefit, if we assume that the core of a SDSS relies on GIS technology, then we can assert that progress has been made with spatial decision making if we measure progress in terms of technology (tool) development. From that statement we can safely conclude that research on tool development has received much more attention over the years than has study of the tool use. We need to focus more energies on studying the use of the spatial information technology to determine if progress has occurred. Good principles directed at design come from a good understanding of the principles of tool use. It is important to understand that a balance of research on tool development and tool use is probably the best approach to ensure progress. Such a position is advocated by Zachary (1988), DeSanctis and Gallupe (1987) and Benbasat and Nault (1990) working in a management information context on decision support. Tool development and use can be studied together effectively using a (recon)structurationist perspective (DeSanctis and Poole, 1994; Orlikowski, 1992) regardless of the empirical research strategies employed. Tools beget new uses and new uses beget new tools. Understanding the impacts is fundamental to measuring progress. Most industry watchers would agreed that GIS (and perhaps some SDSS) are used everyday throughout the world by individuals, groups, and organisations. Some progress in in research about GIS use is being made from an individual perspective (Grassland et al., 1995; Davies and Medyckyj-Scott, 1995), a group perspective (Nyerges, 1995a; Nyerges and Jankowski, 1994), and organisation perspective (de Man, 1988; Dickinson, 1990; Onsrud et al., 1992). However, the research literature still lacks theoretical and empirical contributions about GIS use in general (Nyerges, 1993), let alone on the narrower topic of “spatial decision
122
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
making”. One of the reasons for the lack of research is that almost all of the GIS/SDSS and collaborative spatial decision making research has focused on software development. As in the management and decision sciences where interest in decision support system (DSS) development in the 1970s preceded studies of use in the 1980s, perhaps the same ten year lag is expected for GIS/SDSS development in the 1990s and studies of use after 2000. An additional reason might be that empirical studies about GIS/SDSS use have few guidelines from which to draw. From the outset, our focus will be on “use” of GIS, and its offspring SDSS, for spatial decision making. However, it should be recognised that GIS development is an integral part of the progress. To address the progress with GIS tool development and use, the chapter proceeds as follows. In the next section a framework for spatial decision making described in terms of input, process and outcome, helps us uncover the complexity about tool use. The framework establishes a scope for many variables/issues. In the third section we assess the progress in spatial decision making using GIS/SDSS by examining relationships between issues/variables. The fourth section contains an overview of research strategies that might be useful for exploring the impacts of tool development on tool use. Finally, the fifth section presents generalisations about progress and directions for research. 11.2 THE SCOPE OF SPATIAL DECISION MAKING USING GIS A scope of relevant issues about “spatial decision making using GIS” is established by drawing from a theoretical framework called enhanced adaptive structuration theory (EAST). Original AST characterises the influence of advanced information technology use on organisational change (DeSanctis and Poole, 1994). Nyerges and Jankowski (1997) created EAST by expanding on the number and nature of issues treated in AST, while doing this in the context of decision aiding techniques for GIS/SDSS (Figure 11.1). Of course, an iterative process in decision making is very much possible, so input, process and outcome interaction in an actual decision context can become complicated rather quickly. To sort through this complexity, specific issues/variables are described in the following subsections. 11.2.1 Scoping Spatial Decision Making Inputs Three major constructs acting as inputs in a decision making process are the information technology, the decision context broadly defined, and the decision actors (Constructs A, B and C in Figure 11.1, respectively). Although each is treated in that order below, one should remember that they simultaneously influence the decision making process. 11.2.1.1 Spatial Information Technology for Decision Making One of the major inputs to spatial decision making using GIS is the nature of the GIS technology (Construct A in Figure 11.1). Just as DSS evolved from management information systems because of unmet needs for decision making (Sprague, 1980), SDSS evolved from GIS (Densham, 1991). At this point it is useful to compare and contrast GIS and SDSS. A SDSS has as its core the basic decision aids of a GIS, i.e., data management as an aid to extend human memory, graphics display as an aid to enhance visualisation, and basic spatial analysis to extend human computing performance. However, a SDSS also integrates other aids
SPATIAL DECISION MAKING USING GIS
123
Figure 11.1: EAST-based framework for characterising “spatial decision making using GIS”
such as simulation, optimisation, and/or multiple criteria decision models that support exploration of alternatives. For example, a bushfire simulation model has been linked to GIS to provide decision makers predictive power (Kessell, 1996). Facility optimisation models have been linked to GIS to site health care facilities (Armstrong et al., 1991). Multiple criteria decision (MCD) models have been linked to GIS creating SDSS for land planning (Carver, 1991; Eastman et al., 1993; Faber et al., 1996; Heywood et al., 1995; Jankowski, 1995; Janssen and Herwijnen, 1991). In the early 1990s some GIS/SDSS researchers (Armstrong and Densham, 1990) interested in systematic approaches to tool development looked towards frameworks developed outside of the spatial sciences, mostly in the decision and management sciences where management information systems were not addressing decision needs. For example, Sprague (1980) provided a taxonomy of DSS, classifying them based on the capabilities. Alter (1983) followed, broadening the classification based on what designers had to offer in relation to what kinds of problems needed to be solved—systems applied to business problems. A little later, Zachary (1986) deepened the question of what kinds of decision aid techniques were useful by examining the issue from a cognitive (human decision making) perspective. He articulated six classes of decision aid techniques: process models, choice models, information control (data management) techniques, analysis and reasoning methods, representation aids, and human judgement amplifying/refining techniques. Shortly after, Zachary (1988) elaborated on the need for such aids, persuasively arguing that decision aiding techniques extend the limits of human cognition—which is why they are important. Building upon those studies in differentiating capabilities, hence DSS, came Silver’s work (1990) on directed and non-directed developments in DSS. He felt that DSS should provide capabilities suggested by Zachary (1988), in a way
124
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
that provided systematic access to capabilities through the folowing: breadth of capabilities to address a broad base of tasks; depth of sophistication of capabilities-to meet the needs of a specific task in detail; and restrictiveness of capabilities to be used in some preset manner to protect users against themselves, for example as given in Construct A in Figure 11.1. Currently, GIS is evolving into group-based GIS (Armstrong, 1994; Faber et al., 1996), and SDSS into SDSS for groups (SDSS-G) (Nyerges, 1995a), both being used for collaborative spatial decision making (Densham et al., 1995). Just as the lineage of SDSS can be traced to DSS, there is a similarity between group-based GIS (including SDSS-G) and group DSS (GDSS). GDSS originally emphasised multicriteria decision making techniques (Thiriez and Zionts, 1976). Jarke (1986) provided a four dimension framework for GDSS that included: 1. 2. 3. 4.
spatial separation of decision makers—local or remote; temporal separation of decision makers—face-to-face meeting or mailing; commonality of goals—consensus or negotiation, and control of interaction—democratic or hierarchical.
Shortly thereafter, DeSanctis and Gallupe (1987) extended the cognitive aid work of Zachary (1986) and incorporated Jarke’s (1986) space-time framework to organise techniques into a GDSS framework. In recent developments, spatial understanding support systems (SUSS) have been proposed to address the lack of free-form dialogue exploration capabilities in SDSS (Couclelis and Monmonier, 1995). It is only logical that SUSS be combined with SDSS to form spatial understanding and decision support systems (SUDSS), and SUDSS for groups (SUDSS-G). 11.2.1.2 Structural Issues in Decision Making as Inputs A second major concern as part of input involves decision making situations, or in the terms of DeSanctis and Poole (1994) “other structures” (Construct B). There are several issues other than technology that structure a decision situation. Decision motivation is one issue that continually influences the decision process. Although much of the early work on decision making was couched in terms of a rational process stemming from economic motivation (Simon, 1960), there are other considerations such as social wellbeing and sustainability of an organisation (Zey, 1992). Sorting through these motivations and values is rather important to understanding why decision results turn out differently than expected based on surface (face-value) information (Keeney, 1992). Another important concern is the character of the problem task. Included as characteristics of a task are the goal, content, and complexity of the task. With regards to differentiating task structure, Simon (1960, 1982) recognised three basic tasks in individual decision making—intelligence, evaluation, and choice. In a planning context, Rittel and Webber (1973) described “wicked problems” as among the most complex problem tasks because goals and content are not clear. Mason and Mitroff (1981) went on to label “wicked” problems as “ill-structured” or “semi-structured” problems, such that tasks are more akin to “problem finding” than to “solution finding” (Crossland et al., 1995). In contrast, “well-structured” problems are those that can be defined well enough such that a right answer is computable and knowable. Focusing on small groups, McGrath (1984) performed a comprehensive review of group activity literature to synthesise a task circumplex composed of eight types of tasks—generating ideas, generating plans, problem solving with correct answers, deciding issues with preference, resolving conflicts of viewpoint, resolving conflicts of
SPATIAL DECISION MAKING USING GIS
125
interest, resolving conflicts of power, and executing performance tasks. Each of these differs in terms of goal, content, and/or complexity in specific decision making situations. Another aspect of decision situation is organisational control. If democratic control underlies decision making, versus hierarchical autocratic control, the process and outcome could be different. Such has been the case in the small-group research literature (Counselman, 1991), thus one might expect this effect in spatial decision making. Last to be treated here, but certainly not least, is a fundamental issue about “who benefits” from the decision making? In the management information systems literature (Money et al., 1988), benefits of decision support systems accrue to one or more of three groups: those at the managerial level, operational level, and personal level. In that study most benefits were gained at the personal level, rather than the other two. However, the system design was focused on individuals as a unit, rather than other units such as partnerships, groups, organisations, or community as taken up in the next section. 11.2.1.3 Decision A ctor Unit A third construct (Construct C in Figure 11.1) used as input to the decision making process deals with who is using the spatial information technology. Decision actor unit could be individual, partnership, group, organisation and/or community. In small group research, Mantei (1989) recognises the influence that all of these levels might have on a small group, as well as the reverse. She called the small-group the “standard level”. What is likely to be the best level of research investigation for GIS/SDSS? The opportunistic answer is probably all of them, so that insight is gained from multiple perspectives (Nyerges and Jankowski, 1997). It is likely we may find that decision support capabilities at one level of decision actor unit might not work at all for another level. Thus, size of the decision unit is a fundamental concern. Another aspect of this input is the knowledge and belief experience of a decision actor unit (i.e., individual, partnership, group, organisation, or community) in terms of what is known about the topic and/ or what aspects of the topic are valued more dearly than others. Knowledge about the problem and knowledge about techniques used to solve problems and making decisions are important aspects of what constitute the “personality” of the decision actor unit. 11.2.2 Spatial Decision Making as a Process of Interaction The three input Constructs (A, B and C) influence spatial decision making processes (Constructs D and E) in various ways. Decision making processes in a human context and information technology have often been described in terms of “interaction”. For individuals it is human-computer interaction. For groups it is human-computer-human interaction. For organisations it might be human-system interaction. A goal in GIS is to make the computer as transparent as possible, focusing on human-problem interaction (Mark and Gould, 1991). 11.2.2.1 Structure Appropriation Individuals and/or groups appropriate structures into the decision process (Construct D). Appropriating a structure is the act of invoking the structure, not necessarily the act of using it all the time. The major
126
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
structures are the technology (Construct A) and organisation guidelines (Construct B) that get the process started. Appropriation of structures is only a first part of various stages in the process; they may in fact indicate the transition from stage to stage. When using a GIS during the decision making, people appropriate various decision aids provided to them from the technology menu. Certain decision aids to be appropriated might be used because the decision makers are familiar with how information is treated. In other cases some decision aids might get appropriated as a novelty, and not be used again. Still others such as maps might be appropriated in ways that are creative, but different from what designers had intended, called ironic appropriations (Contractor and Seibold, 1993). 11.2.2.2 Dynamics of Decision Processes Task analyses are performed commonly to understand better the flow of decision phases (Construct E), the result of which is a task model (Nyerges, 1993). Armstrong and Densham (1995) present a task analysis of a facility location decision making process, whereby location-allocation is the underlying analytical mechanism to establish best sites. Nyerges and Jankowski (1994) performed a task analysis to create a task model of habitat site selection based on technical/political preferences, and generalised the process into six phases: 1. 2. 3. 4. 5. 6.
develop a list of objectives as part of problem scoping; develop feasible alternatives as a problem definition; identify criteria for measuring the degree to which objectives meet alternatives; specify criteria weights as a starting preference for the public-private choice process; apply criteria weights to database values, aggregating scores to compute alternative rank; negotiate selection of the best alternatives as part of a community-political process.
Each of these phases is part of McGrath’s (1984) circumplex of task activities for groups, and are thought to be general enough to apply to all preference-based site selection decision processes. This assumption is being investigated as phases in the dynamics of SDSS decision processes (Nyerges and Jankowski, 1994). 11.2.2.3 Emergent Structures When decision aids are applied continually for specific purposes, they are considered to be “emergent structures” of information (Construct F), for example, information structures based on diagrams, models or maps. Emergent structures are likely to direct the thinking patterns of individuals and/or groups. 11.2.3 Decision Making Outcome A third part of the framework concerns outcome. Outcomes from decision making consist of decision outcomes and social structure outcomes. Decision Outcomes: decision outcomes (Construct G) consist of the amount of energy, or cognitive effort, it takes to make the decision, the accuracy of the decision as a measure of decision quality, the
SPATIAL DECISION MAKING USING GIS
127
satisfaction with the decision in terms of comfort, the commitment to the decision, and/or the consensus opinion if in a group setting. Social Structure Outcomes: when in group decision settings, new social structures may develop from decision interaction processes—individuals to organisation; individuals to group, group to group, and/or group to organisation (Construct H). Social structure outcomes consist of adopting new rules for using information, and new interpersonal contacts and ways to support decision making. 11.3 ASSESSING PROGRESS: SPATIAL DECISION MAKING DURING GIS/SDSS USE The issues/variables in the EAST framework (Figure 11.1) are each important in themselves, but to assess progress it is important to know how the variables affect each other. Each of the construct boxes in Figure 11.1 depicts a major issue describing spatial decision making with regard to input, process and outcomes. Premises (P1-P7) connecting boxes motivate potential research questions. The research questions incorporate variables from two constructs at opposite ends of the premises. In this section, we address the progress on investigating these relationships, each premise being taken in turn with emphasis on “spatial aspects” of decision making. Premise 1: Decision aid technology has an influence on decision aid moves. A decision aid move is the initial process of invoking the aid—not the entire process of using it. One fundamental question is: “Are the counts of aid moves for different types of maps likely different because of the advantages (or disadvantages) of information associated with each?” Grassland et al. (1995) describe several types of displays for individual decision making, but the number of times each was used was not reported. With regard to group decision making, Armstrong et al. (1992) identify several kinds of map displays for facility location problems, and Jankowski et al. (1997) identify several for habitat site selection. Armstrong et al. (1992) note that certain kinds of cartographic displays are more appropriate for certain stages of the decision process, and describe them, but counts of moves were not collected in the study to determine relative frequency of moves. Ramasubramanian (Chapter 8, this volume) and Nyerges et al. (1997) describe the influence that geographic information technologies could have on group participation in community decision making. Developments of GIS in this context lead to what has been called public participation GIS (PPGIS). Ramasubramanian (Chapter 8, this volume) describe the use of PPGIS for social service decision making in Milwaukee. Nyerges et al. (1997) describe the use of GIS in three scenarios—urban land rehabilitation, neighbourhood crime watch, forest conservation planning—to synthesise a preliminary, general set of requirements for PPGIS. Premise 2. Appropriation of decision aids varies with alternative sources of structuring. How does task complexity influence spatial decision making? Crossland et al. (1995) report on the relationship between task complexity (low and high) and decision outcome, but did not examine the decision process that links complexity and outcome. Nyerges and Jankowski (1994) are coding data about low and high task complexity as given by the number of sites to be addressed (eight and 20 respectively), but have yet to complete the analysis. McMaster (Chapter 35, this volume) describes how GIS data quality, expressed in terms of locational resolution and attribute accuracy, can impact forest management decision making. Premise 3. Decision aid appropriations will vary depending on the actor’s character. An individual’s knowledge about tools and problems has an influence on use of decision aid tools (Barfield and Robless, 1989; Nyerges, 1993), and we should expect that a group’s technical knowledge and experience will have
128
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
an effect on the appropriation of decision aids. No statistical evidence exists on users’ background in relation to counts of spatial decision aid moves. Premise 4. Decision aid moves have an influence on decision processes. DeSanctis and Poole (1994) view decision making as a process of social interaction, and we suspect the same. What kinds of decision aids influence problem exploration, evaluation, and consensus? Do problem exploration aids facilitate learning, evaluation aids facilitate idea differentiation, and consensus aids facilitate idea integration? What aspects of tables, diagrams, maps, and multicriteria decision making models bring a group to a consensus? No research reports provide statistical evidence for the kind of decision aid moves that occur, but Armstrong et al (1991) report that different maps are used as per the design of the system. Premise 5. New sources of structure emerge during the technology, task, and decision process mix. How diagrams, maps, and/or models emerge during the human-computer interaction has not been studied. Are maps emerging more than tables as a decision process evolves? Are there differences in the emergent sources of structure between face-to-face and space-time distributed meetings? Armstrong et al. (1992) describe the advantages of spider maps, demand maps, and supply maps for facility location decision problems. Grassland et al. (1995) describe several kinds of maps for site selection. Premise 6. Decision processes influence decision outcomes. Decision outcomes have been shown to be influenced by several characteristics (Contractor and Seibold, 1993). When characteristics such as the decision aids available, other sources of social structure (e.g. task guidance), faithful moves/use of decision aids, and decision venue that fit the task assignment, are positive, is it likely that GIS use will result in more satisfactory outcomes? Grassland et al. (1995) report finding that unequivocal evidence exists in favour of addition of GIS technology to a site selection decision task in terms of reduced time and increased accuracy for individual decision makers. Further, their findings about use of GIS indicated favourable results for both the low complexity and high complexity tasks. However, Todd and Benbasat (1994) suggest that cognitive effort may in fact be a more important variable to study than decision quality (accuracy) and efficiency (time). Premise 7. New social structures emerge during the decision process. Interpersonal relationships between and among individuals evolve as they work through problem solving and decision making tasks. Since people come together in dyads, groups or organisations as part of the decision process, they establish new working relationships or reconstruct old ones. New rules for technology use develop as a result of information differentiation and/or integration. To date, social structure adoption has received only scant investigation through GIS implementation studies (Pinto and Azad, 1994). In only a very few cases do we have evidence about issue/variable relationships to help us better understand the impacts of GIS/SDSS use on spatial decision making. Without knowing about the relationships it is rather difficult to comment on progress. The above relationships are only a sample of those that are encouraged by examination of the premises in the context of an EAST-based conceptual framework (Nyerges and Jankowski, 1997). 11.4 RESEARCH STRATEGIES FOR EMPIRICAL STUDIES Several research strategies are available as a guideline for empirical studies that can address the premises in the previous section. Among these are: 1. usability tests with emphasis on evolving the tools (Davies and Medyckyj-Scott, 1994);
SPATIAL DECISION MAKING USING GIS
129
2. laboratory experiments with emphasis on controlled treatments for observation of individuals (Grassland et al., 1995; Gould, 1995), and small groups (Nyerges and Jankowski, 1994); 3. quasi-experiment (natural experiment) with natural control of treatments (Blackburn, 1987); 4. field studies using questionnaires (Davies and Medyckyj-Scott, 1995), interview (Onsrud et al., 1992), and videotape (Davies and Medyckyj-Scott, 1995), and 5. field experiments as a cross between a lab experiment and field study (Zmud et al., 1989). Usability studies examine GIS users’ impressions and performance on tasks identified by system designers (Davies and Medyckyj-Scott, 1994). The goal of usability studies is usually to evolve the tool to a next state of usefulness. Questionnaires can be administered to find out recollections of users’ concerns with difficult to use system capabilities. Video cameras can collect “talk aloud” data about users’ concerns with invoking capabilities. Laboratory experiments make use of controlled treatments for observation. They provide a high degree of internal validity, i.e. consistency between the variables that are being manipulated. The validity is accomplished through setting tasks by the research design such that the treatments (as variables to be collected) can be compared with each other as independent and dependent variables. Grassland et al. (1995) made use of GIS-noGIS and low-high task complexity treatments in their study of decision making. They monitored the time duration and accuracy of the decision results for a site selection problem using keystroke logging. Gould (1995) made use of different user interface options in his experiments, also collecting data through keystroke logging. Nyerges (1995b) reports on interaction coding systems prepared for coding data from videotapes recorded in SDSS group experiments. Coding keystroke and video data is a challenge to summarise the character of use (Sanderson and Fisher, 1994). Quasi-experiments, also called natural experiments (Blackburn, 1987), set up treatments through natural task assignments. A natural task is one that the researcher has no control over, undertaken as a normal process of work activity. Data can be collected using questionnaires, interviews, videotaping, and/or keystroke logging. The less intrusive the data collection, the more natural the data collection, e.g. videotaping. Field studies make use of natural decision making environments, employing participant observation to collect data. Field studies establish a high degree of external validity, i.e., the results are likely to apply to similar realistic situations (Zmud et al., 1989). Onsrud et al. (1992) describe case studies drawing heavily from Lee (1989). Dickinson (1990) and Davies and Medyckyj-Scott (1994) describe survey instruments to collect data about GIS use. Davies and Medyckyj-Scott (1995) describe videotape data capture about GIS use. Field experiments are a cross between a field study and a laboratory experiment. They are among the most difficult research designs to implement because of the conflict between natural setting and controlling for treatments. Given the emphasis in laboratory experiments to establish internal validity, and field studies to establish external validity, it is no wonder they are among the most fruitful research designs (Zmud et al., 1989). Keystroke logging, videotapes, questionnaires and interviews are useful techniques, but again the less intrusive the technique the better the data collection. 11.5 CONCLUSIONS—ARE WE THERE YET? Like a young child on an extended journey, we ask “are we there yet?” If not, the next question is usually “how much farther (in this case further) do we have to go?” In comparison to tool development, we have
130
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
just started with investigations of tool use. Establishing an agenda to study both tool development and tool use is an ambitious task. The EAST framework (Nyerges and Jankowski, 1997) motivates many relationships to examine. Which ones are the most important to pursue? To answer this we could construct a matrix where all variables are listed down the left side, and all variables listed across the top. The left side list could be interpreted as independent variables, and along the top would be the dependent list. The cells would represent relationships to be examined. Naturally, all possible relationships would not be meaningful, at least directly. The diagonal would not be considered, although there are probably some variables such as knowledge experience that affects itself, perhaps through metacognition. As variables are listed in sequence in the matrix, the eight construct boxes in Figure 11.1 could be made evident. Which cells have and have not been treated indicates the status of progress. Of course some of the relationships are more valuable than others. Capturing the appropriation of decision aids in relation to decision phases might be the most important endeavour. We would then know what aids relate to what phases. Determining whether cognitive effort or decision quality is more important under various treatments would be another significant pursuit, relating to research in the decision sciences. Technologies to support various tasks in a space-time distributed setting are also among significant issues. Such technologies expand the time and reduce the space that have been the underlying constraints in making use of decision aids. With the advent of fully functional desktop GIS now in existence, it is likely that use of new problem understanding and decision support aids will foster new social structures in public meetings. Characterising the variables involved in the above relationships will be a challenge to those who are more comfortable with tool building or tool use, and not both. Some relationships may require a different research strategy to articulate than will others, making some studies more difficult, but again, useful entries in the cells. Sorting these issues out is the challenge. Only then can we determine “are we there yet?”, and if not “how far do we need to go with spatial decision making using GIS?” REFERENCES ALTER, S.L. 1983. A taxonomy of decision support systems, in House, W.C. (Ed.) Decision Support Systems . New York: Petrocelli, pp. 33–56. ARMSTRONG, M..P. 1994. Requirements for the development of GIS-based group decision support systems, Journal of the American Society of Information Science, 45(9), pp. 669– 677. ARMSTRONG, M.P., and DENSHAM P.J. 1990. Database organisation alternatives for spatial decision support systems, International Journal of Geographical Information Systems. 4, pp. 3–20. ARMSTRONG, M..P. and DENSHAM, P.J. 1995. A conceptual framework for improving human-computer interaction in locational decision-making, in Nyerges, T., Mark, D.M., Laurini, R. and Egenhofer, M.(Eds.), Cognitive Aspects of HCI for GIS, Proceedings of the NATO ARW, Mallorca, Spain, 21–25 March 1994. Dordrecht: Kluwer, pp. 343–354. ARMSTRONG, M..P., RUSHTON, G., HONEY, R., DALZIEL, B., LOLONIS, P. and DENSHAM, M..P. 1991. Decision support for regionalization: a spatial decision support system for regionalizing service delivery systems, Computers, Environment and Urban Systems, 15, pp. 37–53. ARMSTRONG, M..P., DENSHAM, P.J., LOLONIS, P., and RUSHTON, G. 1992. Cartographic displays to support location decision making, Cartography and Geographic Information Systems, 19, pp. 154–164. BARFIELD, W. and ROBLESS, R. 1989. The effects of two-and three-dimensional graphics on problem solving performance of experienced and novice decision makers, Behaviour and Information Technology, 8(5), pp. 369–85. BENBASAT, I., NAULT, B.R. 1990. An evaluation of empirical research in managerial support systems, Decision Support Systems, 6, pp. 203–226.
SPATIAL DECISION MAKING USING GIS
131
BLACKBURN, R.S. 1987. Experimental design in organizational settings, in Lorsch, J. (Ed.), Handbook of Organizational Behavior. Englewood Cliffs, NJ: Prentice Hall, pp. 126–39. CARVER, S. 1991. Integrating multi-criteria evaluation with geographical information systems, International Journal of Geographical Information Systems, 5(3), pp. 321–339. CONTRACTOR, N.S. and SEIBOLD, D.R. 1993. Theoretical frameworks for the study of structuring processes in group decision support systems, Human Communication Research, 19(4), pp. 528–563. COUCLELIS, H., and MONMONIER, M. 1995. Using SUSS to resolve NIMBY: how spatial understanding support systems can help with “Not in My Backyard” Syndrome, Geographical Systems, 2, pp. 83–101. COUNSELMAN, E.F. 1991. Leadership in a long-term leaderless women’s group, Small Group Research, 22(2), pp. 240–257. COWEN, D.J. 1988. GIS versus CAD versus DBMS: what are the differences? Photogrammetric Engineering & Remote Sensing, 54(11), pp. 1551–1555. CROSSLAND, M.D., WYNNE, B.E. and PERKINS, W.C. 1995. Spatial decision support systems: an overview of the technology and a test of efficacy, Decision Support Systems, 14, pp. 219–235. DAVIES, C. and MEDYCKYJ-SCOTT, D. 1994. GIS usability: recommendations based on the user’s view, International Journal of Geographical Information Systems 8(2), pp. 175– 189. DAVIES, C. and MEDYCKYJ-SCOTT, D. 1995. Feet on the ground, studying user-GIS interaction in the workplace, in Nyerges, T., Mark, D.M., Laurini, R. and Egenhofer, M. (Eds.) Cognitive Aspects of Human-Computer Interaction for Geographic Information Systems, Proceedings of the NATOARW, Mallorca, Spain, 21–25 March 1994. Dordrecht: Kluwer,pp. 123–141. DE MAN, W.H..E. 1988. Establishing a geographic information system in relation to its use: a process of strategic choices , International Journal of Geographical Information Systems, 2(3), pp. 245–261. DENSHAM, P.J. 1991. Spatial decision support systems, in Maguire, D.J., Goodchild, M.F., Rhind, D.W. (Eds.), Geographical Information Systems: Principles and Applications. New York: John Wiley & Sons. DENSHAM, P., ARMSTRONG, M. and KEMP, K. 1995. Collaborative Spatial Decision Making: Scientific Report for the I-1 7 Specialist Meeting, National Center for Geographic Information and Analysis, TR 95–14. Santa Barbara CA: NCGIA. DeSANCTIS, G. and GALLUPE R.B. 1987. A foundation for the study of group decision support systems, Management Science, 33, pp. 589–609. DeSANCTIS, G. and POOLE, M..S. 1994. Capturing the complexity in advanced technology use: adaptive structuration theory, Organization Science, 5(2), pp. 121–147. DICKINSON, H. 1990. Deriving a method for evaluating the use of geographic information in decision making, Ph.D. dissertation, State University of New York at Buffalo, National Center for Geographic Information and Analysis (NCGIA) Technical Report 90–3. EASTMAN, J.R., KYEM, P.A.K., TOLEDANO, J., JIN, W. 1993. GIS and Decision Making Explorations in Geographic Information Systems Technology, Volume 4. Geneva: UNITAR. EASTMAN, J.R., WEIGEN, J., KYEM, P.A. K., and TOLEDANO, J. 1995. Raster procedures for multicriteria/multiobjective decisions, Photogrammetric Engineering and Remote Sensing, 61(5), pp. 539–547. FABER, B.G., WALLACE, W.W. and MILLER, R..M. P. 1996. Collaborative modeling for environmental decision making, Proceedings of GIS’96 Symposium, Vancouver, British Columbia. Fort Collins: GIS World Books, pp. 187–198. GOULD, M.D. 1995. Protocol analysis for cross-cultural GIS design: the importance of encoding resolution, in Nyerges, T., Mark, DM, Laurini, R. and Egenhofer, M. (Eds.) Cognitive Aspects of Human-Computer Interaction for Geographic Information Systems, Proceedings of the NATO ARW, Mallorca, Spain, 21–25 March 1994, Dordrecht: Kluwer, pp. 267–284. HEYWOOD, D.I., OLIVER, J., and TOMLINSON, S. 1995. Building an exploratory multi-criteria modelling environment for spatial decision support, in Fisher, P.(Ed.) Innovations in GIS 2. London: Taylor & Francis, pp. 127–136.
132
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
JANKOWSKI, P. 1995. Integrating geographical information systems and multicriteria decision-making methods, International Journal of Geographical Information Systems, 9(3), 251–273. JANKOWSKI, P., NYERGES, T.L., SMITH, A., MOORE, T.I, and HORVATH E. 1997. Spatial Group Choice: A SDSS tool for collaborative spatial decision making, International Journal of Geographical Information Science, 11(6), pp. 577–602. JANSSEN, R. and VAN HERWIJNEN, M. 1991. Graphical decision support applied to decisions changing the use of agricultural land, in Korhonen, P., Lewandowski A. and Wallenius, J. (Eds.), Multiple Criteria Decision Support, Proceedings of the International Workshop, Helsinki, Finland, 7–11 August 1989. Berlin: Springer-Verlag, pp. 78–87. JARKE, M. 1986. Knowledge sharing and negotiation support in multiperson decision support systems, Decision Support Systems, 2, pp. 93–102. KEENEY, R.L. 1992. Value-Focused Thinking: A Path to Creative Decision-making. Cambridge, MA: Harvard University Press. KESSELL, S.R. 1996. The integration of empirical modeling, dynamic process modeling, visualization, and GIS for bushfire decision support in Australia, GIS and Environmental Modeling: Progress and Research Issues. Fort Collins: GIS World Books, pp. 367–371. LAKE, R.W. 1993) Planning and Applied Geography: Positivism, Ethics and Geographic Information Systems, Progress in Human Geography, 17(3), pp. 404–413. LEE, A.S. 1989. A scientific methodology for MIS case studies, MIS Quarterly, March, pp. 33–50. MANTEI, M.M. 1989. A discussion of small group research in information systems: theory and method, in Benbasat I. (Ed.), The Information Systems Research Challenge: Experimental Research Methods. Boston: Harvard Business School, Volume 2, pp. 89–94. MARK, D.M. and GOULD, M..D. 1991. Interacting with geographic information: a commentary, Photogrammetric Engineering and Remote Sensing, 57(11), pp. 1427–1430. MASON, R..O. and MITROFF, I. 1981. Challenging Strategic Planning Assumptions. New York: John Wiley & Sons. McGRATH, J.E. 1984. Groups: Interaction and Performance. Englewood Cliffs, NJ: Prentice-Hall. MONEY, A., TROMP, D. and WEGNER, T. 1988. The quantification of decision support within the context of value analysis, MIS Quarterly, 12(2), pp. 12–20. NYERGES, T.L. 1993. How do people use geographical information systems?, in Medyckyj-Scott D. and Hearnshaw, H.(Eds.), Human Factors for Geographical Information Systems. New York: John Wiley & Sons, pp. 37–50. NYERGES, T.L. 1995a. Cognitive task performance using a spatial decision support system for groups, in Nyerges, T., Mark, D.M., Laurini, R and Egenhofer, M.(Eds.), Cognitive Aspects of Human-Computer Interaction for Geographic Information Systems, Proceedings of the NATO ARW, Mallorca, Spain, 21–25 March 1994, Dordrecht: Kluwer, pp. 311–323. NYERGES, T.L. 1995b. Interaction coding systems for studying collaborative spatial decision making, in Densham, P., Armstrong, M. and Kemp, K. (Eds.), Collaborative Spatial Decision Making, Technical Report 95–14. Santa Barbara, CA: NCGIA NYERGES, T.L. and JANKOWSKI, P. 1994. Collaborative Spatial Decision Making Using Geographic Information Technologies and Multicriteria Decision Models, research funded by the National Science Foundation, Geography and Regional Program, SBR-9411021. NYERGES, T.L. and JANKOWSKI, P. 1997. Enhanced adaptive structuration theory: a theory of GIS-supported collaborative decision making, Geographical Systems, 4(3), pp. 225–259. NYERGES, T.L., BARNDT, M., BROOKS, K., 1997. Public participation geographic information systems, Proceedings, AutoCarto 13, Seattle, WA, 7–10 April., Bethesda, MD: American Society for Photogrammetry and Remote Sensing, pp. 224–233. ONSRUD, H.J., PINTO, J.K., and AZAD, B. 1992. Case study research methods for geographic information systems, URISA Journal, 4(1), pp. 32–44. ORLIKOWSKI, W.J. 1992. The duality of technology: rethinking the concept of technology in organizations, Organization Science, 3(3), pp. 398–427.
SPATIAL DECISION MAKING USING GIS
133
PINTO, J.K. and AZAD, B. 1994. The role of organizational politics in GIS implementation. Journal of Urban and Regional Information Systems Association, 6(2), pp. 35–61. RITTEL, H.W J. and WEBBER, M.M. 1973. Dilemmas in a general theory of planning. Policy Sciences, 4, pp. 155–169. SANDERSON, P.M. and FISHER, C. 1994. Exploratory sequential data analysis: foundations, Human-Computer Interaction, 9, pp. 251–317. SILVER, M.S. 1990. Decision support systems: directed and non-directed change, Information Systems Research, 1(1), pp. 47–70. SIMON, H.A. 1960. The New Science of Management Decision, New York: Harper & Row. SIMON, H.A. 1982. Models of Bounded Rationality, Cambridge, MA: MIT Press. SPRAGUE, R.H. 1980. A framework for the development of decision support systems. Management Information Systems Quarterly, 4, pp. 1–26. THIRIEZ, H. and ZIONTS, S. (Ed.) 1976. Multiple Criteria Decision Making. Berlin: Springer-Verlag. TODD, P. and BENBASAT, I. 1994. The influence of decision aids on choice strategies: an experimental analysis of the role of cognitive effort, Organizational Behavior and Human Decision Processes, 60, pp. 36–74. ZACHARY, W.W. 1986. A cognitively based functional taxonomy of decision support techniques, Human-Computer Interaction, 2, pp. 25–63. ZACHARY, W.W. 1988. Decision support systems: designing to extend the cognitive limits, in Helander, M. (Ed.), Handbook of Human-Computer Interaction. Amsterdam: Elsevier Science Publishers, pp. 997–1030. ZEY, M. (Ed.) 1992. Decision Making: Alternatives to Rational Choice Models. Newbury Park, CA: Sage. ZMUD, R.W., OLSON, M.H. and HAUSER, R 1989. Field experimentation in MIS research, Harvard Business School Research Colloquium. Boston, Harvard Business School, volume 2, pp. 97–111.
Chapter Twelve GIS and Health: from Spatial Analysis to Spatial Decision Support Anthony Gatrell
12.1 INTRODUCTION There can be few areas of human inquiry that require more of a multidisciplinary perspective than that of health. Understanding the nature and incidence of disease and ill-health, the demands made upon health care systems, and how best to shape a configuration of health services, necessitates insights, approaches and tools drawn from disciplines that straddle the natural, social and management sciences. Given that ill-health is suffered by people living in particular localities, that an understanding of this requires knowledge of global and local environmental and social contexts, and that healthcare resources have to be located somewhere, it is not surprising that a spatial perspective on all these issues usually proves fruitful. As a consequence, GIS as both science and technology can inform our understanding of health problems, policy and practice—as I hope to show. My contribution emerges from a long-standing interest in the geography of health, and more specifically from having convened jointly a GISDATA workshop in this field (Gatrell and Löytönen, 1998) that has yielded insights from several researchers working in different disciplines and countries. “Health” is, of course, notoriously difficult to define! As a result, attention tends to be focused on the incidence of illness and disease in the community. Traditionally, and as we see below, emphasis has been placed more within the GIS community on the use of easily-quantified health data; so we find studies using mortality data, or perhaps morbidity data such as that from cancer registries. Many of these data sets are address-based, carrying with them a postal code that permits fine-grained spatial analysis. But the existence of such data sets should not blind us to serious issues of data quality and uncertainty, both for spatial and attribute data. These issues are touched upon later, but clearly they are a prime example of concerns dealt with in other GISDATA meetings. Notwithstanding the difficulties of defining attributes such as health, illness and disease, there are further difficulties in “entitation”, concerning the spatial objects to which such attributes may be attached. In some areas of health research, notably healthcare planning, this may be relatively unproblematic. For example, in defining a set of facilities that can deal with accidents and emergencies there may be clear criteria for inclusion—though the set of such facilities will of course change over time, requiring the spatial database to be updated. In epidemiology, the study of disease incidence, entitation might be thought to be straightforward, assuming we have postcode addresses of, say, adults suffering from throat cancer. But the existence of such data, and their physical representation as a spatial point pattern (Gatrell and Bailey, 1996, p 850) should not blind us to the fact that the individual point “events” are a very crude and imperfect representation of the lived worlds of the victims. Again, this point is developed below.
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
135
It should be clear that I follow others (for example, Jones and Moon, 1987; Thomas, 1992) in making a distinction in the geography of health between a concern with epidemiology and a concern with healthcare planning. As a result, in what follows I shall seek first to outline the ways in which GIS can contribute to an understanding of disease and ill-health. In so doing I shall identify those areas in which more research is needed. I follow this with a discussion of GIS in healthcare planning. But I wish to make the point that the two traditions of the geography of health come together in at least one important way; quite simply, if we can identify areas with significant health “needs” then it behoves healthcare planners to set in motion projects designed to address those needs and improve the status quo. In reviewing a range of possible scenarios— where to invest or disinvest in healthcare, for example—planners are led to consider the usefulness of spatial decision support systems (SDSS). The prospects for SDSS in health research are therefore discussed, before reaching some general conclusions about the status of GIS and health. 12.2 GIS AND EPIDEMIOLOGY Given that epidemiology is concerned with describing and explaining the incidence of disease, it follows that spatial or geographical epidemiology requires methods that will provide good descriptions of the spatial incidence of disease, together with methods that offer the prospect of modelling such incidence. I shall, therefore, consider briefly methods for visualising, exploring, and modelling the incidence of disease. This threefold division of analytical labour follows the classification adopted in Bailey and Gatrell (1995), where extended discussion of these and other methods may be found. Such a stance suggests immediately that I am adopting a “spatial analysis” view of GIS, one which many leading figures in GIS research (notably Goodchild, Burrough, Openshaw and Unwin) have endorsed. These, and other authors (for example, Anselin and Getis, 1993; Bailey, 1994; Goodchild et al, 1992) have bemoaned the lack of a spatial analysis functionality in GIS, a shortcoming which raises particularly serious issues in epidemiology. Can we interrogate the spatial database for meaningful information (as opposed to simple queries), asking, for example, not simply how many cases of childhood asthma lie within buffer zones placed around main roads, but whether this is “unusual” or statistically significant? Such important questions demand a spatial analysis functionality, something which is now appearing in commercial and research-based products such as SAS-GIS, S-Plus for ARC/INFO, REGARD, and LISPSTAT. Pedagogic material (e.g. INFO-MAP; Bailey and Gatrell, 1995) is also available. A key requirement is the ability to query the data in one window and to see the results of such queries appear in other windows. For example, we might want to see a choropleth map of incidence rates in one window, the results of spatial smoothing of this map in another, a graph relating rates to data on air quality in a third, and a tabulation of data in the form of a spreadsheet in a fourth. These windows need to be linked, so that selection of objects in one causes them to be highlighted in others (Brunsdon and Charlton, 1995). 12.2.1 Visualisation of epidemiological data Assuming we have a set of point objects representing disease incidence among a set of individuals we can map these as a point pattern, though this is singularly uninformative unless an attempt is made to control for underlying population distribution. Often we do not have access to point data and instead are provided only with data for a system of areal units, such as census tracts or other administrative units. Such data include disease incidence rates, age-standardised to control for variations in age structure. There are several issues
136
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
concerned with visualisation of such choropleth maps and how the reader extracts meaning from these. For example, researchers such as Brewer (1994) have looked at the use of colour on such maps. One special class of problem arises because of the variable size of spatial units. This has led some authors (e.g. Dorling, 1994 and Selvin et al., 1988) to explore the use of cartograms or “density-equalised” maps in disease mapping (an idea dating back many years; see Forster, 1963). Here, the size of an areal unit is made proportional to population at risk, and disease rates are shaded on this new base map; alternatively, individual disease events may be mapped instead. The ability to construct cartograms is not, to my knowledge, currently available within any proprietary GIS; it should be. An alternative method is to use proportional symbols to represent population at risk, with symbol shading representing disease rate. 12.2.2 Exploratory spatial data analysis The production of cartograms—effectively, transformations of geographic space—means that these are as much exploratory as visualisation tools, suggesting that the distinction between the two sets of methods is fuzzy. The uneven distribution of population within any given spatial unit (see Wegener, Chapter 10 in this volume) also calls for solutions from exploratory spatial data analysis, though again the results produce new visualisations of epidemiological data. Martin and Bracken (1991) have introduced these methods into the analysis and mapping of census data. Their use in the analysis of health data is set to expand rapidly (Gatrell et al., 1996; Rushton et al., 1996). Such methods are known as kernel or density estimation. In essence, they reveal how the density, or “intensity” of a spatial point pattern varies across space (see Bailey and Gatrell, 1995, pp. 84–88 for a pedagogic treatment). A moving window, or kernel, is superimposed over a fine grid of locations, and the density estimated at each location; research shows that the choice of kernel function is less critical than the bandwidth or spatial extent of the kernel function. Too small a bandwidth simply duplicates the original map of health events, while too large a bandwidth over-smoothes the map, obscuring any useful local detail. Methods are available for selecting an optimal bandwidth. Visualisation of the results shows regions where there is a high incidence of disease, and therefore possible clusters. On its own, this is uninformative, given the natural variation in population at risk, but others (Bithell, 1990; Kelsall and Diggle, 1995) have shown how the ratio of two density estimates (one for disease cases, the other for healthy controls) provides a powerful exploratory tool for cluster detection. For example, we may map the incidence of children born between 1985 and 1994 with serious heart malformations in north Lancashire and south Cumbria (Figure 12.1a). The dot map appears to do little more than mirror population distribution. Mapping controls (healthy infants, those born immediately before and after the cases) allows us to assess this visually (Figure 12.1b) but the ratio of two kernel estimates (Figure 12.1c) gives us a more rigorous indication of whether or not there are significant clusters, the density of shading providing clues as to the location of clusters. In this instance there seems to be nothing unusual in the distribution of heart malformations; a possible cluster in the north-east of the study area is influenced very much by the presence of a single case, while a test of the hypothesis that the two kernel estimates are identical, using randomisation tests, gives a p value of 0.587. Rushton et al. (1996) have embedded these kinds of ideas into software that is now widely available to health professionals in the USA and they are implemented in some of the interactive software environments mentioned earlier. Although derived from a different pedigree, and with an underlying theory that suggests it belongs more in a section on modelling than exploration, the work of Oliver and others on the kriging of disease data (and discussed in the spatial analysis GISDATA meeting) is closely related to kernel estimation (see Oliver et al., 1992).
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
137
Figure 12.1 Geographical analysis of congenital malformations in north Lancashire and south Cumbria, UK, 1985– 1994: (a): case incidence (b): healthy controls (c): ratio of kernel estimates (h=bandwidth)
Such methods allow us to pinpoint possible clusters, in much the same way as Openshaw’s groundbreaking Geographical Analysis Machine sought to do (Openshaw et al., 1987). Authors such as Openshaw and Rushton are keen to stress the potential usefulness of such methods in disease surveillance, suggesting that they could be put to use in routine interrogations of spatial databases; public health specialists would “instruct” software to search such databases for “hotspots” and report the results for investigation and possible action. The feasibility and merit of this proposal demands further consideration. The issue of whether there exist “clusters” of health events needs to be separated conceptually from whether or not there is generalised “clustering” of such events across the study region as a whole. In some applications it is important to know whether or not there is a tendency for cases of disease to aggregate more than one might expect on a chance basis. Again, there are statistical tools available to do this. For example, one approach (Cuzick and Edwards, 1990) looks at each case of disease in turn and asks whether nearest neighbours are themselves more likely to be cases than controls. Other approaches use Ripley’s Kfunction, which gives an estimate of the expected number of point events within a given distance of an arbitrarily chosen event; again, pedagogic treatments are available (Bailey and Gatrell, 1995). The Kfunction allows us to assess whether a spatial distribution is random, clustered, or dispersed, at a variety of spatial scales. As with kernel estimation, knowledge of the K-function for the spatial distribution of health events is of limited value, but if it is estimated for both cases and controls we can assess whether cases display more, or less, tendency for aggregation or clustering than we would expect, given background variation in population at risk. Statistical details are given in Diggle and Chetwynd (1991), with applications in Gatrell et al. (1996). If our database includes as an attribute date of infection, or date of disease notification, then the analytical possibilities are widened, and the space-time incidence of disease may be explored. Do cases that cluster in space also cluster in time? If so, this may give clues to a possible infective mechanism in disease
138
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
aetiology (causation). There is a variety of techniques available here, including those which extend the Kfunction (Bailey and Gatrell, 1995, pp 122–5). Applications include research on Legionnaires’ disease (Bhopal et al., 1992) and on cancer (Gatrell et al., 1996). But this area of research raises interesting issues, of visualisation and analysis, and those interested in spatio-temporal GIS (as in a parallel GISDATA initiative) are well-placed to make a contribution. For example, we might have very detailed longitudinal information available on residential histories, and this can be used to examine space-time clustering not in a contemporary setting but in an historical one. We could, for instance, record locations at a certain age, and use as a measure of temporal separation the difference in birth year of cases. A test of space-time interaction might then reveal whether there was a tendency for clustering at particular ages, as demonstrated very convincingly for multiple sclerosis in part of Norway (Riise et al., 1991). If we wish, as we should, to give time a significant role in our analyses we need to recognise the scale dimension, just as we do in a spatial setting. In other words, we need to acknowledge that some exposures to infectious agents (such as viruses) might have taken place many years ago, and any disease have taken years to develop; on the other hand, some environmental insults (the Chernobyl and Bhopal disasters spring immediately to mind) can lead to immediate health consequences. And at very local scales, it is quite crucial to realise that simply recording current address—and using the battery of spatial point process tools outlined above—is a grossly imperfect measure of “exposure”. People do not remain rooted to the spot, waiting to be exposed to airborne pollutants, for example. They have individual, and possibly very complex, daily and weekly activity spaces, a set of other locations at which they may be exposed to a variety of environmental contaminants. This implies that we should draw upon the rich literature of “time geography”, first developed by the Swedish geographer Torsten Hägerstrand, in order to give due weight to these influences. A start has been made by some in bringing these ideas to the attention of the GIS community (Miller, 1991), while Schaerstrom (1996) has shown how they can be employed in an epidemiological setting (see Figure 12.2). This is a potentially fruitful and important area of research, in which much remains to be done. In exploring health data that are collected for areal units there is a variety of analytical methods available. An important issue here concerns the low frequency of counts or incidence in small areas—or even in quite large areas if the disease is rare. Techniques such as probability mapping, and in particular Bayes estimation (Bailey and Gatrell, 1995, pp 303– 308; Langford, 1994) are now commonplace in the epidemiology literature. Essentially, the latter allows us to “shrink” rates in areas where disease incidence is low, towards the average value for the study area as a whole, as a way of acknowledging that our estimates are uncertain; if the rate is based on large numbers of cases it is not shrunk or smoothed so much. The smoothing can be either “global” or “local”; in the latter context the estimate is adjusted to a local or neighbourhood mean rather than that for the entire study area. Although such methods are not standard in proprietary GIS, they are widely used in modern atlases of mortality and morbidity (see, for example, the Spanish cancer atlas: Lopez-Abente et al., 1995). Note that it is also possible to adapt the kernel estimation ideas discussed earlier for use in exploratory analyses of area data; this has been exploited to great effect in the electronic atlas of mortality in Italy (Cislaghi et al., 1995). Several researchers have made use of measures of spatial (auto)correlation in describing the patterning of disease incidence among a set of areal units (see, for example, Glick, 1982; and Lam, 1986). Various researchers have added such tools to GIS. But one critique of spatial autocorrelation statistics is that they describe properties of the map as a whole; they are global rather than local statistics. Researchers such as Anselin (1995) and Ord and Getis (1995) have encouraged the incorporation of LISA (local indicators of spatial association) into GIS.
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
139
Figure 12.2 The time geography, and risk factors of an imaginary family (after Schaerstrom, 1996)
12.2.3 Modelling spatial data in epidemiology In using point data in geographical epidemiology a common emphasis has been either on detecting clustering of disease, or in identifying “clusters”. But a key problem for geographical or environmental epidemiology is to conduct more “focused” studies (Besag and Newell, 1991), where the aim is to test hypotheses about possible raised incidence of disease around suspected point or linear sources of pollution (such as incinerators and nuclear power plants, or high voltage power lines and busy main roads). While much of the exploratory research can be conducted without a proprietary GIS (rather, with software for interactive spatial data analysis) it is surely in the field of modelling that GIS can make a real contribution. This is because we frequently have to link epidemiological data to other databases, concerned with air or water quality, for example. Dramatic examples of the possibilities are provided by those researchers engaged in trying to predict and control the incidence of malaria in parts of Africa, such as the Gambia (Thomson et al., 1996) and KwaZulu-Natal (Sharp and Le Sueur, 1996). The first of these studies demonstrates the potential for using coarse-resolution satellite imagery, and derived NDVI measurements, in modelling malaria transmission; the second shows how global positioning systems can be used to record the locations of 35,000 homesteads at risk from malaria, in relation to clinic catchments; in so doing, this work anticipates later discussions of links between traditional epidemiology and health care planning. Much of this modelling work proceeds under the assumption that proximity to, or distance from, such putative sources acts as a reasonable marker of exposure. For example, Diggle et al. (1990) demonstrated raised incidence of larynx cancer around the site of a former industrial waste incinerator, by fitting a spatial statistical model to data on cases and controls (see Gatrell and Rowlingson, 1994 for comments on linking
140
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
this to a proprietary GIS, and Elliott et al., 1992b for tests of the hypothesis around other incinerators). One important issue, however, is to control for “confounders”, or other variables that may themselves be associated with proximity to the source(s) being examined. For example, we should where possible control for smoking behaviour, or for socio-economic status, since it may be that any demonstrable elevated risk near such sources is due to these factors, and not to any emissions from the plant (Diggle and Rowlingson, 1994). For areal data there is a substantial literature on modelling disease incidence, with regression-type models used to explain incidence in terms of available covariates. The important point to note here is the need to recognise spatial dependence among the set of areal units; they are not “n” independent observations. The statistical issues are discussed in Haining (1990) and in Bailey and Gatrell (1995), among other texts, while applications are presented by Elliott et al. (1992). More specifically, several authors have built upon the earlier, exploratory Bayesian analysis to build generalised linear models with fixed and random effects. For example, Lopez-Abente (1998) has included covariates such as the application of insecticides, in an ecological analysis of cancers among Spanish provinces. As in exploratory analyses, issues of exposure come to the fore in modelling disease incidence. Suppose we wish to model the incidence of respiratory disease in the vicinity of main roads, or to model the distribution of odour complaints around a hazardous waste site (Lomas et al., 1990). We can use a GIS to define areas of risk, by placing buffer zones around such roads, maybe varying the width of these zones to reflect estimated traffic densities. There are important epidemiological issues to address in any analysis, such as the need to allow for confounders (is incidence high along busy roads because the damp housing is found there too, for example?) and to recognise that indoor exposure to pollutants may be equally serious, if not more so. But the complexity of individuals’ activity spaces is an issue again. And in the absence of detailed measurements of air quality we are forced to rely on modelling likely exposure, for example by using kriging or other interpolation techniques to provide estimates of exposure over space (Collins et al., 1995). A further question is the extent to which there are other adequate surrogates of exposure. Is traffic density, or even the density of the road network, adequate in studies of respiratory morbidity; do the explanatory gains of using monitored or modelled air pollution outweigh the costs and complexity involved in collecting the data? 12.3 GIS AND HEALTHCARE DELIVERY So far, we have considered the role that analytic GIS can play in an understanding of the geographical incidence of disease. We turn now to its possible role in planning the configuration and delivery of health services, following this by considering how GIS can help an examination of variations in accessibility and uptake of services. The two are closely linked, though one can be thought of more in terms of the provider’s perspective, the other more as a consumer perspective. 12.3.1 Planning health services Comparing healthcare systems and delivery among many countries, both in the developed and developing worlds, reveals that a primary healthcare focus is being increasingly adopted, with a concomitant acknowledgement that planning has to be on a local scale. In the developing world this means that healthcare delivery has moved away from investment in “prestige”, hospital-based facilities and more
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
141
Figure 12.3 Dominant flows of patients to General Practitioners in West Sussex (after Bullen et al., 1996)
towards small-scale, community-based clinics that meet better the needs of the population. This local focus is mirrored in parts of the developed world. For example, in Britain, where the “purchase” of health care (by general practitioners and health authorities, on behalf of their populations) is separated from those who provide it (hospital and community services) there have been moves towards “locality commissioning”. This requires purchasers to define localities, about which detailed information on demography and morbidity is required in order to identify likely needs and demands for health care. Bullen et al. (1996) have demonstrated the usefulness of GIS in defining such localities in west Sussex, on the south coast of England. One particularly novel idea was to incorporate individuals’ own definitions of neighbourhood into the planning process; 500 such neighbourhoods were digitised, then rasterised in order to form a count of the number of times a cell formed part of a local neighbourhood. Dominant patient flows (to GP surgeries) were also used in the planning process (Figure 12.3). Having defined small areas for which health care purchasing is required, how are we to assess the needs of people living there? There is a long tradition, certainly in Britain, of using census data, and socio-economic classifications derived from such data, in order to characterise small areas. “Geodemographics”, the origin of which lies in target marketing, has been used to attach lifestyle descriptions (such as “affluent achievers”, “thriving greys” or “hard-pressed families” in the CDMS Super Profile system; see Brown et al., 1991) to postcodes (representing on average 15 properties). In this way, the proportions of locality populations in each lifestyle class, or the association of lifestyle with mortality or morbidity data, can be obtained (Brown et al, 1995). Other research (Hennell et al, 1994) has shown how a standardised morbidity measure can be estimated for such lifestyle classes and attached to individuals, yielding a synthetic illness score for the practice with which patients are registered. Such a score can be used, for example, to predict expenditure on prescriptions.
142
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
12.3.2 Accessibility, utilisation and outcome An assessment of the accessibility of populations to public facilities has long been a subject for geographical enquiry, with Godlund’s classic work on Swedish hospitals being one of the earliest studies (Godlund, 1961). Studies are now appearing that are similar in spirit to what Godlund did without the benefit of GIS technology. For example, in Illinois, Love and Lindquist (1994) have taken 11,000 census block centroids in the state, together with census data on the distribution of the elderly, and linked this to the coordinates of 214 hospitals in order to determine the accessibility of that population to such hospitals. Using simple Euclidean distance they show that 80 percent of the elderly are within about 8 km. of a hospital, and 60 percent within 8 km. of two hospitals; however, those living outside major urban areas differ substantially in their accessibility from urban residents. Whether more sophisticated measures of accessibility than straight-line distance are valuable is a question that does not appear to have been addressed, though with the advent of digitised road network databases it is, as Love and Lindquist observe, feasible to use network distance or estimated drive times to compute accessibility scores. In a pilot study in south-east England, Gatrell and Naumann (1992) took road network data provided by the cartographic company Bartholomew, used assumed speeds of travel during peak and off-peak hours and thereby assigned journey times to arcs of the network. With data on the locations of accident and emergency facilities, linked to census data, they were able to assess which areas were more than a fixed travel time from A&E sites. But the real value here is the ability to use the GIS as a spatial decision support tool, evaluating accessibility under a range of scenarios; what happens, for example, if a particular site closes? Similar research, investigating the need for additional cancer units in north-west England, has been reported by Forbes and Todd (1995). For the GIS specialist this kind of work raises interesting questions. Gatrell and Naumann (1992) make the point that results are sensitive to the resolution of the road database used. Ball and Fisher (1994) observe that we cannot legitimately speak of a single catchment around a hospital or clinic; such catchments can only be probabilistic rather than deterministic. And from a substantive, rather than technical, viewpoint, we need to ask the question: accessibility for whom? The work described here assumes a population that drives to hospital. I am unaware of published research that uses data on public transport availability to assess access to healthcare services among non-car users; surely it is not too much to ask to incorporate, for example, bus timetables into a GIS? The research reported above considers potential accessibility; but what of the utilisation or uptake of care? There is a substantial literature on variations or inequalities in uptake, though the use of GIS here has been negligible. As an example of what is possible, consider the problem of assessing the uptake of screening for breast cancer in south Lancashire (in north-west England). Suppose we wish to explain variations in uptake of screening among the set of general practices (physician clinics); why do some practices achieve uptake rates of perhaps 90 percent, while others only 50 percent? What role do catchment characteristics play? Given that patients have some choice in selecting their GP it is no easy matter to define such catchments (Haynes et al., 1995). But given the patient’s postcode we can assign her to an enumeration district (ED) and attach to her record data on the social deprivation of that ED in which she resides. Collecting together all patients registered with a particular practice we can obtain a crude average deprivation score for that practice. When we regress uptake against deprivation we find a clear relationship (Figure 12.4), moreover, when we add practice characteristics (such as whether or not it has at least one female partner), the level of explained variation increases significantly. Jones and Bentham (1995) have demonstrated the use of GIS in understanding the links between health outcomes and accessibility. In an examination of road traffic accidents in Norfolk, England between 1987 and 1991 they estimated the time taken for an ambulance to reach the accident and convey victims to A&E
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
143
Figure 12.4 Uptake of screening for breast cancer in South Lancashire, UK (solid dots are general practices without a female GP; crosses are those with at least one female GP)
departments. They modelled the likelihood of the victim being a fatality, as opposed to a serious injury, using the estimated travel time but also controlling for other factors, such as type of road, nature of accident, weather conditions, and age of victim. No relationship was found between outcome and journey time, so that survival did not appear to be affected by accessibility. Whether this finding extends to other parts of the world, where distances in rural areas are much greater, has yet to be demonstrated after adequate control for confounding variables. Such research is important, since it feeds into policy debates about the concentration of health services. For many reasons it may be sensible to plan services so that they are located in one large regional centre; but the impacts on those who are some distance from such services have yet to be fully evaluated, and there is much research using GIS (and linked to spatial interaction models) to be done here. For example, after suitable control for confounders, is cancer survival affected by relative location? 12.4 LINKING EPIDEMIOLOGY AND HEALTHCARE PLANNING: SPATIAL DECISION SUPPORT SYSTEMS As noted earlier, within medical geography there are usually two major research areas distinguished; one on geographical epidemiology, the other on health care planning. But we need to build bridges between these— and (to continue the metaphor) the structure required to do this can and should include a spatial decision support system. If analysis suggests there are serious health variations, and in particular localised health problems, then a need is identified for resources to be devoted to tackling such problems. A spatial decision support system provides the tools to do this, as the previous section implied. A striking example of this comes from the work referred to above on malaria incidence and control. Research by Sharp and le Sueur (1996) shows that small-scale maps that portray broad regional trends can
144
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
mask substantial, and epidemiologically significant, variations in incidence in small districts. More detailed maps allow authorities seeking to control malaria to focus strategies in areas where they are most needed; this spatial targeting of resources also contributes to cost-effectiveness. The research also highlights the need to recognise that many public health problems do not stop at national borders; a high percentage of the malaria cases in South Africa are imported from Mozambique, for example. With the rapid expansion of global travel, we do well to remember that new “spaces” created by flows of people serve as modern backcloths on which disease transmission is mapped (Gould, 1993). And we need to bear in mind that, having identified areas of high incidence of disease and illness, the goal of public health doctors should be to “change the map”, by seeing that health and other policies are implemented accordingly. 12.5 CONCLUSIONS Much of this review has emphasised the importance of adding statistical analysis to GIS, though the previous section stressed the parallel theme of ensuring that a decision support component is available too. Having made these points, we do need to ask the question: who will use this extended GIS? Links to public health doctors and others over recent years have taught me not to overestimate their analytical requirements. Some of the tools discussed above are quite sophisticated, and are hardly likely to put in an appearance in health reports that have to satisfy the requirements of lay audiences; while such audiences may themselves be sophisticated, few will have a subtle feel for Bayesian estimation, kernel estimation and K-functions! When coupling such analytical tools to a GIS we need to provide plenty of guidance concerning their use. We also need too to recognise, and emphasise, issues of data quality. It is no use putting such tools to work on poor data. When dealing with clinical databases we need high standards of diagnostic accuracy. And the importance of this needs to be communicated to a lay public concerned about “cancer clusters”, for example, where different cancers may well have very different aetiologies and where awareness of perhaps two or three cases of “brain cancer” may mask the fact that one or more may be secondary tumours resulting from spread of the cancer from another, primary site. Data quality issues also arise in a primary care setting, where general practitioner databases may be out of date or inflated by patients who have left the area but who have yet to be removed from GP registers. In a European setting, we see vast differences in the availability of high resolution health data. For instance, Scandinavian countries have detailed, geocoded, individual level data that allow the researcher to track movements of individuals between residential and occupational settings. Yet the researcher in France is restricted to aggregated data at quite coarse levels of spatial resolution. Even where data are of high quality—for instance where we have accurate residential locations and histories of patients stretching back many years—we should acknowledge that such locations provide far from perfect measures of “exposure”. This issue is, for me, one of the most critical in GIS-based epidemiology. Finally, I think that those of us interested in GIS and health need to engage in more dialogue with those who approach health research from an alternative epistemological viewpoint. What can we do to answer those who criticise much applied GIS for its “surveillant eye” (Pickles, 1995), for distancing itself from those whose health it seeks to map, explore and model? I am much struck by the dedication in Anders Schaerstrom’s (1996) thesis: “To all those unfortunate people whose lives, sufferings and deaths are transformed to trajectories, dots and figures in scientific studies”. (For “transformed”, we might read “reduced”). Put simply, much GIS-based health research takes place within the context of a biomedical model, in which social, cultural and biographical settings are ignored. Can we do more to acknowledge lay perspectives, perhaps? A start could be made by creating and analysing spatial databases that have more to
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
145
do with the perception of ill-health rather than health data with “hard” end-points. Or, drawing on new concerns over “environmental equity” we could construct databases that deal with access to healthpromoting resources (such as good, reasonably priced food, recreational facilities, traffic-free zones) rather than only access to secondary or tertiary health care. These are some of the challenges for GIS-based health research in the first few years of the twenty-first century. REFERENCES ANSELIN, L. 1995. Local indicators of spatial association-LISA, Geographical Analysis, 27, pp. 93–115. ANSELIN, L. and GETIS, A. 1993. Spatial statistical analysis and geographic information systems, in Fischer, M.M. and Nijkamp, P. (Eds.) Geographic Information Systems, Spatial Modelling, and Policy Evaluation. Berlin: Springer-Verlag. BAILEY, T.C. 1994. A review of statistical spatial analysis in geographical information systems, in Fotheringham, A.S. and Rogerson, P. (Eds.) Spatial Analysis and GIS, London: Taylor and Francis. BAILEY, T.C. and GATRELL, A.C. 1995. Interactive Spatial Data Analysis. Harlow: Addison, Wesley, Longman. BALL, J. and FISHER, P.P. 1994. Visualising stochastic catchments in geographical networks, The Cartographic Journal, 31, pp. 27–32. BESAG, I.E. and NEWELL, J, 1991. The detection of clusters in rare diseases, Journal of the Royal Statistical Society, Series A, 154, pp. 143–155. BHOPAL, R., DIGGLE, P.J. and ROWLINGSON, B.S. 1992. Pinpointing clusters of apparently sporadic Legionnaire’s disease, British Medical Journal, 304, pp. 1022–27. BITHELL, J. 1990. An application of density estimation to geographical epidemiology, Statistics in Medicine, 9, pp. 691–701. BREWER, C. 1994. Color use guidelines for mapping and visualisation, in MacEachran, A. and Taylor, D.R.F. (Eds.) Visualisation in Modern Cartography. Amsterdam: Elsevier. BROWN, P.J.B., HIRSCHFIELD, A.F.G. and BATEY, P.W.J. 1991. Applications of geodemographic methods in the analysis of health condition incidence data, Papers in Regional Science, 70, pp. 329–44. BROWN, P.J.B., TODD, P. and BUNDRED, P. 1995. Geodemographics and GIS in Small Area Demographic Analysis: Applying Super Profiles in District Health Care Planning, URPERRL, Department of Civic Design. Liverpool: University of Liverpool. BRUNSDON, C. and CHARLTON, M. 1995. Developing an exploratory spatial analysis system in XLisp-Stat, Proceeding of, GISRUK ‘95. London: Taylor & Francis. BULLEN, N., MOON, G. and JONES, K. 1996. Defining localities for health planning: a GIS approach, Social Science and Medicine, 42, pp. 801–816. CISLAGHI, C., BIGGERI, A., BRAGA, M., LAGAZIO, C. and MARCH, M. 1995. Exploratory tools for disease mapping in geographical epidemiology, Statistics in Medicine, 14, pp. 2663–2682. COLLINS, S., SMALLBONE, K., and BRIGGS, D. 1995. A GIS approach to modelling small area variations in air pollution within a complex urban environment, in P.Fisher (Ed.) Innovations in GIS 2. London: Taylor & Francis. DIGGLE, P.J. and ROWLINGSON, B.S. 1994. A conditional approach to point process modelling of elevated risk, Journal of the Royal Statistical Society, Series A, pp. 433–40. DIGGLE, P.J. and CHETWYND, A.D. 1991. Second-order analysis of spatial clustering for inhomogeneous populations, Biometrics, 47, pp. 1155–1163. DIGGLE, P.J., GATRELL, A.C. and LOVETT, A.A. 1990. Modelling the prevalence of cancer of the larynx in part of Lancashire: a new methodology for spatial epidemiology, in Thomas R.(Ed.) Spatial Epidemiology. London: Pion. DORLING, D. 1994. Cartograms for visualising human geography, in Hearnshaw, H.J. and Unwin, D.J. (Eds.) Visualization in Geographical Information Systems. Chichester: John Wiley. ELLIOTT, P., CUZICK, J., STERN, R. and ENGLISH, R. 1992a. Geographical and Environmental Epidemiology. Oxford: Oxford University Press.
146
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
ELLIOTT, P., HILLS, M., BERESFORD, J., KLEINSCHMIDT, I., JOLLEY, D., PATTENDEDN, S., RODRIGUES, L., WESTLAKE, A. and ROSE, G. 1992b. Incidence of cancer of the larynx and lung near incinerators of waste solvents and oils, Lancet, 339, pp. 854–858. FORBES, H. and TODD, P. 1995. Review of Cancer Services: North West Regional Health Authority, URPERRL, Department of Civic Design. Liverpool: University of Liverpool. FORSTER, F. 1963. Use of a demographic base map for the presentation of areal data in epidemiology, British Journal of Preventive and Social Medicine, 20, pp. 165–171. GATRELL, A.C. and NAUMANN, I. 1992. Hospital Location Planning: A Pilot GIS Study, North West Regional Research Laboratory. Lancaster: Lancaster University. GATRELL, A.C. and ROWLINGSON, B.S. 1994. Spatial point process modelling in a GIS environment, in Fotheringham, A.S. and Rogerson, P. (Eds,) Spatial Analysis and GIS. London: Taylor & Francis. GATRELL, A.C. and BAILEY, T.C. 1996. Interactive spatial data analysis in medical geography, Social Science and Medicine, 42(6), pp. 843–855. GATRELL, A.C., BAILEY, T.C, DIGGLE, P.J. and ROWLINGSON, B.S. 1996. Spatial point pattern analysis and its application in geographical epidemiology, Transactions, Institute of British Geographers, 21, pp. 256–274. GATRELL, A.C. and LÖYTÖNEN, M. 1998. (Eds.) GIS and Health. London: Taylor & Francis GLICK, B.J. 1982. The spatial organisation of cancer mortality, Annals of the Association of American Geographers, 72, pp. 471–481. GODLUND, S. 1961. Population, Regional Hospitals, Transport Facilities and Regions: Planning the Location of Regional Hospitals in Sweden, Lund Studies in Geography, Series B, No. 21, University of Lund, Sweden. GOODHILD, M., RAINING, R., WISE, S. et al 1992. Integrating GIS and spatial data analysis: problems and possibilities, International Journal of Geographical Information Systems, 6, pp. 407–423. GOULD, P. 1993. The Slow Plague: A Geography of the AIDS Pandemic. Cambridge, MA: Blackwell. HAINING, R. 1990. Spatial Data Analysis in the Social and Environmental Sciences, Cambridge: Cambridge University Press. HAYNES, R.M., LOVETT, A.A., GALE, S.H., BRAINARD, J.S. and BENTHAM, C.G. 1995. Evaluation of methods for calculating census health indicators for GP practices, Public Health, 109, pp. 369–374. HENNELL, T., KNIGHT, D. and ROWE, P. 1994. A Pilot Study into Budget-Setting Using Synthetic Practice Illness Ratios (SPIRO Scores) Calculated from “Super Profiles” Area Types, URPERRL Working Paper 43, Department of Civic Design. Liverpool: University of Liverpool. JONES, A.P. and BENTHAM, G. 1995. Emergency medical service accessibility and outcome from road traffic accidents, Public Health, 109, pp. 169–177. JONES, K. and MOON, G. 1987. Health, Disease and Society. London: Routledge. KELSALL, J. and DIGGLE, P.J. 1995. Nonparametric estimation of spatial variation in relative risk. Statistics in Medicine, 14, pp. 2335–2342. LAM, N. 1986. Geographical patterns of cancer mortality in China, Social Science and Medicine, 23, pp. 241–247. LANGFORD, I. 1994. Using empirical Bayes estimates in the geographical analysis of disease risk, Area, 26, pp. 142–9 LOMAS, T., KHARRAZI, M, BROADWIN, R., DEANE, M, SMITH, M. and ARMSTRONG, M. 1990. GIS in public health: an application of GIS Technology in an epidemiological study near a toxic waste site, Proceedings, Thirteenth Annual ESRI User Conference. Redlands, CA: E.S.R.I.. LOPEZ-ABENTE, G. 1998. Bayesian analysis of emerging neoplasms in Spain, in Gatrell, A.C. and Löytönen, M. (Eds.) GIS and Health. London: Taylor & Francis, LOPEZ-ABENTE, G. et al. 1995. Atlas of Cancer Mortality and Causes in Spain.http://www.ucaa.es/hospital/atlas/ introdui.html LOVE, D. and LINDQUIST, P. 1994. The geographical accessibility of hospitals to the aged: a geographic information systems analysis within Illinois, Health Services Research, 29, pp. 627–651. MARTIN, D. and BRACKEN, I. 1991. Techniques for modelling population-related raster databases, Environment and Planning A, 23, pp. 1069–1075.
GIS AND HEALTH: FROM SPATIAL ANALYSIS TO SDS
147
MILLER, H.J. 1991. Modelling accessibility using space-time prism concepts within geographical information systems, International Journal of Geographical Information Systems, 5, pp. 287–301. OLIVER, M.A., MUIR, K.R., WEBSTER, R., PARKES, S.E., CAMERON, A.H., STEVENS, M. and MANN, J.R. 1992. A geostatistical approach to the analysis of pattern in rare disease, Journal of Public Health Medicine, 14, pp. 280–289. OPENSHAW, S., CHARLTON, M., WYMER, C. and CRAFT, A. 1987. A Mark 1 geographical analysis machine for the automated analysis of point data sets, International Journal of Geographical Information Systems, 1, pp. 335–358. ORD, J.K. and GETIS, A. 1995. Local spatial autocorrelation statistics: distributional issues and an application, Geographical Analysis, 27, pp. 286–306. PICKLES, J. (Ed.) 1995. Ground Truth: The Social Implications of Geographical Information Systems. New York: Guildford Press. RIISE, T. et al. 1991. Clustering of residence of multiple sclerosis patients at age 13 to 20 years in Hordaland, Norway, American Journal of Epidemiology, 133, pp. 932–939. RUSHTON, G., ARMSTRONG, M.P., LYNCH, C. and ROHRER, J. 1996. Improving public health through Geographical Information Systems: an instructional guide to major concepts and their implementation, Department of Geography, University of Iowa, CD-ROM. SCHAERSTROM, A. 1996. Pathogenic Paths? A Time Geographical Approach in Medical Geography, Lund: Lund University Press. SEL VIN, S., MERRILL, D.W. and SACKS, S. 1988. Transformations of maps to investigate clusters of disease, Social Science and Medicine, 26, pp. 215–221. SHARP, B.L and Le SUEUR, D. 1996. Malaria in South Africa: the past, the present and selected implications for the future, South African Medical Journal, 86, pp. 83–89. THOMAS, R 1992. Geomedical Systems: Intervention and Control. London: Routledge. THOMSON, M.C., CONNOR, S.I, MILLIGAN, P. and FLASSE, S. 1996. The ecology of malaria as seen from earth observation satellites, Annals of Tropical Medicine and Parasitology, 90, pp 243–264.
Chapter Thirteen The Use of Neural Nets in Modelling Health Variations— The Case of Västerbotten, Sweden Örjan Pettersson
13.1 INTRODUCTION Regional change and uneven development are terms familiar to most geographers. These research areas have received and are still receiving attention at global, regional and local scales (for a short review, see Schoenberger, 1989; Smith, 1989). Even though most research has focused on economic change, often measured as growth/decline in GDP, there have also been studies concerned with other aspects of regional change, such as demography, environment and welfare in a broader perspective (Dorling, 1995; Morrill, 1995; Pacione, 1995). Furthermore, there is an extensive literature specialising in medical geography and spatial epidemiology (Gould, 1993; Kearns, 1996; Mayer, 1990). Attempts have been made to identify underprivileged or deprived areas (Jarman, 1983, 1990; Pacione 1995) and in recent years there has been a renewed interest in geographical literature concerning issues of spatial equity, unfair distribution and justice (Hay, 1995; Smith, 1994). This chapter deals with the substantial differences in the status of public health among the populations living in 500 residential areas in the county of Västerbotten, Sweden. A neural net approach is applied in order to explore these health variations. 13.1.1 Contemporary Sweden By international standards welfare in Sweden is high and relatively evenly distributed. The Swedish welfare model has reduced social and spatial differences in living conditions within the country. However, the trend changed in the late 1980s. Within a few years national unemployment rates rose to levels that were unprecedented in post-war Sweden. Although there are some signs of recovery in the economy, there is still great uncertainty as to whether or not there will be any substantial reduction in unemployment rates (SOU, 1995). This new labour market situation has major implications for individuals and households, but no clear picture has yet emerged regarding how these changes affect welfare and health distribution between social groups and regions. Although Sweden is characterised by relatively small regional imbalances, there are certain differences in living conditions between different parts of the country (Svenska Kommunforbundet, 1994). Little attention has been paid, however, to the obvious differences among the populations living in different parts of cities and municipalities.
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
149
Figure 13.1: The county of Västerbotten; fifteen municipalities and 500 microregions. The areas with high ill-health rates (65 days or more) are shaded.
Uneven distribution has usually been seen as a regional problem, mainly concentrated to the interior parts of northern Sweden or at specific localities hit by a crisis when a large plant has been shut down, and the traditional Swedish regional policy has almost exclusively been aimed at such areas and localities. Recently, there has also been some concern about metropolitan suburbs with a large proportion of immigrants (National Board of Health and Welfare, 1995). Persson and Wiberg (1995) maintain that the last few years have seen a shift towards increasing spatial differences in Swedish society and they also anticipate that such growing inequalities are first to be observed at the micro-regional level, i.e. within counties and municipalities. A recent empirical study has shown that differences in both living conditions and public health can be substantial within an ordinary Swedish county (Pettersson et al., 1996). 13.1.2 Aims of the study This chapter will focus on the observed differences in health status among the populations living in different parts of the county of Västerbotten in northern Sweden (Figure 13.1). In this chapter a measure called “ill-health rate” (ohälsotal) will be used as an indicator of the population’s health status. The illhealth rate consists of the population’s average number of days absent from work. The measure will be further specified and discussed later in the chapter. Hypothetically, the analysis has to face the presence of non-linear interaction between indicators at different spatial scales. Such “local” pockets of interaction are difficult to pin-point with explicit a priori hypotheses. As an alternative to regression analysis the methodology of supervised artificial neural nets will be applied to the problem of identifying a relationship between sick-leave and the morphology of
150
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
demographic, social and economic indicators at different spatial scales. This method is supposed to reveal the relevant explanatory patterns in social and physical space. First it is necessary to give a description of the studied county and to summarise some of the findings from earlier stages in the research project (Pettersson et al., 1996). 13.2 THE STUDY AREA The county of Västerbotten consists of 15 municipalities in northern Sweden, covering approximately 55, 000 km2 and extending from the shores of the Gulf of Bothnia to the mountainous border with Norway. Most of the 250 000 inhabitants are concentrated to the coastal areas, especially in the towns of Umeå and Skellefteå. With its 100 000 citizens Umeå is the biggest and fastest growing municipality in northern Sweden and it serves as the regional centre with a university and a regional hospital. Skellefteå is more of an industrial centre and dependent upon small-scale manufacturing industry. During the last few years the differences between these two major localities have increased. While Umeå is growing rapidly, Skellefteå exhibits clear signs of decline. The interior parts of Västerbotten are very sparsely populated (1–3 persons per km2) and are considered to be among the traditional problem areas in Swedish regional policy. 13.2.1 The microregional approach A pilot study (with reference to what has previously been done in Sweden) was conducted to analyse emerging new patterns of inequality in an exploratory way (Pettersson et al., 1996). In order to shed light upon the substantial spatial differences in living conditions and the hypothesis about increasing inequalities (Persson & Wiberg, 1995) it was thought necessary to employ data with a higher degree of spatial resolution than municipalities (Can, 1992). Even though the investigation was conducted for a single county there are good reasons to believe that most of the spatio-temporal changes that could be observed in Västerbotten are, to a large extent, also valid for the rest of Sweden. The microregional approach left us with two main choices: either to use electoral wards or to use NYKOs (nyckelkodsområden), which are the municipalities’ own statistical subdivisions (usually more detailed than electoral wards). Both divisions are considered to represent relatively homogeneous housing environments. However, the NYKO-system involves a number of practical problems as there are no national rules regarding how the municipalities are supposed to make these subdivisions and only a few, mostly big, municipalities have actually produced digital maps according to NYKO-boundaries. The final decision was to employ electoral wards for the rural municipalities and NYKO for the three largest municipalities. By making use of NYKO for some of the municipalities, it would be easier to trace local pockets of deprivation within the major localities. The final base map (Figure 13.1) was made up of approximately 500 geographical entities. Some area units are very large while others are very small, and most microregions contain between 50 and 1700 inhabitants. Since “Statistics Sweden” adopts a principle of suppressing information on areas with few inhabitants there was some loss of information in microregions with fewer than 50 residents. With conventional choropleth maps there is an obvious risk that physically large areas dominate the visual impression. Furthermore, small and often densely populated areas are obscured. In this case the latter problem is of less importance since the urban areas in the county of Västerbotten are usually characterised by low ill-health rates.
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
151
A set of census indicators was obtained from ‘Statistics Sweden’ and when running univariate analyses, many indicators revealed dramatic differences within the county. Some variables showed expected spatial patterns with manifested urban-rural continuum characteristics. Other indicators resulted in a kind of microregional mosaic with patterns that were far more complex and difficult to interpret. The ill-health rate was one of them (Figure 13.1). A simple index showed that many microregions were exposed to multiple deprivation. Most of these were found in rural areas, but there were also underprivileged areas unexpectedly close to the coast and within the towns. During the period between 1985 and 1992 there has been a substantial decline in employment intensity, but this change appeared to affect most microregions to the same extent. In terms of disposable income there has been a tendency towards convergence, whereby the populations in the relatively poor areas have experienced a significant rise in purchasing power. Contrary to this, the households in affluent areas have suffered a decline in disposable income. This development contradicts the hypothesis suggesting increased spatial inequalities at the microregional level. 13.2.2 Clusters of microregions A cluster analysis was performed in order to reduce the 500 microregions into a manageable number of groupings with similarities regarding certain variables, such as residential characteristics and indicators of material living conditions and health (Pettersson, 1996). The cluster analysis was performed with Ward’s method and six indicators. A seven-cluster solution provided groupings with well-defined characters, ranging from densely populated residential areas within the major localities to very remote rural wards with many elderly inhabitants. The clustering procedure resulted in a spatial mosaic with marked urban-rural tendencies. The highest ill-health rates were found in the remotest margins and in the rural areas, while the lowest values were found among the suburban areas. However, the variations in public health were still considerable within some of these clusters. There also seemed to be a positive relationship between general living conditions and public health in the residential areas. 13.3 THE ILL-HEALTH RATE In this chapter the ill-health rate is used as an indicator of public health. The ill-health rate is defined as the average number of sick-leave days (or, more precisely, days absent from work due to illness). This measure has several advantages over alternative public health measures. One important argument is that the ill-health rate is a simple measure and also available for small area units, but there are also at least three important objections to the relevancy of the ill-health rate. Firstly, it is restricted to only a part of the population, those between 16 and 64 years of age and working. Secondly, it can be affected by other factors not considered as having anything to do with the population’s health status; for instance, the ill-health rate alters due to changes in the generosity of the sickness benefit system. Thirdly, it is difficult to make direct comparisons with other countries. Besides ordinary sick-leave days, the measure also contains those who have obtained an early retirement. Early retirements and long-term illness contribute considerably to the ill-health rate and thereby make this measure sensitive to the health status of a relatively small part of the population, especially persons over 50 years of age. Sometimes a distinction is made between early retirements and more normal sick-leave days; however, since the data used here do not allow such a distinction, both categories will be considered as
152
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
arising from the same factors, even though it is easy to see that some of the indicators are primarily related to just one of them. Sick-leaves and early retirements have received considerable attention in national and international research. Marklund (1995) provides an overview of factors known to affect the number of sick-leave days. First it is necessary to point out that the individual health situation is of course evident. In understanding the differences in public health between different regions the population’s structure regarding age and gender is of great importance. The elderly and females have more absences due to illness and the ill-health rate increases rapidly especially when approaching pensionable age (65 years of age). It is important though to make a distinction between the individual level and the aggregated level, in this case the microregion and groups of microregions. Most relationships can, in fact, only be given a relevant explanation at the individual level. For instance, it must be assumed that the influence on public health of age and gender depends upon whether the individual is aged and/or a female, and that these circumstances have no effect upon the health status of the rest of the population. Type of work and education are also of significant importance. Persons with manual or monotonous jobs have more sick-leave days. Higher educational level means fewer sick-leave days on average. Educational level is expected to be highly correlated with both the age structure of the population and type of work. The effect of the labour market situation seems to be unclear. It is important to emphasise that losing a job or being exposed to long-term unemployment is always considered to affect the individual’s health in a negative way. The effect on the ill-health rate at the aggregated level could, however, be the reverse since it is possible that persons employed in a labour market characterised by high unemployment are more anxious to be visible at the place of work. In contrast, and more controversial, it has been claimed that one way of securing a stable income in an unsafe labour market is to strive for an early retirement. Some investigations suggest a correlation between a troublesome labour market and high ill-health rates mostly resulting from large numbers of early retirements. Several studies have indicated a covariance between household structure and sickleaves. One-person households and families with small children, in particular single parents, show higher averages than the population in general. A covariance between income and health has also been proposed even though it is difficult to establish in what direction the relationship goes. A Danish study (Bovin and Wandall, 1989) has concluded that people living in small municipalities or in rural areas have fewer sick-leave days than residents in large municipalities and cities. They suggest that one reason for this is that the social control in small societies makes it harder to stay home from work. 13.3.1 Public health in the county of Västerbotten The county of Västerbotten exhibits ill-health rates well above the national average and within the county there are substantial deviations from the county average. At the municipality level there is a marked coreperiphery pattern with high ill-health rates in the rural municipalities and relatively low rates in the coastal areas. The microregional approach unmasks a more complicated pattern with high ill-health rates in the middle of the county, but similar areas are also to be found even within the coastal municipalities and towns (Figure 13.1). Within Skellefteå, in particular, there are several microregions with ill-health rates well above the county average. There are other areas with unexpectedly low ill-health rates, for instance many of the electoral wards close to the Norwegian border show much lower values than most neighbouring areas.
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
153
13.3.2 A priori hypotheses It is obvious that a large proportion of these variations in public health between microregions can be explained by the age and gender of their populations, but there is also a need to establish whether other factors, especially social and economic circumstances, contribute to the ill-health rate. Once again it is necessary to emphasise that most of the relationships are only relevant at the individual level. However, it is expected that a demographic structure with many elderly men and women implies higher ill-health rates at the microregional level. Similarly, it is likely that a high proportion of one-person households and families with children, as well as microregions with a lower educational level, will show higher ill-health rates. The relationship with income situation is more difficult to handle. A negative relationship might imply that health is negatively affected by a troublesome economic situation, but it can also be argued that the income reduction is due to health problems. With the new and seemingly structural unemployment situation it is of particular importance to investigate the relationship between labour market and public health, even though it could be argued that a large proportion of the effect on the ill-health rate is not entirely health related. A negative relationship between employment levels and the ill-health rate implies that the health status of the individual or of the microregion’s population becomes impaired when exposed to long-term unemployment or an insecure labour market situation. On the other hand, a positive relationship would suggest that, as a measure of public health at the microregional level, the ill-health rate is sensitive to factors not considered to be health related. This indicator is one example showing that the effect on the individual’s sick-leaves could be contradictory, depending on whether one considers the individual or the aggregated level. From a geographical point of view it is likely that distance to health care could be another important factor in explaining deviations in public health among the populations in different parts of the county. Primary health care is usually provided in the municipality centres, while the more advanced medical services are concentrated to the major localities and especially to the regional hospital in Umeå. Public health could also be related to the physical environment and to local cultural traditions regarding the consumption of food, alcohol and tobacco. Due to the lack of data such factors will largely be left out of this investigation. 13.4 ARTIFICIAL NEURAL NETWORKS During recent years artificial neural net technology has penetrated many fields of scientific inquiry. What in the beginning was met with scepticism and disbelief—“a black box”—has today become an established methodology providing a tool with wide applicability (Hewitson and Crane, 1994). In this chapter neural nets will be used as an alternative to linear regression analysis in order to explore microregional variations in health status among the populations of nearly 500 residential areas in the county of Västerbotten. Since it is assumed that neural net technology is not completely unknown to most readers, only a very brief overview will be given. For a short and simple introduction, see Hinton (1992) or the first three chapters in Hewitson and Crane (1994). The final chapters in the latter book also provide some examples of successful applications to geographical problems. More extensive presentations of neural net technology can be found in Bishop (1995), Hertz et al. (1991) and Ripley (1994). The neural net looks for patterns in a set of data and “learns” them. This means that the trained network has the ability to classify new patterns correctly and to make predictions and forecasts. Furthermore neural
154
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 13.2: A single neuron and a feedforward neural net with three layers.
nets are considered as being able to handle complex and non-linear interactions. Another advantage is the neural net’s capability of overcoming problems with noisy data or even missing values. A simple feedforward neural network architecture, such as the three-layer back-propagation net, consists of several building blocks. The basic element is the neuron (node or unit) shown in Figure 13.2. The node sums the weighted inputs from the connected links and an activation function decides whether the input signal to the neuron is powerful enough to “fire” a signal to the neuron(s) in the next layer(s). Most backpropagation neural networks are built with one input layer, one or more hidden layer(s) and one output layer. These layers consist of one or more neurons connected to neurons in other layers. There are two types of “learning”: supervised learning means that the neural net is provided with both input and output data, while in unsupervised learning the neural network is only given the input values. In this chapter, only supervised learning will be performed. The network “learns” (or trains) by gradually adjusting the interconnecting weightings between the layers in order to reproduce the output pattern. The purpose of training is to construct a model that will generalise well upon an unseen set of data. There is a risk that the neural network finally starts memorising the data set, but this can be prevented by saving a part of the data material outside the training set. 13.5 DATA PREPARATION In the above discussion concerning the ill-health rate we introduced some of the factors expected to contribute to variations in this public health measure. Since the data was originally obtained for a different purpose, the analysis was restricted to what was available in the data base. Nevertheless, it was possible to construct a set of variables with expected relevancy for this investigation (Table 13.1). Most variables are census-based and relate to the population’s demographic composition and their socio-economic circumstances. Other indicators describe the settlement pattern.
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
155
Table 13.1: Variable list Ill-health rate Total population Average age of total population Proportion of inhabitants 50–64 years in the 16–64 age-group Proportion of old inhabitants (over 75 years) Proportion of population 20–64 years (economically active population) Proportion of females 16–64 years Proportion of one-person households Proportion of single-parent households Proportion of two-parent households Proportion of flats in multi-dwelling buildings Land area, km2 Number of inhabitants per km2 Distance to own municipality centre, km Distance to regional centre (Umeå), km Mean income from work (thousands of SEK) Mean disposable income 1992 (thousands of SEK) Mean disposable income 1985 (thousands of SEK) Change in disposable income 1985–1992 Employment intensity 1992 Employment intensity 1985 Change in employment intensity 1985–1992 Proportion of population 16–64 years with compulsory school education Proportion of population 16–64 years with integrated upper secondary school education Proportion of population 16–64 years with post secondary school education Proportion of population 16–64 years with more than 2 years of post secondary school education Number of privately owned cars per inhabitant
It is worth emphasising that the data are aggregated and do not contain any information on whether actual relationships between indicators at the microregional level are valid at the individual level. This is the wellknown problem of ecological fallacy. As Statistics Sweden adopts a principle to suppress information on areas with small populations there was some loss of information. For this reason the investigation was restricted to areas with more than 50 inhabitants and, after this reduction, 439 microregions were available for the final analysis. Since the analysis was performed on area units of different sizes, and sometimes with small populations, there was a need to evaluate the importance of geographical scale. In order to shed some light on the scale effect the same set of variables was also computed for larger regions. These regions were constructed by summarising all values within a certain distance from the microregions’ centroid. This zoning procedure was repeated for radii of 5, 10, 20, 30 and 50 kilometres. In this way a matrix of 6X25 variables was obtained.
156
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
13.6 NEURAL NET—LINEAR REGRESSION From a methodological point of view it was desirable to find out whether neural net technology could enhance our understanding of a complex phenomenon, such as public health, beyond what was possible with other multivariate techniques. In this chapter back-propagation neural nets are compared with the more traditional linear regression analysis. 13.6.1 Regression analysis The first step was to conduct stepwise multiple linear regression at the microregional level (i.e. without any variables computed from the zoning procedure). The model (Equation 13.1) could be simplified to five significant variables without over-reducing the coefficient or multiple correlation (R2): (13.1) where: Yr=Ill-health rate=average number of sick-leave days X1=Average age of total population X2=Proportion of inhabitants 50–64 years in the 16–64 age-group (%) X3=Proportion of single-parent households (percentage units) X4=Employment intensity 1992 (%) X5=Proportion of population 16–64 years with post secondary education (%). The same procedure was repeated for the different zonings. Even though some “new” variables also showed significance it was possible to keep the same set of variables to explain the microregional public health status at all levels. The final step was to substitute some of the “microregional” indicators with the corresponding indicators from other spatial scales. It turned out that this could not be done without accepting a substantial reduction in the R2-value. 13.6.2 Neural network analysis The second step was to perform a similar analysis using a neural network. Since there are no good criteria for including or excluding variables in the neural network models, much experimentation was required with different sets of indicators, network architectures and numbers of neurons in hidden layers. Finally, it was decided to employ a three-layer back-propagation network and it was also necessary to reduce the number of variables. After some experimenting, the final choice was to use the same set of variables as in the final regression model described above, but with one-person households also included. This indicator was added because it made an important contribution to the final model and because it was also supported by other studies (Marklund, 1995). Due to the complexity of the neural net solution it is not possible to obtain coefficients of a regular equation as in regression models. It is possible, however, to compare each indicator’s relative importance to the model and to illustrate each variable’s partial effect upon the output. A sensitivity analysis was performed in order to discover the partial relationships between the input and output indicators. By feeding the trained network with slightly altered values for each variable and keeping
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
157
the rest of the indicators at the county average, it was possible to obtain the variable’s partial effect on the ill-health rate (Table 13.2). In most cases the partial relationship with the dependent variable is approximately linear and shows the same sign of coefficient as the regression equation. Nevertheless, the two variables concerning employment intensity and one-person households indicate non-linear features. Table 13.2: Results from neural net analysis at the microregional level. Variable
Weight*
Partial relationship
Signs of coefficient
Average age of total 20.24 Approx. linear, increasing + population Proportion of one-person 19.72 Non-linear + ,− households Employment intensity 18.22 Non-linear (see Fig. 13.3) + ,−, + 1992 Proportion of single15.23 Approx. linear, increasing + parent households Proportion with post 13.62 Approx. linear − secondary school education Proportion of 50–64 years 10.98 Approx. linear + in the 16– 64 age-group *Weight=contribution factor=the sum of the absolute values of the weights leading from the single variable and a rough measure of the importance of a single variable in predicting the networks output.
Figure 13.3 shows the partial effect of changes in employment intensity. The straight line displays the corresponding partial effect within a linear regression model, while the curve visualises the neural net’s ability to find non-linear relationships. It is difficult, however, to find a simple interpretation to the observed non-linearity. Since almost all microregions have employment intensities above 56 per cent, the graph implies that the ill-health rate decreases with increasing employment intensity until the employment intensity rises above the county average (74 per cent), whereupon the ill-health rate increases with increasing employment intensity. However doubtful, this could indicate that there are at least two types of microregions. In the first type the employment intensity is below the county average. These areas are characterised by a declining labour market and a relatively large proportion of the population have actually obtained early retirement. In the second type of microregions the employment intensity is higher, but this also means that being visible at the place of work is less important. It is also possible that high levels of employment intensity results in persons with a poor health status, who under other circumstances would have been unemployed, being able to find a job. Similar explanations have also been proposed in Marklund (1995). The correlation with one-person households is harder to explain. It is likely that social relations are important to the individual’s health status, but whether household structure in this sense reflects social networks remains questionable. The non-linearity is even more difficult to interpret. For microregions with relatively few one-person households, the relationship with the dependent variable is positive; but in other microregions the correlation is negative. Even though the data do not allow a distinction to be made between different types of one-person households, it is a known fact that while some microregions are dominated by elderly persons living alone, others are dominated by younger one-person households. Therefore it is possible that one-person households somehow interact with age in the neural network.
158
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 133: Partial effects of changes in employment intensity.
Another interesting characteristic of this variable is that it can be substituted with the same indicator for 10 kilometre zonings, without reducing the R2-value too much. This could be interpreted as meaning that the effect of one-person households is not isolated to the single microregion or individual, but rather that it is a structural phenomenon. 13.6.3 A comparison between regression and neural net models Both linear regression and neural nets show similar results according to sets of variables in the models. It is also possible to compare the variables’ relative importance between different models. In the neural net model the average age of the population showed the highest contributing factor, while in the regression model the proportion of highly educated was the single most important variable (Table 13.3). Table 13.3: Relative importance of variables in different models. Relative importance:
Regression model: Beta coefficient*
Neural net model: Contribution factor
High
Proportion with post secondary school education Average age of total population Proportion 50–64 of 16–64 Employment intensity 1992 Proportion of single parents
Average age of total population One-person households Employment intensity 1992 Proportion of single parents Proportion with post secondary school education
Low – Proportion 50–64 of 16–64 2 R -value (%): 56.9 65.4 Mean abs. error: 9.9 9.0 * The beta coefficient is a rough measure of the variables relative importance in the regression model.
Seemingly the neural net stresses the importance of age and household structure, whereas the regression equation puts greater significance to educational level and demographic variables. The neural net makes slightly better predictions, even though the model on average miscalculates the actual values by nine days. Both methods have difficulties in predicting extremely high and extremely low ill-health rates.
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
159
Figure 13.4: The shaded areas show where the predictions from regression analysis (above) and neural network (below) underestimate the observed ill-health rates by more than 10 per cent
13.6.4 Prediction error maps It is obvious that a large proportion of the explanation for the microregional differences in public health is left outside the applied models. A first step in analysing these deviations is to plot the microregions where the predictions fail substantially. From a public health perspective it is of special interest to study areas with higher ill-health rates than those predicted by the models (Figure 13.4). The maps show that the both methods make similar mistakes in predicting the actual ill-health rates. There seems to be a slight clustering to certain parts of the county and this could indicate specific local public health problems. Usually these microregions have relatively high ill-health rates and this stresses the importance of further investigations in these areas. These microregions could also be primary targets for
160
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 13.5: R2-values for regression and neural net models with sets of independent variables from different zonings.
public health campaigns implemented by the County Health Organisation. Similar ideas have been proposed by Jarman (1990) and Jørgensen (1990). 13.6.5 The importance of geographical scale When trying to replace some of the variables on the microregional level with the same variables for another distance all attempts failed. Only when using the neural net and replacing the microregional one-person households with the same variable for zones within a radius of 10 kilometres, was the reduction in the R2value relatively small. In one specific data run this model exceeded the “purely” microregional model substantially, but this trial could never be reproduced. In my opinion this shows that neural nets sometimes find unstable solutions. This also indicates the need for being cautious with the results of “too successful” models when using neural nets. That the microregional differences in ill-health rate could best be explained with indicators at the same level is expected. However, it is also possible to explain a substantial part of the deviations with sets of variables at other levels. Figure 13.5 shows the results of computations using regression analysis and neural networks with the previously described sets of variables but for different zonings. The “purely microregional contribution” to the model is roughly 20–25 per cent. This microregional contribution effect could have two very different interpretations. The first one indicates that the microregional level is the correct level when trying to analyse the ill-health rate and that the microregion is the population’s “natural” residential environment (neighbourhood). The second interpretation is that the effect is due to the fact that many of the microregions have relatively few inhabitants, thereby making the microregional approach sensitive to individual variations. This tends to imply that the micro-regional level acts as a proxy for the individual level. 13.7. CONCLUDING REMARKS The study was performed with relatively few indicators and it is likely that some changes would have improved the explanatory power of the models. Earlier in the chapter it was noted that type of work is an
NEURAL NETS FOR MODELLING HEALTH VARIATIONS
161
important factor when analysing sick-leaves. Even though the educational level also reflects the labour market structure to some extent, one cannot disregard the fact that such an indicator would presumably have enhanced the model. It is also likely that a subdivision into ordinary sick-leave days and early retirements would have contributed to an improved analysis. Since there are obvious differences between men and women it is possible that a subdivision into gender, for at least some of the indicators, could have improved the final models. This does not necessarily mean that another network architecture and a changed number of neurons in hidden layers would not have been able to improve the model. The selection, or rather reduction, into a few important variables did seem to be of great importance. The problem with selecting the network architecture and the number of variables is also one of the major disadvantages when using neural nets, since there are no good criteria for including or excluding variables (at least the software program used, Neuro Shell 2, did not provide such a tool). Another disadvantage is that it is impossible to grasp how the independent variables interact within the neural network. On one occasion the network provided a solution that could not be repeated and this indicates that neural networks sometimes find unstable solutions. Nevertheless, the neural network does actually provide a model that makes better predictions at the microregional level. Although there were some obvious difficulties in applying a neural net to this problem, this does not mean that neural networks would “fail” when applied to similar problems. Furthermore, the example illustrates the ability of neural nets to detect non-linear relationships. However, this chapter also shows that neural net technology is not a panacea. When compared with linear regression analysis the latter provides simpler solutions, but they are less sophisticated than the ones given by neural networks. It is also possible that most relationships between the ill-health rate and the set of independent variables are linear and that this is also the reason why the neural net does not outdo regression analysis. In fact, the partial effects from most independent variables suggest a linear relationship. The analysis shows that individual variations in health status are very important in understanding variations in public health between different parts of the county. Even though a set of social and economic indicators was utilised at the microregional level, almost every significant variable could only be given a reasonable interpretation at the individual level. However, the neural net analysis also suggested other relationships. The non-linear feature of employment intensity could perhaps indicate, however doubtfully, that not only the effect stemming from the individuals being employed or unemployed is important, but that the effect of the microregional employment situation is also relevant when analysing deviations in public health. However, this needs further investigation. ACKNOWLEDGEMENTS The chapter is based upon a research project carried out at CERUM (Centre for Regional Science at Umeå University, Sweden) and was partly financed by the Västerbotten County Health Organisation. The author wishes to thank Einar Holm and Ian Layton (Department of Social and Economic Geography, Umeå University) and three anonymous referees. REFERENCES BISHOP, C.M. 1995. Neural Networks for Pattern Recognition. Oxford: Clarendon. BOVIN, B. and WANDALL, J. 1989. Sygedage-fravaer blandt ansatte i amter og kommuner. Köpenhamn: AKFforlaget.
162
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
CAN, A. 1992. Residential quality assessment—alternative approaches using GIS, The Annals of Regional Science, 26, pp. 97–110. DORLING, D. 1995. A New Social Atlas of Britain. Chichester: John Wiley. GOULD, P. 1993. The Slow Plague: A Geography of the AIDS Pandemic. Cambridge, MA: Blackwell. HAY, A.M 1995. Concepts of equity, fairness and justice in geographical studies. Transactions of the Institute of British Geographers, 20 (4), pp. 500–508. HERTZ, J., KROGH, A. and PALMER, R.G. 1991. Introduction to the Theory of Neural Computation. Redwood City: Addison-Wesley. HEWITSON, B.C. and CRANE, R.G. (Eds.) 1994. Neural Nets: Applications in Geography. Dordrecht: Kluwer. HINTON, G.E. 1992. How neural networks learn from experience, Scientific American September 1992, pp. 105–109. JARMAN, B. 1983. Identification of underprivileged areas, British Medical Journal, 286, pp. 1705–1709. JARMAN, B. 1990. Social Deprivation and Health Service Funding. Paper presented as a public lecture at Imperial College of Science, Technology and Medicine, University of London, 22 May 1990. JØRGENSEN, S. 1990. Regional epidemiology and research on regional health care variations—differences and similarities, Norsk Geografisk Tidskrift ,44(4), pp. 227–235. KEARNS, R.A. 1996. AIDS and medical geography: embracing the other? Progress in Human Geography, 20(1), pp. 123–131. MARKLUND, S. (Ed.) 1995. Rehabilitering i ett samhällsperspektiv. Lund: Studentlitteratur. MAYER, J.D. 1990. The centrality of medical geography to human geography: the traditions of geographical and medical geographical thought, Norsk Geografisk Tidskrift, 44(4), pp. 175–187. MORRILL, R.L. 1995. Ageing in place, age specific migration and natural decrease, The Annals of Regional Science, 29, pp. 41–66. NATIONAL BOARD OF HEALTH AND WELFARE 1995. Welfare and Public Health in Sweden 1994. Stockholm: Fritzes. PACIONE, M. 1995. The geography of deprivation in rural Scotland, Transactions of the Institute of British Geographers, 20(2), pp. 173–191. PERSSON, L.O. and WIBERG, U. 1995. Microregional Fragmentation: Contrasts Between a Welfare State and a Market Economy. Heidelberg: Physica-Verlag. PETTERSSON, Ö. 1996. Microregional Fragmentation in a Swedish County. Paper presented at the 28th International Geographical Congress, The Hague, 4–10 August 1996. PETTERSSON, Ö., PERSSON, L.O. and WIBERG, U. 1996. Närbilder av västerbottningar —materiella levnadsvillkor och hälsotillstånd i Västerbottens Iän. Regional Dimensions Working Paper No. 2, Umeå universitet: CERUM. RIPLEY, B.D. 1994. Neural networks and related methods for classification, Journal of the Royal Statistical Society B, 56(3), pp. 409–456. SCHOENBERGER, E. 1989. New models of regional change, in Peet, R. and Thrift, N. (Eds.), New Models in Geography Volume I. London: Unwin Hyman, pp. 115–141. SMITH, D.M. 1994. Geography and Social Justice. Oxford: Blackwell Publishers. SMITH, N. 1989. Uneven development and location theory: towards a synthesis, in Peet, R. and Thrift, N. (Eds.), New Models in Geography, Volume I. London: Unwin Hyman. pp. 142–163. SOU 1995. Långtidsutredningen 1995. Stockholm: Fritzes. SVENSKA KOMMUNFÖRBUNDET. 1994. Levnadsförhållanden i Sveriges kommuner. Stockholm: Svenska Kommunförbundet.
Chapter Fourteen Interpolation of Severely Non-Linear Spatial Systems with Missing Data: Using Kriging and Neural Networks to Model Precipitation in Upland Areas Joanne Cheesman and James Petch
14.1 INTRODUCTION For strategic planning purposes, water authorities require accurate yield estimates from reservoirs, therefore precipitation gauge interpolation results are critical for providing areal precipitation estimates. However, the interpolation of precipitation amounts in remote, upland areas is one situation in which input data are severely unrepresentative. Precipitation gauge networks are usually of low density and uneven distribution with the majority of gauges located in the lowland regions of catchments. Results of using traditional interpolation techniques are seriously affected both by the complexity of theoretical data surfaces (Lam, 1983) and by the quality of data, especially their density and spatial arrangement. Typically, a standard interpolation technique will fail to model upland precipitation successfully, as the interpolation is likely to be based upon lowland gauges. The predominant influence of orography on the spatial distribution of precipitation throughout the United Kingdom has been recognised since the 1920s (Bleasdale and Chan, 1972); however, the relationships are not clearly defined. Orography complicates the estimation of mean areal precipitation in upland areas through effects such as the triggering of cloud formation and the enhancement of processes such as condensation and hydrometeor nucleation and growth. Additionally, intense, lengthy precipitation events are typically upwind of the topographic barrier or divide, with sharply decreasing magnitude and duration on the leeward side (Barros and Lettenmaier, 1994). Classical interpolation techniques make simplistic assumptions about the spatial correlation and variability of precipitation and do not handle orographic effects well (Garen et al. 1994). In a review of several studies evaluating various methods available for estimating areal precipitation from point values, Dingman (1994) found that optimal-interpolauon/kriging methods provide the best estimates of regional precipitation in a variety of situations. It was considered that these methods performed more accurately because they are based on the spatial correlation structure of precipitation in the region of application, whereas other methods impose essentially arbitrary spatial structures. However, kriging requires a stationary field for estimation, i.e. there must be no systematic spatial trend or ‘drift’ in the mean or variance of the process; this is not the case in upland regions influenced by orographic effects. Furthermore, kriging requires a well distributed set of points to achieve optimum performance. Modern Geographical Information Systems (GIS) provide the functionality to carry out most interpolation procedures. However, the inability of interpolation procedures to account for complex, multivariate relationships and unrepresentative data, continues to be a major shortcoming of current GIS,
164
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
which are considered to lack sophisticated forms of spatial analysis and modelling (Fischer, 1994a, 1994b; Fischer and Nijkamp, 1992; Goodchild, 1991). Future GIS models should be derived in the first instance from data rather than theory and they should be increasingly computationally dependent rather than analytical in nature. Such new spatial analysis approaches should be capable of determining relationships and patterns without being instructed either where to look or what to look for (Openshaw, 1992a), It has been suggested that artificial intelligence technologies such as the artificial neural network (ANN), could provide these more advanced forms of spatial analysis and modelling (Fischer, 1994a, 1994b; Openshaw, 1992a). Original studies into ANNs were inspired by the mechanisms for information processing in biological nervous systems, particularly the human brain. ANNs offer one alternative information-processing paradigm. ANNs comprise networks of very simple, usually non-linear computational units, interconnected and operating in parallel. Most real world relationships involving dynamic and spatial relations are non-linear. This complexity and non-linearity make it attractive to try the neural network approach, which is inherently suited to problems that are mathematically difficult to model. Furthermore, ANNs are reported to display great flexibility in “poor data” situations, which are often characteristic of the GIS world. ANN technology provides data-driven methodologies that can increase the modelling and analysis functionality of GIS (Fischer, 1994a, 1994b; Fischer and Gopal, 1993). Realisation of the potential of ANNs by the GIS community has been relatively slow with the exception of a few social and economic geographers, for example Openshaw (1992b), Fischer (1994a, 1994b) and Fischer and Gopal (1993). There has been recent progress in the Hydro-GIS field, for example, Gupta et al., (1996) integrated ANNs and GIS to characterise complex aquifer geometry and to calculate aquifer parameters for ground water modelling. Utilisation of ANNs in geoscience generally is also relatively new; applications to problems so far have included cloud classification (Lee et al., 1990), sunspot predictions (Koons and Gorney, 1990), optimising aquifer remediation for groundwater management (Rogers and Dowla, 1994), short-range rain forecasting in time and space (French et al., 1992) and synthetic inflow generation (Raman and Sunilkumar, 1995). ANNs have also been employed effectively for the classification of remotely sensed data (e.g. Benediktsson et al., 1990; Foody, 1995; Liu and Xiao, 1991). One of the simplest ways to utilise a neural network is as a form of multivariate non-linear regression to find a smooth interpolating function from a set of data points (Bishop, 1994). The data-driven generalisation approach of neural networks should also enable them to handle incomplete, noisy and imprecise data in an improved manner. In such a situation traditional statistical interpolation algorithms would not be expected to provide an adequate representation of the phenomena being studied. Furthermore, multiple input variables, which are considered to have a possible relationship with the output variable, e.g. orographic influences upon precipitation amount and distribution, can be fed into the ANN. The aim of this chapter is to present preliminary progress in the evaluation of areal precipitation models, constructed using neural networks and kriging, for upland catchments where the precipitation gauge networks are of low density, uneven distribution and are mainly located in the lowland areas. The topographic nature of the catchments is varied and precipitation distribution is likely to be influenced by orographic effects. The chapter will assess; the overall interpolation performance of each model; and the success of each model to map precipitation falling within remote, high altitude areas. This chapter provides the first steps in a comparative evaluation of neural networks and a well established GIS interpolation model, kriging, in a classic geophysical GIS application. The study area includes the upland regions of north-west England including the Pennines and the Lake District and covers an area of approximately 13, 000 km2.
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
165
14.2 ARTIFICIAL NEURAL NETWORKS 14.2.1 Theoretical Background The ANN typically comprises a highly interconnected set of non-linear, simple information processing elements, also known as units or nodes, analogous to a neuron, that are arranged in layers. Each unit collects inputs from single and/or multiple sources and produces output in accordance with a predetermined transfer function e.g. non-linear sigmoidal. Creation of the network is achieved by interconnecting units to produce the required configuration. There are four main features that distinguish ANNs from conventional computing and traditional Artificial Intelligence-approaches (Fischer, 1994a, Fischer and Gopal, 1994): 1. inherent parallelism—information processing is inherently parallel, this provides a way to significantly increase the speed of information processing; 2. connectionist type of knowledge representation—knowledge within an ANN is not stored in specific memory locations (as with conventional computing and expert systems); knowledge is distributed throughout the system, and it is a dynamic response to the inputs and the network architecture; 3. fault tolerance—ANNs are extremely fault tolerant, they can learn from and make decisions based upon noisy, incomplete and fuzzy information; and 4. adaptive model free function estimation not algorithmic dependent—ANNs require no a priori model and adaptively estimate continuous functions from data without specifying mathematically how outputs depend on inputs. French et al., (1992) state that the above characteristics can be used to identify suitable application areas for ANNs: 1. situations in which only a few decisions are required from a massive amount of data, e.g. classification problems; 2. operations that involve large combinatorial optimisation exercises; or 3. tasks in which a complex non-linear mapping must be learned, as with the situation which is addressed in this work. 14.2.2 Advantages There are a number of advantages characteristic of the ANN approach to problem solving (French et al., 1992): 1. application of a neural network does not require a priori knowledge of the underlying process; 2. one may not recognise all of the existing complex relationships between various aspects of the process under investigation;
166
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
3. a standard optimisation approach or statistical model provides a solution only when allowed to run to completion, whereas an ANN always converges to an optimal (or sub optimal) solution and need not run to any pre-specified solution condition; and 4. neither constraints nor an a priori solution structure is necessarily assumed or strictly enforced in the ANN development. Such characteristics eliminate, to a certain extent, the problems of regression-based methodologies, mainly the need for the modeller to select explanatory variables and the dependence on understanding of both local and non-local conditions. 14.2.3 Disadvantages The principal drawbacks of ANNs include: 1. the need to provide a suitable set of example data for training purpose, and the potential problems that can occur if the ANN is required to extrapolate to new regions of the input space that are significantly different from those corresponding to the training data; 2. excessive training times, there could be several thousands of weights to estimate, and convergence of the non-linear opitimisation procedures tend to be very slow; 3. over-trained ANNs can learn how to reproduce random noise as well as structure; and 4. choice of ANN architecture is extremely subjective, for instance how many layers and how many neurons in each layer (Bishop, 1994; Openshaw, 1992a). 14.2.4 Structure The ANN can be trained to solve complex, non-linear problems. In order to carry this out a neural network must first learn the mapping of input to output. In a supervised approach, the weighted connections are adjusted through a learning or training process, via the presentation of known inputs and outputs, in some ordered/random manner. The strength of the interconnections is altered using an error convergence technique so that the desired output will be produced for a known set of input parameters. Once created, the interconnections stay fixed and the ANN can be used to carry out the intended work. An ANN typically consists of an output layer, one or more hidden layers and an input layer. Each layer is made up of several nodes and the layers are interconnected by a set of weighted connections. The number of processing units in each layer and the pattern of connectivity may vary with some constraints. There is no communication between the processing units within a layer, but the processing units in each layer can send their output to the processing units in the succeeding layers. Nodes can receive inputs from either the initial inputs or from the interconnections. A feed-forward neural network with an error back propagation algorithm, first presented by Rumelhart et al. (1986), was utilised in this research. Error back propagation provides a feed forward neural network with the capacity to capture and represent relationships between patterns in a given data set. The processing units are arranged in layers, and the method takes an iterative non-linear optimisation approach, using a gradient descent search routine. Error back propagation involves two phases: a feed forward phase when the external input information at the input nodes moves forward to compute the output information signal at the output
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
167
Figure 14.1 : Three Layer Neural Network Model, Structure (After: French et al., 1992 and Raman and Sunilkumar, 1995)
unit(s); and a backward phase in which modifications to the strength of the connections are made based on the differences between the computed and observed information signals at the output unit(s). At the start of the learning process, the connection strengths are assigned random values. The learning algorithm modifies the strength in each iteration until the completion of the training. On convergence of the iterative process, the collection of connection strengths captures and stores the knowledge and information present in the examples used in the training process. The trained neural network is then ready to use. When presented with a new input pattern, a feed forward network computation results in an output pattern which is the result of the synthesis and generalisation of what the ANN has learned and stored in its connection strengths. Figure 14.1 shows a three layer neural network and its input parameters, N data input patterns, each with a set of input values, xi, i= I,.............., k at the input nodes with output values, On, n=1,....., m, at the output nodes. The input values, xi are multiplied by the first interconnection weights, Wij, j = I,............, h, at the hidden nodes, the values are then summed over the index, i, and become the inputs to the hidden layers i.e.: (14.1) where Hj is the input to the jth hidden node, Wij is the connection weight from the ith input node to the jth hidden node. The inputs to the hidden nodes are transformed through a non-linear activation function, usually sigmoidal to provide a hidden node output, HOj: (14.2) where Hj is the input to the node, f(Hj) is the hidden node output, and Pj is a threshold or bias and will be learned in the same way as the weights. The output, HOj, is the input to the succeeding layer until the output layer is reached. This is known as forward activation flow. The input to the m output nodes, IOn, is defined as:
168
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(14.3) These input values are processed through a non-linear activation function (such as the above defined sigmoidal function) to give ANN output values, On. The subsequent weight adaptation or learning process is achieved by the back propagation learning algorithm. The On at the output layer will be different to the target value, Tn. The sum of the squares of error, ep, for the pth input pattern, for each input pattern, is: (14.4) and the mean square error (MSE), E, which provides the average system for all input patterns is: (14.5) where Tpn is the target value, Tn, for the pth pattern and Opn is the ANN output value, On for the pth pattern. The back propagation training algorithm is an iterative gradient algorithm designed to minimise the average squared error between values of the output, Opn, at the output layer and the correct pattern, Tpn, provided by a teaching input. This is achieved by first computing the gradient (β n) for each processing element on the output layer: (14.6) where Tn is the correct target value for the output unit, n, and On is the neural network output. The error gradient (β j) is then recursively determined for the hidden layers by calculating the weighted sum of the errors at the previous layer: (14.7) The errors are propagated backwards one layer at a time until the input layer is reached, recursively applying the same procedure. The error gradients are then used to adjust the network weights: (14.8) (14.9) where r is the iteration number, wji (r) is the weight from hidden node i or from an input to node j at iteration r, xi is either the output of node i or is an input, β j is an error term for node j, and β is the learning rate or gain item providing the size of step during the gradient descent. The learning rate determines the rate at which the weights are allowed to change at any given presentation. Higher learning rates result in faster convergence, but can result in non-convergence. Slower learning rates produce more reliable results but require increased training time. Generally, to assume rapid convergence, large step sizes which do not lead to oscillations are used. Convergence is sometimes faster if a momentum term is added and weight changes are smoothed by (14.10)
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
169
14.2.5 Training the network In order to train a neural network, inputs to the model are provided, the output is computed, and the interconnection weights are adjusted until the desired output is reached. The number of input, hidden and output nodes (the architecture of the network) used depends upon the particular problem being studied; however, whilst the number of input and output nodes is determined by the input and output variables no well-defined algorithm exists for determining the optimal number of hidden layers and the number of nodes in each. The performance advantage gained from increasing relational complexity by using more hidden nodes must be balanced with maintaining a relatively short training time. The error back-propagation algorithm is employed to train the network, using the mean square error (MSE) over the training samples as the objective function. The data are divided into two sets, one for training and one for testing. 14.2.6 Testing the network How well an ANN performs when presented with new data that did not form part of the training set is known as generalisation. Some of the data are set apart or suppressed when building the training data set. These observations are not fed into the network during training. They are used after training in order to test the network and evaluate its performance on test samples in terms of the mean square error criterion. 14.3 ARTIFICIAL NEURAL NETWORK IMPLEMENTATION In order to account for orographic influences upon the spatial distribution of precipitation falling within an upland catchment, various topographic variables are incorporated in the implementation of the ANN. These variables are obtained from a digital elevation model (DEM) that can also be used within a GIS to derive aspect and slope values for the area covered. The target output values of the ANN are long-term annual average rainfall (LAR), 1961–90. In this study we investigated the problem of predicting rainfall amounts, p, from six input variables, four of which are derived from the DEM, using in-built GIS functions. The variables are denoted x (UK National Grid Easting of rain gauge), y (UK National Grid Northing of rain gauge), e (elevation at the rain gauge), s (angle of slope at the rain gauge), asin (sin value of east-west aspect, at the rain gauge) and a-cos (cos value of north-south aspect, at the rain gauge). The aim was to predict precipitation, p, from knowledge of these variables using a functional relationship of the form (14.11) A four layer network with one input layer, two hidden layers and one output layer was constructed. Each hidden layer had 12 hidden nodes giving a network architecture of 6-12-121. The precipitation data set consisted of 1384 gauges for a 13,000 km2 area of north-west England. This set was randomly divided into two sets each of 692 gauges, one for training and one for testing. As the aim of this chapter is to evaluate the performance of the two models to estimate precipitation amounts in the higher altitude zones based upon predominantly lowland located gauges, the data set was also split into four sets of gauges falling within different altitude zones: Zone A 0–99 m, an area of 3815.35 km2 with 385 rain gauges; Zone B 100–349 m,
170
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
an area of 6505.4 km2 with 780 rain gauges: Zone C 350–599 m, an area of 2364.58 km2 with 204 rain gauges; and Zone D 600–1000 m, an area of 318.17 km2 with 15 rain gauges. These data sets were used in the same manner as the test set in order to facilitate a comparison of the results. Note that normalisation between 0 and 1 formed part of pre-processing of all input and target variables, thus creating similar dynamic ranges, to maximise the modelling process and avoid inter-variable bias. A logistic sigmoid activation function was used and initially the learning rate, β , of the network was set at 0.8 for 20000 epochs, after each set of 20000 epochs the learning rate was reduced by 0.2. This reduction of the learning rate was done to assist convergence toward an optimal solution. Using a high learning rate early on, the aim was to approach the neighbourhood of the optimal solution rapidly, then decrease the learning rate to allow smooth convergence toward the final solution. An epoch is a training cycle, in which all data in the training set are presented to the ANN. The sum squared error (SSE) and mean squared error (MSE) were recorded at every 20000 epochs and finally at 50000 epochs. When training was completed the weights were collected from the training sample to test the network and evaluate its performance on the test data and the altitudinal test sets in terms of SSE, MSE and correlation, as shown in Table 14.1. The real LAR values are plotted against the modelled values (see Figure 14.2). 14.4 KRIGING Kriging is an oprtimal spatial interpolation procedure for estimating the values of a variable at unmeasured points from nearby measurements. Kriging regards the surface to be interpolated as a regionalised variable that has a degree of continuity, i.e. there is a structure of spatial autocorrelation or a dependence between sample data values that decreases with their distance apart. These characteristics of regionalised variables are quantified by the variogram, a function that describes the spatial correlation structure of the data. The variogram is the variance of the difference between data values separated by a distance, h, and is calculated as follows: (14.12) where 2 (h) is the sample estimate of the variogram, h is the distance between data sites, x is a vector in a twocoordinate system providing the spatial location of a data site, Y(x) is the data value at point x, and n is the number of site pairs separated by the distance h. The function β (h) is known as the semi-variogram. The values of this function are actually used in the kriging calculations to estimate the unknown values. The semi-variogram is usually modelled by one of several analytic functions including spherical, circular, exponential, gaussian and linear. The estimate of a data value at an unmeasured point Y is a weighted sum of the available measurements. (14.13) where wi is the weight for measurement Yi, m is the number of measurements, and (14.14) Kriging is the algorithm for determining the weights wi such that the estimate has minimum variance. This is a Lagrangian optimisation problem that requires the solution of a system of linear equations.
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
171
Kriging is considered to be an improvement over other interpolation methods because the degree of interdependence of the sample points is taken into consideration. The kriging estimate is based on the structural characteristics of the point data, which are summarised in the variogram function, and thus result in an optimal unbiased estimate (Delhomme, 1978). Furthermore, kriging provides an estimate of the error and confidence interval for each of the unknown points. Kriging was first developed for use in the mining industry and has subsequently found widespread use in geology and hydrology (it is sometimes referred to as “geostatistics”). Examples of precipitation studies that have utilised kriging include Delfiner and Delhomme (1975), Chua and Bras (1982), Creutin and Obled (1982), Lebel et al. (1987), Dingman et al. (1988) and Garen et al. (1994). 14.5 IMPLEMENTATION OF KRIGING The spatial correlation structure of precipitation is modelled in kriging by the semi-variogram. Each empirical semi-variogram was examined in order to find an adequate semi-variogram to model the residuals, and spherical was chosen. In order to allow a comparative study of ANNs and kriging, the same data sets used for training and testing the ANNs were used to create and test the kriging model. Kriging interpolation was performed using the 692 randomly selected rain gauges and was subsequently tested on the 692 test data set and then the altitude zone sets. The SSE, MSE and correlation were calculated, as shown in Table 14.1. Table 14.1: Evaluation Measures of the Artificial Neural Network and Kriging Models Artificial Neural Network Training Set Testing Set Zone A ZoneB ZoneC ZoneD
SSE 0.27888 2.049794 0.295729 1.460579 0.48589 0.092649
Kriging MSE 0.000403 0.002962 0.000768 0.001873 0.000623 0.006177
CORR 0.98607 0.929808 0.975158 0.925218 0.958466 0.935093
SSE 0 0.609059 0.027621 0.181528 0.232096 0.169811
MSE 0 0.000888 0.0000723 0.000233 0.001143 0.011321
CORR 1 0.979072 0.997631 0.99048 0.983369 0.919882
14.6 RESULTS Performance measures were computed separately for ANN and kriging interpolation of the training data, test data and the altitude validation data, zone A, zone B, zone C and zone D. These measures indicate how well the ANN learned the input patterns and how well kriging interpolated the points and the degree to which each method can be used to predict other rainfall amounts at gauges not included in the training processes. The evaluation measures include comparison of the real precipitation data with those modelled by the ANN and kriging, using the SSE, MSE, correlation between the real LAR values and the modelled LAR values, and real and modelled LAR values plotted against each other. Figure 14.2. provide a visual comparison of the real LAR values (normalised) and the modelled LAR values (normalised) for the training, testing, zone A, zone B, zone C and zone D data sets for each model.
172
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 14.2: Real LAR values (normalised) plotted against the modelled LAR values (normalised) for the training, testing, zone A, zone B, zone C and zone D data
In each figure, perfectly modelled LAR values lie on the line. Figures14.2a and 14.2g show that with both
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
173
models the training data performed more successfully than any of the testing datasets, although this would be expected. The training data for kriging shows a perfect relationship between the actual and modelled values, this is because kriging is an exact interpolation procedure, meaning that the interpolated surface fits exactly through the values of the sample points. The neural network training data (Figure 14.2a), shows that most of the modelled values fit closely to the real values although there are some outliers. In order to cope with such noisy data, more iterations of the neural network are required. Both models display a clustering pattern within the data: it is possible that this is due to the nature of the original data, i.e. 30 years of longterm average data. Any trends or extremes within this period may be smoothed out by the averaging procedure. Comparison of the methods within test data, zone A, zone B and zone C sets shows that kriging provided the best results, with fewer outliers and closer fits to the real LAR values. However in zone D (the highest altitude zone) the neural network outperformed kriging, with a SSE of 0.092649 compared to 0. 169811 and a MSE of 0.006177 compared to 0.011321. Table 14.1 shows that kriging achieves smaller SSE and MSE results with all the data sets except zone D. This is particularly noticeable when comparing the test data set where the neural network achieved a SSE of 2.04794 and MSE of 0.002962, significantly higher than those of Kriging 0.609059 (SSE) and 0.000888 (MSE). The ANN SSE and MSE could be improved with further iterations, although overtraining must be avoided (Bishop, 1994). The correlation coefficient results provide further evidence that kriging has resulted in a closer relationship between the actual and the modelled values, except in zone D. 14.7 CONCLUSION An ANN was developed for the interpolation of precipitation using topographic variables as inputs. Kriging was also used to create a surface using the same data in order to provide a comparison of the performance. The results show that overall kriging provided the most accurate surface except in the higher altitude zone, where points of known data were scarce. These initial results are encouraging, suggesting that neural networks can provide a robust model in situations where data scarcity is an issue. Possible improvements to the ANN could be made by increasing the epochs or the hidden nodes in order to reduce the SSE further, and by using actual precipitation data as oppose to average data. Overall, the study has shown that a global ANN model can be employed successfully to interpolate data points of low density and uneven spatial distribution based upon topographic variables with no a priori knowledge of the multivariate relationships. Further work needs to be undertaken to investigate how well ANN models based upon gauges falling within lower altitude zones can predict values within higher altitude ranges and vice versa. This will provide further evidence about the ability of an ANN to extrapolate results to areas of data scarcity. This study is by no means conclusive as to the ability of an ANN approach to precipitation interpolation and only provides an initial step towards the understanding and evaluation of the role for ANNs in spatial modelling of geophysical processes. Further studies should be carried out using simulated as well as real data, different precipitation data such as annual or monthly totals, seasonal variation of the precipitation data, a selection of several spatial distributions of gauges, stability of the methods versus the scale of analysis (topoclimatology), time series data and various neural network configurations before any conclusive statement can be produced.
174
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
ACKNOWLEDGEMENTS The authors would like to thank north-west Water for supplying the data and funding the project and Bob Abrahart of the Medalus Project, School of Geography, University of Leeds for providing his expertise and time in the development of the ANN. REFERENCES BARROS, A.P. and LETTENMAIER, D.P. 1994. dynamic modeling of orographically induced precipitation, Reviews of Geophysical Union, 32(3), pp 265–284. BENEDKTSSON, J.A., SWAIN, P.H. and ERSOY, O.K. 1990. Neural network appoaches versus statistical methods in classification of multisource remote sensing data in IEEE Transactions on Geoscience and Remote Sensing, 28(4), pp 540–551. BISHOP, C.M.. 1994. Neural networks and their applications, Review of Scientific Instrumentation, 65(6), pp 1803–1832. BLEASDALE, A . and CHAN, Y.K. 1972. orographic influences on the distribution of precipitation, World Meteorological Organisation Publication, 326(326), pp 322–333. CHU A, S.H. and BRAS, R.L. 1982. Optimal estimators of mean areal precipitation in regions of orographic influence, Journal of Hydrology, 57, pp 23–48. CREUTIN, J.D. and OBLED, C. 1982. Objective analysis and mapping techniques for rainfall fields: an objective comparison, Water Resources Research, 18(2), pp 413–431. DELFINER, P. and DELHOMME, J.P. 1975. Optimum interpolation by kriging, in David, J.C. (Ed.) Display and Analysis of Spatial Data. NATO Advanced Study Insititute: John Wiley and Sons, pp 96–114. DELHOMME, J.P. 1978. Kriging in the hydrosciences, Advances in Water Resources, 1(5), pp 251–266. DINGMAN, S.L. 1994. Physical Hydrology. New York: Macmillan. DINGMAN, S.L., SEELY-REYNOLDS, D.M. and REYNOLDS, III R.C. 1988. Application of kriging to estimating mean annual precipitation in a region of orographic influence, Water Resources Bulletin, 24(2), pp 329–399. FISCHER, M.M. 1994a. From conventional to knowledge-based geographic information systems, Computing, Environment and Urban Systems, 18(4), pp 233–242. FISCHER, M.M. 1994b. Expert systems and artificial neural networks for spatial analysis and modelling: Essential components for knowledge-based Geographical Information Systems, Geographical Systems, 1, pp 221–23 5. FISCHER, M.M. and GOPAL, S. 1993. Neurocomputing—a new paradigm for geographic information processing, Environment and Planning A, 25, pp 757–760. FISCHER, M.M. and GOPAL, S. 1994. Artificial neural networks: a new approach to modeling interregional telecommunication flows, Journal of Regional Science, 34(4), pp 503–527. FISCHER, M.M and NIJKAMP, P. (Eds.). 1992. Geographic Information Systems, Spatial Modelling, and Policy Evaluation. Berlin: Springer-Verlag. FOODY, G.M. 1995. Land cover classification by an artificial neural network with ancillary information, International Joumal of Geographical Information Systems, 9(5), pp 527– 542. FRENCH, M.N., KRAJEWSKI, W.F. and CUYKENDALL, R.R. 1992. Rainfall forecasting in space and time using a neural network, Journal of Hydrology, 137, pp 1–31. GAREN, DC., JOHNSON, G.L and HANSON, C.L. 1994. Mean areal precipitation for daily hydrologie modeling in mountainous regions, Water Resources Bulletin, 30(3), pp 481–491. GOODCHILD, M.F. 1991. Progress on the GIS research agenda, in EGIS‘91: Proceedings Second European Conference on Geographical Information Systems, Volume 1. Utrecht: EGIS Foundation, pp 342–350. GUPTA, A.D., GANGOPADHYAY, S., GAUTAM, T.R and ONTA, P.R 1996. Aquifer characterization using an integrated GIS-neural network approach, in Proceedings of HydroGIS 96: Application of Geographic Information
KRIGING AND NEURAL NETS TO MODEL PRECIPITATION
175
Systems in Hydrology and Water Resources Management, Vienna, April. Oxfordshire: IAHS Publication no. 235, pp 513– 519. KOONS, H.C and GORNEY, D.J. 1990. A sunspot maximum prediction using a neural network, EOS Transactions American Geophysical Union, 71(18), pp 677–688. LAM, N.S 1983. Spatial interpolation methods: a review, The American Cartographer, 10(2), pp 129–149. LEBEL, T., BASTIN G., OBLED, C. and CREUTIN, J.D. 1987. On the accuracy of areal rainfall estimation: a case study, Water Resources Research, 23(11), pp 2123–2134. LEE, J., WEGER, R.C., SENGUPTA, S.K and WELCH, RM. 1990. A neural network approach to cloud classification, IEEE Transactions Geoscience and Remote Sensing, 28(5), pp 846–855. LIU, Z.K. and XIAO, J.Y. 1991. Classification of remotely-sensed image data using artificial neural networks, International Joumal of Remote Sensing, 12(11), pp 2433–2438. OPENSHAW, S. 1992a. Some suggestions concerning the development of artificial intelligence tools for spatial modelling and analysis in GIS, in Fischer, M.M and Nijkamp, P. (Eds.) Geographic Information Systems, Spatial Modelling, and Policy Evaluation. Berlin: Springer-Verlag, pp. 17–33. OPENSHAW, S. 1992b. Modelling spatial interaction using a neural net, in Fischer, M.M and Nijkamp, P. (Eds.) Geographic Information Systems, Spatial Modelling, and Policy Evaluation. Berlin: Springer-Verlag, pp. 147–164. RAMAN, H. and SUNILKUMAR, N. 1995. Multivariate modeling of water resources time series using artificial neural networks, Journal ofHydrological Sciences, 40(2), pp. 145– 163. ROGERS, L.L. and DOWLA, F.U. 1994. Optimization of groundwater remediation using artificial neural networks with parallel solute transport modeling, Water Resources Research, 30(2), pp. 457–481. RUMELHART, D.E., HINTON, G.E. and WILLIAMS, RJ. 1986, Learning representations by back-propagating errors, Nature, 323, pp. 533–536.
Chapter Fifteen Time and Space in Network Data Structures for Hydrological Modelling Vit Vozenilek
15.1 INTRODUCTION In terms of GIS the river networks are systems of connected linear features that form a framework through which water flows. The flow of any resource in the network depends on more than the linear conduits of movement. Movement between connecting lines is affected by other elements and their characteristics in the network. It is possible to carry out some hydrological modelling directly within GIS systems, so long as time variability is not needed. The way of eliminating time as a variable is to take a snapshot at the peak flow condition and model that by assuming the discharge is at its peak value throughout the system (Maidment, 1993). It is thus possible to route water through GIS networks using analogies to traffic flow routing in which each arc is assigned an impedance measured by flow time or distance and flow is accumulated going downstream through the network. Extensions to GIS are needed to put the implementation of network analysis into practice. Some GIS can offer such tools. For example, PC NETWORK from ESRI provides the flexibility to support the requirements of a broad range of applications. Hydrologists can simulate sources of water pollution and analyse their downstream effects. Networks can also be used to model storm runoff. This chapter explores the use of GIS for this purpose showing how such tools can be used to represent the complexity of a river network. 15.2 SPACE AND TIME OF THE HYDROLOGICAL PHENOMENA IN RIVER NETWORK SYSTEMS Most hydrological phenomena are linked to the river networks. These networks are results of long-term endogenic and exogenic landscape processes. They are determined by many aspects including parameters of earth surface, soil and rock characteristics, climatic conditions and others. There are many geographical, hydrological and ecological topics which can be investigated in the river network structure—for example flood prediction, water pollution management or water quality assessment. Transport processes are characterised by rate of advection, dispersion and transformation of the transported constituent. Advection refers to the passive motion of a transported constituent at the same velocity as the flow. This is the simplest motion that one can conceive, but it is a reasonable approximation, particularly in groundwater flow, where this approximation is used to determine how long it will take leakage from a
TIME AND SPACE IN NETWORK DATA STRUCTURES
177
contaminant source to flow through the groundwater system to a strategic point, such as a drinking water well, a lake or a river. Dispersion refers to the spreading of the constituent because of mixing in the water as it flows along. In surface water flow, dispersion occurs because of eddy motion mixing masses of water containing different concentrations of the constituent. The transformation of constituents by degradation, absorption or chemical reaction can change the concentration from that predicted by the advectiondispersion equation, usually resulting in decreased concentrations as the constituent leaves the solution (Goodchild et al., 1993). Each application of network analyses requires special data structure, models and set of algorithms (Bishr and Radvan, 1995). All of these needs involve accurate understanding of space and time of the river networks with their full complexity. The model creation is determined by data structure and gives basic features of algorithms. 15.2.1 SPACE It seems theoretically possible that one-dimensional and two-dimensional steady flow computations could be done explicitly based on a GIS database of river channels (for ID flow) or shallow lakes and estuaries or aquifer systems (for 2D flow) (Maidment, 1993). That is not done in commonly used GIS systems at this time, however. Incorporating these flow descriptions for groundwater models within GIS would best be done using the analytical solutions which exist for many different types of groundwater configurations. The motion of water is a truly 3D phenomenon and all attempts to approximate that motion in a space of lesser dimensions are just that—approximations. The network understanding in terms of space involves more than network morphology. To apply all complex natural phenomena to the scope of river network analyses mere are several levels of dimensionality (see Figure 15.1): The first level—3D+T space—describes all natural phenomena in their real nature. They can be defined by the function well-known as interpretation of 4D space: (15.1) where x,y, and z are the spatial coordinates and t is the time parameter. The second level—2D+T space—is derived from 3D+T space to simplify phenomena in network analyses. This simplification goes to meet the data structures. The 2D+T level is the derivation of the function f. The description can be expressed by function f: (15.2) where x and y are spatial coordinates and t is time parameter, identical to 3D+T space. The example is map representation of spreading rain or any resource event. The acronym expresses two spatial and one temporal axes and is more precise than the acronym 3D. The third level—1D+T space—is the final simplification to simulate the movement of investigated material in the river. It is derived from above levels. The description can be expressed by function f’: (15.3) where x is spatial coordinate and t is time parameter, identical to 3D+T and 2D+T spaces. The example is water flow in the river bed. The acronym expresses one spatial and one temporal axes and is more precise than the acronym 2D.
178
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 15.1: Three Levels of Dimensionality
15.2.2 Time Hydrological phenomena are driven by rainfall and are thus always time dependent, even though by taking snapshots at particular points in time or by time averaging over long periods, a steady-state model can be created (Bonhan-Carter, 1996). Time is the basic parameter in the concept of dynamic modelling. Space can be stretched but time must to stay continuous. It is possible to split a continuous time into a set of intervals only. This simplification allows dynamic modelling with available data structures. The temporal items in the structures are needed to simulate real processes in time:
TIME AND SPACE IN NETWORK DATA STRUCTURES
179
• Time Initial—defines the initial time of the phenomena state. It is calculated from 2D+T spaced models. It is used for calculation of any temporal parameters of phenomena, • F-node Time—defines the time of phenomena state in F-node point of any network link, • T-node Time—defines the time of phenomena state in T-node point of any network link, • Demand Time—defines time associated with any feature and can be specified for every arc and stop in the network. The basic assumption can accept the intervals limited by the following times: t0—the initial situation when the resource event starts; the primary calculations establish starting state S0, t1—calculations of state S1 according to distribution and intensity of resource event and calculation of Tnode time, t2—calculations of state S2 according to distribution and intensity of resource event and calculation of Tnode time, t3—calculations of state S3 according to distribution and intensity of resource event and calculation of Tnode time, etc. The general aspects of space and time in the river network analyses can be described as follows: 1. The initial time can vary with the direction and speed of resource event (rain, pollution, erosion). 2. The initial time must be stable for each link in the network. 3. The variables of the phenomena investigated include : • the state parameters of river network links (slope, roughness, depth, bank regulation, width and crosssection of river bed, etc.), • the morphology of the river network (length of links, number of confluences and their radius etc.), • the spatial and temporal changes of resource event (rain can hit a part of catchment only, rain can stop and then start again, rain can have varying intensity, etc.), • the morphology of surroundings of the streams (fields at the banks steep and/orflat, rough and/or covered by grass, large and/or small, etc.). 15.2.3 Models Network analysis simulating mass movement in the network requires exactly defined paths in the river network (Vozenilek, 1994a). They can be derived as a stream order in the river system and their direction from spring to confluence can be reached by changing the link directions. The simplest analysis is a calculation of the time taken for water to flow from one point to another in the river network. Greater problems can arise in the calculation of water volume at confluences. To create a model for this task the algorithm involves two steps: 1. Identification of path in river network from springs to confluence and classification of the river order (by Strahler) (Haggett and Chorley, 1969). 2. Calculation of water volumes for each path, starting with the tributaries of the highest order up to the main river; after each path calculation the counted water volume is added to the path of the highest order in confluence.
180
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
The simulation of floods can be carried out in two steps: 1. Definition of the base-flow based on water inputs into streams from springs. 2 . Evaluation of resource event and its adding into network as resource demand of links. The algorithm for simulating the initial state of the river network has the following requirements: 1. Yield of spring requires data for selected points. 2. Time increase for each link (time in which water flows from F-node to T-node and it is calculated according to attributes items (slope, density, etc.) implementation of a DEM with other layers and corresponding equations. 3. Water volume at T-node is water volume at F-node or F-node is adding several T-nodes (in terms of water volumes). If rain is considered in the analysis there is the need to add water from the surrounding slopes into these calculations. 15.3 NETWORK DATA STRUCTURES The common data structures for linking GIS and hydrological modelling must involve the representation of the land surface and channel network. It is suggested that GIS possess six fundamental data structures, including three basic structures (point, line and polygons) and three derived structures (grid, triangulated irregular network, and network). Each of this data structures is well understood in hydrological modelling, where they have been used for many years as a basis for flow modelling. It is clear that computation is more efficient if it can rely on a single set of data structures instead of having data passed back and forth between different data structures (Joao et al., 1990). There are specialised products designed to provide network modelling and address matching capabilities. They allow users to simulate all the elements, characteristics and functions as they appear in the real world. They also provide geocoding capabilities which the user can use to take information stored in tabular address files and add it to the project for subsequent use in analytical application. In the research described in this chapter the PC Arc/Info Network module was used with the ROUTE programme utilised to determined optimal paths for the movement of water through the network. There are several general elements in network data structures, such as PC Arc/Info Network, to describe the real world objects. • Links are the conduits for movement. Their attributes simulate flow (two-way impedance) and water amount (demand). For the river network the two-way impedance is only used one way—downstream. Links represent stream and channels in river networks. Barriers prevent movement between links. The resources cannot flow through them. They have no attributes. • Barriers can represent facilities for flow prevention on streams or channels. • Turns are all possible turns at an intersection. Flow can occur from one link through the node, and onto another link. Their attributes defined the only way of flow—downstream direction. The other turns are defined as closed. Turns represent all confluences in the river network.
TIME AND SPACE IN NETWORK DATA STRUCTURES
181
• Stops are locations on a route to pick up or drop off resources along paths through the network. They are used to represent wells, springs, water consumers, factories, point pollution sources or any facilities which have a specific volume of water for distribution. Their attributes define adding or losing water volumes. Stops are used only in the ROUTE procedure. Attributes associated with network elements are represented by items defined according to their data types and values (Vozenilek 1994b). Most of the network elements have one or more characteristics that are an important part of the network. Impedance measures resistance to movement. Impedances are attributes of arcs and turns. The amount of impedance depends upon a number of factors, such as the character of the arc (e.g. roughness, slope, type of channel), types of resources moving through the network, direction the resources are flowing, special conditions along an arc, etc. Turns also have impedance attributes. They measure the amount of resistance to move from one arc, through a node, onto another arc. Turn impedances will vary according to the conditions at the intersection of the two arcs. All arc and turn impedances in a network must always be in the same units for accurate calculations. The purpose of impedances is to simulate the variable conditions along lines and turns in real networks. Negative impedance values in ROUTE block movement along any arc or turn. Negative values can also be used to prevent travel along arcs in a certain direction. Resource demand, the number or amount of resources associated with any feature, can be specified for every arc and stop in the network. Resource demand is the amount of water carried by each arc in a water distribution system (Vozenilek, 1995). To simulate runoff in a river network the allocate procedure can be used. The allocate procedure allows users to model the distribution of resources from one or more centres. Geographers can model the flow of resources to and from a centre. Allocation is the process of assigning links in the network to the closest centre. The use of network data structures and analysis is wide. Water companies can record data about wells, streams, reservoirs, channels, and so on, and use the data to model water availability and distribution. Hydrologists can simulate sources of water pollution and analyse their downstream effects. Networks can also be used to model storm runoff. Forestry departments will find network analysis useful in management studies, such as testing the feasibility of logging system transportation plans. Wildlife managers can use networks to assess the potential environmental impacts on wildlife migration routes. The following section describes some of the applications of the methodology outlined above and identifies key research issues. 15.4 CASE STUDIES The ability to recognise and quantify spatial structures is an essential component of river system management. A number of approaches may be taken to establish the database depending on where each catchment data set is to be located and who would require access to it. Providing a powerful system in order to deal with a huge amount of data derived from different types and sources is a significant opportunity for GIS (Kanok, 1995). There is a need to see GIS as more integrated and oriented towards an information provision process, away from primary data collection and documentation. Three study areas were used to test practically the approach mentioned above. During this process, several GIS related questions emerged with respect to:
182
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 15.2: The Network Elements in the Smallest Scale Catchment
• • • • • • • •
building a digital spatial database including all information from the research, investigating the possibility of integrating a knowledge base, model-base and GIS, evaluating time in the model of the river system, replacing manual work by digital data processing, modelling the river network, modelling the monitoring outcomes, visualising the results, selecting the best management practice to improve the ecological status of the river system. 15.4.1 Initial Model
The lowest scale to test the approach was created as a simple unreal catchment (Figure 15.2). The basic network elements were defined and the algorithms implemented. The results were used for model calibration and verification. The relation between the form of a stream channel network and the character of throughput flows is a complex one and in particular the distinction between cause and effect is very much a function of time. Inputs of precipitation into a stream network produce two types of output: 1. The discharge of water and debris. This discharge is a relatively immediate output, clearly related to a given input, and the prime concern of those who exploit the short-term operation of river systems, and 2. Changes in the channel network itself. These changes in many aspects of channel geometry are the natural result of certain magnitudes and frequency of runoff and discharge input. The larger the time interval
TIME AND SPACE IN NETWORK DATA STRUCTURES
183
under consideration the more important are these changes, either due to cumulative effects or to the increasing probability of encountering extreme events of high energy. Thus some attributes of the hydraulic geometry of self-formed channels (e.g. bankfull hydraulic radius) reflect events of relatively short recurrence interval (e.g. the most probable annual flood discharge), whereas other network attributes (e.g. the areal drainage network density) are more the product of events of larger recurrence intervals. The data were collected following a classification scheme which defined the features as either points, lines or areas. The data collection becomes more cost effective as it concentrates first on key elements of the river system. Only if further refinements are needed, can additional data be collected. Field work is supported by spatial hypotheses, making the selection of representative sites more objective. Data for network analyses was collected in different ways including digitising topographical and thematic maps, aerial processing photographs, field direct and indirect measurements, results of spatial analyses, and results from laboratory analyses. The practical value of measurements is determined by the degree to which they help understanding the causative basis of river system dynamics. Ultimately, nature research will be needed to dislocate the actual use of measurements and their importance in river management. 15.4.2 The Trkmanka catchment For a demonstration of the capabilities of network analysis the Trkmanka catchment was used as study area (see Figure 15.3). The Trkmanka river is a left tributary of the Dyje river in the south-eastern part of the Czech Republic. The data were obtained from topographical maps 1:50000 scale (digitised river network) and from field research (meteorological and hydrological stations). The calculation of runoff was chosen as the process to be simulated. This simple task gives immediate results which are presented in maps. To use the project outputs efficiently it is recommended that the implementation of this system be demonstrated at a larger scale. This has implications in respect to the collection of data to create a complex regional database applicable for different environmental applications, the preparation of inputs for the simulation packages, and the analysis of the data to infer the best management practices. Database The topographical and environmental databases have been organised mainly from 1:50 000 scale maps. The integrated information was included from historical series or planning data with geographical references (water flows, rainfalls, evaporation data etc.) to non-serial and georeferenced data (Vozenilek, 1994c). Database generalisation was taken as a more general process than normally conceived by traditional cartographers (Vozenilek, 1995). The generalisation involved: • • • •
filtering features of specified regions where the phenomenon under study is continuous in nature, selecting an appropriate areal base or framework among a very large number of alternatives, reducing the number of areal units when data are aggregated to a higher level in a spatial hierarchy, modifying the initially recorded position, magnitude, shape or nature of discrete geographical objects,or changing their relationships with other discrete objects, • changing the measurement scale in the attribute data on aggregation.
184
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 15.3: The Trkmanka Catchment
The generalisation of vector data consists of generalisation of both attribute data and spatial data. Table 15.1 presents terminology of generalisation procedures according to McMaster (1989), Brassel and Wiebel (1988) and Tobler (1989). Table 15.1: The Types of Generalisation Attribute data
Thematic data
Spatial data
Temporal data Punctiform data
Linear data
Areal data
Surface data
Flow data
Classification Symbolisation Statistical means Omit Aggregate Displace Simplify Smooth Displace Merge Enhance Omit Amalgamate Collapse Displace Smooth Enhance Simplify Amalgamate Omit
TIME AND SPACE IN NETWORK DATA STRUCTURES
185
Figure 15.4: The Morava River System
Generalisation can dramatically affect the quality of results derived from a GIS. In addition, GIS analysis can become difficult if the data sets do not match with old ones. The impact of generalisation will influence some aspects of the project (Joao et al., 1990): Differences in measured lengths, areas, volumes and densities. The length of mapped features (such as rivers) increases with increasing map scale (Steinhaus’ paradox). Similar effects occur in relation to area and surface data. Shifts of geometric features. Massive shifts of objects occur between different representations. Geological boundaries were digitised at 1:50 000 scale and then replayed precisely at the smaller scale to match the manually generalised topographical base map. Generally the location of spatial means or centroids will be also affected (affecting in turn inter-zonal distance calculations); statistical errors generated in the resulting two-dimensional coverage will occur; and topological errors generated will inadvertently lead to a quite different three-dimensional surface model. The problem of “sliver polygons“ occurred frequently. Statistically related problems. These problems occurred in handling data extracted from large national databases. Two different cases are involved: the effect of zonal “reporting units” at different levels of spatial aggregation and the effects of different arrangements of boundaries for a single level of aggregation. 15.4.3 The Morava river system The ecological research of 4,067 km of the Morava River system shows that only one third is in a healthy state, while the remaining parts are unacceptable from an environmental perspective. A whole 1,086 km can be considered to be an ecological catastrophe. The basis for this classification is the “ecological value”, The analyses of the different factors show that the character (landscape type) of the flood-plain has the strongest impact on the ecological status of the system. Where forest is located along the river, the ecological status of the ecosystem is good, however, in the case of arable land, the river is usually in a very poor condition. The Trkmanka catchment is a part of the Morava River system (see Figure 15.4).
186
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 15.5: The Relation Between Total Ecological Value and Altitude
The project was originally designed as a large interdisciplinary study. Because of the huge amount of information involved the project provides a suitable platform for GIS implementation. The project is divided into eight sub projects: • • • • • • • •
quality water monitoring; point pollution sources; non-point pollution sources; sources of drinking water; strategy of using water related to water protection; evaluation and water quality models; ecological status of the rivers; synthesis and regulation proposal..
These sub-projects required the collection of vast quantities of data including: the collection of water quality samples from the Morava River and its main tributaries to analyse the pollutants, observation of paper and wood companies and their waste activities, observation of the impact of the sugar-beet season, update of point sources of pollution in agriculture, analysis of organic and inorganic contents of fluvial sediments, monitoring of soil erosion, USLE implementation and the calculation of the potential soil erosion, investigation of sources of drinking water and identification of contaminated water, observation of the river system with regard to the relationship between water economy and discharge, and global assessment of water quality and ecological values. Analysis
The major output of the project can be described in terms of maps, tables, reports and photos (see, for example, Figures 15.5 and 15.6). The main analysis of the data was carried out using GIS techniques on spatial environmental databases. Some point features and non-spatial data however were analysed by FoxBase+ database software. A conversion of these data to ARC/INFO was carried out to produce maps. Spatial statistical modelling has been done as a tool to estimate variable values at specific locations. Statistical analysis of the basic data was carried out parallel to data processing, e.g. evaluation of water time
TIME AND SPACE IN NETWORK DATA STRUCTURES
187
Figure 15.6: The Relation Between Total Ecological Value (EV) and Landscape Type (LT).
series data, creation of results which have been stored as single attributes in the GIS database. An essential part of the assessment procedure of the abiotic aspects was the application of a ground water model in combination with GIS capabilities. The main objectives of the analysis were: • assessment of regional and local conditions (surface, soil, land use, etc.); • assessment, modelling and simulation of water movement in the river system in relation to soil, land cover and agricultural practices; • development of procedures to optimise the use of limited water resources and their range of natural conditions and to provide guidelines for their management. The developed techniques can be applicable for a wide range of conditions. The potential impacts caused by river engineering works on the flood-plain ecosystems were evaluated by “ecological impact assessment” and “ecological risk assessment” procedures. Environmental models are increasingly becoming more spatial and more quantitative enabling a more precise assessment of the landscape. In addition, the improvements in advanced dynamic modelling, for example, using data types built on video compression techniques to reduce storage space and processing time, will allow for further refinement in resolution. At the time of writing the project was not completed yet, but it is expected that it will provi#de considerable input to environmental decision-making in the area. 15.5 CONCLUSIONS Environmental studies usually imply the knowledge of continuous data but generally environmental data are measured at specific locations. Some goals of the project were to compare the spatial trends in order to access the validity of techniques used for the estimation of the ecological status of the Morava River. A GIS played and will play a significant role towards the new management paradigm of river systems. However, the power of GIS as a decision-making tool can be reduced if the accuracy of the results cannot be controlled, independently of the type of users. User experience of such shortcomings seems likely to result
188
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
in a backslash against the use of GIS-based methods and even against quantitatively-based planning tools in general. Further research needs to focus on more complicated phenomena in the river networks. An example is that of fish movement in rivers as fish can move against the stream and bring diseases or pollution both downward and upward. Another is the movement of tides and water in marshlands as discussed in the next chapter of this Volume. From the perspective of this project, the experience gained so far indicates the need for an extension of procedures to properly account for additional aspects (municipal water management, water logging, deep percolation, alkalisation etc.), extensions of the models to manage a complete irrigation system with reservoirs, canals, hydroelectric power generation, verification of the techniques through comparison with controlled field experiments, and improvement of the irrigation efficiency through reduction of seepage, deep percolation and similar phenomena. A tall research agenda the findings of which will undoubtedly have very positive applications, particularly in areas environmentally distressed such as that of the Morava. REFERENCES BISHR, Y.A., and RADVAN, M.M. 1995. Object orientated concept to implement the interaction between watershed components that influences the selection of the best management practice, in Proceeding of the First Joint European Conference and Exhibition on Geographical Information, Hague, pp. 265–278. BONHAM-CARTER, G.F. 1996. Geographical Information Systems for Geoscientists. London: Pergamon Press. BRASSEL, K., and WIEBEL, R. 1988. A review and framework of automated map generalisation, International Journal of Geographical Information Systems, 3, pp. 38–42. GOODCHILD, M.F., PARKS, B.O., and STEYAERT, L.T. (Eds.) 1993. Environmental Modeling with GIS. Oxford: University Press. HAGGETT, P., CHORLEY, R.J. 1996, Network Analysis in Geography. London: Edward Arnold. JOAO, E., RHIND, D. and KELK, B. 1990. Generalisation and GIS databases, in Harts, J., Ottens, H. and Scholten, H. (Eds.), Proceeding of the First European Geographical Information Systems Conference, Amsterdam, 10–13 April. Utrecht: EGIS Foundation, pp. 368–381. KANOK, J. 1995. Die Farbenuswahl bei der Bildung von Urheberoriginalen der Thematischen Karten in der Computer, in Acta facultatis rerum naturalium Universitas Ostraviensis. Ostrava: University of Ostrava pp. 21–31. MAIDMENT, D.R. 1993. GIS and hydrologie modelling, in Goodchild, M.F., Parks, B.O., and Steyaert, L.T. (Eds.), Environmental Modeling with GIS. Oxford: Oxford University Press, pp. 147–167. McMASTER, R. 1989. Introduction to numerical generalization in cartography, Cartographia, Monograph 40, 26(1), pp. 241. TOBLER, W. 1989. Frame independent spatial analysis. In Goodchild, M. and Gopal, S. (Eds.), Accuracy of spatial data bases. London: Taylor & Francis, pp. 56–80. VOZENILEK, V. 1994a. Computer models in geography, in Acta Universitatis Palackianae Olomoucensis, Faculatas Rerum Naturalium 118, Geographica-Geologica 33, pp. 59–64. VOZENILEK, V. 1994b. Data format and data sources for hydrological modelling, in Proceedings of Regional Conference of International Geographical Union, Prague, pp. 256–261. VOZENILEK, V. 1994c. From topographic map to terrain and hydrological digital data: an arc/info approach, m Acta Universitatis Palackianae Olomoucensis, Faculatas Rerum Naturalium 118, Geographica-Geologica 33, pp. 83–92. VOZENILEK, V. 1995. Environmental databases in GIS, in GeoForum, 2, Bratislava, p. 47 (in Czech).
Chapter Sixteen Approach to Dynamic Wetland Modelling in GIS Carsten Bjornsson
16.1 INTRODUCTION Developing wetlands to improve water quality involves the analysis of existing hydrological conditions, which are often characterised by time and space variations and discrete sample points. To bring about spatial continuity, hydrologic and water quality models has been developed and implemented using GIS. In respect to spatial distribution these GIS models are often based on one-dimensional network algorithms or surfaces with a many-to-one spatial relationship and some level of accumulation of flows. Some hydrologic movements in space are described by a many-to-many relationship characterised by types of behaviour like dispersion, diffusion, etc. where movement is multidirectional and the object in focus is spreading or diving itself. This chapter suggests a different approach to model multidirectional movements in GIS using existing programming tools within commercial raster GIS. 16.1.1 Problems with the Sustainability of Wetlands In 1932 Beadle was studying water quality in African lakes and in particular the Chambura Payrus Swamp. He noted that water quality was much better in the lake where the inflow passed a wetland than in an adjacent lake without a wetland. Based on their biochemical cycles, wetlands can be introduced for remediation purposes to improve water quality in streams and rivers acting like wastewater treatment of nutrients and other chemicals. In principle there are three types of wetlands being used for these purposes: natural wetlands, enhanced wetlands and constructed wetlands. Natural wetlands are wetlands which have not been altered or restored to a previous state. Enhanced wetlands are also natural but somehow have been modified or altered to fulfil one or more purposes. Constructed wetlands are wetlands on sites with no previous recorded wetland data. Wetlands are, no matter of what type, balanced ecosystems and when misplaced they may have opposite effects with either release of remediants or of substances which degenerate the wetland (Mitsch, 1993). Wetlands are often characterised by the presence of water, either at the surface or within the root zone, unique soil conditions that differ from adjacent upland, and vegetation adapted to the wet conditions and conversely are characterised by the absence of flooding-intolerant vegetation. Misplacement is often the result of lack of knowledge in locating and analysing site conditions. It is thus of the highest importance to identify areas within the watershed which have the potential for long range efficiency (Kuslar and Kentula, 1990). Efficiency is primarily dependent on hydrology (Hedin et al., 1994; Stabaskraba et al., 1988). Hydrological, erosion, environmental and ecological landscape models have been
190
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
used for many years to study problems related to water quality and determine measures to be taken to reduce the load of contaminants. Models in general have the advantage of describing the relationships between processes thereby testing simulation of system reaction to different management strategies as well as scientific hypotheses. At the same time models offer the possibility of understanding the processes involved and help identify areas where further knowledge is needed or data are sparse (Jørgensen, 1988). Two different approaches have been taken in modelling water quality and wetlands. The first involves ecosystem modelling where mathematical models have been used to simulate wetland remediation performance (Mitsch, 1990). These types of models are lumped models in a spatial context. The second approach involves models handling data in a spatial domain either as surfaces or networks. In recent years the latter two have been integrated to some extent with GIS (Maidment, 1993; Moore et al., 1993). 16.2 MODELS IN A LUMPED DOMAIN Wetland models are based on the philosophy of ecosystem modelling which highlights interaction between a set of variables within an ecosystem (Sklar and Constanza, 1990; Stabaskraba et al., 1988). They include: external variables which impose changes on the system, state variables which describes the state of the system, parameters describing some changeable numeric values and constants describing global values (Jørgensen, 1988). The interaction between these components describes the rate of change (dynamics) and is expressed through a condensed set of differential equations, thus making these types of models mathematical models (Jørgensen, 1988; Sklar et al., 1994). The rate of change, also termed fundamental equation of the model can be expressed as: (16.1) in which the state variable x is projected a small time step β t ahead with the r being the rate of change proportional to β t. The (*) denotes that r may depend on various quantities like time, external variables, other state variables and feedback mechanisms from x. Expressed only with one state variable, the previous equation becomes: (16.2) which states that x is not affected by other state variables. This type of equations, is often stated as an ordinary differential equation of first order. More complex systems with interactions from several state variables are described through a set of differential equations with each equation for each state variable. For a system described by two state variables x(t) and y(t) the fundamental equations can be expressed by: (16.3) Today these differential equations as well as much more complex ones are solved using numeric integration techniques often implemented in modelling programs like SYSL, STELLAR, EXTEND, etc. or programmed by the user in either C, Pascal, or Fortran. An often implemented numeric integration technique for solving differential equations is the Runge-Kutta fourth order, which approximates r t+h through a repeated stepwise approximation from r at time t. The result is not a continues function r(t) but a discrete set of approximations of points (r0 r1 r2,….) in time (t0, t1 t2,….) (Knudsen and Feldberg, 1994), obtained through a set of iterations for each time step (Chow et al., 1988). Every equation describing the system is thus solved for each iteration before advancing to the next. The number of iterations are either controlled by a user defined constant or through some minimum rate of change. One advantage using Runge-Kutta is the possibility of estimating the numerical errors due either to rounding errors or truncation errors for each time step. In GIS no immediate algorithms exist to perform numeric integration and solve a
DYNAMIC WETLAND MODELLING IN GIS
191
set of differential equations. Equations for different state variables formulated as (16.2) can be solved by algebra as long as the model is linear and projected forward in time. This procedure simplifies calculations of the fundamental equations but creates some obvious errors since each equation has to be solved explicitly and serves as input to the next fundamental equation within the same time step. 16.2.1 Spatial modelling approaches A spatial model is a model in a bi-space (space, attribute) and a space-time model is a model in a tri-space (space, time, attribute) (Chapter 10, this volume). Wetland and water quality models have the advantage of GIS to implement spatial models. This implementation can broadly be divided into two approaches (Fedra, 1993). The first approach uses the analytical tools of GIS to extract hydrologie parameters (Moore et al., 1993) and export these to a model outside GIS. This model reads and processes input data and produces some output which is then routed back for display in GIS. The models used outside GIS have a structure coherent to lumped ecosystem models. The second approach involves spatial modelling within GIS where tools for spatial data handling combined with analytical capabilities make up the spatial model. The latter type of models in GIS are at least characterised by one of five items in their modelling structure: 1. The processes modelled are often expressed in spatially or averaged terms on the computational level (Vieux, 1991). In raster GIS this involves models like AGNPS (Young et al., 1987) which uses simplistic algebraic relations and indexing like USLE (Wischmeier and Smith, 1978) and the SCS curve number (Chow, 1990) to calculate erosion and runoff from each cell. In network GIS routing of water based on Djikstra algorithm has been used to determine steady flows through pipes and channels using simple linear equations (Maidment, 1993). The reason for this is the inadequacy of present GIS to perform numerical solutions. 2. Hydrologie conditions simulated in watershed management models are often expressed as overland flow due to their implementation towards management of flooding, erosion and sediment discharge. Models like AGNPS, ANSWER (Beasley et al., 1980), ARMSHED (Hodge et al., 1988) and other runoff models used for estimation of runoff (Nageshwar et al., 1992; Stuebe and Johnston, 1990) assume that stream flow are generated during rainfall where soil becomes saturated which leads to Hortonian overland flow running to a stream (Maidment, 1993). In Denmark overland flows of this type only occur in places with impenetrable surfaces, during thunderstorms, or periods with frozen soil. Most areas contributing to overland flow are located near and around streams where saturated conditions occasionally exist. In GIS there are no algorithms to calculate this type of partial area flow (Maidment, 1993). 3. Temporal domains are often handled through a single event phenomenon where an event is released in one instance and a snapshots is taken showing only peak conditions throughout the watershed, for example the distribution of precipitation (Maidment, 1993). Water quality models like ARMSED and ANSWER are based on the single event approach calculating the routing of surface water to estimate loads of sediment. Algorithms to describe pulses—distribution of flow over a surface as well as subsurface, are not yet implemented in GIS. 4. Existing algorithms in raster GIS for routing surface water are based on the pour-point model (Jenson, 1991). Through focal analysis, the flow is directed towards the lowest lying cell of the eight adjacent cells. The spatial relationship is a many-to-one. This structure can be used for watershed delineation as well as tracing flow paths. Using this approach in a DEM with no obstacles might be satisfactory but with
192
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
obstacles, like hedgerows or boundaries between farming fields, surface flow follows different patterns. Other phenomena like waves, dispersion, and diffusion also follow other flow patterns. An algorithm based on Darcy’s Flux is implemented in a commercial raster GIS, like Arc/Info to model dispersion (ESRI, 1996). This algorithm supports a spatial relation of 1 to 4 to adjacent cells where diagonals are left out. 5. Flows described in network are modelled as 1D lines and routed along these lines having no interactions with surrounding zones (Chapter 10, this volume). Thus describing inundation, partial area flow, or flooding creates problems in network GIS. 16.2.2 Spatial modelling and dynamic simulation A third and very promising type of wetland model is spatial models based on a merge between ecosystem models and distributed process thinking—landscape process models (Sklar and Constanza, 1990). Landscape process models are spatial models which divide the landscape into some geometric compartments and then describe flows within compartments and flows between compartments according to location-specific algorithms. Flows in this type of model can be multidirectional, determined by the interaction of variables thus allowing for feedback mechanisms to occur. This type of model is dynamic and time can be treated either as a discrete entity or continuously (Chapter 10, this volume; Maidment, 1993). Models like the CELSS simulate movements of nutrients between 1 km2 squares in a marsh land. Each square (cell) consists of a dynamic non-linear simulation model with eight state variables. Each cell exchanges water and materials with each of its adjacent cells. This connectivity is a function of landscape characteristics like habitat type, drainage density, waterway orientation, and levee height. In a spatial context rates of change between state variables are transformed into movements where the same object occupies different positions in space at different times (Galton, 1996). 16.3 A FRAMEWORK FOR SPATIAL MODELLING According to Galton (1996), the definition of movements involves a definition of space, time, and position as well as object: • Space may have up to three dimensions. Three dimensions represents a volume with the boundary being a surface. Two dimensions is a surface also called an area or a region with edges as its boundaries. An edge is one dimensional where its extension is an arc or length which consists of a pair of points. • Time is characterised by duration and direction where a certain duration can be defined as an interval bounded by instants. An interval is clearly defined by instants and marks a stretch of time. Direction is often obtained by formally imposing an ordering relation to sets of time often expressed in linear form. However a third issue is important when defining time. This has to do with granularity which means the temporal resolving power. Granularity can be addressed in either of two ways. One is to work with discrete time where temporal duration is articulated into a set of minimal steps. The other is dense time where steps can be arbitrarily subdivided into any interval, say between two instants a third can be found, etc., aiming towards infinite subdivisions. • An object is anything that it makes sense to ascribe movement to. An object can be rigid, maintaining the shape and size at all times, or non-rigid and change shape, size, or position.
DYNAMIC WETLAND MODELLING IN GIS
193
Figure 16.1: Commercial GIS addressing of immediate Neighbourhood.
• The position of an object has been formulated as the total region of space occupied by it at a time, thereby making the congruence of position and object. It can either be ‘exact’ position or defined by proximity or contiguity. Movement can thus be defined at each time t by mapping the position of a body at time t. Working with this definition t has to be defined as well as the type of position. Movement is continuous if to get from one position to another it has to pass through a set of intermediate positions making no sudden jumps from one place to another. In a discrete space positions are neighbours where the object can move directly from one position to another. Such a space is called quasi-continuous where the rule is that objects can only move directly between neighbouring positions. 16.4 IMPLEMENTATION WITHIN GIS Introducing the above framework for movements together with landscape process thinking may eliminate many of the current constraints in GIS. Using existing tools in GIS interaction matrices can be implemented to describe movement by defining objects, time, space, and position. In raster GIS matrices are defined by rows and columns where each ‘cell’ in the matrix has one attribute besides row and column number stored in a data model (Hansen, 1993). Using map algebra different matrices can be modelled together using either local, focal, zonal, or global approaches (Tomlin, 1990). Using focal analysis these techniques calculate a cell value as some function of its immediate neighbourhood corresponding to the definition of movement. Focal analysis tools further allow handling subsets of immediate neighbourhoods. In commercial GIS addressing of immediate neighbourhood occurs in either of three procedures (ESRI, 1996): • Most widely used has been functions calculating the directions of flows based on bit values indicating directions of steepest flow. Bit values range from 1 to 128 where each of the eight surrounding cells has its own bit value. The cell will get a value corresponding to the bitvalue of the lowest laying cell. Many cells can point into one cell but one cell can only point to one cell (see Figure 16.1, diagram 1). • The other type of flow processing is calculating the number of cells that will flow to the target cell. The cell will get a bit value indicating which cells flow to the target cell. To identify these cells involves additional application programming. Vice-versa it is possible to identify adjacent cells addressed by the cell giving the ability for a many-to-many relationship. If a focal flow value is calculated to 5 only the two cells with values 1 and 4 are flowing into the cell whereas the rest of the cells will flow from the cell (see Figure 16.1, Diagram 2). • A much less recognised approach but with greater possibilities is the use of individual cell addressing. Using this technique involves application programming within GIS where a cell can be programmed to address each of its surrounding neighbours using a reference relative to itself. This reference consists of two co-ordinates where the cell value will be a function of the cell addressed (see Figure 16.1, Diagram 3).
194
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 16.2: Movements in GIS can be described as focal interaction, focal—and local relations
In describing movements the last two approaches seems feasible. Two immediate advantages of such approaches are: the possibility of using GIS’s map-algebraic capabilities in conjunction with focal cell modelling; and no data transfer has to occur. 16.5 MOVEMENT IN GIS We have shown that rates of change between cells are described by movement where the same object occupies different positions in space at different times. Using individual cell addressing in raster GIS, it is possible to describe the movement of objects based on the landscape process model approach. To describe movements there are conceptually three issues to be determined for an object to move from one cell to the next. The first issue is to determine whether a relation between two cells exists or not. This could be termed the level of focal interaction. Second, if an interaction occurs, which kind of focal relation describes the interaction needs to be determined. Thirdly there is a local relation. No matter the type of focal interaction, processes could occur within the cell where the value of the cell will change during a certain period of time. By using a stepwise individual cell addressing it is possible to determine which relationship exists between the target cell and each of its surrounding cells. Each cell under investigation can be identified through a pointer to the cell, defined as its location in relation to the target cell. If focal interaction is established, processing between cells can occur. The focal and local relation can be expressed as some linear mathematical equation to determine the rates of change. Since no immediate numerical integration techniques exist at this level in cell processing, the equations must be solved via algebraic solution. This automatically creates difficulties in describing rates of change simultaneously in a space and time domain. Nevertheless a solution can be reached by accepting an approximation of the rates of change. (See Figure 16.2) Based of the concept of ecosystem theory each cell can be interpreted as a single ecosystem governed by a set of fundamental equations each describing the rates of change involved. Having these equations gives the possibility of describing exchanges between cells. Working with the immediate neighbourhood fulfils the assumption that the processes described do not jump cells but instead pass through cells. The cell can thus be expressed as a dynamic model which describes changes in the spatial pattern of the immediate neighbourhood from time t to time t+m: (16.4) where X is a column vector describing the spatial pattern at time t of a state variable and Y describes other external or internal variables in the immediate neighbourhood. An effect of this definition implemented in the individual cell processing is its dynamic structure where feed-back mechanisms are allowed through interactions of individual cells. Within each cell parameters from the cell and its neighbours are processed (Constanza et al., 1990).
DYNAMIC WETLAND MODELLING IN GIS
195
Figure 16.3: A conceptual model of stream flow using ecosystem modeling approach where stream (ST) represents the state variable and precipitation, evaporation, and groundwater external variables affecting the state of ST
16.6 BACK TO THE HYDROLOGY PROBLEM Hydrological flow processes represent the most important factor affecting the kinetics of the wetland. Instead of modelling a stream via lines in GIS a stream feature can be modelled as a set of individual segments each adjacent to each other. Each cell is a homogeneous entity representing a system of its own interacting with its immediate neighbours. Expressed as a conceptual model the variables for a stream could have the following appearance: (16.5) where the rate of change for stream water in a cell is precipitation at time t plus a contribution from adjacent cells at time t−1, contribution from groundwater at time t, minus evaporation at time t, and flow out of the cell at time t (see Figure 16.3). PR (precipitation), is the daily accumulated precipitation over each cell and gives a positive contribution to the system. This comes from accumulated time series of data expressed as mm/day (24 hours sampling period). EP(Evaporation), is the evaporation from streams expressed as mm/24 hours with the assumption that there is no wind which speeds up evaporation from open water surfaces (Chow, 1990): (16.6) where: T is the temperature expressed as °C r is density of water expressed as kg/m2 Rn is the net radiation expressed as: Rn=Ri (1-a)-Re with; Ri being the incoming radiation in W/m2
196
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
a being an absorption factor based on types of land cover Re is the emission or reflected radiation defined as: Re=e s T4 with; e being the emissivity of the surface between 0.97–1.00 s is the Stefan-Boltzmann constant (5.67×10–8 W/m2 K4) T is the temperature in Kelvin degrees SF(stream flow), is the flow in streams expressed as a steady uniform flow in open channels. Regarding each cell as a homogeneous entity a stream segment can be approximated to an open channel with the same hydraulic radius throughout the cell. Each cell has thus a different hydraulic radius depending on its physical properties. This gives different flow rates between cells creating a simple dynamic structure as well as easy algebraic solutions. Depending on data and ambition these equations could be expressed in more complex form. Uniform flow for open channels is expressed as (Chow, 1990): (16.7) where: R is the hydraulic radius defined as: R=A/P with A being the cross section area of the stream defines as height multiplied with width. P being the wetted perimeter defined as width+(2× height) n is the Manning roughness factor determined for different types of physical features in a stream like winding, ponds etc. Sf being the friction slope expressed as So (bed slope) for uniform flow. The flow rate can be calculated as: (16.8) 16.7 A DYNAMIC SPATIAL ALGORITHM FOR STREAM FLOW Transforming the conceptual model for stream flow into an algorithm based on individual cell processing also introduces spatial reasoning (Berry, 1995) expressed through rules for focal interaction. One rule of focal interaction for a stream is that a stream cell only receives water from an adjacent stream cell if and only if it is beneath the donor cell. A cell can run to many cells if it is either meeting the above criteria or the neighbouring cell is below the water table in the cell. To implement the algorithm a small test site size 600×800 m. is divided into 6×8 cells with a cell size of 100 m. The algorithm firstly reads, sets and calculates at time t=0 all the parameters to be used in the following procedure such as initial water table, precipitation, evaporation and volume. All these represent vertical flows in the model and are calculated by local relations only. Often meteorological data are scattered over large areas and have to be interpolated to the area in focus. In this simulation the time series of data available for 24 hours are accumulated precipitation data, sun hours, and highest and lowest temperature in that period. Using 24 hour accumulated precipitation data will however give a false value for stream velocity: therefore these data are approximated into 12 intervals using a Gaussian normalisation curve to distribute the rain pattern over a 24 hour period. This approach is feasible because a comparison between rainfall patterns and water velocity and levels in the test stream shows steady flows at all times, indicating a main contribution to the stream from subsurface water and thus not surface runoff Evaporation, evapotranspiration and initial water table are calculated using local cell operations and local relations (Tomlin, 1990). Contributions from precipitation and stream
DYNAMIC WETLAND MODELLING IN GIS
197
flows from the previous time step represent the total initial surface water in the cell. Some initial water will evaporate during the two hour time step and are removed from the initial water table (see Figure 16.4). Secondly, focal interaction is tested and horizontal flows are calculated through individual cell processing. Focal relations formulated through equations for slope and hence velocity (16.8) and rates are calculated using the equation for uniform flow in open channels (Chow, 1990). Parameters used for this calculation, such as Mannings and physical stream conditions are described in a set of parameter grids. Calculation of water velocity represents a “chicken or egg” situation since it is determined by the height of the stream which again is determined from water velocity. The problem is solved by introducing initial distributed height values for the stream. These are interpolated values based on a set of height measurements along the stream which are then interpolated. Velocity is calculated based on the values of the adjacent lower lying stream cells and the slope between these. The algorithm (local relations) then checks the volume of water that will leave the cell within the time step and the travel time to the next cell to ensure that water only runs a certain distance within a time step. If a volume of water in one cell has not yet reached an accumulated travel time beyond the time step the volume is allowed to reach the next cell. If travel time (previous travel time+new travel time) will exceed the time step, only a subset of the water will leave the cell whereas if the opposite occurs all the water will leave the cell. In the next step the algorithm checks its immediate neighbourhood to see if any of its adjacent neighbours will contribute to the cell. If the cell itself has an accumulated time past 7200 sec. water can actually run into the cell and even further due to high velocity and low travel time. This phenomenon is also called a pulse (Chow, 1990). Based on the cell’s physical properties a new water table is calculated (local relation). The algorithm represents an iteration within a time step where the iteration continues to calculate the distribution of stream water as long as there is a volume which flows out of the cells. When no flows occurs the algorithm bails out of the iteration and continues with the next time step calculating distribution of water at time t+1. The number of time steps are controlled by either the user or by the range of accumulated precipitation data. 16.8 PRELIMINARY RESULTS At this stage the model for stream flow is yet in a testing mode. Some preliminary results can however be presented. Figure 16.5 illustrates the distribution of water in the test stream during one time step where volumes are represented as blocks shaded after size and volume. At time 0 the initial water table has been calculated based on average measurements and precipitation. The following five illustrations show how this volume of water is distributed throughout the stream during one time step. Distribution of volume does not follow a linear rate since it depends on the spatial variability in physical properties of the stream. Since no groundwater recharge is implemented in this test run the stream will slowly dry during the iterations. This is because the velocity and thus the travelled distance exceeds the actual length of the test stream. 16.9 DISCUSSION Being at a preliminary stage, this model can only indicate at present the spatial behaviour of the stream. Actual quantitative results have yet to come since the model needs to be further calibrated and verified throughout the whole watershed. It can be argued that using network algorithms within GIS the same result could have been obtained. Using network models however does not allow back flows, partial area flows or flooding to be modelled. The method suggested in this chapter supports these issues, thus allowing
198
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 16.4: Within each time step distribution of water in a stream is calculated through a set of iterations controlled by height of stream surface and accumulated travel time.
modelling more complex flow patterns in a spatial domain. One of the present constrains is the obvious lack of possibilities to solve a set of equations simultaneously or address and process multiple cells in one calculation. Not being able to do this introduces some crude approximations in relation to time. Nevertheless this approach allows the modelling of spatial behaviour at a much higher level than existing algorithms. The drawback of implementing these types of models in raster GIS is the level of application programming needed which is rather slow especially in the writing and processing of files. On the other hand, the advantage of modelling in GIS is the possibility of a seamless interaction of focal interaction, focal relation, and local relations running the model using all available tools to interpolate and analyse data. In this algorithm only simple mathematical equations have been used to describe flow behaviour of water. It is possible to implement further complex equations based on linear solutions.
DYNAMIC WETLAND MODELLING IN GIS
199
Figure 16.5: Through one time step water distributes itself and reaches a steady state after 6 iterations. As seen there is not a linear movement of volumes due to the varying physical properties of the stream as well as allowed travel time of water.
16.10 CONCLUSION The preliminary results of this work show that distributed flows of stream water can be described using individual cell processing techniques in GIS. Using this technique, stream velocity as well as flow rates are calculated for each 100 m thus creating possibilities to estimate hydrology and sediment loads in the streams. Knowledge of these in locating wetlands is crucial if long-term efficiency of wetlands is to be obtained. Using this type of model also allows interactions between subsurface, surface, and stream, thus describing movements of hydrology mid sediments within the whole watershed. REFERENCES BEASLY, D.B., HUGGINS, L.F. and MONKE, E.J. 1980. ANSWER: a model for watershed planning, Transactions of the ASAE, pp. 938–944 BERRY, J. 1995. Spatial Reasoning for Effective GIS. Fort Collins: GIS World Books. CHOW, V.T., MAIDMENT, D.R and MAYS, L.W. 1988. Applied Hydrology. New York. McGraw-Hill. CONSTANZA, R., SKLAR, F.H. and WHITE, M.L. 1990. Modelling coastal landscape dynamics, Bioscience, 40(2), pp. 91–107. E.S.R.I 1996. ArcDOC Version 7.0.4. Redlands: E.S.R.I. FEDRA, K. 1993. GIS and environmental modelling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. Oxford: Oxford University Press, pp. 35–50. GALTON, A. 1996. Towards a qualitative theory of movement, in Frank, A. (Ed.) Temporal Data in Geographic Information Systems. Vienna: Department of Geoinformation, Technical University of Vienna, pp. 57–78. HANSEN, H.S. 1993. GIS-datamodeller, in GIS i Danmark København: Teknisk Forlag, pp. 45–50. HEDIN, R.S., NARIN, R.W. and KLEINMANN, RL. P 1994. Passive treatment of coal minage drainage, in Bureau of Mines Information Circular/1994. Washington: United States Department of Interior. HODGE, W., LARSON, M. and GORAN, W. 1988. Linking the ARMSED watershed process model with the grass geographic information system, Transactions of the ASAE, pp. 501–510.
200
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
JENSON, S.K. 1991. Applications of hydrologic information automatically extracted from digital elevation models, Hydrological Processes, 5(1), pp. 31–44. JØRGENSEN, S.E. 1988. Environmental Modelling. Amsterdam: Elsevier. KNUDESEN, C. and FELDBERG, R. 1994. Numerical Solution of Ordinary Differential Equations with Runge-Kutta Methods. Lyngby: Technical University of Denmark, Physics Department. KUSLAR, J.A. and KENTULA, M.E. (Eds.) 1990. Wetland Creation and Restoration—The Status of Science. New York: Island Press. MAIDMENT, D.R. 1993. GIS and hydrologic modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. Oxford: Oxford University Press, pp. 147–167. MOORE, I.D., TURNER, A.K., WILSON, J.P., JENSON, S.K. and BAND, L.E. 1993. GIS and land-surfacesubsurface-modelling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modelling with GIS. Oxford: Oxford University Press, pp. 196–230. MITSCH, W.J. 1993. Designing constructed wetlands systems to treat agricultural non point source pollution problems, in Created and natural wetlands for controlling non point source pollution problems. Boca Raton: CRC Press. MITSCH, W.J., and GOSSELINK, J.G. 1993. Wetlands, second edition. New York: Van Nostrand Reinhold. NAGESHWAR, R.B., WESLEY, J.P. and RAVIKUMAR, S.D. 1992. Hydrologic parameter estimation using geographic information systems, Journal of Water Resources Planning and Management, 118(5), pp. 492–512. SKLAR, F.H. and CONSTANZA, R. 1990. Dynamic spatial models, in Quantitative Methods in Landscape Ecology, New York: Springer Verlag, pp. 239–288. SKLAR, F.H., GOPU, K.K., MAXWELL, T. and CONSTANZA, R. 1994. Spatially explicit and implicit dynamic simulations of wetland processes, in Global Wetlands—Old and New, Amsterdam: Elsevier, 537–554. STABASKRABA, M, MITSCH, W.J. and JØRGENSEN, S.E. 1988. Wetland modelling-an introduction and overview in Wetland Modelling. New York: Elsevier. STUEBE, M.M. and JOHNSTON, DM 1990. Runoff volume estimation using GIS techniques, Water Resources Bulletin, 26(4), pp. 611–620. TOMLIN, C.D. 1990. Geographic Information Systems and Cartographic Modelling. Englewood Cliffs, NJ: Prentice Hall. VIEUX, B.E. 1991. Geographic information systems and non-point source water quality and quantity modelling, Hydrologic Processes, 4, pp. 101–113. WISCHMEIER, W.H. and SMITH, D.D. 1978. Predicting rainfall erosion losses, a guide to conservation planning, Agricultural Handbook 537, Washington DC: US Department of Agriculture. YOUNG, R.A., ONSTAD, C.A., BOSCH, D.D. and ANDERSON, W.P. 1987. AGNPS, agricultural-non-point-sourcepollution-model: a watershed analysis tool, Conservation Research Report, 35, Washington DC: US Department of Agriculture.
Chapter Seventeen Use of GIS for Earthquake Hazard and Loss Estimation Stephanie King and Anne Kiremidjian
17.1 INTRODUCTION The realistic assessment of the earthquake hazard and potential earthquake-induced damage and loss in a region is essential for purposes such as risk mitigation, resource allocation, and emergency response planning. A geographic information system (GIS) provides the ideal environment for conducting earthquake hazard and loss studies for large regions (King and Kiremidjian, 1994). Once the spatially-referenced databases of geologic, geotechnical, and structural information for the region have been compiled and stored in the GIS, different earthquake scenarios can be analysed to estimate potential damage and loss in the region. In addition to the forecasting of future earthquake damage and loss for planning and mitigation purposes, the GIS-based analysis provides invaluable assistance when an earthquake does occur in the study region. The earthquake event is simulated in the GIS to estimate the location and severity of the damage and loss, assisting officials who need to allocate emergency response resources and personnel. As post-earthquake reconnaissance reports are made, the GIS is continuously updated to provide a relatively accurate assessment of the disaster situation in the region. This chapter describes the development of a GIS-based methodology for earthquake hazard and loss estimation that aids in seismic risk mitigation, resource allocation, public policy decisions, and emergency response. After some brief background, an overview of the various types of data and models that comprise a regional earthquake hazard and loss estimation is presented. Following the overview is a description of the development of the GIS-based methodology, i.e., the implementation of earthquake hazard and loss estimation in a geographic information system, with examples from a case study for the region of Salt Lake County, Utah (Applied Technology Council, in progress). 17.2 BACKGROUND Most of the previous work in the application of geographic information system technology to regional earthquake damage and loss estimation has been limited to methods usually considering only one type of seismic hazard and often applied to a small region or to a specific type of facility. Rentzis et al. (1992) used a GIS to estimate damage and loss distributions due to ground shaking in a 50-year exposure period for residential and commercial buildings in Palo Alto, California. Borcherdt et al. (1991) developed a GISbased methodology for identifying special study zones for strong ground shaking in the San Francisco Bay
202
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
region based on regional surface geology and an intensity attenuation relationship for a repeat of the 1906 San Francisco earthquake. Kim et al. (1992) developed a GIS-based regional risk analysis program to study the vulnerability of bridges in a regional highway network. McLaren (1992) describes the use of a GIS by Pacific Gas and Electric (PG&E) to aid in the evaluation of the likely effects of high-probability, large magnitude future earthquakes in PG&E’s service territory and to set priorities for the mitigation of seismic hazards. Due to recent improvements in the availability and quality of GIS technology, tabular database software, as well as computer hardware, a significant amount of current research has been devoted to incorporating GIS technology in seismic hazard and risk analysis. Very few of these studies, however, have considered combining the effects of the various seismic hazards such as ground shaking, soil amplification, liquefaction, landslide, and fault rupture. Additionally, most studies have been conducted for a specific site or for a specific facility type. This chapter describes a methodology for integrating all of the separate modules necessary for a comprehensive regional earthquake damage and loss analysis in a manner that is flexible in geographic location, analysis scale, database information, and analytical modelling capabilities. 17.3 OVERVIEW OF EARTHQUAKE HAZARD AND LOSS ESTIMATION Earthquake hazard and loss estimation involves the synthesis of several types of spatially-referenced geologic, geotechnical, and structural information with earthquake hazard and loss models. The basic steps in the analysis typically include the following (as illustrated in Figure 17.1): 1. Estimation of ground shaking hazard. Ground shaking estimation involves the identification of the seismic sources that may affect the region, modelling of the earthquake occurrences on the identified sources, modelling the propagation of seismic waves from the sources to the study region, and lastly modifying the ground motion to account for local soil conditions that may alter the shaking intensity and frequency content. 2. Estimation of secondary seismic hazards. Secondary seismic hazards include effects such as liquefaction, landslide, and fault rupture. Models to estimate these hazards are typically functions of the local geologic and geotechnical conditions, the intensity, frequency, and duration of the ground motion, and the occurrence of the hazards in previous earthquakes. 3. Estimation of damage to structures. Structural damage estimation involves the identification, collection, and storage of information on all building and lifeline structures in the region, and the modelling of damage to each type of structure as a function of ground shaking intensity and potential for secondary seismic hazards. 4. Estimation of monetary and non-monetary losses. Monetary losses include repair and replacement cost for structural damage and loss of business income, and non-monetary loss includes deaths and injuries. The estimation of these types of losses involves models that are functions of the level of damage and associated social and economic information for each structure. There are several other types of losses that can be considered, such as clean-up and relocation cost, homelessness, unemployment, emotional distress, and other short and long-term socio-economic impacts on the region; however, the modelling of these is typically too involved, requiring separate and more detailed studies.
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
203
Figure 17.1: Basic steps in earthquake hazard and loss estimation.
17.4 A GIS-BASED EARTHQUAKE HAZARD AND LOSS ESTIMATION METHODOLOGY As illustrated in Figure 17.1, earthquake hazard and loss estimation utilises models that require spatiallyreferenced information as the input parameters. A GIS is ideal for this type of analysis. The geologic, geotechnical, and structural data for the study region are stored in the form of maps with related database tables, and the models are in the form of either look-up tables or short programs written in GIS macro language that involve map overlay procedures and database calculations. Figure 17.2 illustrates the GIS implementation of earthquake hazard and loss estimation, following the same basic steps as those shown in Figure 17.1. Each of the four steps shown in Figure 17.2 is described in more detail below. 17.5 GIS MAPS OF GROUND SHAKING HAZARD An earthquake event is typically specified in the study region by selecting the seismic source from a fault map of the area, as well as the desired magnitude of the earthquake. The bedrock ground motion resulting
204
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 17.2: GIS-based earthquake hazard and loss estimation.
from the earthquake is estimated with the use of an attenuation relationship, an empirical formula that describes the level of ground shaking at a given location as a function of earthquake magnitude and distance to the fault. In the GIS, buffer zones of equal distance around the fault are created and the level of ground motion in each buffer zone is assigned through a database table look-up. For example, Figure 17.3 shows a map of Salt Lake County, Utah with a scenario fault break that would produce a magnitude 7.5 earthquake, and the corresponding buffer zones around the fault. The database attributes associated with a typical buffer zone are also shown in Figure 17.3. The values of peak ground acceleration (PGA) were computed according to the attenuation relationship developed by Boore et al. (1993) as follows: (17.1)
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
205
Figure 17.2: (continued). GIS-based earthquake hazard and loss estimation.
where:
d=distance to the rupture zone in km M=the assumed magnitude of the earthquake (7.5 in this study) Gb=0 and Gc=0 for soil type A (shear wave velocity > 750 m/s) Gb=1 and Gc=0 for soil type B (shear wave velocity=360–750 m/s) Gb=0 and Gc=1 for soil type C (shear wave velocity < 360 m/s) Peak ground acceleration values were also converted to Modified Mercalli Intensity (MMI) (Wood and Newman, 1931) values for use in earthquake damage estimation described later in this chapter. A map of
206
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 17.3: Map showing buffer zones of ground shaking in Salt Lake County, Utah.
the local site geology is used to define the areas of A, B, and C soil types in the study region. To estimate the final surface ground shaking in the region, the map of ground motion buffer zones is combined with the soil type map to produce the final map of surface ground shaking as illustrated in Figure 17.4 for MMI values for a magnitude 7.5 earthquake in Salt Lake County, Utah. An example list of the attributes associated with each polygon on the map is also shown in Figure 17.4. High ground motion values are shown to occur through the middle portion of the county. This is due to the close proximity of the fault in these regions, as well as the presence of softer soil deposits (soil type C). 17.6 GIS MAPS OF SECONDARY SEISMIC HAZARDS The secondary seismic hazards that are considered here include liquefaction, landslide, and surface fault rupture. The hazards associated with liquefaction and landslide are defined in terms of “high”, “moderate”, “low”, and “very low” potential of occurrence based on geologic and geotechnical conditions (see King and Kiremidjian, 1994). Maps describing the hazard due to liquefaction and landslide are often available in GIS format in terms of the qualitative potential description. More quantitative descriptions of liquefaction and landslide hazard can be developed in the GIS with spatial modelling of geotechnical and geologic parameters (for example, see Luna, 1993). The hazard due to surface fault rupture is defined in terms of 100 and 200 meter buffer zones around the assumed scenario fault break. The surface fault rupture map is created in the GIS with a similar buffer procedure as is used in the ground shaking map generation. Figure 17.5 shows an example landslide potential map for Salt Lake County, Utah.
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
207
Figure 17.4: Map showing distribution of ground shaking hazard in Salt Lake County, Utah.
17.7 GIS MAPS OF DAMAGE TO STRUCTURES In order to estimate earthquake damage accurately, a complete and detailed inventory of structures must be developed for the region. The accuracy of the final regional estimates of damage and loss is highly dependent upon the accuracy of the underlying structural inventory developed for the area. The information to be included in a structural inventory often depends upon the classes of facilities under consideration and the type of analysis being conducted. For the most general regional earthquake damage and loss analysis, information about the location, use, and structural properties of each facility is typically desired. Sources of information for the inventory include federal, state, and local governments, as well as private sector databases. Often, knowledge-based expert systems are used to infer missing information and assign predefined engineering classes to structures in the final inventory (see King et al., 1994). Typically the inventory includes information on all structures in the region, such as buildings, bridges, highways, pipelines, and industrial facilities. Once the inventory is compiled, summary maps may be generated that help to describe the characteristics of the inventory in the study region. For example, the percentage of unreinforced masonry buildings (a typically poor performer in withstanding earthquake shaking) in each Census tract can be displayed as shown in Figure 17.6 for Salt Lake County, Utah. These summaries help to indicate those areas that contain buildings that are relatively more hazardous. If resources are limited, a detailed earthquake damage and loss analysis might be conducted only in the most hazardous areas, while the remainder of the study region could be analysed when more funding is available. The most widely used measure of earthquake damage is an expression of damage in terms of percentage financial loss that can be applied to all types of structures (Rojahn, 1993). This measure is typically given the name “damage factor” and is defined as (Applied Technology Council, 1985):
208
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 17.5: Map showing landslide potential in Salt Lake County, Utah.
(17.2) Damage is typically estimated individually for all seismic hazards, i.e., ground shaking, liquefaction, landslide, and fault rupture and then combined to result in a final damage to the structures in the region for the given earthquake event; however, the discussion in this chapter is limited to damage due only to ground shaking hazard. Motion-damage relationships are used to estimate earthquake damage for each facility type due to various levels of ground shaking. These relationships, also known as vulnerability curves, are typically expressed in terms of: damage-loss curves, fragility curves, damage probability matrices, and expected damage factor curves (King and Kiremidjian, 1994). Figure 17.7 shows an example of an expected damage factor curve for low-rise wood frame buildings located in Salt Lake County, Utah. Curves such as these are usually developed on the basis of expert opinion augmented with empirical data. In the GIS, the inventory of structures is stored as a series of maps with associated database tables. The expected damage factor curves, such as the one shown in Figure 17.7, are stored in the form of database tables with ground shaking level (MMI), engineering class, expected damage factor, and standard deviation on the expected value as attributes. The maps of inventory data are overlaid in the GIS on the ground shaking hazard map as shown in Figure 17.8 for commercial buildings in Salt Lake County. Through the overlay, each building (stored as a point feature in the GIS) acquires the level of ground shaking intensity as one of its attributes. A table look-up is then used to assign the expected damage factor and standard deviation to the building as a function of the ground shaking intensity and the engineering class of the building. A sub-set of the final attributes stored with each building is also shown in Figure 17.8. There are several sources of uncertainty in the GIS-based earthquake damage and loss estimation, such as in the ground motion intensity, the inventory information, and the expected damage factor curves. Due to all the uncertainty and simplifying assumptions that are necessary when representing variables as maps, damage and loss estimates are never reported on a structure-by-structure basis but are reported as
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
209
Figure 17.6: Map showing percentage of non-reinforced masonry buildings in Salt Lake County, Utah.
Figure 17.7: Expected damage factor curve with standard deviation as a function of MMI for low-rise wood-frame buildings.
aggregated values over small regions. For example, Figure 17.9 shows the expected damage factor due to ground shaking for a magnitude 7.5 earthquake in Salt Lake County, Utah. The areas of high damage are located through the centre of the county, where the ground shaking is relatively high and where many buildings are constructed of unreinforced masonry. 17.8 GIS MAPS OF MONETARY AND NON-MONETARY LOSS Monetary losses resulting from an earthquake are typically due to: direct structural damage, such as failed beams, excessive deflections, and differential settlement to man-made facilities; and indirect effects, such as damage to non-structural elements and contents, clean-up and financing of repairs, and loss of facility use. Non-monetary losses usually refer to fatalities, injuries, unemployment, and homelessness. The modelling of the various monetary and non-monetary losses is very difficult and the subject of many current research projects. In this chapter, losses are limited to those due to direct structural damage, loss of facility use, and casualties. The latter categories of loss are assumed to be a function of the facility use and the earthquake damage to the facility.
210
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 17.8: Map showing results for commercial buildings overlaid on seismic hazard map for Salt Lake County, Utah.
In the GIS, losses are estimated with database manipulations; map overlays are not necessary. Loss due to direct structural damage is the expected damage factor of a facility multiplied by its replacement cost. Replacement costs are typically estimated as a function of the facility use and square footage according to local construction estimates. Loss of facility use is assumed to be a function of the use of the facility and the expected damage factor of the facility. Values for loss of facility use are based on expert opinion augmented with empirical data and give the number of days required to restore the facility to various percentages of full use. Casualties are also estimated as a function of the use of the facility and the expected damage factor of the facility. Casualty rates are typically based on empirical data, although the amount of data is limited. Example maps of earthquake loss estimates are shown in Figures 17.10 and 17.11. Figure 17.10 shows the loss due to structural damage aggregated by Census tract for a magnitude 7.5 earthquake in Salt Lake County, Utah, and Figure 17.11 shows the expected number of deaths for the same event and region, also aggregated by Census tract. As expected, the areas of high losses on the maps shown in Figures 17.10 and 17.11 correspond to the areas of high damage on the map shown in Figure 17.9. Again, reporting these estimates on an individual building basis is not done due to the numerous sources of uncertainty. 17.9 COST-BENEFIT STUDIES The GIS-based earthquake hazard and loss estimation provides an efficient and clear means for assessing the effects (e.g., cost-benefit analyses) of various seismic risk mitigation strategies. For example, a proposed city ordinance may require the seismic upgrading of all unreinforced masonry public buildings. The cost associated with the upgrading is assumed to be a dollar amount per square foot of building and is computed by utilising the building inventory stored in the GIS database. The benefit is assumed to be the change in the expected loss (including monetary loss, casualties, and loss of use) for one or more earthquake scenarios. Using the GIS, the expected earthquake loss is computed for the unreinforced masonry public buildings in
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
211
Figure 17.9: Map showing average expected damage factor due to ground shaking in each Census tract in Salt Lake County.
both their current and upgraded state, assuming that compliance with the city upgrade ordinance decreases the expected earthquake damage to the buildings by a certain percentage. The cost and benefit of the city ordinance are compared to assess the effectiveness of this type of risk mitigation strategy. 17.10 SUMMARY This chapter describes the implementation of earthquake hazard and loss estimation in a geographic information system. The GIS-based analysis provides decision making assistance is areas such as seismic risk mitigation, resource allocation, public policy, and emergency response. An overview of the various types of data and models that comprise a regional earthquake hazard and loss estimation is given, followed by a description of how earthquake hazard and loss estimation is done within the GIS with examples from a case study for Salt Lake County, Utah. ACKNOWLEDGEMENTS Funding for the research described in this chapter was provided by several sources including Applied Technology Council, GeoHazards International, Kajima Corporation through the California Universities for Research in Earthquake Engineering Foundation, and National Science Foundation grant number EID-9024032. The authors are grateful for this support. In addition, the authors wish to thank Environmental Systems Research Institute of Redlands, California for providing their ARC/Info™ software for research use at the John A. Blume Earthquake Engineering Center.
212
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 17.10: Map showing loss due to structural damage in each Census tract in Salt Lake County, Utah.
REFERENCES APPLIED TECHNOLOGY COUNCIL, in progress, Earthquake Loss Evaluation Methodology and Databases for Utah, ATC-36 Report, Redwood City, California. APPLIED TECHNOLOGY COUNCIL 1985. Earthquake Damage Evaluation Data for California, ATC-13 Report. Redwood City, CA: ATC. BOORE, D.M., JOYNER, W.B., and FUMAL, T.E. 1993. Estimation of Response Spectra and Peak Accelerations From Western North American Earthquakes: an Interim Report, Open File Report 93–509. Menlo Park, CA: United States Geological Survey. BORCHERDT, R, WENTWORTH, C.M., JANSSEN, A., FUMAL, T. and GIBBS, J. 1991. Methodology for predictive GIS mapping of special study zones for strong ground shaking in the San Francisco Bay Region, CA, Proceedings of the Fourth International Conference on Seismic Zonation, Stanford, California, 25–29August 1991, Volume III, pp. 545–552. KIM, S.H., GAUS, M.P., LEE, G. and CHANG, K.C. 1992. A GIS-based regional risk approach for bridges subjected to earthquakes, Proceedings of the ASCE Special Conference on Computing in Civil Engineering, Dallas, Texas, June 1992, pp. 460–467. KING, S.A. and KIREMIDJIAN, A.S. 1994. Regional Seismic Hazard and Risk Analysis Through Geographic Information Systems, John A. Blume Earthquake Engineering Center Technical Report No. 111, Stanford, CA: Stanford University. KING, S.A., KIREMIDJIAN, A.S., ROJAHN, C., SCHOLL, R.E., WILSON, R.R, and REAVELEY, L.D. 1994. Development of an integrated structural inventory for earthquake loss estimation, in Proceedings of the Fifth National Conference on Earthquake Engineering, Chicago, Illinois, 10–14 July, Vol. IV, pp. 397–406. LUNA, R. 1993. Liquefaction analysis in a GIS environment, Proceedings of the NSF workshop on Geographic Information Systems and their Application in Geotechnical Earthquake Engineering , Atlanta, Georgia, 28– 29January, 1993, pp. 65–71. McLAREN, M. 1992. GIS Prepares Utilities for Earthquakes, GIS World, Volume 5(4), pp. 60–64.
GIS FOR EARTHQUAKE HAZARD AND LOSS ESTIMATION
213
Figure 17.11 : Map showing total number of expected deaths in each Census tract in Salt Lake County, Utah. RENTZIS, D.N., KIREMIDJIAN, A.S. and HOWARD, H.C. 1992. Identification of High Risk Areas Through Integrated Building Inventories, The John A. Blume Earthquake Engineering Center Technical Report No. 98, Stanford, CA: Stanford University. ROJAHN, C. 1993. Estimation of earthquake damage to buildings and other structures in large urban areas, Proceedings of the Geohazards International/Oyo Pacific Workshop, Istanbul, Turkey, 8–11October 1993. WOOD, H.O. and NEWMAN, F. 1931. Modified Mercalli intensity scale of 1931, Seismological Society of America Bulletin, 21m(4), pp. 277–283.
Chapter Eighteen An Evaluation of the Effects of Changes in Field Size and Land Use on Soil Erosion Using a GIS-Based USLEApproach Philippe Desmet, W.Ketsman, and G.Govers
18.1 INTRODUCTION Watershed models have an obvious and explicit spatial dimension and increasingly benefit from the use of GIS linked to digital elevation models for both data input, analysis and display of the results. The particular value of GIS lies in capturing the spatial variability of parameters, and aiding in the interpretation of the results. The Universal Soil Loss Equation (USLE) is a lumped parameter erosion model that predicts the average annual sheet and rill erosion rate over a long term. The traditional approach uses averaging techniques to approximate characteristics of each parameter needed in the model. Despite its shortcomings and limitations the USLE (Foster, 1991; Renard et al., 1994; Wischmeier, 1976) is still the most frequently used equation in erosion studies, mainly due to the simple, robust form of the equation as well as to its success in predicting the average, long-term erosion on uniform slopes or field units (e.g. Bollinne, 1985; Busacca et al., 1993; Flacke et al., 1990; Jäger, 1994; Moore and Burch, 1986; Mellerowicz et al., 1994,). The USLE is primarily designed to predict erosion on straight slope sections. Foster and Wischmeier (1974) were the first to develop a procedure to calculate the average soil loss on complex slope profiles by dividing an irregular slope into a limited number of uniform segments. In this way, they were able to take the profile shape of the slope into account. This is important as slope shape influences erosion (D’Souza and Morgan; 1976, Young and Mutchler, 1969). Using manual methods the USLE has already been applied on a watershed scale (Griffin et al., 1988, Williams and Berndt, 1972, 1977, Wilson, 1986). Basically, all these methods consist of the calculation of the LS-value for a sample of points or profiles in the area under study: the results of such calculations are then considered to be representative for the area as a whole. The number of data collected will necessarily be limited by the time-consuming nature of these methods. Furthermore, a fundamental problem may arise: the measurement of slope length at a given point is not straightforward and in a two-dimensional situation slope length should be replaced by the unit contributing area, i.e. the upslope drainage area per unit of contour length (Kirkby and Chorley, 1967). Indeed, in a real two-dimensional situation overland flow and the resulting soil loss does not really depend on the distance to the divide or upslope border of the field, but on the area per unit of contour length contributing runoff to that point (e.g. Ahnert, 1976; Bork and Hensel, 1988; Moore and Nieber, 1989). The latter may differ considerably from the manually measured slope length, as it is strongly affected by flow convergence and/or divergence. GIS technology provides for relatively easy construction and handling of digital elevation models which, in principle, allow for the calculation of the unit contributing area so that the complex nature of the
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
215
topography may be fully accounted for. In order to do so, various routing algorithms have been proposed in the literature and some applications have already been made in erosion studies (e.g. Bork and Hensel, 1988; Desmet and Covers, 1995; Moore and Burch, 1986, Moore and Nieber, 1989, Moore et al., 1991). The aim of this chapter is to present an extension of the Foster and Wischmeier (1974) approach for the calculation of the explicit-factor on a two-dimensional terrain. It will be shown that the algorithm may increase the applicability of the USLE by incorporating the proposed procedure in a GIS-environment thereby allowing the calculation of LS-values on a land unit basis. The applicability and flexibility of the GIS procedure will thereafter be demonstrated by an explorative evaluation of the effect of changes in land parcellation and land use on the erosion risks in the last two centuries in areas in central Belgium. Land parcellation and land use were chosen because of their relative importance in erosion assessments, their explicit spatial character and because they offered the best potential to derive their temporal evolution. 18.2 METHODOLOGY 18.2.1 A GIS-based USLE approach Foster and Wischmeier (1974) recognised the fact that a slope or even a field unit cannot be considered as totally uniform. Therefore, they subdivided the slope into a number of segments, which they assumed to be uniform in slope gradient and soil properties. The LS-factor for such a slope segment might then be calculated as: (18.1) where: L=slope length factor for the j-th segment (-) Sj=slope factor for the j-th segment (-) β j=distance from the lower boundary of the j-th segment to the upslope field boundary (m) m=the length exponent of the USLE LS-factor (-) In a grid-based DEM the surface consists of square cells. If the LS-factor has to be calculated, the contributing area of each cell as well as the grid cell slope have to be known. There are various algorithms to calculate the contributing area for a grid cell, i.e. the area upslope of the grid cell which drains into the cell. A basic distinction has to be made between single flow algorithms which transfer all matter from the source cell to a single cell downslope, and multiple flow algorithms which divide the matter flow out of a cell over several receiving cells. This distinction is not purely technical: single flow algorithms allow only parallel and convergent flow, while multiple flow algorithms can accommodate divergent flow. Desmet and Covers (1996a) reviewed the algorithms available and found that the use of single flow algorithms to route water over a topographically complex surface is a problem as minor topographical accidents may result in the erratic location of main drainage lines. Multiple flow algorithms, which can accommodate divergent flow, do not have this disadvantage. In here, we used the flux decomposition algorithm developed by Desmet and Covers (1996a): a vector having a magnitude equal to the contributing area to be distributed, increased with the grid cell area, is split into its two ordinal components. The magnitude of each component is proportional to the sine or cosine of the aspect direction which gives the direction of the vector. But as the sum of these two components is larger than the original magnitude, the components have to be normalised
216
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
afterwards. The unit contributing area may then be calculated by dividing the contributing area of a cell by the effective contour length D’ij. This is the length of contour line within the grid cell over which flow can pass. The latter equals the length of the line through the grid cell centre and is perpendicular to the aspect direction. It is calculated as: (18.2) where: D’ij=the effective contour length (m) D=the grid cell size (m) xij=sinij +cosα ij β =aspect direction for the grid cell with co-ordinates (i, j). At the cell outlet, the contributing area at the inlet has to be increased by the grid cell area. Equation (18. 1) can be extended for a two-dimensional topography by substituting the unit contributing area for the slope length as each grid cell may be considered as a slope segment having a uniform slope. After some rearrangements the L-factor for the grid cell with co-ordinates (i, j) may be written as: (18.3) where: Lij=slope length factor for the grid cell with co-ordinates (i, j) Aij-in=contributing area at the inlet of a grid cell with co-ordinates (i, j) (m2) Different methods can be used to calculate the slope gradient on grid-based DEMs. For this study, the slope gradient for each grid cell of the study area was computed according to the algorithm described by Zevenbergen and Thorne (1987): (18.4) where Gx=gradient in the x-direction (m/m) and Gy=gradient in the y-direction (m/m). The LS-factor for a grid cell may then be obtained by inserting Gi,j calculated by equation (18.4) and Lij calculated by equation (18.3), in the equations of the chosen USLE approach. For this study, we employed the equations as proposed for the Revised Universal Soil Loss Equation (RUSLE) (McCool et al., 1987, 1989; Renard et al., 1993). This approach has proved to give reliable, two-dimensional estimates of the long-term erosion risks; more details can be found in Desmet and Covers (1996b). 18.2.2 Benefits of this approach Desmet and Covers (1996b) compared the automated approach to a manual analysis of a topographic map in which the LS-values for a sample of points were derived and considered to be representative for a certain area around these points. A first difference is that the number of data collected will necessarily be limited by the time-consuming nature of the manual method while on the other hand the automated approach enables an almost infinite amount of grid cells. Both the manual and the automated method yielded broadly similar results in terms of relative erosion risk mapping. However, there appeared to be important differences in absolute values. We may generally state that the use of manual methods leads to an underestimation of the erosion risk because the effect of flow convergence cannot taken into account. This is especially true in plan-concave areas (i.e. zones of flow concentration) as overland flow tends to concentrate in the concavities: it is therefore logical that the sheet and rill erosion risk will be higher here. The plan form convergence responsible for these higher erosion risks can clearly be captured by the automated technique
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
217
but not by the manual. Therefore, it is clear that a two-dimensional approach is required for topographically complex areas to capture the convergence and/or divergence of the real topography. The latter is especially important when rill and gully erosion are considered. While the USLE is said to give the mean annual soil loss by sheet and rill erosion, there is a strong indication that this two-dimensional approach also accounts for (ephemeral) gully erosion. This indication is based on the prediction of rill and gully volumes measured in the field (Desmet and Covers, 1997) by using the above-mentioned LS-factor. After the classification of the LS-values, the mean (predicted) LS-value and the mean (measured) rill section were extracted for each class. A comparison of the predicted mean class values for the LS-factor with the measured mean rill erosion rate shows that the highest LS-class (which corresponds with the ephemeral gully zone) follows the general linear trend of the other data, reflecting rill erosion strictu sensu (Figure 18.1). Thus, a two-dimensional approach for the USLE can be used to predict the variation with topography of both rill and ephemeral gully erosion as long as the effects of soil conditions on ephemeral gully development are not too important. The integration of the procedure in a GIS environment, in this particular case IDRISI (Eastman, 1992), has several advantages: • The combination of IDRISI with TOSCA (Jones, 1991) permits the relatively easy construction of a DEM from a standard topographic map. This DEM can directly be used by the program to produce an information layer containing the LS-value for each grid cell. • The USLE consists of six factors, four of which may gain an obvious advantage from the linkage with a GIS; digitising soil and land unit maps and/or the availability of digital soil information enable an easy storing, updating and manipulating of the soil credibility factor K, the cover-management factor C and the support practice factor P; some information for the rainfall-runoff factor R can also be achieved and improved by using GIS techniques (e.g. interpolation, Thiessen polygons). This linkage enables the incorporation of the spatial variation of some factors causing erosion and the assignment of appropriate C and P values to each of the land units and K values to each of the soil units. With a GIS-based extension of the LS-factor a full linkage is achieved. The predicted soil loss per unit area can now be calculated for each grid cell by a simple overlay procedure. Furthermore, standard IDRISI procedures allow to sum and average predicted soil losses for each land or soil unit, so that total and average soil loss can be calculated on a land unit/soil unit basis. • In a catchment with mixed land use, theoretical drainage areas are often irrelevant. Very often runoff will not be generated on the whole slope length at the same time, but only on land units with a specific land use. The following may be an example of this: mature grassland or woodland never generate significant amounts of runoff under western European conditions. So, if there are parcels with this land use upslope of an area under cultivation, they should not be taken into account when calculating L-values for the cultivated part of the catchment. It is also possible that land units may be separated by drainage systems diverting all the runoff from the upslope area. Therefore, the program was written in a way so that the user may or may not consider the land units as being hydrologically isolated. If this is done, only the area within the land unit under consideration will be taken into account when calculating L-values. Certainly, user experience is required to decide whether two parcels should be considered as being hydrologically isolated or hydrologically continuous.
218
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 18.1: The prediction of gully sections by a power function of slope gradient and length
18.2.3 Implementation procedure Four study sites were selected in the neighbourhood of Leuven, Belgium. All catchments have a surface area of a few km2 and a rolling topography with soils ranging from loamy sand to silty loam. Main crops are winter cereals, sugar beets and Belgian endives while limited parts are forested or under permanent pasture. The elevation data to construct the digital elevation model were obtained by digitising the contour lines from the Belgian topographical map (NGI), and thereafter converted into a grid-based DEM. Accuracy on elevations of contour lines are in the order of 0.5 m (Depuydt, 1969). For all DEMs a grid spacing of 5 m was chosen. Next to this, the parcellation of each catchment for 1947, 1969 and 1991 was digitised from aerial photographs. This means that the subdivision into parcels is not based on the land registry, which cannot be seen on aerial photographs, but on exploitation, which is the appropriate option when assessing erosion risks. In order to fit the parcellations to the DEM a rubber sheeting procedure was carried out. For each parcel, the land use for the corresponding year was derived by visual inspection of the aerial photographs. A main distinction was made between arable land, pasture, woodland and built-up area. The identification of the differences between arable land (especially winter corn) and pasture was not always easy because most photographs were taken in the spring; mostly, we had to rely on the presence of patterns of tillage for arable land, and a patchy pattern for pasture. In case of doubt, the field was considered to be arable land. Next to these aerial photographs, we also used old cartographic documents, namely the map of the Austrian Count de Ferraris dated to 1775 and the first official topographical map of 1870. These sources did not give any information about the parcellation, however, it was possible to retrieve information on land use. Therefore, we had to assume the parcellation of 1775 and 1870 being identical to the parcellation of 1947.
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
219
By doing so, we have probably overestimated the field sizes for these years and therefore, its effect on the erosion risk. The GIS-based routine to calculate the LS-factor was then run on all DEMs using the information on the parcellation of the respective years. For this study we assumed the parcels to be hydrologically isolated so that no water could flow from one parcel to another. This means that the L-factor for a certain parcel is only dependent on the size and the orientation of the parcel. The LS-factor itself may give us information on the effect of changes in field size on the erosion risk. In order to include information on the land use, we also had to change the cover-management factor C.Bollinne (1985) suggested a C-value of ‘0.47’ for arable land and a value of “0” for pasture, woodland and built-up areas. The other RUSLE-factors were taken from Bollinne (1985) who performed an erosion plot study in the area, i.e. R=67.4; K=0.43; and P=1. 18.3 RESULTS 18.3.1 Evaluation of the predicted patterns Figure 18.shows the height and the slope map for the Kinderveld study catchment. A comparison of the slope map with the spatial patterns of the LS-values as calculated for the parcellation of 1990, points to the fact that the areas with the steepest slopes have the highest LS-factors (Figure 18.3). This is simply because the slope gradient is the major control on the LS-value, especially for large parcels. For smaller parcels, the effect of the slope gradient can be relatively compensated for by the lower slope lengths. 18.3.2 Effect of parcellation For all study areas the average field size has increased significantly since the Second World War (Table 18.1). This is mainly due to re-allocations, a process that still goes on. The effects on the erosion risks is obvious: it causes an increase of the average LS-factor and thus, an increase of what we may call the topography-based erosion risk. This increase ranges from 25 to 30% between 1947 and 1991 which is much lower than the increase in field size (Table 18.1). This is due to the fact that slope, which is assumed not to change over time, is the major component in the LS-factor, and because the relation between an increase in field size and an increase of the L-factor is spatially variable and dependent on the configuration of the parcel Table 18.1: Evolution of the field size and corresponding LS-values Mean field size (ha.)
1947
1969
1991
Change since 1947
Kouberg Bokkenberg Kinderveld Ganspoel Mean LS-value Kouberg
0.43 0.38 0.37 0.39
0.82 0.62 0.58 0.59
1.71 0.87 1.08 1.32
+398% +229% +292% +338%
1.08
1.22
1.37
+26.9%
220
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 18.2: Topography for the Kinderveld study catchment A: height map; B: slope map. Mean field size (ha.)
1947
1969
1991
Change since 1947
Bokkenberg Kinderveld Ganspoel
1.19 1.17 1.48
1.32 1.34 1.76
1.50 1.46 1.92
+26.1% +24.8% +29.7%
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
221
18.3.3 Effect of parcellation and land use The changes in LS-values have to be attributed to the changes in field sizes. In order to estimate the real erosion risks, we also have to introduce information on land use. Table 18.2 gives the evolution of the percentage arable land for each study area. The maximum percentage of arable land is always found in 1870. This also affects the erosion risks as for all study areas the mean RUSLE value reaches a (local) maximum (Table 18.2). Table 18.2: Evolution of the land use and of the RUSLE values % Arable land
1775
1870
1947
1969
1991
Change since 1947
Kouberg Bokkenberg Kinderveld Ganspoel Mean RUSLE-value (ton/ha/yr) Kouberg Bokkenberg Kinderveld Ganspoel
96 73 90 77
97 100 100 97
95 96 97 90
95 70 89 77
92 68 82 77
−3.2% −29.2% −15.5% −14.4%
14.25 10.40 14.29 12.58
14.45 16.31 16.06 19.33
12.87 15.69 15.26 16.20
14.97 10.64 12.96 14.57
16.55 12.62 13.19 18.69
+28.6% −19.6% −13.6% +15.4%
After 1870 the percentage arable land gradually decreased. Essentially, this causes a decrease of the mean erosion risk, but this might sometimes be compensated for by the effect of the previously mentioned increased field size. And in fact, two catchments experienced an increase of the mean erosion risk, the two others a decrease. Whether the erosion risk will decrease or increase, is dependent on several factors: • The evolution of the field size, which is not the same for all catchments (Table 18.1). • The decrease of the percentage arable land is not in the same order of magnitude for all catchments. For the Kouberg area the percentage arable land decreased only a small amount which implied a serious increase of the erosion risk since 1947 (Table 18.2). On the other hand, for the Bokkenberg area almost one third of the arable land has been converted to pasture or woodland; this area experienced an important decrease of the mean RUSLE value (Table 18.2). • The LS-values are especially high for the steep slopes (Figures 18.2 and 18.3). When the percentage arable land is decreased, the farmers will choose to convert these steeper areas to pasture or woodland. The degree to which this happens, is partially dependent on the relative availability of steep slopes; because the steepest slopes have mostly been converted first, the slopes remaining for conversion later on were necessarily less steep. This is illustrated for the Kinderveld catchment (Table 18.3); due to the fact that the steepest areas were previously converted to pasture or woodland, the mean slope for the non-arable land decreased with time. However, the slopes for the non-arable land were always significantly higher than those for the arable land. Moreover, the mean slope for the converted area was always clearly higher than the mean slope for the remaining arable land; for example, the mean slope of the area converted to pasture and woodland between 1947 and 1969 was 13.1 percent compared to a mean slope value of less than 6 percent for the remaining arable land. For the period between 1969 and 1991 these values were respectively 6.6 percent and 5.1 percent.
222
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 18.3: Spatial pattern of the pixel-wise LS-values for the Kinderveld catchment based on the parcellation of 1991. Table 18.3: Mean Slope Gradients for the Kinderveld catchment Kinderveld
1947
1969
1991
Arable land Non-arable land
5.92% 15.79%
5.17% 13.57%
5.06% 10.99%
The effect of changes in land use on the erosion risk is probably best illustrated by the spatial pattern of the RUSLE values. Figure 18.4 shows the evolution of these patterns for the Kinderveld area. The lower slopes and the main valley (Figure 18.2) in particular experienced a gradual but important decrease of the erosion risks over time which is mainly due to a conversion towards woodland. On the other hand, for some other areas, erosion risk was increasing due to the increase in field size. 18.4 DISCUSSION According to this study an increase of the erosion risk due to the combined evolution of field size and land use occurred in the last two centuries. The representation of the temporal evolution by sequent snapshots is appealing but also implies some dangers (Langran, 1993). A major limitation may be the number of time slices which cannot fully capture all temporal changes. As the input data used are the sole data available, they must be sufficient to detect the general tendency, especially as to the evolution of field size. The aim was to study the evolution of the erosion risk as influenced by field size and land use. The evolution of the erosion risk in the last two centuries was also dependent on other factors (e.g. plowing techniques, soil structure). These factors also changed significantly over time, but in this case even less information is available in respect to their temporal evolution. 18.5 CONCLUSION A two-dimensional formulation to calculate the topographic LS-factor for topographically complex terrain was implemented in a GIS environment. The linkage of this procedure in a GIS offers several advantages compared to the one-dimensional and/or manual approach; it may account for the effect of flow convergence on rill development and it has advantages in terms of speed of execution and objectivity. The
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
223
Figure 18.4a: Spatial pattern of the parcel-wise RUSLE soil loss values for the Kinderveld catchment A: predicted values for 1870 (parcellation of 1947); B: predicted values for 1947.
ease of linking this module with a GIS facilitates the application of the (Revised) Universal Soil Loss Equation to complex land units, thereby extending the applicability and flexibility of the USLE in land resources management. This is shown by an application in which the effect of changes in field size and land use on the erosion risk, is investigated and quantified. A significant increase of the field size occurred since the Second World War which caused a substantial increase of the erosion risks when only considering topography and field size. This increase was very similar for the four studied catchments. However, this effect may sometimes be
224
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 18.4b: Spatial pattern of the parcel-wise RUSLE soil loss values for the Kinderveld catchment C: predicted values for 1969; D: predicted values for 1991.
counteracted by changes in land use which were more variable between the catchments. The extent and degree to which this is true, depends on physical and social-economic factors of the region. This approach also enables the identification of those areas that should be converted to pasture or woodland in case of supra-national authorities (e.g. the EU) or when the market situation impose the farmers to shrink the arable land further.
EVALUATION OF SOIL EROSION USING A GIS-BASED USLE APPROACH
225
REFERENCES AHNERT, F. 1976. Brief description of a comprehensive three-dimensional process-response model of landform development Zeitschrift für Geomorphologie Suppl.Band, 25, pp. 29–49. BOLLINNE, A. 1985. Adjusting the universal soil loss equation for use in Western Europe, in El-Swaify S.A., Moldenhauer W.C. and Lo A. (Eds.) Soil Erosion and Conservation. Ankeny: Soil Conservation Society of America, pp. 206–213. BORK, H.R. and HENSEL H. 1988. Computer-aided construction of soil erosion and deposition maps, Geologisches Jahrbuch Al, 04, pp. 357–371. BUSACCA, A.J., COOK, C.A. and MULLA, D.J., 1993. Comparing landscape-scale estimation of soil erosion in the Palouse using Cs-137 and RUSLE, Journal of Soil and Water Conservation, 48(4), pp. 361–367. DEPUYDT, F. 1969. De betrouwbaarheid en de morfologische waarde van een grootschalige a Geographica Lovaniensia, 7, pp. 141–149. DESMET, P.J.J. and COVERS, G. 1995. GIS-based simulation of erosion and deposition patterns in an agricultural landscape: a comparison of model results with soil map information, Catena, 25(1–4), pp. 389–401. DESMET, P.J.J. and COVERS, G. 1996a. Comparison of routing systems for DEMs and their implications for predicting ephemeral gullies, International Journal of GIS, 10(3), pp. 3 11–331. DESMET, P.J.J. and COVERS, G. 1996b. A GIS-procedure for the automated calculation of the USLE LS-factor on topographically complex landscape units, Journal of Soil and Water Conservation, 51(5), pp. 427–433. DESMET, P.J.J. and COVERS, G. 1997. Two-dimensional modelling of rill and gully geometry and their location related to topography, Catena. D’SOUZA, V.P.C. and MORGAN, R.P.C. 1976. A laboratory study of the effect of slope steepness and curvature on soil erosion, Journal of Agricultural Engineering Research, 21, pp. 21–31. EASTMAN, R. 1992. IDRISI version 4.0, User’s Guide. Worcester: Clark University, Graduate School of Geography. FLACKE, W., AUERSWALD, K. and NEUFANG, L., 1990. Combining a modified Universal Soil Loss Equation with a digital terrain model for computing high resolution maps of soil loss resulting from rain wash, Catena, 17, pp. 383–397. FOSTER, G.R., 1991. Advances in wind and water erosion prediction, Journal of Soil and Water Conservation, 46(1), pp. 27–29. FOSTER, G.R. and WISCHMEIER, W.H. 1974. Evaluating irregular slopes for soil loss prediction, Transactions of the American Society Agricultural Engineers, 17, pp. 305– 309. GRIFFIN, M.L., BEASLEY, D.B., FLETCHER, J.J. and FOSTER, G.R 1988. Estimating soil loss on topographically non-uniform field and farm units, Journal of Soil and Water Conservation, 43, pp. 326–331. JÄGER, S. 1994. Modelling regional soil erosion susceptibility using the USLE and GIS in Rickson R.J. (Ed.) Conserving SoilRresources: European Perspectives. Wallingford, CAB International, pp. 161–177. JONES, J. 1991. TOSCA version 1.0, Reference Guide. Worcester: Clark University, Graduate School of Geography. KIRKBY, M.J. and CHORLEY, R.J. 1967. Through-flow, overland flow and erosion, Bulletin of the International Association of Hydro logical Scientists, 12, pp. 5–21. LANGRAN, G. 1993. Time in Geographic Information Systems. London: Taylor & Francis. McCOOL, D.K., BROWN, L.C., FOSTER, G.R, MUTCHLER, C.K. andMEYER, L.D. 1987. Revised slope steepness factor for the Universal Soil Loss Equation, Transactions of the American Society Agricultural Engineers, 30, pp. 1387–1396. McCOOL, D.K., FOSTER, G.R., MUTCHLER, C.K. and MEYER, L.D. 1989. Revised slope length factor for the Universal Soil Loss Equation, Transactions of the American Society Agricultural Engineers, 32, pp. 1571–1576. MELLEROWICZ, K.T., REES, H.W., CHOW, T.L. and GHANEM, I. 1994. Soil conservation planning at the watershed level using the Universal Soil Loss Equation with GIS and microcomputer technologies: a case study, Journal of Soil and Water Conservation, 49(2), pp. 194–200. MOORE, I.D and BURCH, G.J. 1986. Physical basis of the length-slope factor in the Universal Soil Loss Equation, Journal of the Soil Science Society of America, 50, pp. 1294–1298.
226
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
MOORE, I.D. and NIEBER, J.L. 1989. Landscape assessment of soil erosion and non-point source pollution, Journal of the Minnesota Academy of Science, 55(1), pp. 18–25. MOORE, I.D., GRAYSON, R.B. andLADSON, A.R. 1991. Digital terrain modelling: a review of hydrological, geomorphological and biological applications, Hydrological Processes, 5, pp. 3–30. RENARD, K.G., FOSTER, G.R., WEESIES, G.A., McCOOL, D.K. and YODER, D.C. 1993. Predicting Soil Erosion by Water: A Guide to Conservation Planning with the Revised Universal Soil Loss Equation (RUSLE). Washington D.C., USDA. RENARD, K.G., FOSTER, G.R., YODER, D.C. and McCOOL, D.K. 1994. Rusle revisited: status, questions, answers, and the future, Journal of Soil and Water Conservation, 49(3), pp. 213–220. WILLIAMS, J.R. and BERNDT, H.D. 1972. Sediment yield computed with universal equation in Journal of the Hydraulics Division, 98, pp. 2087–2098. WILLIAMS, J.R. and BERNDT, H.D. 1977. Determining the USLE’s length-slope factor for watersheds, in Soil Erosion: Prediction and Control, Proceedings of a National Conference on Soil Erosion, West Lafayette, 24–26 May 1976. West Lafayette: Purdue University, pp. 217–225 WILSON, J.P. 1986. Estimating the topographic factor in the universal soil loss equation for watersheds, Journal of Soil and Water Conservation, 41(3), 179–184. WISCHMEIER, W.H. 1976. Use and misuse of the universal soil loss equation, Journal of Soil and Water Conservation, 31(1), pp. 5–9. WISCHMEIER, W.H. and SMITH. D.D. 1965. Predicting rainfall erosion losses from cropland east of the Rocky Mountains, USDA Agricultural Handbook 282, Washington, DC: USDA WISCHMEIER, W.H. and SMITH, D.D. 1978. Predicting rainfall erosion losses: a guide to conservation planning, USDA Agricultural. Handbook 537. Washington, DC: USDA YOELI P. 1983. Digital terrain models and their cartographic and cartometric utilisation, The Cartographic Journal, 20 (1), pp. 17–22. YOUNG, R.A. and MUTCHLER, C.K. 1969. Soil movement on irregular slopes, Water Resources Research, 5(5), pp. 1084–1089. ZEVENBERGEN, L.W. and THORNE, C.R. 1987. Quantitative analysis of land surface topography, Earth Surface Processes and Landforms, 12, pp. 47–56.
Chapter Nineteen GIS for the Analysis of Structure and Change in Mountain Environments Anna Kurnatowska
19.1 INTRODUCTION 19.1.1 Aim of the research The research presented in this chapter compares the environmental structure of two mountainous areas, placed in different climatic zones. The analysis is based on delimited, homogenous, typological units (called geocomplexes) and consists of a detailed description of their morphology (size, frequency, shapes), assessment of relationships between delimited units (indication of dominant and subordinate, stable and fragile geocomplexes; determination of neighbourhood of geocomplexes); comparison of frequency and strength of relationships between geocomponents in different types of geocomplexes and assessment of biodiversity of the researched environments. From this comparison it is possible to draw general conclusions (main processes ruling montane environments), which predict further environmental changes and evaluate simple models of correlation between vegetation, geology and slope class, while stressing the differences between mountainous areas placed in different climatic zones. The analysis and comparison were carried out with the help of GIS and statistical methods. The use of GIS enabled the quantification of environmental data which was later used as the basis for statistical methods. 19.1.2 Brief characteristics of areas of interest The research covers the Five Lakes Valley located in the Tatra Mountains and Loch Coruisk Valley eroded from the Cuillin Hills on Skye, a Scottish Island of the Inner Hebrides (see Figure 19.1). The sites were chosen because both areas represent an alpine type of environment, differentiated with respect to geology, climate, hydrology, biogeography and biocenosis, and also in consideration of their beauty, uniqueness, complexity and fragility. The Tatra Mountains are eroded from acidic granite that intruded in the Carboniferous period; the Cuillin Hills are composed mainly of Tertian intrusions (intermediate gabbro and ultrabasic peridotites). In a humid climate granite weathers quickly, producing round hills; under the climatic conditions prevailing in the Tatra Mountains, weathered granite forms steep and rugged peaks. In alpine environments, the main geocomponent that determines the character of all other geocomponents is relief. Hence the researched
228
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 19.1: Location of Loch Coruisk Valley and The Five Lakes Valley
areas have one essential common feature: they have been formed by glacial processes during the last glacial epoch. Placed in the subalpine and alpine zones, both valleys constitute typical glacial valleys: maximum denivelations in Loch Coruisk Valley are 900 metres and in the Five Lakes Valley they are 650 metres. Generally speaking, Loch Coruisk Valley is steeper, less accessible and more rugged than the Five Lakes Valley. Both areas of interest are considered as high mountain environments and are appreciated by climbing and trekking tourists. Quaternary deposits in the Five Lakes Valley cover 73 percent of the researched area (including moraines —36 percent, slope deposits—27 percent, and alluviums—10 percent); in Loch Coruisk Valley—67 percent (34 percent, 11 percent and 12 percent, respectively). Gabbro is an extremely hard rock and despite its basic character and its great resistance to weathering does not produce particularly basic soils. Generally, in both areas of interest poor soils (lithosols, regosols, rankers and podzols) prevail. More fertile soils in the Five Lakes Valley developed on mylonites that are a result of tectonic crushes on ridge passes. Basic soils in Loch Coruisk Valley developed on ultrabasic peridotite and on basic sills and dikes that cut gabbroic laccolith. In both areas of interest surface waters are represented by alpine lakes and streams, and underground waters by few moraine reservoirs and slotted waters. Due to the impermeable ground, bogs and peats have developed at the very bottoms of both valleys at sites where runoff is impeded. However in the more humid climate of Loch Coruisk Valley, bogs and peats dominate in the landscape while in the Five Lakes Valley they form small patches. The most obvious feature differentiating the researched valleys is climate. The Tatra Mountains are placed in a moderate, transitional zone, from Atlantic to continental, with cold winters and hot summers, precipitation peak in summer months (mean yearly precipitation is about 1700 mm), strong foehn winds and strong winter temperature inversions. The geographical situation of Skye determines its cool, extremely Atlantic climate with high precipitation all year (yearly precipitation averages 3400 mm), strong winds and mild winters and summers leading to low temperature amplitudes (37°C, compared with 58°C in the Tatra Mountains). The Atlantic character of Loch Coruisk Valley is further emphasised by the fact that it lies in the southern part of Skye and opens straight to the sea allowing easy penetration of the valley by humid,
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
229
westerly winds. Skye climate can be characterised by its name which derives from the Norse “skuy” meaning “cloud”; in Gaelic it is Eilan a Cheo—“Isle of mists”, as there are more than 300 rainy days in an average year. The Five Lakes Valley is located above the tree line (the lowest point in the valley is the Great Lake at 1664 metres above sea level). Although Loch Coruisk Valley reaches sea level, it is also devoid of trees due to hard rock, steep slopes and exposure to strong winds and humid air masses from the sea. Main vegetation types on both areas of research are: scree and cliff communities, subalpine and alpine grasslands and dwarfshrub heaths, fens, bogs and tall herb communities. Some patches of subnival vegetation can be found at both sites. Both valleys have long been subjected to man’s activity (burning, tourist pressure) and sheep grazing whose influence is depicted by introduction of anthropogenic communities and species. 19.1.3 Methods: construction of the database and processing of maps The Polish site of research covers an area of 5.2 km2 while the Scottish one is 13.8 km2. The small areas of research, the enormous variety of mountainous ecosystems and the aim of the research, required a detailed study. Both case studies were carried out at 1:10,000 scale. Scarce environmental data at such a detailed scale and the mountainous character of the researched valleys determined the choice of three principal components that were used in the analysis: geology, geomorphology (slope class) and vegetation. Geological and contour maps (with 50 metres cut) were available at 1:10,000 scale, while vegetation maps were produced during terrain surveys (mapping of vegetation communities) at 1:7,500 scale and partially on the basis of interpretation of aerial photos at about 1:25,000 scale. All maps were digitised and rasterized. For both sites the same size of base unit was assumed: one pixel represented 4.2 m×4.2 m in terrain (which is 0.42×0.42 mm on a map at 1:10,000 scale). The main shortcoming of such a fine pixel size was increased processing time and big file sizes. However, as the credibility of final results depends on quality and detail of input maps, precision in the database construction is a must. The next step in map processing was interpolation of contour maps (development of Digital Elevation Models) and derivation of slope and aspect maps for both areas of interest. Finally slope maps were classified (classification according to Kalicki, 1986 —see Figure 19.2). The maps (Figures 19.2, 19.3, 19.4 and 19.5, see also Tables 19.1 and 19.2) were overlaid to produce maps of geocomplexes (homogeneous land units characterised by one feature of geology and vegetation). Automated overlaying of maps was accomplished using the map algebra module in IDRISI. Simple processing produced maps containing very small contours that were eliminated using a specially designed filter (all areas of less than 4 pixels were joined to the neighbouring class of pixels). Finally maps of geocomplexes were processed with respect to environmental sense of the delimited geocomplexes that is, some similar types were joined in one type. In the Five Lakes Valley 42 types represented by 1407 individual land units were delimited, on Skye values were 41 and 1442, respectively. Maps of types of geocomplexes became base maps to analyse the structure of the environments. The task was implemented through statistical analysis of the geocomplexes. Most statistical methods require numerical data whereas geocomplexes constitute qualitative characteristics of the environment. To quantify the data, three numerical parameters of geocomplexes were calculated: areas, perimeters and quantities of distinctive land units in particular types of geocomplexes. Calculation of these measurements was possible with the use of GIS methods and allowed the evaluation of more than 30 statistical indicators and coefficients, some of which are presented in this chapter.
230
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 19.2 Maps of slopes of The Five Lakes Valley and Loch Coruisk Valley Table 19.1: Legend for the Vegetation Communities in the Five Lakes Valley (Geocomplexes: A=granite; B=mylonite; C=moraines; D=slope deposits; E=river alluvium) No
1.
Vegetation type
Subnival grasslands (Minuartio-Oreochloetum distichae, Com Oreochloa distichaGentiana frigida, Trifido-Distichetum subnival variant with Oreochloa disticha)
Type of Geocomplexes A
B
1
15
C
D
E
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
Figure 19.3: Geological maps of The Five Lakes Valley and Loch Coruisk Valley
231
232
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 19.4. Vegetation map of The Five Lakes Valley No
Vegetation type
Type of Geocomplexes A
2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Montane typical grasslands (Trifido-Distichetum typicum) Montane mossy grasslands (T.-D. sphagnetosum) Montane chinophilous grasslands (T.-D. salicetosum herbaceae) Montane boulder shrubs and grasslands (T.-D. salicetosum kitaibelianae, Drepanoclado-Salicetum kitaibelianae) Montane scree grasslands (T.-D. scree variant with Juncus Trifidus) Montane grazed grasslands (T.-D. caricetosum sempervirentis) Subalpine grasslands (T.-D. anthropogenic variant with Agrostis rupestris and Deschampsia flexuosa) Mylonrte communities (Festuco versicoloris-Agrostietum alpinae, Com. with Silene cucubalus) Scree and boulder communities (Rhizocarpetalia) Chinophilous heaths and mosses (Salicetum herbaceae and mossy communities from Salicetea herbaceae) Chinophilous grasses (Luzuletum spadiceae) Tall herb communities (Calamagrostietum villosae, Aconitetum firmi, Adenostyletum alliariae) Bogs (Com. with Eriophorum vaginatum, Sphagno-Nardetum, Shagno-Caricetum) Fens (Caricetum fuscae subalpinum)
B
2 5 7 9
C
D
3
4 6 8 10
12
11 13
E
14 16 17 20 21 26
22
18
19
24 28
25 29
31 33
23 27 30 32
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
233
Figure 19.5: Vegetation map of Loch Coruisk Valley No
Vegetation type
Type of Geocomplexes A
16. Dwarf-shrub pine (Pinetum mughi carpaticum) 17. Dwarf-shrub heaths (Vaccinietum myrtylli, Empetro-Vaccinietum) 18. Anthropogenic fresh grasslands (Com. with Deschampsia flexuosa, Hieracio alpiniNardetum) 19. Anthropogenic wet grasslands (Cerastio fontani-Deschampsietum) 20. Antropogenic communities (Com. with Stellaria media, Urtico-Aconitetum, Com. with Rumex obtusifolius, Com. with Cardaminopsis halleri, Com. with Ranunculus repens)
B
34
C
D
35 37 38
36
E
39
41 42
40
Table 19.2: Legend for the Vegetation Communities in Loch Coruisk Valley (Geocomplexes: A=gabbro; B=periodites; C=moraines; D=slope deposits; E=river alluvium) NO Type of vegetation
1. 2. 3.
Montane grasslands (Com. Festuca ovina-Luzula spicata) Montane grasslands and heaths (Cariceto-Rhacomitretum lanuginosi) Montane heaths (Rhacomitreto-Empetrum)
Type of Geocomplexes A
B
1 3 5
8
C
D 2 4 6
E
234
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
NO Type of vegetation
Type of Geocomplexes A
4. 5. 6. 7. 8.
9. 10. 11. 12. 13. 14. 15. 16.
Montane dwarf shrubs (Com. with Juniperus nana) Scree and boulder communities (Rhizocarpetalia) Acidic tall herb communities (Com. Luzula sylvatica-Vaccinium myrtillus) Calcareous tall herb communities (Com. Sedum rosea-Alchemilla glabra) Eutrophic fens (Com Carex rostrata-Scorpidium scorpioides, Com Carex paniceaCampylium stellatum, Com. Eriophorum latifolium-Carex hostiana, Com. with Schoenus nigricans) Mezotrofic fens (Com. Trichophorum cespitosum-Carex panicea, Com.Molinia caerulea-Myrica gale) Ombryotrophic mires (Com. Eriophorum angustifiolium-Sphagnum cuspidatum, Trichophoreto-Callunetum, Trichophoreto-Eriophoretum) Bogs and wet-heaths moderately flushed by water (Molinieto-Callunetum) Species-poor dwarf-shrub heaths (Callunetum vulgaris) Mossy dwarf-shrub heaths (Vaccinieto-Callunetum hepaticosum) Anthropogenic species-poor grasslands (Agrosto-Festucetum species-poor) Anthropogenic species-rich grasslands (Agrosto-Festucetum species-rich, Alchemilleto-Agrosto-Festucetum) Coastal communities (Com. Asplenium marinum-Grimma maritima)
7 9 12 15 18
B
C
10
D 11 14 16
E
20
13 17 19
21
23
22
24
26
25
27 29 31 34 37
28 30 33 36 40
32 38
35 39
41
19.2 STATISTICAL ANALYSIS OF THE ENVIRONMENTAL STRUCTURES The calculated indices and measures can be divided into four groups: 1. parameters of size and frequency of occurrence of land units; 2. indices of shape of land units; 3. measures characterising spatial (horizontal) structure of geocomplexes (pattern complexity, nearest neighbour frequency, landscape richness, eveness, patchiness, entropy, diversity and dominance of geocomplexes); and 4. measures characterising relationships between different features of geocomponents (strength and frequency of relationships between slope class, type of vegetation and geology). 19.2.1 Parameters of size and frequency of geocomplexes Size and frequency of geocomplexes constitute simple, basic and introductory descriptions of the morphology of the environment. At the same time, however, they are very important, as the analysis of the structure of geocomplexes depends on the scale of the research and the size of delimited geocomplexes.
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
235
Mean size of geocomplexes within a type Large mean sizes of geocomplexes within a type express stability. The largest mean sizes of types of geocomplexes in The Five Lakes Valley characterise the following types of geocomplexes: subalpine grasslands on moraines (14), grazed montane grasslands on slope deposits (13), mylonite communities on slope deposits (16) and scree communities on moraines (18). Subalpine meadows occur in a transitional zone between alpine and subalpine zones and constitute characteristic mixture of communities typical for both zones, which explains their large sizes. Moreover, similar to montane grazed grasslands, they are a result of intensive sheep grazing which continued till the late 1960s, when the Five Lakes Valley was purchased by the Tatra National Park from private farmers. Degeneration and floristic changes of communities are not so strong as in subalpine zones, but in both cases sheep grazing lead to the development of semi-natural geocomplexes of simplified inner structure. Hence, in spite of the primitive character of pasture management and a short time of grazing during the year (two months yearly), it led to simplification of characteristics, and a mosaic structure of the alpine environments. This fact supports a theory that environmental structure depends on the strength of anthropopression: at the beginning human activity leads to diversification of environmental structure (number and sizes of geocomplexes); with its intensification, however, it leads to substitution of small, diversified geocomplexes by large, cohesive and unified ones (Pietrzak, 1989). Mylonite communities form “flower meadows” on the alluvium talluses below mountain passes with mylonite outcrops formed on dislocation lines. The main determinant of the occurrence of this community is the presence of CaCO3 in flushing water. Other geocomponents like geology, relief, ground humidity and microclimate are of less importance and do not contribute to diversification of these communities in the scale of current analysis. Scree communities on slope deposits (19) are communities of extensive scree boulders in side valleys. Low diversification is a result of lack of linear geocomplexes (streams or gullies) which usually cut large geocomplexes. Also large boulders constitute a hostile environment for any other type of vegetation, because of lack of soil. The following types of geocomplexes have the smallest sizes: chinophilous grasses on mylonites (22), scree communities on granite (17), montane grasslands on mylonites (15) and tall herb communities on river alluviums (27). Small areas of types 22, 15 and 27 are due to the type of geology underlying rocks: both mylonites and alluviums form small patches which are important in the diversification of environment. Scree communities on granite exist mainly on rugged summits and crevices, and in a 3-D reality cover large areas. However, as they are presented on a 2-D map, their size parameters are significantly underestimated. In Loch Coruisk Valley the largest geocomplexes are found in the following types: scree communities on slope deposits (11), eutrophic fens on river alluviums (19), anthropogenic species-rich grasslands on gabbro (37), montane grasslands on gabbro (1) and poor anthropogenic grasslands on gabbro (34). Types 19 and 1 can be considered as stable and typical geocomplexes in the Loch Coruisk Valley landscape. Comparatively large sizes of types 11, 37 and 34 can be explained in the same way as the large sizes of the respective geocomplexes in the Five Lakes Valley in the Tatra Mountains. The smallest mean areas of types of geocomplexes in Loch Coruisk Valley are observed in the following types: mesotrophic fens and flushed bogs on gabbro (21 and 27), and scree communities on peridotite and gabbro (10 and 9). Types 21 and 27 form small patches in small concaves in solid rocks, often on watershed borders in upper parts of the valley or on muttons. Small sizes of types 10 and 9 are, again, due to the 2-D representation of the 3-D environment and are significantly underestimated. As a final analysis of the sizes of geocomplexes an index of area variability for each type of geocomplex was calculated (Bocarov 1976):
236
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(19.1) where: s – standard deviation of the area within a geocomplex type, – mean value of the area within a geocomplex type. The index represents standard deviation as a percentage of mean value and hence eliminates direct influence of mean values on standard deviation which enables comparison of size variability for different types of geocomplexes. The index correlates with the total area of the type of geocomplexes (correlation equal 0.63 and at the 0.00 significance level). In the Five Lakes Valley the index ranges from 26 percent to 231 percent and reaches its maximal values for the following types: scree montane grasslands on moraines (12), scree communities on moraines (18) and anthropogenic wet grasslands on moraines (41). On Skye it ranges from 44 percent to 367 percent and reaches its maximum for: flushed bogs on gabbro (27), speciespoor heaths on moraines (30), and anthropogenic species-poor grasslands on gabbro (34). Generally speaking these are large and typical geocomplexes which have quite a wide range of habitat requirements. Small variability of size is characteristic for small and rare geocomplexes. Most often these are geocomplexes which require specific habitat conditions. In the Tatra Mountains these are fens (32, 33) and chinophilous communities (20, 22); on Skye they are represented by calcareous communities (8, 15, 16, 37) and montane heaths (5, 3). These communities are homogenous and in spite of their episodic character and small areas, are important as they indicate non-typical habitats which contribute to the specific character of the researched environments. Frequency of types of geocomplexes High area frequency of geocomplexes is characteristic for dominant geocomplexes which constitute the so called “landscape background”. In the Five Lakes Valley as many as six types cover 40 percent of the total valley area: scree communities on moraines (18), dwarf pine on moraines (35), chinophilous grasses on slope deposits (25), scree communities on slope deposits (19), anthropogenic fresh grasslands on moraines (35) and typical montane grasslands on granite (2). Types of geocomplexes covering a small percentage of the total valley area but of high dispersion of individual geocomplexes (high frequency of occurrence) are characterised by common but specific habitats covering small areas. Good examples in the Tatra Mountains are tall herb communities occurring in long, narrow belts along disrupt slope bents on the border of solid rock and accumulation deposits (types 26 and 28), chinophilous grasses (21, 23, 25) and dwarf pine on slope deposits (36). In Loch Coruisk Valley six types of geocomplexes, most of which are hydrogenic (types 20, 36, 11, 29, 34, and 28), cover as much as 50 percent of the total area. This is further evidence that the dominant geocomponent in Loch Coruisk Valley is climate, which determines all the other geocomponents. Small but common individual geocomplexes are characteristic for basic habitats (types 38, 15, 17, 18, 10, and 4). 19.2.2 Indices of shape Shape indices provide information on the main processes ruling the overall performance of geocomplexes. Index of shape dismemberment
Index of shape dismemberment reaches its minimum value of 1 for circles and approaches infinity for highly dismembered shapes (Pietrzak, 1989):
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
237
(19.2) where: A = area of individual geocomplex, P = perimeter of individual geocomplex. Dismemberment of shapes is highly scale-dependent. In spite of the detailed scale of the research, shapes of individual geocomplexes are generalised. Hence the index renders rather elongation of shapes. It correlates with mean perimeters of types, and mean size of geocomplexes. Large geocomplexes are more dismembered or elongated than small ones. On both areas of interest the index reaches its maximum for geocomplexes of subnival and montane zones, being under the strongest influence of gravitation forces (rock and boulder fall, soil creep, rock slides, landslides, mud-flows, avalanche erosion and water erosion)—in the Five Lakes Valley in types 1, 2, 5, 9, 13, 16, in Loch Coruisk Valley: 6, and for hydrogenic geocomplexes dependent on water flow (16, 23; and 12, 14, 15, 17, 19, respectively). Mean value of the shape dismemberment index for Skye (2.17) is higher than for the Tatra Mountains (1.91) and is a result of higher inaccessibility, steepness of slopes and more humid climate. Roundness index
The mathematical expression defining the index is a square inverse of the index of shape dismemberment and is given by the following equation (Pietrzak, 1989): (19.3) The index reaches its maximum (Rc = 1) for circles and approaches 0 for long and dismembered shapes. In the Five Lakes Valley it reaches its maximum (Rc = 0.51) for anthropogenic geocomplexes, which significantly stand out from natural geocomplexes. The highest values of the index (0.37–0.38) are found in two groups of geocomplexes. The first is represented by bogs and chinophilous communities (in the Five Lakes Valley: 8, 33, 20, 31, 37; in Loch Coruisk Valley: 26, 24, 32, 31); the second by shrub and grass communities on “islands” of solid rocks buried in moraine deposits (34 and 34, respectively). Both groups are characteristic for subalpine zone: dwarf pine zone in the Tatra Mountains and the respective heath zone on Skye, where gravitation forces are slowed down in comparison to the montane and subnival zones. Mean value of the roundness index in the Five Lakes Valley (0.32) is higher than the mean value of the index calculated for Loch Coruisk Valley (0.27) and confirms a fact that Loch Coruisk Valley environment is much more influenced by one-direction gravitation processes than The Five Lakes Valley. 19.2.3 Analysis of spatial patterning Landscape indices derived from information theory approximate homogeneity and diversification of environment. In this chapter four indices are presented: landscape diversity (absolute entropy), relative landscape diversity (relative entropy), dominance and likeness index (nearest neighbour analysis). Diversity (absolute entropy) and relative diversity (relative entropy)
Entropy is a basic notion used in cybernetics which for continuous features is based on probabilities of occurrence of particular features and is expressed by logarithmic function (Richling 1992):
238
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(19.4) where: p=si/s=probability of occurrence of particular state (the proportion of landscape in habitat i or % of geocomplexes within the whole research area), Si=area of individual geocomplex within a type (or 1), s=total area of a type (or number of individual units within a type), n=number of observed individual units within a type. In this study probabilities of occurrence of individual geocomplexes were measured both on the basis of areas of geocomplexes and number of geocomplexes. As the logarithm base was set to 2, entropy is given in bits. Entropy increases with number of individual units within a type. It reaches a minimum (0) for one element sets and approaches infinity with an increase of individual units within a type. At a given number of individual units within a type, absolute entropy of the type of geocomplexes is higher when the probability distribution of a feature is close to even and reaches its maximum when units cover the same area. A relation of absolute entropy to maximum entropy for a given type is called a relative entropy: (19.5) where: (19.6) Entropy is strongly scale-dependent (it is calculated on the basis of frequencies of occurrence and sizes of landscape units), hence it is difficult to compare entropies of different landscapes evaluated in different studies. However, entropies evaluated for Loch Coruisk Valley and the Five Lakes Valley can be compared without a fault. Although mean sizes of landscape units in Loch Coruisk Valley are bigger than mean sizes of landscape units in the Five Lakes Valley, the total researched area of Loch Coruisk Valley is bigger than the total researched area of the Five Lakes Valley. A single, individual landscape unit in Loch Coruisk Valley constitutes 6.93x10–4 % of total researched area while in the Five Lakes Valley it constitutes 7, 11×10–4% of the total researched area. As these values are comparable, they do not influence indices based on entropy. Absolute and relative entropies based on areas and number of individual units within a type are highly correlated: in The Five Lakes Valley correlation equals 0.97 while in Loch Coruisk Valley it is 0,94 so both measures are good approximations of landscape diversity. As absolute entropy describes diversity of a particular type and correlates highly with number of individual units within a type (correlation 0.96 in the Five Lakes Valley and 0.94 in Loch Coruisk Valley) its measurements do not contribute significantly to the description of landscape structure of the researched environments. Mean entropy of the Five Lakes Valley equals 5.26, and of Loch Coruisk Valley 4.73; Loch Coruisk Valley is characterised by lower dispersion of individual units within a type and hence lower landscape diversity. This fact can be explained by the strong influence of mesoclimate on the Loch Coruisk Valley landscape. Strong winds and high humidity leave their impact on spatial patterning through blurring the impact of microclimate and eliminating small-scale differences of the environment. In the Tatra Mountains the impact of microclimate and local ground humidity is strong and leads to formation of small patches of vegetation communities (and hence geocomplexes) differentiated with respect to local soil and microclimate conditions. Relative entropy eliminates the influence of the number of units and correlates inversely with an index of area diversification (correlation index in the Five Lakes Valley equals −0.78 and −0.59 in Loch Coruisk Valley at significance level of 0.00). Relative entropy ranges from 0 to 1 and reaches its maximum for types
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
239
with the lowest index of area diversification. Mean entropy of the researched valley is very high and in the Five Lakes Valley reaches 0.975 and in Loch Coruisk Valley 0.883. This is further confirmation of a fact that landscape of the Five Lakes Valley is more diversified than that of Loch Coruisk Valley. Dominance
Dominance is another statistical index based on entropy (Turner, 1989): (19.7) It correlates highly with frequencies based on areas and shows dominant types of geocomplexes with respect to the percentage of area taken by a particular type of geocomplex. Furthermore it correlates highly with the index of area diversification (0,97 at both sites at a significance level of 0.00) and strongly but negatively with relative entropy (in the Five Lakes Valley correlation equals −0.85, and in Loch Coruisk Valley −0.71, both at a significance level of 0.00). To summarise, although indices based on the notion of entropy constitute good approximations of landscape patterning they can be substituted by the simpler indices presented above. Likeness index
Likeness index, measured on the basis of length of common borders between distinct land units is a good method of nearest neighbour analysis. It is given by the following equation (Richling, 1992): (19.8) where: a (or b)=total length of border of unit a (or b), c=total common length of border between units a and b. Likeness of units which do not border with each other is 0 and for two types bordering only with each other (which occurs when there are only two types of geocomplexes in the researched landscape), it reaches 100 percent. The index was evaluated for each pair of bordering types of geocomplexes; finally mean values for each occurrence of a bordering type were calculated. On both sites of research the most frequent neighbouring types of geocomplexes are pairs with the same vegetation but different geology. In the Five Lakes Valley the index reaches its maximum value of 45 percent and is above 30 percent for the following pairs: (32, 33); (30, 31); (7, 8); (38, 39); (5, 6); (2, 4) and (41, 42) (see Tables 19.1 and 19.2). In Loch Coruisk Valley the highest values of the index are between 49 and 35 percent for the following pairs: (19, 20); (3, 4); (34, 36); (1, 2); (1, 10); and (31, 33). It can be explained by the fact that vegetation categories are much more detailed than geological categories (neighbourhood of two types from the total of 5 is much more probable than neighbourhood of two types from the total of 20). It testifies that ecological amplitudes of vegetation communities are much narrower than variation of soil humidity and mineral composition that can be read from available geological maps. Granulometric composition of geological material could be a simple projection of soil conditions. Unfortunately geological maps of both researched areas are very detailed with respect to solid rocks (detailed division of granites and gabbro which have little influence on vegetation) and very rough with respect to Pleistocene and Holocene deposits. A good example is the category “moraines” which constitutes 36 percent of the total area of the Five Lakes Valley and consists of ground, lateral, interlobate, terminal, and other moraines which differ significantly with respect to size of particles and hence availability of nutrients and humidity for plants. As maps of vegetation of both areas present a
240
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
better description of the researched environments, further detailed recognition of environmental structures could be well based on recognition of structure of vegetation habitats. The second group of geocomplexes with high likeness index consists of geocomplexes of different vegetation types but occurring on the same geological unit. In the Five Lakes Valley the highest values of the index ranges between 21 percent and approximately 12 percent for the following types of geocomplexes: (11, 25); (5, 9); (18, 35); (2, 21). In Loch Coruisk Valley these are represented by the pairs: (8, 10); (32, 38); (19, 22); (15, 29); (20, 23) and (15, 31). This group characterises most often neighbouring vegetation communities. The pairs represent communities with similar habitat requirements, for instance ground humidity requirements (eutrophic and mesotrophic fens) or similar slope aspect (mossy montane grasslands and boulder montane shrubs and grasslands)—usually communities tied to the same microclimate. However, some of the pairs are characterised by diametrically different habitats: for instance typical montane grasslands and chinophilous grasses on granite. It testifies a fact of great diversification and mosaic character of montane environments. In the group of pairs with different vegetation and geological units, the likeness index is very low and reaches 5–7 percent. On both sites this group of geocomplexes is derived either from basic geological units (mylonites and peridotites) or from calcareous communities which occur on non-basic geological units but are flushed by basic waters flowing from above, basic regions (in the Five Lakes Valley: (16, 1); (2, 15); (2, 16); in Loch Coruisk Valley: (17, 32) and (4, 8)). This group thus relates hence to the first group where the main determinant of location was geology. Mean value of the likeness index in the Five Lakes Valley is 3.92 percent and in Loch Coruisk Valley 5. 03 percent. An important characteristic is also the percentage of occurring neighbourhoods from the total number of all possible neighbourhoods. In the Five Lakes Valley it reaches 51 percent and in Loch Coruisk Valley—43 percent. This is further confirmation that Loch Coruisk Valley is better ordered than the Five Lakes Valley whose types and individual units are more dismembered and accidental. 19.2.4 Analysis of vertical structure of geocomplexes Index of strength of relationship
The index of strength of relationship describes the relationship between pairs of geocomponents such as soils and geology, microclimate and vegetation, etc. (Bezkowska, 1986; Richling, 1992). The index expresses the relation of area (or number of occurrences) covered by geocomplexes with particular features to a theoretical, maximum area (or number of occurrences) where the relationship could exist. It is expressed by the following equations: (19.9) (19.10) where: Prg=area or frequency of types of geocomplexes with r type of vegetation and g category of geology, Pr=total area (or frequency) of geocomplexes with r type of vegetation, Pg=total area (or frequency) of geocomplexes with g category of geology.
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
241
The index was calculated both on the basis of frequencies and areas. In this chapter the author presents results of the latter method. The index reaches its maximum (W=1) for the types of geocomplexes which are the only representatives of the particular category of geocomponents. It approaches 0 for the features of geocomponents whose relation with features of other geocomponents is loose and reaches 0 for the features of geocomponents which never occur together (for instance in the Five Lakes Valley typical montane grasslands never occur on river alluviums and dwarf pine never occurs on mylonites). High values of the index are typical for strong and stable relationships, which have a leading role in the environmental structure. The results achieved in this study are presented according to the classification introduced by Bezkowska (1986): very strong relationships (W=0.8–1), strong relationships (W=0.6–0.8), moderate relationships (W=0.4–0.6), loose relationships (W=0.2–0.4) and very loose relationships (W=0–0.2). In the Five Lakes Valley six types of geocomplexes reach the maximum value of the index. This is a direct implication of the method of delimitation of geocomplexes, which after map overlaying where further generalised (use of the filter and classification). This is a good example of instances when the preliminary procedure (GIS analysis) must be considered while interpreting statistical results. On both research areas strong and very strong relationships prevail in three groups of geocomplexes: montane grasslands on solid rocks (in the Five Lakes Valley types: 1, 5, 2 and in Loch Coruisk Valley: 29, 1, 12, 3), hydrogenic geocomplexes on moraines (31, 38, 33, 31 and 23, 28, 26 and 19, respectively), and montane grasslands on basic material (18 and 8, respectively). It is significant that while in the Five Lakes Valley strong and very strong relationships prevail mainly in the first group, in Loch Coruisk Valley they prevail mainly in the second. Hence a conclusion may be drawn that in moderate, transitional climate, in montane environments the most important geocomponent that diversifies environment is relief and altitude (and hence microclimate), while in a cool, Atlantic climate, it is hydrography (dependant upon meso- and macroclimate), which is more important than relief, altitude and microclimate. This phenomenon was observed during the field research. While in the Tatra Mountains one can observe vegetation zones, which appear approximately at the same altitude ranges, on Skye, the transition between vegetation formations is gradual. A good example of the “blurred” sequence is gradual transition of the following communities: heaths, montane heaths, montane heaths and grasslands, montane grasslands, dismembered montane grasslands with scree communities, scree communities, bare rocks. The strength of relationship between different geocomponents describes inner cohesion of the types of geocomplexes. Geocomplexes with strong relationships between geocomplexes are stable and cohesive. The most stable and cohesive geocomplexes are the large and typical ones. Low values of index of relationship indicate unbalanced and fragile geocomplexes. Most fragile geocomplexes are found in transition zones between different vegetation zones. As borders of vegetation zones never have linear character, most fragile geocomplexes form small patches of communities located in the upper limits of their zones of occurrence. Good examples are small patches of dwarf pine in the Five Lakes Valley (mean size of geocomplexes with dwarf pine range from 1681 m to 1808.5 m while the mean size of all the geocomplexes equals 3695 m). They are the most endangered communities, which can be easily destabilised. This fact should be considered during planning of tourist routes which should pass them.
242
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
19.3 INTERPRETATION OF RESULTS—COMPARISON OF THE RESEARCHED ALPINE VALLEYS The research summarised in this chapter, makes it possible to compare the Five Lakes Valley and Loch Coruisk Valley and confirm more general conclusions concerning the structure of the alpine type of environments. The leading component that determines the character of all other components in alpine environments is relief. Strong dependence on gravitation is expressed by elongated and strongly digital shapes of geocomplexes. The most elongated shapes are characteristic for geocomplexes that are under the strongest influence of slope processes and water flow (geocomplexes of subnival and alpine zone and hydrogenic character). At the same time these geocomplexes are the least transformed by man. Another important conclusion is the enormous diversity of the researched environments, depicted by strong break up and dispersion of types of geocomplexes. Such a big diversity of small areas is typical for alpine environments and results from a wide range of different scale factors, e.g. macro-, meso- and microclimate. However, diversity of the researched areas can also be attributed to different factors. In the Five Lakes Valley the main factors differentiating vegetation and hence landscape are: height above sea level (climatic zoning) and meso- and micro-relief that determine humidity and microclimate (which is mainly a result of relief and slope aspect). In the Loch Coruisk Valley Valley, extremely humid climate reduces the influence of temperature and sun exposure and emphasises the importance of slope exposure to humid air masses and strong winds. The effect of these humid conditions is a blurring of the outlines of vertical climatic zoning, e.g. lack of trees within forest zone, lowering of communities typical for alpine zone towards the sea level, and due to low competition, descent of some alpine species to the bottom of the valley. The main factors influencing the researched landscapes are reflected in a wide range of alpine meadows in the Five Lakes Valley and a wide range of hydrogenic communities in Loch Coruisk Valley. Geocomplexes within these communities constitute dominant geocomplexes in the researched landscapes, and being climax communities, are characterised by high cohesion and stability. An interesting differentiation of the researched environments is the differentiation of geology. Geocomplexes on basic geological formations (mylonites in the Tatra Mountains and peridotites on Skye) are unique and characteristic because of their proliferating basic flora dominated by herbal species with big flowers. Because both mylonites and peridotites are easily weathered, basic communities are endangered by degradation. This fact should be taken into account when planning tourist routes. Unfortunately, nearly all the basal habitats in The Five Lakes Valley are on mountain passes crossed by tourist routes. These areas are hence subject to strong erosion due to tourists and water. Of primary importance to landscape diversity are geocomplexes of small area frequencies but large quantity frequencies (many small units within one type of geocomplex), which cut large background geocomplexes. These are mostly represented by geocomplexes with hydrophilic or chinophilous communities or calcareous communities. Although very common, their ecological amplitudes are comparatively narrow—they are fragile to the slightest changes of any of the environmental factors determining their habitats. In spite of the alpine character and low accessibility of the areas of research, both regions have been subjected to anthropopression for many years. Sheep grazing and burning of shrub communities led to significant changes of environmental structure in both valleys. The most transformed areas in the Five Lakes Valley are the subalpine zone and lower parts of the alpine zone, where shrub and heath communities were replaced by meadow and sometimes even mossy communities. In Loch Coruisk Valley, subalpine communities stretched down and dominated the forest climatic zone. Because of low competition, even some
GIS TO ANALYSE STRUCTURE AND CHANGE IN MOUNTAIN ENVIRONMENTS
243
mountain species descended to lower parts of the valley. In both regions man’s activities led not only to disturbance of the natural sequence of vegetation communities but also to unification of structure of subalpine zones. These parts of both valleys are less diversified; individual units are big and cohesive. Apart from changes in the proportion of natural communities, some of the communities are inhabited by lowland species that are alien to the researched areas. Abandonment of man’s activities (establishment of a National Park in the Tatra Mountains, and gradual climate cooling on Skye) leads to gradual withdrawal of lowland and anthropogenic species. However, natural succession may last many years and in extreme cases natural communities may not return at all. For example, heather encroachment in Scotland leads to soil acidification and in some cases to development of peat processes. In the Tatra Mountains grazed meadow communities form compacted sod where dwarf mountain pine (Pinetum mughi carpaticum) seedlings cannot cut through. In these cases anthropogenic changes to the environment are irreversible and lead not only to evident changes of real vegetation but also to permanent changes of habitats that in the long run lead to changes of potential vegetation. Stability of geocomplexes with anthropogenic vegetation is confirmed by high values of indicators of strength between geocomponents. In summary it can be concluded that subnival, alpine and low, hydrogenic parts of both valleys are characterised by strong diversity and dispersion of landscape, with long individual geocomplexes and strong relationships between geocomponents. The subalpine zone, being a transitional zone between alpine and forest zones, includes many geocomplexes that are characteristic for both zones. As many communities reach their extreme habitats, higher diversity and lower stability of subalpine zone should be expected. Meanwhile, significant reduction of landscape diversity (large, cohesive geocomplexes) and strong relationships between geocomponents are observed. This is an evident result of intensive and long-term anthropopression. This confirms also a long acknowledged fact that slight, short-term human activity leads to landscape variation (for instance tourist routes), while strong and long-term activity leads to unification of landscapes. One of the most important conclusions achieved in this part of the research was confirmation of the fact that detailed phytosociological studies can replace complex environmental research. This is due to the fact that vegetation is a perfect indicator of all habitat characteristics like geology, geomorphology, hydrology and micro-climate. Abrupt discontinuities in vegetation are associated with abrupt discontinuities in the physical environment; and vegetation patterns in space reflect general patterns of the landscape and can lead to general conclusions on ecological processes. This is a not a new conclusion but its confirmation is important, especially for mountain environments where detailed studies of micro-climate or hydrology are often not feasible. Detailed studies of vegetation, being a relatively easy method of research, can be considered as the most important method of research in such diversified ecosystems. 19.4 FURTHER RESEARCH One of the problems encountered during the research was the fact that all the indicators used were based on areas and perimeters calculated on orthogonal maps. As the researched environments have an alpine character, size measures of geocomplexes calculated from two-dimensional maps lead to significant underestimation of the values. As a result, many geocomplexes which exist on steep slopes, and cover big areas, have underestimated measures of their size as a map constitutes a horizontal projection of the real situation and the vertical dimension is lost. This prompts the author to research further in this field and to improve the results. The author hopes to accomplish this task with the help of photogrammetry techniques. Further research will include: evaluation of DEMs from stereopairs of aerial photos, production of
244
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
orthophotomaps and then calculation of areas and perimeters. This time, as the third dimension will be taken into account, areas will represent real measures. Development of statistical indicators based on real, 3-D space would simplify the interpretation of achieved statistical coefficients. As further research will hopefully be based on data acquired from aerial images, also thematic maps of several geocomponents (vegetation, geology, geomorphology, soil humidity, aspect, slope, shading conditions, wind exposure, etc.) will be derived from the imagery and then analysed using image enhancement and classification methods. Field work, which in this research constituted a significant part of the research, will be minimised to calibration of the aerial data. In conclusion, the achieved results challenge the author to undertake more detailed investigation of interdependencies existing between geocomponents of montane environments and to improve and widen employed techniques. The author believes that further detailed, automated and objective research of alpine environments will help in management and protection of these fragile environments. REFERENCES BEZKOWSKA G. 1986. Struktura i typy geokompleksów w α rodkowej czα α ci Niziny Poludniowowielkopolskiej, Acta Geographica Lodziensia, nr 54. Lódα : Ossolineum. BOČ AROV M.K. 1976 Metody statystyki matematycznej w geografii. Warszawa: PWN. KALICKI, T. 1986. Funkcjonowanie geosystemów wysokogórskich na przykładzie Tatr, Prace Geograficzne z. 67. Zeszyty Naukowe Uniwersytetu Jagielloα skiego PIETRZAK, M. 1989. Problemy i metody badania struktury geokompleksu (na przykładzie powierzchni modelowej Biskupice). Poznaα : UAM. RICHLING, A. 1992. Kompleksowa geografia fizyczna. Warszawa: PWN. TURNER, M.G. 1989. Landscape ecology: the effect of pattern on process, Annual Review of Ecological Systems, 20, pp. 171–197.
Chapter Twenty Simulation of Land-Cover Changes: Integrating GIS, SocioEconomic and Ecological Processes, and Markov Chain Models Yelena Ogneva-Himmelberger
20.1 INTRODUCTION The need to couple GIS and modelling techniques has been strongly expressed by environmental modellers, and some successful attempts have been made to integrate GIS with atmospheric, hydrological, land surfacesubsurface processes, and biological/ecosystems modelling (Goodchild et al., 1993). However, there are very few examples of coupling GIS with models that link environmental processes to their socio-economic causes, such as relating land-cover change to their human driving forces (Dale et al., 1993; Veldcamp and Fresco, 1995). There are several obstacles to integrating GIS and process modelling, both in terms of GIS functionality and modelling methods. Additional difficulties arise when social and environmental data are to be integrated into a model: not only the different nature and scale of the processes (Lonergan and Prudham, 1994), but also the lack of a well developed theoretical framework for linking societal and environmental process models make their integration difficult (Brown, 1994). One of the ways of coupling simulation models of land-cover change with GIS is via the Markov chain models (Lambin, 1994). This method, although employed in a few landscape ecology studies (Berry et al., 1996; Hall et al., 1991; Parks, 1991; Turner, 1988), has not been extensively explored in the GIS field. This chapter proposes a methodology for integration of GIS and socio-economic and ecological processes for modelling land-cover change via the Markov chain analysis. The model is based on dynamic transition probabilities which are defined as functions of exogenous (ecological and socio-economic) factors via logistic regression. It is tested in a study area in the southern Yucatan peninsula (Mexico), where old growth, semi-evergreen tropical forests have been subjected to significant changes over the past 20 years. This chapter starts with the definition and an overview of the evolution of the Markov chain models in environmental sciences. The description of the study area and the explanation of the methodology follow in the next section. Initial modelling results and the issues encountered during the model implementation stage conclude this chapter. 20.2 HISTORICAL OVERVIEW OF MARKOV CHAIN MODELS A Markov chain model is a “mathematical model for describing a certain type of process that moves in a sequence of steps through a set of states” (Lambin, 1994, p. 28). Transition probability matrices are the basis of this model. These probabilities are estimated from land-cover changes between time t and time t+1
246
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 20.1: Example of calculation of transition probabilities in a raster GIS. Numbers in the crosstabulation table represent the number of cells in each combination of land-cover categories. To create transition probability matrix, each element of each row in this table is divided by the number in the “TOTAL” column of that row
from agricultural census or forest surveys, or through superimposition of two land-cover maps in a GIS, using the following formula: (20.1) where pij is the probability that a given cell has changed from class i to class j during the interval from t to t +l, and aij is the number of such transitions across all cells in a landscape with m land-cover classes (Anderson and Goodman, 1957). An m × m matrix is built based on the probabilities for all possible transitions between all states (Figure 20.1). The major assumptions of these models are: that the transition probabilities are stationary over time; and that they depend only on current distribution of uses, and history has no effect. Using this matrix, Markov models allow the calculation of land-cover distribution at time t+1 from the initial distribution at time t. In matrix notation, the model can be expressed as (Baker, 1989): (20.2) where nt is a column vector whose elements (n1… m) are the fraction of land area in each of m states at time t, and M is an m x m transition matrix, whose elements, pij are the transition probabilities. The Markov models have a special mathematical property that makes them relevant to simulations of ecological successions: the resulting areal distribution of landscape elements corresponds to steady state conditions of an ecosystem (Hall et al., 1991; Shugart et al., 1973; Usher, 1981; Van Hulst, 1979). Use of
SIMULATION OF LAND-COVER CHANGES WITH GIS AND MARKOV CHAIN MODELS
247
these models in geography began in the mid 1960’s with diffusion and movement research (Brown, 1963; Clark, 1965). Such models were also widely used to estimate changes in land-cover acreage (Burnham, 1973; Miller et al., 1978; Nualchawee et al., 1981; Vandeveer and Drummond, 1978). Predictions based on the Markov chain models are generally considered better than linear extrapolations (Aaviksoo, 1993). In recent years, Markov chain models have been extended to overcome three major limitations concerning their application to modelling land-cover change: the assumption that the rates of transition are constant through time; the neglect of the influence of exogenous variables on transition probabilities, and the non-spatial character of future land-cover prediction (Baker, 1989; Lambin, 1994). The model proposed in this chapter addresses the last two limitations and assumes stationarity of transition probabilities over the period of study. To overcome stationarity limitation, Collins et al. (1974) suggest the calculation of dynamic transition probabilities by postulating rules of behaviour of certain landscape elements or by switching between different transition matrices at certain intervals (Baker, 1989). The influence of exogenous factors on transition probabilities can be incorporated into the model by using theoretical or empirical functions such as econometric models (Alig, 1986). Multivariate regression techniques are usually applied to analyse the influence of selected economic factors on each type of land-cover transition (Berry et al., 1996; Parks, 1991). The results of regression analysis are then used to recalculate transition probabilities under different economic scenarios and to run new simulation models with the new probabilities (Lee et al., 1992; Parks, 1991). The first spatial model of land-cover change was developed by Tom et al. (1978) who analysed the change between 1963 and 1970 using discriminant analysis with 27 physiographic, socio-economic, and transportation variables. They tested two types of models: with a priori change occurrence probability (derived from the Markov matrix) and with equal change occurrence probability (e.g. for five types of landcover change, the probability of each was assumed to be 0.2). Separate discriminant analyses were run for each initial type of land cover. The accuracy of the models was based on the number of cells whose future (1970) land cover was correctly predicted. The authors proposed to combine the Markov chain and discriminant analysis models for improved prediction of change. The Markov transition probability matrix provides the number of cells that are expected to change, and the discriminant analysis model calculates probabilities of change for each cell. These cells are then rank-ordered from the highest to the lowest probability, and the correct number of the top cells is selected for each type of change. This model uses the expected percentage of change as an a priori knowledge to predict where these changes will take place. The strength of the algorithm is that it selects the areas that are “best suited” for change. By running the model for different time scales with different Markov change parameters, it is possible to observe the spatial diffusion of change over time. This type of model works well in situations when there are only a few land-cover types present. They do not provide a solution for conflict situations, when the same cells are selected as the most suitable for several change types. 20.3 THE STUDY AREA The study area (60×60 km) corresponds to the southern Campeche state in Mexico, specifically to a swath between Route 186 (the east-west, cross-peninsula highway) and the Guatemalan border. The area in question corresponds to a tropical monsoon climate (following Köppen). The annual rainfall is around 2,000 mm with peak precipitation during summer and a distinct winter dry season. A rolling karstic terrain dominates the centre of the area with elevations reaching about 250–300 meters above sea level; the east
248
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
and west edges are, however, low-lying. There are no permanent surface streams in the uplands, although seasonal water courses are apparent. Evergreen and semi-evergreen tropical forests predominate without human interference. The upland soils are shallow but agriculturally fertile mollisols, while depressions (called bajos) are typically filled with vertisols of thick clays. The area so bounded was part of the Río Bec region of the ancient Maya people, one of the most densely settled regions of Mesoamerica from about 1000 BC to 800-1000 AD, in which major deforestation occurred. The forest returned with the collapse and depopulation of the Classic Maya civilisation in this region. Subsequent Maya occupation was sparse, associated with extensive slash-and-burn cultivation. When the Spaniards entered the region in the early XVI century, they encountered a mature forest, occasionally punctuated by a Maya village (Turner, 1990). Such conditions remained until the early part of this century, when market trade in chicle (resin) and tropical hardwoods began (Edwards, 1957). Due to the high value of chicle at that time, logging and burning in tropical forests dominated by chicozapote (Manilkara zapote) were prohibited. But by the end of the 1930s, with the production of synthetic latex and the sharp decrease in the price of chicle, extraction of the product declined and deforestation began with the development of timber production and, to a lesser extent, agriculture (Flores, 1987). Initially, less commercially valuable timber was logged for railroad construction; extraction of more precious hardwoods (cedar and mahogany) started later. More recent and pronounced changes started with the construction of a highway during the late 1960s and the implementation of largescale government-sponsored resettlement projects (Plan de Colonization de Sureste), involving the establishment of ejidos—villages with communally owned lands. Cleared land was used for subsistence agriculture through the shifting cultivation of maize, beans, and squash, which expanded along the highway. Later, during the first half of the 1980s, large areas were cleared by the Mexican government for rice fields. Many of these areas have since been abandoned due to the failure of rice production experiments, though some were cultivated with African grasses and converted into pastures. The livestock raised here is cebu and criollo stock, used largely for domestic (local and national) consumption. Overall, land cover has gone through a range of transformations, from intensive cultivation of land to abandonment of cultivated pastures and regrowth of secondary forests. The recent character of these changes and the availability of data facilitate the documentation of different trajectories of change in land cover from the onset of the recent deforestation. Also, the diverse set of human forces driving these changes makes this area an excellent example for studying the links between causes and types of cover change and for exploring the spatial and temporal dynamics of these relationships. 20.4 METHODS The study was conducted in three methodological steps. First, land-cover maps for three different times were created. These maps were then analysed and land-cover change (transition) maps for the two time periods were designed. Second, ecological and socioeconomic factors defining land-cover change in the area were identified, and a set of digital maps representing these factors was made. Finally, land-cover transitions were linked to the factors of change via the Markov chain-based spatially explicit model. The model used data from the first time period to predict land-cover transitions for the second time period. Changes in land cover are examined through the analysis of two types of data—Landsat MSS images and socio-economic census data, both spanning the period from 1975 to 1990. The study area is completely covered by one MSS scene. Three radiometrically corrected and geometrically rectified cloud-free MSS scenes—for 1975, 1986 and 1990—were obtained from the North American Landscape Characterisation
SIMULATION OF LAND-COVER CHANGES WITH GIS AND MARKOV CHAIN MODELS
249
Project (NALC). These images were georeferenced to a 60×60 meter Universal Transverse Mercator ground coordinate grid, and coregistered at the EROS Data Center prior to purchase. Since cloud-free data are rare for this area, the acquisition dates in the dataset range from April (1986 and 1990 scenes) to December (1975 scene). 20.4.1 Land-cover classification A combination of different image classification techniques has been employed to achieve the results most suited for analysis. First, Principal Components Analysis (PCA) was used to reduce four spectral bands in each MSS scene to two components: the first principal component of the two visible bands, and the first principal component of the two near-infrared bands. Each of the component images was scaled so that mean plus or minus four standard deviations falls within the 0–255 range. The visible component allows for the separation of open vs. forested land, and the near-infrared component is used to delineate water and cloud shadows (Bryant et al., 1993). To classify open and forest areas in more detail, a combination of unsupervised and supervised classification was used. Training sites for supervised classification were selected based on the existing landcover map for 1986 and a knowledge of the area. A set of aerial photographs for 1984–85 was employed to interpret image classification results. 20.4.2 Defining the driving forces of change The land uses associated with the various covers and the important factors of change were determined from field observations, interviews conducted with the local farmers in eight ejidos, and from literature reviews. Elevation, slope, and soil type were identified as ecological factors important for land-cover change in the area. Among the socio-economic factors of change, the amount of governmental subsidies, availability of bank loans, population distribution and affluence level, distance to roads, distance to market, and distance from each land plot to the village, were identified as important. Some socio-economic variables representing these driving forces were collected from the 1970 and 1980 population censuses for each of the 23 ejidos comprising the study area. Agricultural census data were available only for 1970 and 1988 at the municipal level. Data for some of the driving forces (soils, bank loans, amount of governmental subsidies) were not available. Based on the census data, a database of 20 digital maps was created. The village boundary map was used to represent census information in a spatial form. At this point, it was assumed that all socio-economic indicators derived from census data were distributed uniformly within each village boundary. These 20 maps include: ecological factors (elevation, slope, distance to the nearest other land cover) and socioeconomic factors (population, economically active population, population density, percentage of households with electricity and indoor plumbing, number of working animals per hectare of cropland, literacy rate, number of pick-up trucks, number of trucks, number of tractors per hectare of cropland, and the distance to roads, village, market, cropland, grassland, and secondary forest). Two sets of socio-economic factor maps were created. The first set, based on the 1970 population and agricultural census, was used in model development, and the second, based on the 1980 population census and the 1988 agricultural census, was used in model implementation.
250
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
20.4.3 Integrated modelling The model was developed using data on land-cover changes between 1975 and 1986 and was applied to the 1986 data to produce the map for 1990. It was validated by comparing this predicted map with the landcover map produced from the satellite imagery for 1990. The development of the model included the following steps: 1. Cross-classification of the 1975 and 1986 land-cover maps to produce the transition type map, showing the corresponding transition number for each cell. 2. Extraction of values from the set of 20 ecological and socio-economic maps for each pixel that underwent a particular transition between 1975 and 1986. 3. Use of multinomial logistic regressions (CATMOD procedure in SAS) to produce a transition probability equation for each transition type: (20.3) where: Pij is the probability of change from land cover i to land cover j; a and β are logistic regression coefficients; X1 – Xn are values of ecological and socio-economic factors for a pixel. The logistic regression was chosen for two reasons: it is suitable for the analysis of continuous and discrete variables (Trexler and Travis, 1993); and the interpretation of predicted values, always ranging between 0 and 1, is straightforward for probability-based Markov chain analysis. If multicolinearity was present between the data, factor analysis was applied first, and then the factor scores were used in the logistic regressions as independent variables. The model was implemented using the following steps: 1. Generation of transition probability maps corresponding to these equations using map algebra functions in a GIS. Since the considered time period corresponds to 11 years (1975–1986), these maps likewise correspond to 11 year transition probabilities (1986– 1997). 2. Normalisation of these 11 year maps to annual probabilities and generation of the four-year transition probabilities maps (corresponding to 1986–1990). The standardisation procedure assumed that the rate of change of probabilities was linear. 3. Comparison of the transition probability maps on a cell-by-cell basis and creation of the predicted 1990 land-cover map. Each land-cover type was considered separately. In other words, for cells that were cropland in 1986, only maps representing probabilities of transition from cropland to other land covers (plus probability map of “no change”) were considered. At this time, a deterministic approach, based on the assumption that the transition that will take place will always be the one with the highest likelihood, was adopted. Thus, of the four transition probability values corresponding to each cell, the highest was selected, and the cell was changed to the land cover that represented that highest value. 4. Model evaluation: comparison of the simulated land-cover map with the actual land-cover map produced from classification of satellite imagery.
SIMULATION OF LAND-COVER CHANGES WITH GIS AND MARKOV CHAIN MODELS
251
20.5 RESULTS AND DISCUSSION 20.5.1 Land-cover classification The PCA technique worked well on these data and allowed for the separation of open vs. forested areas with high degree of accuracy. Unsupervised classification was performed on the false colour composite image of green, red, and infrared bands. The results of this classification were, however, unsatisfactory due to the severe striping problems in the original data. Supervised classification allowed for the separation of eight land-cover classes, which were then reduced to six general classes in order to minimise the effect of misclassification error on modelling results. These six land-cover types are: forest (including both mediumtall semi-evergreen and bajo seasonal wetland forests), scrub forest (early successions), grassland (both savanna and cultivated pastures), cropland, bare soil/roads and water. Three land-cover maps (for 1975, 1986 and 1990) were produced using the same technique. 20.5.2 Integrated modelling Cross-classification of 1975 and 1986 land-cover maps yielded 29 transition types. Only 16 of them, representing changes between four main land-cover categories—forest, scrub forest, grassland, and cropland —were chosen for the model. Other land-cover categories, such as water and roads, were assumed to be unchanging. Statistical analysis of 20 independent variables showed that many of them were highly correlated with each other (for example, the four distance variables, as well as variables representing the affluence level). Factor analysis was applied to the independent variables and the first ten factors were extracted (each of the factors had at least one variable with the loading higher than 0.7). These factors explained about 97 percent of the variance. It is interesting to note that even though the variable loading pattern was quite different for different transitions, variables related to technology use (number of tractors, trucks, working animals per hectare of cropland) as well as the distance to market variable, always had the highest loading in the first component for all transition types. Multinomial logistic regressions were run separately for each of the four initial (i.e. corresponding to 1975) land-cover types. The maximum-likelihood analysis-of-variance tables showed that all models fit and that the factors included in the analysis were significant with respect to each transition type. Comparison of the predicted map (Figure 20.2) with the “control” map (Figure 20.3) was made on both a non-spatial and spatial basis. The former involved the calculation of the total area under each land cover for the two maps, and the latter, the overlaying and crosstabulation of the two maps. The model predicted the extent of mature forest quite accurately (just 7 percent less than in the control map), but underpredicted the area under scrub forest (39 percent less). With regard to the grassland and cropland, the model overpredicted their extent 1.3 and 2.5 times respectively. The fact that the model overestimated the rates of conversion in these categories is not very surprising and can be linked to the model’s main assumption about stationarity of transition probabilities over time. The period between 1975 and 1986 corresponds to the large-scale deforestation by the government for cultivation of rice, as well as the growth of subsistence agriculture, due to the increase in population in the area. By the late 1980s, however, rice production was declining, and no more land was cleared on a large scale. Thus, the rate of
252
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 20.2: Predicted land-cover map for 1990,
these particular transitions (forest-cropland and forest-grassland) was somewhat different during the second time period (1986–1990), but was not accounted for by the model. With respect to the spatial agreement of the results, overlaying of the two maps showed that the model correctly predicted the location of 89 percent of the forest, 11 percent of scrub forest, 9 percent of grassland, and 39 percent of cropland. It is important to keep in mind that the final map corresponds to land-cover changes that had the highest probability for each cell, regardless of the value of this probability and its proximity to the second-highest probability. A closer look at the areas where forest conversion into cropland or grassland was predicted incorrectly shows that the probabilities of these transitions are 0.59 and 0.52, while the second-highest probabilities (corresponding to no change) are 0.40 and 0.47 respectively. In the case of forest-to-grassland transition, the two probabilities are so close to each other (the difference is 0.05), that the decision rule applied in this deterministic model may not be appropriate. This example illustrates that the final land-cover map should always be analysed in conjunction with the maps of the maximum and the second highest transition probabilities. Finally, a comment regarding the data used for the analysis. The difference in the dates of the imagery (the 1975 scene corresponds to the end of the wet season, and the 1986 and 1990 scenes to the end of the dry season) as well as the limited spectral resolution may have introduced some land-cover classification error. A finer resolution imagery (TM or SPOT) acquired on anniversary dates would have improved the classification accuracy. Weak predictions of some of the transitions may be explained by the fact that the 20 independent variables representing the driving forces of change in the area, do not fully describe the
SIMULATION OF LAND-COVER CHANGES WITH GIS AND MARKOV CHAIN MODELS
253
dynamics present. If a time series of the data on bank loans, governmental subsidies, prices for agricultural and livestock production were available, the model performance may have improved for certain transitions. 20.6 CONCLUSIONS AND SUGGESTIONS FOR FUTURE RESEARCH Analysis of the results leads to the conclusion that the Markov chain probability-based model coupled with a logistic regression model allows for improved understanding of land-cover change processes. Its main value is its spatial character and the ability to incorporate explanatory variables (both ecological and socioeconomic). The identification of variables representative of the processes defining land-cover change in an area, as well as the definition of decision rules during the model formulation stage are the most critical elements for its successful performance. The findings presented above are part of ongoing research. The next step of the analysis will be to calibrate the model using several different approaches. First, some alternative models (e.g., stochastic) will be tested and their results compared with the performance of the deterministic model. Second, the approach suggested by Tom et al. (1978), where the probability maps are rank-ordered from the highest to the lowest value, and the correct number of the top cells (as derived from non-spatial Markov chain analysis) is selected for each type of transition, will be tested. Third, a set of different approaches to spatial representation of socio-economic variables will be tested (e.g. areal interpolation using ancillary data) and their influence on model performance will be evaluated. Finally, a set of behavioural rules for certain land covers will be incorporated to overcome the stationarity limitation of the Markov approach. ACKNOWLEDGEMENT This research was partially supported by the US Man and Biosphere Program (Tropical Directorate) Grant No. TEDFY94-003. Professors Billie Lee Turner II, Ronald Eastman, and Samuel Ratick, who serve on the author’s dissertation committee, have contributed to the design of this research and their comments have been incorporated into this chapter. REFERENCES AAVIKSOO, K. 1993. Changes of plant cover and land use types (1950’s and 1980’s) in three mire reserves and their neighborhood in Estonia, Landscape Ecology, 8(4), pp. 287–301. ALIG, R. 1986. Econometric analysis of the factors influencing forest acreage trends in the Southeast Forest Science, 32 (1), pp. 119–134. ANDERSON, T. and GOODMAN, L. 1957. Statistical inference about Markov chains, Annals of Mathematical Statistics, 28, pp. 89–110. BAKER, W. 1989. A review of models of landscape change, Landscape Ecology, 2(2), pp. 111–133, BERRY, M., FLAMM, R, HAZEN, B., and MACINTYRE, R 1996. Lucas: a system for modeling land-use change, IEEE Computational Science and Engineering, 3(1), pp. 24–35. BROWN, L. 1963. The Diffusion of Innovation: a Markov Chain-type Approach. Discussion Paper No. 3, Department of Geography, Northwestern University. BROWN, D. 1994. Issues and Alternative Approaches for the Integration and Application of Societal and Environmental Data within a GIS, Michigan State University, Department of Geography, Rwanda SocietyEnvironment Project, Working Paper No. 3,12 April, 1994.
254
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 20.3: Actual land-cover map for 1990. BRYANT, E., BIRNIE, R. and KIMBALL, K. 1993. A practical method of mapping forest change over time using landsat MSS data: a case study from central Maine, in Proceedings of 25th International Symposium on Remote Sensing and Global Environmental Change, Graz, Austria, 4–8 April. Ann Arbor: ERIM, Vol. 2, pp. 469–480. BURNHAM, B. 1973. Markov intertemporal land use simulation model, Southern Journal of Agricultural Economics, 5 (2), pp. 253–258. CLARK, W. 1965. Markov chain analysis in geography: an application to the movement of rental housing areas, Annals of American Association of Geographers, 55, pp. 351– 359. COLONS, L., DREWETT, R. and FERGUSON, R. 1974. Markov models in geography, The Statistician, 23, pp. 179–209 DALE, V., O’NEIL, R., PEDLOWSKI, M. and SOUTHWORTH, F. 1993. Causes and effects of land-use change Rondonia, Brazil, Photogrammetric Engineering and Remote Sensing, 59(6), pp. 997–1005. EDWARDS, C. 1957. Quintana Roo, Mexico’s Empty Quarter. Berkeley: University of California. FLORES, G.J. 1987. Uso de losRecursos Vegetales en la Peninsula de Yucatan: Pasado, Presente y Futuro. Xalapa: INIREB. GOODCHILD, M., PARKS, B. and STEYAERT L. (Eds.) 1993. Environmental Modeling with GIS . New York: Oxford University Press. HALL, F.,BOTKIN, D., STREBEL, D., WOODS, K. and GOETZ, S. 1991. Large-scale patterns of forest succession as determined by remote sensing, Ecology, 72(2), pp. 628–640, LAMBIN, E. 1994. Modeling Deforestation Processes: a Review. Ispra: TREES Project. LEE, R., FLAMM, R., TURNER, M., BLEDSOE, C., CHANDLER, P., DEFERRARI, C., GOTTFRIED, R., NAIMAN, R, SCHUMAKER, N. and WEAR, D. 1992. Integrating sustainable development and environmental vitality: a
SIMULATION OF LAND-COVER CHANGES WITH GIS AND MARKOV CHAIN MODELS
255
landscape ecology approach, in Naiman, R. (Ed.), Watershed Management: Balancing Sustainability and Environmental Change. New York: Springer-Verlag, pp. 499–521. LONERGAN S. and PRUDHAM, S. 1994. Modeling global change in an integrated framework: a view form the social sciences, in Meyer, W. and Turner, B. (Eds.), Global Land-use and Land-cover Change. Cambridge: Cambridge University Press. MILLER, L., NUALCHWEE, K. and TOM, C. 1978. Analysis of the dynamics of shifting cultivation in the tropical forest of northern Thailand using landscape modeling and classification of Landsat imagery, in Proceedings of the 20th International Symposium on Remote Sensing of Environment, 20–26 April. Ann Aibor: ERIM, pp. 1167–1185. NUALCHAWEE, K., MILLER, L., TOM, C., CHRISTENSON, J. and WILLIAMS, D. 1981. Spatial Inventory and Modeling of Shifting Cultivation and Forest Land Cover of Northern Thailand with Inputs from Maps, Airphotos and Landsat, Remote Sensing Centre Technical Report No. 4177. College Station: Texas A & M University. PARKS, P. 1991. Models of forested and agricultural landscapes: integrating economics, in Turner, M. and Gardner, R. (Eds.), Quantitative Methods in Landscape Ecology. New York: Springer-Verlag, pp. 309–322. SHUGART, H, CROW, T. and HETT, J. 1973. Forest succession models: a rationale and methodology for modeling forest succession over large regions, Forest Science, 19, pp. 203–212. TOM, C, MILLER, L. and CHRISTENSON, J. 1978. Spatial Land-use Inventory, Modeling, and Projection/Denver Metropolitan Area, with Inputs from Existing Maps, Airphotos, and Landsat Imagery. Greenbelt: NASA, Goddard Space Center. TREXLER, J. and TRAVIS, J. 1993. Nontraditional regression analyses, Ecology, 74(6), pp. 1629–1637. TURNER, M. 1988. A spatial simulation model of land use changes in a piedmont county in Georgia, Applied Mathematics and Computation, 27, pp. 39–51. TURNER II, B. 1990. The rise and fall of Maya population and agriculture, 1000 BC to present: the Malthusian perspective reconsidered, in Newman, L. (Ed.), Hunger and History: Food Shortages, Poverty and Deprivation, Oxford: Basil Blackwell, pp. 178–211. USHER, M. 1981. Modeling ecological succession, with particular reference to Markovian models, Vegetatio, 46(1), pp. 11–18. VANDEVEER, L. and DRUMMOND, H. 1978. The Use of Markov Processes in Estimating Land Use Change. Oklahoma: Agricultural Experimental Station. VAN HULST, R 1979. On the dynamics of vegetation: Markov chains as models of succession, Vegetatio, 40(1), pp. 3–14. VELDCAMP, A. and FRESCO, L. 1995. Modeling Land Use Changes and their Temporal and Spatial Variability with CLUE. A Pilot Study for Costa Rica, Wageningen: Department of Agronomy, Wageningen Agricultural University.
Part Three GIS AND REMOTE SENSING
Chapter Twenty One Multiple Roles for GIS in Global Change Research Michael Goodchild
21.1 BACKGROUND The past ten years have seen a dramatic increase in support for research into the physical Earth system, and the effects of human-induced change, particularly in climate. Such research places heavy demands on geographic data, and on systems to handle those data, in order to calibrate, initialise, and verity models of the Earth system, and also to investigate the relationships that exist between various aspects of the physical system, and the human populations that both cause change and experience its effects. It is widely believed that GIS and related technologies (remote sensing, GPS, image processing, high bandwidth communications) will play an increasingly important role in global change research (Goodchild et al., 1993; Mounsey, 1988; Townshend, 1991). In particular, GIS is seen as a vehicle for collecting, manipulating, and pre-processing data for models; for integrating data from disparate sources with potentially different data models, spatial and temporal resolutions, and definitions; for monitoring global change at a range of scales; and for visual presentation of the results of modelling in a policy-supportive, decision-making environment. This chapter explores these potential multiple roles of GIS, in global change research and more broadly in environmental modelling and analysis. The emphasis throughout the chapter is on the role GIS can play in the science of global change research; in addition, but downplayed in the chapter, are its more general roles in creating and managing data in global databases such as the UN Environment Program’s GRID. The discussion is based in part on the results of two specialist meetings conducted by the National Center for Geographic Information and Analysis (NCGIA) under its Research Initiative 15 (for the fall meeting reports see Goodchild et al., 1995, 1996). 21.2 INTRODUCTION During the last decade there has been increased awareness of the potential for major changes in climate, deterioration of the stratospheric ozone layer, and decreasing biodiversity. At the same time, new political and economic transformations and structures are emerging. These phenomena are described as “global change” (Botkin, 1989; Committee on the Human Dimensions of Global Change, 1992; Price, 1989; Turner et al., 1990) and can be classified as being of two basic types (Botkin, 1989; Turner et al., 1990). In one sense the term applies where actions are global in extent or systemic, that is, at a spatial scale where perturbations in the system have consequences everywhere else, or reverberate throughout the system.
258
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Thus, for example, there is concern over “greenhouse” gases and other climate forcing agents that are manifested globally. The second meaning applies where there is cumulative global change. The loss of biological diversity at so many locations throughout the world is global in scale because its effects are world-wide, even though the causes are localised. The international global change research program (IGBP, 1990; NRC, 1990) has grown out of the need for scientific assessments of both types of global change, and is ultimately intended to aid in policy decisions. Emphasis has focused largely on interactions between the Earth’s biosphere, oceans, ice, and atmosphere. The research strategies that help to provide this scientific foundation were developed in the mid-1980s (ESSC, 1986, 1988; ICSU, 1986; NRC, 1986), and feature an advanced systems approach to scientific research based on: (1) data observation, collection, and documentation; (2) focused studies to understand the underlying processes; and (3) the development of quantitative Earth system models for diagnostic and prognostic analyses. Concepts such as Earth system science (ESSC, 1986), global geospherebiosphere modelling (IGBP, 1990), and integrated and coupled systems modelling at multiple scales (NRC, 1990) have emerged, focusing broadly on the Earth system, but including subsystems such as atmosphereocean coupling. The US Global Change Research Program is one example of a national-level effort to implement this strategy (CES, 1989, 1990; CEES, 1991). GIS could play an important role in this research in two ways: (1) enhancement of models of Earth system phenomena operating at a variety of spatial and temporal scales across local, regional, and global landscapes, and (2) improvements in the capacity to assess the effects of global change on biophysical (ecological) systems on a range of spatial and temporal scales. In addition to the biogeochemical process that drive the Earth system, changes in human land use, energy use, industrial processes, social values, and economic conditions are also increasingly being recognised as major forces in global change (Committee on the Human Dimensions of Global Change, 1992). The relationship of these activities and behaviours to global change is critical because they may systemically affect the physical systems that sustain the geosphere-biosphere. Thus additional research strategies that emphasise the human dimension in global change have recently emerged. The National Research Council’s (NRC) Committee on Global Change (Committee on the Human Dimensions of Global Change, 1992) has emphasised that the development of a coherent and systematic assessment and understanding of global change phenomena requires better linkage between the environmental and human dimensions (social and economic). At present, several problems pose formidable challenges in addressing the human dimensions of global change, three of which are central to this initiative. First, there are difficulties in collecting requisite socio-economic and demographic data. Those data that do exist often span a range of temporal and spatial scales, lack appropriate intercalibration, have incomplete coverages, are inadequately checked for error, and have unsuitable archiving and retrieval formats (Committee on the Human Dimensions of Global Change, 1992). Second, there remain serious problems in translating human activities and information across a range of scales (local, regional, national, or global). Human activities that drive and mitigate global change vary significantly by region or place (Feitelson, 1991; Turner et al., 1990) but as in ecology, methods for explicit translation across disparate scales or levels of organisation are lacking. Feitelson (1991) noted that geographers have only recently begun to consider how activities at one geographic scale affect activities at other spatial scales, and proposed a conceptual framework for analysing how geographic scale affects environmental problem solving. For conceptually similar problems, ecologists have invoked hierarchy theory as a way of understanding complex, multiscaled systems (Allen and Starr, 1982). Last, there is a dearth of ways of understanding the interactions of socioeconomic systems and global change other than through logical analysis, which often requires a level of abstraction that makes their understanding obscure (Cole and Batty, 1992). Geographic visualisation
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
259
could be used to gain insight into both data and models, though such visual “thinking” has been little explored. Four broad themes emerge from this discussion to characterise the potential for GIS use in global change research. These are discussed below. 21.2.1 Use of GIS to support integrative modelling and spatial analysis Scientifically based mathematical models for computer analysis, that is, environmental simulation, are fundamental to the development of reliable, quantitative assessment tools. One major purpose of these computer-based models is to simulate spatially distributed, time-dependent environmental processes realistically. But environmental simulation models are, at best, simplifications and inexact representations of real world environmental processes. The models are limited because basic physical processes are not well understood, and because complex feedback mechanisms and other interrelationships are not known. The sheer complexity of environmental processes (three-dimensional, dynamic, non-linear behaviour, with stochastic components, involving feedback loops across multiple time and space scales) necessarily leads to simplifying assumptions and approximations (e.g. Hall et al., 1988). Frequently, further simplification is needed to permit numerical simulations on digital computers. For example, the conversion of mathematical equations for numerical processing on a grid (discretisation) can lead to the parameterisation of small-scale complex processes that cannot be explicitly represented in the model because they operate at subgrid scales. There may be significant qualitative understanding of a particular process, but quantitative understanding may be limited. The ability to express the physical process as a set of detailed mathematical equations may not exist, or the equations may be too complicated to solve without simplifications. In addition to incomplete knowledge, simplifications, and parameterisations of real world processes, other general themes emerge from a review of state-of-the-art modelling, and from efforts to link models with GIS. One is cross-disciplinary modelling, which is illustrated by the concept of modelling water and energy exchange processes within the soil-plant-atmosphere system, or ecosystem dynamics modelling with, for example, the environmentally and physiologically structured Forest-BGC model (Running and Coughlan, 1988). These models cross such disciplines as atmospheric science, hydrology, soil science, and plant physiology. 21.2.2 GIS-linked models and conceptual frameworks for hierarchical and aggregated structures The requirements of global change research place significant emphasis on modelling at multiple time scales and across multiple spatial scales. NRC (1990) outlines a strategy for coupling models across time scales to account for feedbacks in land-atmosphere interactions. For example, the land surface parameterisations for water and energy exchange between the biosphere and atmosphere must adapt to climate-induced changes in vegetation characteristics that exert major influence on such exchange processes. Feedbacks may be further complicated by the existence of thresholds, and by hysteresis effects. Hay et al. (1993) discuss the use of nesting to model interactions between spatial scales, while Nemani et al. (1993), Burke et al. (1991), and King (1991) are concerned with the extrapolation of research results from local study areas to regional analysis. Hall et al. (1988) illustrate some of the problems in linking vegetation, atmosphere, climate, and
260
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
remote sensing across a range of spatial and temporal scales. Spatial scaling involves significant research issues, such as how to parameterise (i.e., aggregate or integrate) water and energy fluxes from the plant leaf level to the regional level. In addition to scale problems, the parameterisation process is confounded by structuring processes which operate at different hierarchical levels (e.g., physiological, autecological, competitive, landscape). Finally, the interactions between levels are asymmetric in that larger, slower levels maintain constraints within which faster levels operate (Allen and Starr, 1982). Complex terrain and heterogeneous landscape environments form another major theme in the use of physically based models of spatially distributed processes (Running et al., 1989). Distributed parameter approaches are increasingly used instead of classic lump sum analysis as models become more sophisticated, allowing them to incorporate more realistic, physically based parameterisations of a wide variety of land surface characteristics data (King, 1991; Running et al., 1989). Factors such as terrain and landscape variability are important considerations for land-atmosphere interactions (Carleton et al., 1994; Pielke and Avissar, 1990). Finally, environmental simulation modelling depends on the results of field experiments such as the First ISLSCP (International Satellite Land Surface Classification Project) Field Experiment (FIFE) (Hall et al., 1988), an intensive study of interactions between land surface vegetation and the atmosphere, and the Boreal Ecosystem-Atmosphere Study (BOREAS; NASA, 1991). Such experiments are integral to the development and testing of models based on direct measurements and remote sensing data from various ground-based, aircraft, and satellite systems. In addition, focused research to understand processes and to develop remotesensing driven algorithms for regional extrapolations will be supplemented by a range of simulation models under BOREAS (NASA, 1991). These themes of environmental systems modelling suggest opportunities for the integration of GIS. For example, detailed consideration of landscape properties and spatially distributed processes at the land surface is fundamental to global climate and mesoscale models, watershed and water resource assessment models, ecosystem dynamics models that are physiologically based, and various types of ecological models involving landscape, population, and community development processes. The themes of multiple space and time scales are basic to coupled systems modelling, a highly cross-disciplinary modelling approach exemplified by the suite of models for land-atmosphere interactions research. In addition to the issue of spatial processes operating at multiple time and space scales, GIS and environmental simulation models share converging interests in geographic data. The availability of geographic data from many sources, including land cover characteristics based on multitemporal satellite data, is growing rapidly. GIS by definition is a technology designed to capture, store, manipulate, analyse, and visualise diverse sets of geographically referenced data. In fact, advanced simulation models require a rich variety of multidisciplinary surface characteristics data of many types in order to investigate environmental processes that are functions of complex terrain and heterogeneous landscapes. To illustrate, land surface characteristics data required by scientific research include land cover, land use, ecoregions, topography, soils, and other properties of the land surface to help understand environmental processes and to develop environmental simulation models (Loveland et al., 1991). These advanced land surface process models also require data on many other types of land surface characteristics, such as albedo, slope, aspect, leaf area index, potential solar insulation, canopy resistance, surface roughness, soils information on rooting depth and water holding capacity, and the morphological and physiological characteristics of vegetation. GIS along with remote sensing has a role in dealing with these complex data issues. GIS can help meet these requirements and provide the flexibility for the development, validation, testing, and evaluation of innovative data sets that have distinct temporal components. There is the need to create
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
261
derivative data sets from existing ones and GIS tools are also needed for flexible scaling, parameterisation and re-classification, creating variable grid cell resolutions, or aggregation and integration of spatial data. At the same time, methods are needed to preserve information across a range of scales or to quantify the loss of information with changing scales. Thus, this overall modelling environment seems suited for GIS as a tool to support integrative modelling, to conduct interactive spatial analysis across multiple scales for understanding processes, and to derive complex land surface properties for model input based on innovative thematic mapping of primary land surface characteristics data sets. By implementing a full range of spatial data models, GIS offers the ability to integrate data across a range of disciplines despite wide variation in their ways of conceptualising spatial processes and of representing spatial variation. 21.2.3 More efficient integration of models and GIS Despite the above mentioned potential, a number of impediments stand in the way of more complete integration of GIS and global environmental modelling. GIS are generic tools, designed for a range of applications that extend well beyond environmental modelling, into data management for utilities, local governments, land agencies, marketing, and emergency response (Maguire et al., 1991). While GIS support a wide range of data models, many of the fundamental primitives needed to support environmental modelling are missing, or must be added by the user (Goodchild, 1991). At present, environmental simulations must be carried out by a separate package linked to the GIS, and the ability to write the environmental model directly in the command language of the GIS is still some distance away. Nyerges (1993) provides a comprehensive discussion of these technical integration issues. 21.2.4 Visualisation of spatial patterns and interactions in global data Effective use of GIS requires attention to several generic issues, many of which are also of concern to environmental modelers. The discretisation of space that is inherent in both fields forces the user to approximate true geographical distributions, and the effects of such approximations on the results of modelling are often unknown, or unevaluated. Space can be discretised in numerous ways—finite differences and finite elements are two of the examples well known to environmental modelers—and each has its own set of impacts on the results. Effective display of the results of modelling, particularly for use in policy formulation, requires attention to principles of cartographic design. Finally, spatial databases tend to be large, and effective environmental modelling may require careful attention to the efficiency of algorithms and storage techniques. Many of these generic issues are identified in the NCGIA research agenda (NCGIA, 1989, 1992) and are the subject of active research within the GIS community. As this review demonstrates, and as the title of this initiative indicates, we view GIS as a tool that can play many roles in global change research. There is a need to identify those roles more clearly, and also to identify impediments that prevent GIS being used more broadly. We need to address the generic needs of global change research for spatial data handling tools, whether or not those needs will be met five or ten years from now by a technology we recognise as “GIS”.
262
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
21.3 IMPEDIMENTS TO PROGRESS With these issues in mind, the following sections address the problems that stand in the way of a greater role for GIS in global change research, and the research that needs to be conducted to address them. They address five major themes: • To identify technical impediments and problems that obstruct our use of GIS in global change research, and our understanding of interactions between human systems and regional and global environmental systems. • To assess critically the quality of existing global data in terms of spatially varying accuracy, sampling methodologies, and completeness of coverage, and develop improved methods for analysis and visualisation of such data. • Within the context of global change, to develop theoretical/computational structures capable of building up from knowledge at smaller spatial scales and lower levels of aggregation. • To develop methods for dynamically linking human and physical databases within a GIS and for exploring the regional impacts of global change. • To develop methods for detecting, characterising, and modelling change in transition zones, thereby addressing the problems that result from overly simplistic representations of spatial variation. These themes span to varying degrees the concerns of the many disciplines that together constitute the global change research community. For the purposes of this chapter, the wide range of topics addressed by global change research is narrowed to eight areas: • • • • • • • •
Atmospheric science and climate Oceans, ocean-atmosphere coupling, and coasts Biogeochemical dynamics, including soils Hydrology and water Ecology, including biodiversity Demography, population, and migration Production and consumption, including land use Policy and decision-making.
The following sections address major problem areas within this context. 21.3.1 Perspectives on “GIS” Most published definitions of “geographic information system” refer to both data and operations, as in “a system for input, storage, manipulation, analysis, and output of geographically referenced information”. In turn, geographically referenced information can be defined fairly robustly as information linked to specific locations on the Earth’s surface. This definition suggests two tests that can be applied to a software package to determine whether it is a GIS: the integration of a broad enough range of functions, and the existence of geographic references in the data. Clearly the first is less robust than the second, and there have been many arguments about whether computer-assisted design (CAD) or automated mapping functions are sufficiently broad to qualify packages for the title “GIS”.
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
263
At this time, several hundred commercial and public-domain packages meet these qualifications, and the GIS software industry is enjoying high rates of growth in annual sales which now amount to perhaps $500 million per year. However, the majority of these sales are in applications like parcel delivery, infrastructure facilities management, and local government, rather than science. Moreover, the term “GIS” has come to mean much more than is implied by this narrow definition and test. At its broadest, “GIS” is now used to refer to any and all computer-based activities that focus on geographic information; “GIS data” is often used as shorthand for digital geographic information; and the redundant “GIS system” is becoming the preferred term for the software itself. One can now “do GIS”, specialise in GIS in graduate programs, and subscribe to the magazine GIS World. A further and largely academic perspective is important to understanding all of the ramifications of current usage. In many areas of computer application, such as corporate payroll or airline reservations, the objects of processing are discrete and well-defined. On the other hand many geographically distributed phenomena are infinitely complex, particularly those that are naturally occurring as distinct from constructed by humans. Their digital representations are thus necessarily approximations, and will often embed subjective as well as objective aspects. The use of digital computers to analyse such phenomena thus raises a series of fundamental and generic scientific issues, in areas ranging from spatial statistics to cognitive science. The GIS research community has begun to describe its focus as “geographic information science” (Goodchild, 1992), emphasising the distinction between the development of software tools on the one hand, and basic scientific research into the issues raised by the tool on the other. In summary, three distinct perspectives are identifiable in current use of the term “GIS”: 1. GIS as geographic information system, a class of software characterised by a high level of integration of those functions needed to handle a specific type of information. 2. GIS as an umbrella term for all aspects of computer handling of geographic data, including software, data, the software development industry, and the research community. 3. GIS as geographic information science, a set of research issues raised by GIS activities. From the first perspective, we can identify a range of advantages and disadvantages of GIS as a software tool for global change research. Some of these can be seen as technical impediments, implying that further research and development of the software may remove them. Others are more fundamental, dealing with the problems of large-scale software integration and the adoption of such solutions within the research community. In this area, it may be possible to draw parallels between GIS and other software toolkits, such as the statistical packages, or database management systems, or visualisation packages. In each of these cases, the average researcher faces a simple make-or-buy decision—is it preferable to write one’s own code, or to obtain it? The answer can be very different depending on the discipline of the researcher, the cost of the software, and its ease of use. In the specific case of GIS, the following issues seem important: • Ease of use: How much learning is needed to make use of the software? Will it be quicker to learn the package or to write code, or to find code written for this exact problem by some colleague? Many GIS are reputed to be difficult to use, and GIS courses require a heavy investment of time. On the other hand it may be preferable to rely on a GIS than to write code in an area unfamiliar to the researcher, such as map projections. • Cost: Many researchers are averse to purchasing application software, although they expect to pay for more generic packages such as operating systems and compilers. Many commercial GIS have a high
264
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
price-tag. A GIS will be considered worth the investment if it is perceived as needed by a large enough proportion of the research community, like a statistical package. • Software integration: Is the level of software integration in a GIS viable? A researcher needing to solve a problem in map projections might prefer a public-domain map projection package to a GIS that offers the same functionality bundled into a much more costly and complex package; the same argument could be made in the context of spatial interpolation techniques. Any generic, integrated tool imposes a cost on its users because it cannot achieve the same performance as a tool designed for a specific purpose, so this cost must be compared to the benefits of integration. • Terminology: Does the GIS use terms familiar to the researcher, or does use of GIS require the researcher to embrace an entirely unfamiliar culture very different from his or her own? Researchers see time as a fixed resource, and fear that adoption of any new technology will be at the expense of other areas of expertise. If we adopt the second meaning of GIS above, the world of GIS seems very different. Other geographic information technologies, such as remote sensing and GPS, now fall under the GIS umbrella, and the use of GIS is no longer an issue: global change research has no choice but to use computers and digital data; and the vast majority of the types of data needed for global change research are geographically referenced. From this viewpoint, we face a much broader set of issues, including: • Requirements for computer-based tools in support of global change research, focusing in particular on the need to model dynamic processes in a variety of media, together with relevant boundary conditions and interfaces. • The need for interoperability between tools, to allow users of one collection of tools to share data and methods of analysis with users of another collection—and associated standards of format, content description, and terminology to promote interoperability. • The need to harmonise approaches to data modelling, defined as the entities and relationships used to create digital representations of real geographic phenomena. The current variation in data modelling practice between software developers, the various geographic information technologies, and the different disciplines of global change research is a major impediment to effective use of GIS. • The accessibility of data, including measurements shared between scientists; and data assembled by governments for general purposes and useful in establishing geographic reference frameworks and boundary conditions for modelling. • The role of visualisation and other methods for communicating results between global change researchers and the broader communities of decision-makers and the general public. The third perspective above defines GIS as a science, with its own subject matter formed from the intersection of a number of established disciplines. From this perspective global change research is an application area with an interesting set of issues and priorities, many of which fall within the domain of geographic information science. These include the modelling of uncertainty and error in geographic data; the particular problems of sampling, modelling, and visualising information on the globe; and the development of abstract models of geographic data. Of the three, the second meaning of GIS is perhaps the most appropriate to a discussion of the multiple roles of GIS in global change research, as it provides a more constructive perspective than the first, and a greater sensitivity to context than the third. All three are necessary, however, in order to understand the full range
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
265
of viewpoints being expressed both within and outside the GIS community, and the research that needs to be done to move GIS forward. 21.3.2 Global change research communities “What is this GIS anyway?” may be the question uppermost in the minds of many global change researchers, but it is quickly supplanted when one realises that the multiple roles of GIS in global change research extend well beyond the immediate needs of scientists for computerised tools. First, global change is a phenomenon of both physical and human systems. Many of the changes occurring in the Earth’s physical environment have human origins, and thus mechanisms for their prediction and control are more likely to fall within the domain of the social sciences. Moreover, many would argue that when measured in terms of their impacts on human society, the most important changes to the globe are not those currently occurring in its physical environment, but are economic and political in nature. The issues raised by computerised tools are very different in the social sciences. Second, the need to integrate physical and social science in order to understand global change creates its own set of priorities and issues. Not only are the characteristics of data different, but the differences in the scientific cultures and paradigms of physical and social science create enormous barriers to communication that are exacerbated by the formalisms inherent in GIS. A recurring theme in global change research is the need to build effective connections between science and policy. Complaints about the lack of connections surface whenever the US Congress is asked to approve another major investment in global data collection, such as NASA’s Mission to Planet Earth. Several obvious factors are to blame: scientists are not trained to present their results in forms that can be readily understood by policy-makers; decisions must be made quickly, but science marches to its own timetable; the scientific culture does not provide adequate reward for communicating with policy-makers. GIS as broadly understood is widely believed to have a role to play in this arena. It is visual, providing an effective means of communicating large amounts of information; it is already widely used as a common tool by both the scientific and policy communities; and it supports the integration of information of various sources and types. One of the biggest impediments to progress in global change research, perhaps the biggest of all, is the general public’s reluctance to accept global environmental change as a major problem requiring the commitment of massive research effort and the development of effective policy. As GIS becomes more widely available, through the Internet, World Wide Web, home computers, and other innovations in digital technology that impact the mass market, the same arguments made above about the roles of policy-makers will become applicable to the general public. In summary, three major communities should be considered in examining the roles of GIS in global change research: scientists, policy-makers, and the general public. Each creates its own set of priorities for GIS, and its own set of impediments. Another recurring theme in global change research is the potential role of the general public in collecting data. The GLOBE project (Global Learning and Observations for a Better Environment; http:// globe.fsl.noaa.gov) is conceived along these lines as a network of schoolchildren around the world who will collect data on their own local environment, learning about it in the process, and then contribute those data to a central agency responsible for interpretation and synthesis. In turn, the central agency will return a global synthesis to the primary sources. In a sense, this concept promises to return us to the earliest days of environmental data collection, before the organisation of official climate measurement stations, and offers to give back to the general public the role then played by the amateur scientist. Although there are
266
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
substantial concerns about quality control, this concept offers perhaps the only feasible solution to the current dilemma faced by national governments who can no longer support dense networks for primary field data collection in the face of rising costs and steadily decreasing budgets. 21.3.3 Data issues Several issues associated with data arise in using GIS in support of global change research. First, all of the global change communities are affected by issues of data quality. In any multidisciplinary enterprise it is common for the data used by a scientist to have been collected, processed, manipulated, or interpreted by someone else prior to use, creating a demand for new mechanisms to assure quality that have not been part of traditional science. Tools are needed to model and describe quality; to compare data with independent sources of higher accuracy such as ground truth; to verify the output of models of global environmental change; and to support updating. Much of the necessary theory to support such tools has been developed in the past decade, and needs to be brought to the attention of the global change research community, implemented in readily available tools, and disseminated in education programs. Second, remote access to data must be supported by effective methods of data description, now commonly termed “metadata”. Search for suitable data can be seen as a matching process between the needs of the user and the available supply, both represented by metadata descriptions; and both user and supplier must share a common understanding of the terms of description. The advent of technology to support remote access, including the World Wide Web, has put enormous pressure on the community to develop appropriate methods of description and cataloguing. Techniques need to be improved to support content-based search for specific features, and there are many other technical issues to be addressed in this rapidly developing area of spatial database technology. Third, issues arise over the institutional arrangements necessary to support free access to global change research data, and the concerns for copyright, cost recovery, and legal liability that are beginning to impact the use of communications technology. While much data for global change research is unquestionably for scientific purposes, other data are also useful for commercial and administrative purposes, and in many cases these tend to dictate access policies. Fourth, there are a number of issues under the rubric of facilitating input, output, and conversion. These include interoperability, the lack of which is currently a major contributor to GIS’s difficulty of use and a major impediment to data sharing among scientists. Interoperability can be defined by the effort and information required to make use of data and systems; in an interoperable world, much of what we now learn in order to make use of GIS will be unnecessary or hidden from the user. An important role in this arena is being played by the Open Geodata Interoperability Specification initiative (http://www.ogis.org). 21.3.4 Data models and process models The term “model” is used in two very different contexts in environmental modelling. A process model is a representation of a real physical or social process whose action through time results in the transformation of the human or physical landscape. For example, processes of erosion by wind and flood modify the physical landscape; processes of migration modify the human landscape. A process model operates dynamically on individual geographic entities. Here we should distinguish between process models that define the dynamics of continuous fields, such as the Navier-Stokes equation, and must be rewritten in approximate, numerical
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
267
form to operate on discrete entities; and models such as Newton’s law of gravitation or individual-based models in ecology that operate directly on discrete entities. A data model, on the other hand, is a representation of real geographic variation in the form of discrete entities, their attributes, and the relationships between them. Many distinct data models are implemented in GIS, ranging from the arrays of regularly spaced sample points of a digital elevation model (DEM) to the triangular mesh of the triangulated irregular network (TIN) model. Under these definitions, there is clearly a complex and important relationship between data modelling and process modelling. In principle, the entities of a process model are defined by the need to achieve an accurate modelling of the process. In practice, the entities of a data model are often the outcome of much more complex issues of cost, accuracy, convenience, the need to serve multiple uses that are frequently unknown, and the availability of measuring instruments. An atmospheric process model, for example, might require a raster representation of the atmospheric pressure field; the only available data will likely be a series of measurements at a sparse set of irregularly located weather stations. In such cases it is likely the data will be converted to the required model by a method of intelligent guesswork known as spatial interpolation, but the result will clearly not have the accuracy that might be expected by a user who was not aware of the data’s history. Such data model conflicts underlie much of the science of global change research, and yet their effects are very difficult to measure. The availability of data is often a factor in the design of process models, particularly in areas where the models are at best approximations, and distant from well-understood areas of physical or social theory. We rarely have a complete understanding of the loss of accuracy in modelling that results from use of data at the wrong level of geographic detail, or data that has been extensively resampled or transformed. Clearly the worlds of data modelling and process modelling are not separate, and yet practical reality often forces us to treat them as if they were. 21.3.5 Levels of specificity Another key issue in data modelling can be summed up in the word specificity. While there may be agreement that data modelling requires the definition of entities and relationships, there is much greater variation in the degree to which those entities and relationships must be specified, and the constraints that affect specification. One set of constraints is provided by the various models used by database management systems. The hierarchical model, for example, requires that all classes of entities be allocated to levels in a hierarchy; and that relationships exist only between entities at one level and those at the level immediately above or below. If these constraints are acceptable, then a database can be implemented using one or another of the hierarchical database management systems that are readily available. While the model seems most applicable to administrative systems, and has now been largely replaced by less constrained models, it has been found useful for geographic data when the collection of simple entities into more complex aggregates is important—for example, in the ability to model an airport at one scale as a point, and at a finer scale as a collection of runway, hangars, terminal, etc. The most popular model for geographic data is the relational, and its implementation for geographic data is often termed georelational. Relationships are allowed between entities of the same class, or between entities in different classes, and this is often used to model the simple topological relationships of connectedness and adjacency that are important to the analysis of geographic data. But even georelational models impose constraints that may be awkward in geographic data modelling.
268
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
For many Earth system scientists, the important modelling frameworks are the ones implemented in the various statistical and mathematical packages, which are much more supportive of complex process modelling than GIS and database management systems. Matlab and S-Plus, for example, have their own recognised classes of entities and relationships, and impose their own constraints. Thus to an Earth system scientist, the task of data modelling may consist of a matching of entities and relationships to those classes supported by a common modelling package; whereas a GIS specialist may be more concerned with matching to the constraints of the georelational model. The entity types supported by a modelling or statistical package will likely include simple tables of data, and arrays of raster cells, but not the full range of geographic data types implemented in the more advanced GIS, with their support for such geographic functions as projection change and resampling, and with implementations of data model concepts like planar enforcement and dynamic segmentation. Choices and constraints may also be driven by the nature of data—a field whose primary data comes mostly from remote sensing will naturally tend to think in terms of rasters of cells, rather than vector data, and to the attributes of a cell as averages over the cell’s area rather than samples at the cell’s centre. The georelational model imposes one level of constraints on data modelling. Further constraints are imposed by the practice of giving certain application-specific interpretations to certain elements of data models. For example, many GIS implement the relational model in specific ways, recognising polygons, points, or nodes as special types within the broad constraints of the relational model. This issue of specificity, or the imposition of constraints on data modelling, contributes substantially to the difficulty of integrating data across domains. For example, the data modelling constraints faced by an oceanographer using Matlab are very different from those of a GIS specialist using ARC/INFO. One might usefully try to identify the union of the two sets, or their intersection, in a directed effort at rationalisation. 21.3.6 Generalisations of GIS data models It is widely accepted that GIS data models have been developed to support an industry whose primary metaphor is the map—that is, that GIS databases are perceived as containers of maps, and that the task of data modelling is in effect one of finding ways of representing the contents of maps in digital form. Maps have certain characteristics, and these have been largely inherited by GIS. Thus maps are static, so GIS databases have few mechanisms for representing temporal change; they are flat, so GIS databases support a wide range of map projections in order to allow the curved surface of the Earth to be represented as if it were flat; they are two-dimensional, so there are few GIS capabilities for volumetric modelling; they are precise, so GIS databases rarely attempt to capture the inherent uncertainty associated with maps, but almost never shown on them; and they present what appears to be a uniform level of knowledge about the mapped area. There are many possible extensions to this basic GIS data model, with varying degrees of relevance to global change research. The five points made above lead directly to five generalisations: • temporal GIS, to support spatio-temporal data and dynamic modelling (Langran, 1992); • spherical GIS, avoiding the use of map projections by storing all data in spherical (or spheroidal) coordinates; computing distances and areas and carrying out all analysis procedures on the sphere; and using the projection for display (Goodchild and Yang, 1992; Raskin, 1994; Whiter et al., 1992); • 3D GIS, to support modelling in all three spatial dimensions (Turner, 1992);
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
269
• support for modelling the fuzziness and uncertainty present in data; propagating it through GIS operations; and computing confidence limits on all GIS results (Heuvelink and Burrough, 1993); • methods of analysis that allow for variable quality of data. The spherical data models are clearly of relevance to global change research, but their benefits need to be assessed against the costs of converting from more familiar representations such as the latitude/longitude grid. Methods of spatial interpolation, which are widely used in global change research to resample data and to create approximations to continuous fields from point samples, are particularly sensitive to the problems that arise in using simple latitude/longitude grids in polar regions and across the International Date Line. On the other hand, the benefits of consistent global schemes may be outweighed by the costs of converting from less ideal but more familiar schemes. 21.3.7 The data modelling continuum The literature contains several discussions of the various stages that lie between reality and a digital database: from reality to its measurement in the form of a spatial data model, to the additional constraints imposed by a digital data model, to a data structure, to the database itself. For example, the sharp change in temperature that occurs along a boundary between two bodies of water might be first modelled as a curved line (perhaps by being drawn as such on a map); the curved line would then be represented in digital form as a polyline, or a set of straight-line connections between points; the polyline would be represented in a GIS database as an arc; and the arc would be represented as a collection of bits. Modelling and approximation occurs at each of these four stages except perhaps the last. The polyline, for example, may be no better than a crude approximation to the continuous curve, which is itself only an approximation to what is actually a zone of temperature change. It is important to recognise that approximation and data modelling occur even before the use of digital technology. 21.3.8 The data life cycle Related to the previous concept of a data modelling continuum is the data life cycle, which is conceived as the series of transformations that occur to data as it passes from field measurement to eventual storage in an archive. In a typical instance, this life cycle may include measurement, interpretation, collation, resampling, digitising, projection change, format change, analysis, use in process modelling, visualisation, exchange with other researchers, repetition of various stages, and archiving. The data model may change many times, with consequent change in accuracy. Moreover, data quality is more than simply accuracy, since it must include the interpretation placed on the data by the user. If data pass from one user to another, that interpretation can change without any parallel change in the data, for example if documentation is lost or misinterpreted. In this sense, data quality can be defined as a measure of the difference between the contents of the data, and the real phenomena that the data are understood to represent—and can rise and fall many times during the life cycle, particularly in applications that involve many actors in many different fields. It is very easy, for example, for data collected by a soil scientist, processed by a cartographer, analysed by a geographer, and used for modelling by an atmospheric scientist, to be understood by the various players in very different ways.
270
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
21.3.9 Information management Recent advances in digital communication technology, as represented by the Internet, and applications such as the World Wide Web (WWW), have created a situation in which there is clearly an abundance of digital data available for global change research, but few tools exist to discover suitable information or assess its fitness for use. Much effort is now going into development of better tools for information management, in the form of digital libraries, search engines, standards for data description, and standards for data exchange. Several recent developments in geographic information management are of relevance to global change research and GIS. While the Federal Geographic Data Committee’s Content Standard for Geospatial Metadata (http://www.fgdc.gov/Metadata/metahome.html) has attracted much attention since its publication in 1994, the effort required to document a data set using it is very high, particularly for owners of data who may have little familiarity with GIS or cartography. If the purpose of metadata is to support information discovery, search, browse, and determination of fitness for use, then much less elaborate standards may be adequate, at least to establish that a given data set is potentially valuable. At that point the potential user may want to access a full FGDC record, but if the owner of the data has not been willing to make the effort to document the data fully, other mechanisms such as a phone or e-mail conversation may be just as useful, and more attractive to the owner. Scientists appear reluctant to document data without a clear anticipation that it will be used by others. However, it may be that funding agencies will begin to require documentation as a condition for successful termination of a project. Otherwise, documentation to standards like FGDC may have the character of an unfunded burden. An owner of data may be willing to provide an initial contribution of metadata to a data catalogue. But if the data are later modified, or deleted, are there suitable mechanisms for ensuring that the catalogue reflects this? Users of the WWW are acutely aware of the problems caused by “broken” URLs (Universal Resource Locators) and similar issues. Although it might be possible to provide facilities for checking automatically whether a data set has been modified, owners may not be willing to accept this level of intrusion. Another issue associated with distributed information management that is already affecting the global change research community concerns the use of bandwidth. The communication rates of the Internet are limited, and easily made inadequate by fairly small geographic data sets. Research is needed to develop and implement methods that reflect more intelligent use of bandwidth, including progressive transmission (sending first a coarse version of the data, followed by increasingly detailed versions) and the use of special coarse versions for browse. While methods already exist for certain types of raster images, there is a need to extend them to cover all types of geographic data. A final information management issue of major importance to global change research is interoperability. Today, transfer of data from one system to another frequently requires that the user invoke some procedure for format conversion. While such procedures may not be complex, they present a considerable impediment to data sharing and the research it supports. In principle, the need for conversion should not involve the user, any more than it does in the automatic conversion of formats that is now widely implemented in word processors—the user of Microsoft Word, for example, will probably not need to know the format of a document received from someone else, although conversion still needs to occur. Achievement of interoperability between the software packages used to support global change research should be a major research objective. Reasonable goals for interoperability research might include the following: • interoperability between representations of imagery tied to the Earth’s surface—this might include recognition of a common description language that can be read automatically, and used to perform
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
271
necessary operations such as resampling to a common projection; interoperability between bandsequential and band-interleaved data; interoperability between different representations of spectral response, including different integer word lengths; • interoperability between data sets based on irregularly spaced point samples, allowing automatic interpolation to a raster, or resampling to another set of sample points; • interoperability between any data model representations of continuous fields over the Earth’s surface. 21.4 CONCLUSION Several themes from this discussion are of sufficient generality to warrant revisiting in this concluding section. Data models lie at the heart of GIS, because they determine the ways in which real geographic phenomena can be represented in digital form, and limit the processing and modelling that can be applied to them. GIS has inherited its data models from an array of sources, through processes of legacy, metaphor, and commercial interest, and there is a pressing need for greater recognition of the role of data models, better terminology, and a more comprehensive perspective. A second strong theme is interoperability. Interest in this area stems largely from the widespread acceptance of the notion that despite its abundant functionality, GIS is hard to use, particularly in exploiting its potential for integrating data from a variety of sources. Even though we now have a range of format standards to help us in exchanging data, and every GIS now supports a range of alternative import and export formats, the fact remains that transfer of data from one system to another is far more time-consuming and complex than it need be. Moreover, every system has its own approach to user interface design, the language of commands, and numerous other aspects of “look and feel” that help to create a steep learning curve for new users. The discussion identified several areas where current techniques of spatial analysis are inadequate for global change research. One is the availability of techniques for analysis of phenomena on a spherical surface; too many methods of spatial analysis are limited to a plane, and are not readily adapted to the globe. A survey of existing techniques for spatial analysis on the sphere has been published as an NCGIA technical report (Raskin, 1994). In August 1995 NCGIA began a project to develop improved methods for spatial interpolation, including methods for the sphere, that incorporate various kinds of geographic intelligence. These “smart” interpolators will go beyond the traditional generic types such as kriging and thin plate splines by attempting to model processes known to affect geographic distributions of specific phenomena, and to take advantage of known correlations. The current status of the work can be reviewed at http://www.geog.ucsb.edu/~raskin. With funding from ESRI (Environmental Systems Research Institute) and CIESIN (Consortium for International Earth Science Information Network), NCGIA constructed the first consistent global database of population based on a regular grid. The database was completed in 1995, and is being distributed for use in studies which integrate human and physical processes of global change, and thus need demographic data on a basis compatible with most physical data sets. The work was led by Waldo Tobler, with assistance from Uwe Deichmann and Jonathan Gottsegen. It uses a range of techniques of spatial analysis for disaggregating and reaggregating census population counts from arbitrary regions to grid cells. The work is documented in an NCGIA Technical Report (Tobler et al., 1995). Another general issue is the need to understand the influence of national government policy and other dimensions of the policy context on the availability of spatial data. This issue has recently come to the fore in arguments about access to climate records, under the auspices of the WMO (World Meteorological
272
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Organisation). Other debates are occurring in the context of the Internet, and its implications for intellectual property rights and the international market for data. Efforts such as the US Department of State-led Earthmap (http://www.gnet.org/earthmap), the Japanese Millionth Map, and the international community’s Core Data are attempting to coordinate base mapping around the world and achieve a higher level of availability for digital framework data in the interests of global change research (Estes et al., 1995). Other efforts, such as the Alexandria Digital Library (ADL) project (http://alexandria.ucsb.edu) are seeking general solutions to the problems of finding geographic data on the Internet. While much of the discussion of this chapter has been motivated by the specific context of global change research, similar concerns arise in considering the potential roles of GIS in any area of science. Global change research is particularly complex, involving many disciplines, and of great public interest, so there are good reasons for suggesting that it might form a useful model for the scientific uses of GIS generally. ACKNOWLEDGEMENTS The National Center for Geographic Information and Analysis is supported by the National Science Foundation through Cooperative Agreement SBR 88–10917. We acknowledge support for the two I15 specialist meetings from the US Geological Survey. The Alexandria Digital Library project is also supported by the National Science Foundation through Cooperative Agreement IRI 94–11330. The assistance of John Estes, Kate Beard, Tim Foresman, Dennis Jelinski, and Jenny Robinson in co-leading I15 is gratefully acknowledged. Ashton Shortridge also helped to organise the two specialist meetings and to prepare the reports. REFERENCES ALLEN, T.F.H. and STARR, T.B. 1982. Hierarchy: Perspectives for Ecological Complexity. Chicago: University of Chicago Press. BOTKIN, D.B. 1989. Science and the global environment, in Botkin, D.B., Caswell, M.F., Estes, J.E. and Orio, A.A. (Eds.) Changing the Global Environment: Perspectives on Human Involvement. New York: Academic Press, pp. 3–14. BURKE, I.C., KITTEL, T.G.F., LAURENROTH, W.K., SNOOK, P., YONKER, CM. and PARTON, W.J. 1991. Regional analysis of the Central Great Plains, Bioscience, 41(10) pp. 685–692. CARLETON, A.M., TRAVIS, D., ARNOLD,D., BRINEGAR, R, JELINSKI, D.E. and EASTERLING, D.R. 1994. Climatic-scale vegetation-cloud interactions during drought using satellite data, InternationalJournal of Climatology, 14(6), pp. 593–623. COLE, S., and BATTY, M. 1992. Global Economic Modeling and Geographic Information Systems: Increasing our Understanding of Global Change. Buffalo, NY: National Center for Geographic Information and Analysis, State University of New York at Buffalo. COMMITTEE ON EARTH SCIENCES (CES) 1989. Our Changing Planet: A US Strategy for Global Change Research. Washington, DC: Federal Coordinating Council for Science, Engineering, and Technology, Office of Science and Technology Policy. COMMITTEE ON EARTH SCIENCES (CES) 1990. Our Changing Planet—The FY 1990 Research Plan. Washington, DC: Federal Coordinating Council for Science, Engineering, and Technology, Office of Science and Technology Policy. COMMITTEE ON EARTH AND ENVIRONMENTAL SCIENCES (CEES). 1991. Our Changing Planet—The FY 1992 US Global Change Research Program. Washington, DC: Federal Coordinating Council for Science, Engineering, and Technology, Office of Science and Technology Policy.
MULTIPLE ROLES OF GIS IN GLOBAL CHANGE RESEARCH
273
COMMITTEE ON THE HUMAN DIMENSIONS OF GLOBAL CHANGE. 1992. Report of the Committee on the Human Dimensions of Global Change, in Stern, P.C., Young, O.R. and Druckman, D. (Eds.) Global Environmental Change: Understanding the Human Dimensions, Washington, DC: National Academy Press. EARTH SYSTEM SCIENCES COMMITTEE (ESSC) 1986. Earth System Science Overview: A Program for Global Change. Washington, DC: National Aeronautics and Space Administration. EARTH SYSTEM SCIENCES COMMITTEE (ESSC). 1988. Earth System Science: A Closer View. Washington, DC: National Aeronautics and Space Administration. ESTES, I.E., LAWLESS, J. and MOONEYHAN, D.W. (Eds.) 1995. Proceedings of the International Symposium on Core Data Needs for Environmental Assessment and Sustainable Development Strategies, Bangkok, Thailand, Nov. 15–18, 1994. Sponsored by UNDP, UNEP, NASA, USGS, EPA, and URSA. FEITELSON, E. 1991. Sharing the globe: the role of attachment to place, Global Enviromental Change, 1, pp. 396–406. GOODCHILD, M.F. 1991. Integrating GIS and environmental modeling at global scales, in Proceedings, GIS/LIS 91,1, Washington, DC: ASPRS/ACSM/AAG/URISA/AMFM, pp. 117–127, GOODCHILD, M.F. 1992. Geographical information science, International Journal of Geographical Information Systems, 6(1), pp. 31–46, GOODCHILD, M.F., and YANG, S. 1992. A hierarchical spatial data structure for global geographic information systems, CVGIP-Graphical Models and Image Processing, 54(1), pp. 31–44. GOODCHILD, M.F., PARKS, B.O., and STEYAERT, L.T. (Eds.) 1993. Environmental Modeling with GIS. New York: Oxford University Press. GOODCHILD, M.F., ESTES, I.E., BEARD, K.M., FORESMAN, T. and ROBINSON, J. 1995. Research Initiative 15: Multiple Roles for GIS in US Global Change Research: Report of the First Specialist Meeting, Santa Barbara, California, March 9–11, 1995. Technical Report 95–10. Santa Barbara, CA: National Center for Geographic Information and Analysis. GOODCHILD, M.F., ESTES, I.E., BEARD, K.M. and FORESMAN, T. 1996. Research Initiative 15: Multiple Roles for GIS in US Global Change Research: Report of the Second Specialist Meeting, Santa Fe, NM, January 25–26, 1996. Technical Report 96–5. Santa Barbara, CA: National Center for Geographic Information and Analysis. HALL, F.G., STREBEL, D.E. and SELLERS, P.J. 1988. Linking knowledge among spatial and temporal scales: vegetation, atmosphere, climate and remote sensing, Landscape Ecology, 2 pp. 3–22. HAY, L.E., BATTAGLIN, W.A., PARKER, R.S., and LEAVESLEY, G.H. 1993. Modeling the effects of climate change on water resources in the Gunnison River basin, Colorado, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.), Environmental Modeling with GIS. New York: Oxford University Press, pp. 173–181. HEUVELINK, G.B.M., and BURROUGH, P.A. 1993. Error propagation in cartographic modelling using Boolean logic and continuous classification , International Journal of Geographical Information Systems, 7(3), pp. 231–246. INTERNATIONAL COUNCIL OF SCIENTIFIC UNIONS (ICSU) 1986. The International Geosphere-Biosphere Program —A Study of Global Change: Report No. 1. Final Report of the Ad Hoc Planning Group, ICSU Twentyfirst General Assembly, September 14–19, 1986. Bern, Switzerland: ICSU. INTERNATIONAL GEOSPHERE-BIOSPHERE PROGRAMME (IGBP) 1990. The International GeosphereBiosphere Programme —A Study of Global Change: The Initial Core Projects, Report No. 12. Stockholm, Sweden: IGBP Secretariat. KING, A.W. 1991. Translating models across scales in the landscape, in Turner, M.G. and Gardner, R.H. (Eds.) Quantitative Methods in Landscape Ecology. New York: Springer Verlag. LANGRAN, G. 1992. Time in Geographic Information Systems. London: Taylor & Francis. LOVELAND, T.R., MERCHANT, J.W., OHLEN, D. and BROWN, J.F. 1991. Development of a land-cover characteristics database for the conterminous US, Photogrammetric Engineering and Remote Sensing, 57(11), pp. 1453–1463. MAGUIRE, D.W., GOODCHILD, M.F. and RHIND, D.W. (Eds.). 1991. Geographical Information Systems: Principles and Applications. London: Longman Scientific and Technical . MOUNSEY, H.M. (Ed.) 1988. Building Databases for Global Science. London: Taylor & Francis.
274
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
NATIONAL AERONAUTICS AND SPACE ADMINISTRATION (NASA) 1991. BOREAS (Boreal EcosystemAtmosphere Study): Global Change and Biosphere-Atmosphere Interactions in the Boreal Forest Biome, Science Plan. Washington, DC: NASA. NATIONAL CENTER FOR GEOGRAPHIC INFORMATION AND ANALYSIS (NCGIA). 1989. The research agenda of the National Center for Geographic Information and Analysis, International Journal of Geographical Information Systems, 3, pp. 117–136. NATIONAL CENTER FOR GEOGRAPHIC INFORMATION AND ANALYSIS (NCGIA). 1992. A Research Agenda for Geographic Information and Analysis. Technical Report 92–7, Santa Barbara, CA: National Center for Geographic Information and Analysis. NATIONAL RESEARCH COUNCIL (NRC) 1986. Global Change in the Geosphere-Biosphere, Initial Priorities for an IGBP, Washington, DC: US Committee for an International Geosphere-Biosphere Program, National Academy of Sciences. NATIONAL RESEARCH COUNCIL (NRC) 1990. Research Strategies for the US Global Change Research Program. Washington, DC: Committee on Global Change, US National Committee for the IGBP. National Academy of Sciences. NEMANI, R.R, RUNNING, S.W., BAND, L.E. andPETERSON, D.L. 1993. Regional hydro-ecological simulation system—an illustration of the integration of ecosystem models in a GIS, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.) Environmental Modeling with GIS. New York: Oxford University Press, pp. 296–304. NYERGES, T.L. 1993. Understanding the scope of GIS: its relationship to environmental modeling, in Goodchild, M.F., Parks, B.O. and Steyaert, L.T. (Eds.) Environmental Modeling with GIS. New York: Oxford University Press, pp. 75–93. PIELKE, R.A. and AVISSAR, R. 1990. Influence of landscape structure on local and regional climate, Landscape Ecology, 4, pp. 133–155. PRICE, M.F. 1989. Global change: defining the ill-defined, Environment, 31, pp. 18–20. RASKIN, R. 1994. Spatial Analysis on the Sphere. Technical Report 94–7. Santa Baibara, CA: National Center for Geographic Information and Analysis. RUNNING, S.W. and COUGHLAN, J.C. 1988. A general model of forest ecosystem processes for regional applications. I: hydrologic balance, canopy gas exchange, and primary production processes, Ecological Modelling, 42, pp. 125–154. RUNNING, S.W., NEMANI, R.R., PETERSON, D.L., BAND, L.E., POTTS, D.F., PIERCE, L.L. and SPANNER, M.A. 1989. Mapping regional forest evapotranspiration and photosynthesis by coupling satellite data with ecosystem simulation, Ecology , 70, pp. 1090–11. TOBLER, W.R, DEICHMANN, U., GOTTSEGEN, J. and MALOY, K. 1995. The Global Demography Project. Technical Report 95–6. Santa Barbara, CA: National Center for Geographic Information and Analysis. TOWNSHEND, J.R.G. 1991. Environmental databases and GIS, in Maguire, D.J., Goodchild, M.F. and Rhind, D.W. (Eds.) Geographical Information Systems: Principles and Applications, 2, London: Longman, pp. 201–216. TURNER, A.K. 1992. Three-Dimensional Modeling with Geoscientific Information Systems, Dordrecht: Kluwer. TURNER, B.L. II, KASPERSON, RE., MEYER, W.B., DOW, K.M., GOLDING, D., KASPERSON, J.X., MITCHELL, R.C. and RATICK, S.J. 1990. Two types of global environmental change: definitional and spatialscale issues in their human dimensions, Global Environmental Change, 1, pp. 15–22. WHITE, D., KIMERLING, A.J., and OVERTON, W.S. 1992. Cartographic and geometric components of a global sampling design for environmental monitoring , Cartography and Geographic Information Systems, 19(1), pp. 5–22.
Chapter Twenty Two Remote Sensing and Urban Analysis Hans-Peter Bähr
22.1 REMOTE SENSING AND ACQUAINTED TECHNOLOGY Remote sensing is defined as a “technology for acquiring information from distant objects without getting in direct contact, taking the electromagnetic spectrum as the transport medium”. We shall in the following restrict this very broad and generally accepted definition to imaging techniques. Remote sensing includes techniques for imaging both from satellites and airborne platforms. Therefore photogrammetry, which as long as 80 years ago had already developed mapping techniques for urban areas using aerial photography (see Schneider, 1974), is a well-established subset of remote sensing and not a separate field of activity (see Bähr and Vögtle, 1998). Remote sensing is proving to be particularly useful for urban analysis for the following reasons: 1. Imagery, as a function of scale, shows very detailed information in non-generalised mode, ranging from 3D geometry of buildings to contours of urban environment. 2. Imagery may be taken “upon request” according to the needs of the user. This is particularly true for aerial imagery: time, scale, stereoscopic parameters and spectral range may be preselected. 3. Image-based data processing offers advanced automatic procedures. 4. Both the variety of data and the advanced digital image processing techniques available are leading to considerable progress in urban analysis by remote sensing techniques. 22.2 IDENTIFYING RESEARCH TOPICS FOR REMOTE SENSING IN URBAN ANALYSIS During the GISDATA meeting in Strasbourg in June 1995 research topics were identified according to three major areas. The results are laid down in detail in the GISDATA Newsletter No. 6 (Craglia, 1996) and will not be given here again. Nevertheless, the issues which showed a special level of interest and which were strongly discussed are taken as a guideline for the following discussion.
276
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANLIC PERSPECTIVES
22.2.1 Remote Sensing Detection of Patterns in Relation to Processes A feature which is discussed again and again for many applications, particularly in urban analysis, is scale in relation to sensor resolution and object size. No general rule has been confirmed up to now and only heuristics are taken for practical use. For classical methodology like airborne photogrammetry, empirical rules exist for instance when drawing topographic maps by stereo restitution procedures. The algorithm, still applied today is (Heissler, 1954): m b=Scale factor image m k=Scale factor map It is noteworthy that this formula is empirical. Therefore we have to accept that the result for getting an adequate relation between object size on the one hand and image scale/sensor resolution on the other hand will also be empirical. A second topic of concern is segmentation of imagery in order to obtain GIS-objects in urban areas. The required methods are automatic procedures as far as possible. It applies not only to plain objects and landuse pattern in urban environment but also to 3D modelling. It is suggested that the task should be done by using multiple sensors from different origin. The multisensor concept is expected to give more reliable results than those obtained when taking only one sensor at the time (see Section 22.3.1). Finally it should be noted, that linguistic concepts should be taken more seriously into account for pattern detection in urban areas. However, this feature is more future oriented than the two aspects mentioned above (Bähr and Schwender, 1996). 22.2.2 Remote Sensing in Urban Dynamics Interestingly, a strong discussion is underway about which features can really be detected and analysed by remote sensing for urban dynamics. Because of its specific nature, remote sensing techniques are able to reveal physical patterns directly. The main concern of urban dynamics, of course, is the growth of cities. But this question is heavily dependent upon the definition of the limits of an urban cluster. This point was highlighted at the Strasbourg GISDATA meeting and it is clear that many definitions go far beyond physical parameters and shape (see paragraph 22.3.1). Another challenging point is the determination of optimum time and frequency when taking remote sensing data for change analysis. On-line observation of urban change will be an exception and may be applied for big construction areas like the “Potsdamer Platz” in Berlin during the period 1995–2000. In most cases samples for a specific time have to be taken and the situation in between has to be interpolated. The dream of urban analysts is forecasting physical changes. Here remote sensing may give a reliable base at least by nearly continuous spatial information. Nevertheless, forecasting is generally done at a very high risk and may be pure speculation. 22.2.3 Data integration (remote sensing, cartography, socio-economics) The opportunities created by remote sensing systems for urban analysis are increasing as an increasing amount of data from high resolution satellites comes on stream The MOMS-2 system on a space shuttle
REMOTE SENSING AND URBAN ANALYSIS
277
already provided data with a resolution of 4.5 m on the ground, while a series of new commercial satellites is being developed which may increase the resolution on the ground to about 1 m (Fritz, 1996; Jürgens, 1996). This may be especially important for urban applications. For cartography it is worthwhile to note that in nearly all countries digital map data bases are under development. The GIS user community should prepare for having digital data in vector form available in the very near future together with digital terrain models (DTMs, 2, 5D), and an increasing number of full 3D models of the earth’s surface. “Integration” requires common features of data. This means for instance that geometry, time, quality and topological relationship should be equivalent or at least be analytically comparable. In addition to “metric conditions”, non-metric conditions like standards and terminology have also to be taken into account. Integration of data requires models of data geometry, topology and semantics. The quality of the data model has to be in balance with the model of the process which is to be analysed. This means for instance, that a bad model for a dynamic process in urban analysis cannot be compensated by excellent data. On the other hand it makes no sense to design an excellent model for analysis when only weak data are available. 22.3 REMOTE SENSING FOR URBAN FEATURE EXTRACTION: SOME EXAMPLES This section takes “small scale” and “large scale” as different features for urban analysis. Urban analysis does not necessarily require large scale. The scale factor used for urban analysis has to be considered in relation to the respective application. When satellite imagery offered a very coarse resolution of about 70 meters (e.g. Landsat MSS), users strongly demanded higher resolution for applications in urban environment. Nevertheless, higher resolution showed that it yields new types of problems. Always remember: “No solution just by resolution”. 22.3.1 Small Scale Examples Satellite imagery is of increasing importance for remote sensing applications in urban environments. However, even the new sensor generation of about 5 m pixel size on the ground or smaller in case of the commercial series cannot necessarily be considered as “large scale”. Consequently, satellite data are characteristic for small scale urban analysis. In this field progress will be achieved by adding knowledge. A growing source of knowledge is the available data itself. This is particularly true for radar imagery because acquisition is independent of atmospheric conditions. A test has been done for segmentation of urban areas near Karlsruhe, Germany, merging five datasets of the optical and three of the microwave domain. It has been shown, that by “just combining” optical and microwave data results will not necessarily be improved. Consequently, for some classes pure microwave or pure optical data will yield the best results and for another group of classes a combination. The best result then is filtered out and displayed (see Bähr 1995 b; Foeller, 1994; Hagg and Bähr, 1997). It is not easy to check the quality of the obtained result. One possibility is to take an existing digital cartographic data base, if available. Figure 22.1 shows the differences from a land-use classification based on data fusion of microwave and optical data subtracted from a geo-database for the class “urban”. Differences, i.e. non-matching areas, are displayed in black. A general error is the so-called “mixed-pixel-
278
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANLIC PERSPECTIVES
Figure 22.1: “Urban area”: Differences (in black) between land use classification from merged satellite data (Foeller) and an existing cartographic database. (Area shows approx. 9 km×5 km)
effect” which occurs for the marginal parts of high-contrast objects, like streets or built-up areas neighboured by vegetation (e.g. linear features in Figure 22.1). This effect may be overlaid by residuals from non-matching geometry between the different data sets. Larger errors are for instance due to time differences of both data bases. In the case of Figure 22.1, the cartographic database had not been updated and consequently new built up areas are shown as “differences” (black patches). Moreover, the structure of errors based on image classification procedures is clearly displayed. Because of the typical mixed texture in urban areas (for instance due to gardens, plants, different types of roof cover etc.) it is very difficult if not impossible to show urban areas in “uniform blocks”. This problem may be overcome by a special procedure, based on Delaunay-Triangulation as suggested by Schilling and Vögtle (1996). Figure 22.2 shows the triangulation network of pixels classified as “urban”. In this case, the delineation of “urban areas” seems to be possible when executing an interpretation of the size and shape of the triangles. This then enables the automatic drawing of the real contours of the settlement areas. The result is given in Figure 22.3. The contours of the settlement areas are displayed in bold lines as the result of the image analysis explained above. The problem of defining “urban” or the “limits of urban areas” is widely discussed in geography and urban planning. The procedure shown here, based on Delaunay-triangulation and “intelligent” grouping, should be taken as a proposal for a machine-based contribution in this field.
REMOTE SENSING AND URBAN ANALYSIS
Figure 22.2: Delaunay-Triangulation for "urban pixels" (Area shows approx. 4×4 kms.)
279
280
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANLIC PERSPECTIVES
Figure 22.3: Contours of urban areas derived from analysis of triangles
22.3.2 Large Scale Example Generally speaking, results may be improved by adding knowledge. In this respect, existing topographic line maps may play an important role, though they may have not been used very often up to now. This is due to the fact that large digital cartographic data bases are still under development, and that imagery and line maps show basically very different representations of the “real world”, for reasons not discussed here (see Bähr et al., 1994). It has been shown that it is possible to take information of both image and map datasets for mutual analysis (Quint, 1997). It is however not possible to do this straightforwardly i.e. on the “iconic level”. The iconic display has to be transformed first into the “symbolic level” using for instance semantic networks. Figure 22.4 shows the data flow as developed by Quint. Once transformed into a hierarchical semantic net, the verification of objects found in both datasets is possible. For objects where a verification was not achieved, the classification has to analyse the reasons for non matching. The analysis of large scale imagery has been found to be much more confident when taking line maps as additional knowledge source. The final result is shown in Figure 22.5. It is evident that the combination of line maps and aerial imagery gives a new quality for urban remote sensing. Because of the very many objects in the urban environment and the variance of textural and radiometric patterns, remote sensing procedures are becoming increasingly sophisticated in urban analysis for both small and large scale, particularly if supported by an intelligent use of ancillary cartographic databases.
REMOTE SENSING AND URBAN ANALYSIS
281
Figure 22.4: Data flow for combined analysis in image and cartographic databases (Quint, 1997)
22.4 CONCLUSION 22.4.1 Available tools The increasing development of high resolution satellites i.e. satellites with a resolution ranging between 1 and 5 meters on the ground, promises to contribute widely to the needs of urban analysis. Growing diversity
282
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANLIC PERSPECTIVES
Figure 22.5: Map- supported image analysis for urban environment using semantic networks(Quint, 1997)
will also become the norm as many different countries will have their own national satellite system in orbit (Bähr, 1997a). Although urban planners and analysts have always requested high geometric resolution; one should not expect too much from data of this kind. A considerable amount of experience for geometrically high resolution data is already available from aerial photography. In the digital domain there is no difference in data extraction between satellite imagery of high resolution and aerial imagery. Consequently, from this viewpoint high resolution satellite imagery for urban analysis in principle does not offer completely new options. A more important feature is identified by the fact that digital maps are more and more common in the cartographic domain. Although this is not purely a remote sensing issue, one may count on digital maps being available for additional information when doing feature extraction from imagery in urban areas (Section 22.3.2). In both Europe and North America digital databases from the urban environment are already on the market or being produced. Hardware and software developments are also increasingly supporting image analysis in urban areas. Whilst the continuing fall in hardware costs is of major benefit, there is still considerable progress needed in automating image processing. For example, much work is still needed to derive 3D models of urban environment from stereoscopic imagery by automatic procedures.
REMOTE SENSING AND URBAN ANALYSIS
283
22.4.2 Application challenges From the very many real topics which are challenging at the moment urban analysis, the following four are of particular importance: 1. Developing countries: Remote sensing imagery from satellite platforms may provide relatively cheap (indirect) information about the demographic explosion. In this case, high resolution imagery from satellites may be a partial substitute for aerial photography. For modelling development of urban areas in the Third World, new concepts are urgently needed. 2. Urban pollution: Here again the models are the weak point. The dynamics of polluted air in an urban environment requires 3D models of cities. There is no other way to get them than by remote sensing using for instance correlation techniques or laser scan procedures. 3. Disaster management: Disasters always require very rapid information about location and extent. For many cases, like floods and earthquakes, the terrain will not be accessible by cars or other ground transportation media. Here only procedures from airborne platforms may be used. They should in principle allow on-line processing, i.e. location and extent of the damages should be detected automatically on-the-flight and then directly recorded to the ground. 4. Navigation/transportation: This topic is very close to commercialisation as financial return seems to be very fast. The whole field of navigation and transportation is a very big market. Strong international firms will occupy that market very quickly once the technology has matured. A technological challenge for remote sensing is continuous traffic flow monitoring even for larger urban areas. This could include information for drivers, suggestions for deviations, and generally assist in cutting down the cost of transportation. 22.4.3 Crucial factors Remote sensing in urban analysis requires digital tools and digital thinking in a GIS context. Completely new tools are available, and this should lead to completely new solutions (“do not try to the the old solutions by new tools!”). New features are, for example, automated processes and inclusion of options for decision making. GIS is knowledge based; this means that databases and decision processes for spatial information are potentially no longer separated as is the case with analogue data. The decision making process, which was formerly done by the operator from “knowledge” should—as far as possible—be delegated to the GIS. At the present time, the issues relating to data quality are frequently discussed. However, very often model quality is more crucial. Model in this context means the analytical description of the environmental phenomena. It makes no sense to use good data in weak urban models. In many cases it is not the fault of lacking or bad data that the result obtained is not acceptable. In an optimised process, data quality and model quality have to be well balanced. Digital geo-data acquisition, storage and analysis compared to conventional analogue methods should open up a completely new field of methodology. The step from analogue to digital is a “paradigm shift” (Ackermann, 1995). This step means revolution rather than evolution of technology. One has to admit that both producers and users are not yet fully prepared for a change.
284
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANLIC PERSPECTIVES
REFERENCES ACKERMANN, F. 1995. Digitale Photogrammetrie—Ein Paradigma-Sprung, Zeitschrift für Photogrammetrie und Fernerkundung, 63(3), pp. 106–115. BÄHR, H.-P. 1997a. Satellite image data: progress in acquisition and processing, in Altan, O. and Gründig, L. (Eds.) Proceedings of the Second Turkish-German Joint Geodetic Days, Berlin, pp. 83–94. BÄHR,H.-P. 1995b. Image Segmentation Methodology in Urban Environment-Selected Topics, paper presented at the GISD AT A Specialist meeting on Remote Sensing for Urban Applications, Strasbourg. BÄHR, H.-P. and SCHWENDER, A. 1996. Linguistic Confusion in Semantic Modelling. Wien: Internationale Gesellschaft für Photogrammetrie und Fernerkundung Commission V. BÄHR, H.-P., VÖGTLE, T. (Eds.) 1998. GIS for Environmental Monitoring—Selected Material for Teaching and Learning. Stuttgart: Schweizerbart Verlag. BÄHR, H.-P., QUINT, F. and STILLA, U. 1995. Modellbasierte Verfahren der Luftbildanalyse zur Kartenfortführung, ZPF-Zeitschrifl für Photogrammetrie und Fernerkundung 6/1995, p.224 ff. CRAGLIA, M. (Ed.). 1996. GISDATA Newsletter. No. 6. Sheffield: University of Sheffield, Department of Town & Regional Planning FOELLER, J. 1994.Kombination der Abbildungen verschiedener operationeller Satellitensensoren zur Optimierung der Landnutzungsklassifizierung. Diploma Thesis, unpublished, FRITZ, L.W. 1996. The era of commercial earth observation satellites, Photogrammetrie Engineering and Remote Sensing, January, 1/1996, pp. 36–45. HAGG, W. and BÄHR, H.-P. 1977. Land-use Classification Comparing Optical, Microwave Data and Data Fusion. Rio de Janeiro: IAG Scientific Assembly. HEISSLER, V. 1954. Untersuchungen uber den wirtschaftlich zweckmäßgsten Bildmaßstab bei Bildflügen mit Hochleistungsobjektiven, Bildmessung und Luftbildwesen, pp. 37–45, 67–79, 126–137. JÜRGENS, C. 1966. Neue Erdbeobachtungs-Satelliten liefern hochauflösende Bilddaten für GIS-Anwendungen, Geoinformation Systems, 6, pp. 8–11. QUINT, F. 1997. Kartengestützte Interpretation monokularer Luftbilder. Dissertation Universität Karlsruhe, in Deutsche Geodätische Kommission, 477, Serie C. SCHILLING, K.-J. and VÖGTLE, T. 1996. Satellite image analysis using integrated knowledge processing, in Kraus, K. and Waldhäusl, P. (Eds.), Proceedings of the XVIII Congress of the ISPRS, Vienna, Austria, July 1996. International Archive of Photogrammetry and Remote sensing, Vol. XXXI, Part B3, Commission III, pp. 752–757. SCHNEIDER, S. 1994. Luftbild und Luftbildinterpretation. Berlin: Walter de Gruyter.
Chapter Twenty Three From Measurement to Analysis: a GIS/RS Approach to Monitoring Changes in Urban Density Victor Mesev
23.1 INTRODUCTION The monitoring of urban land use change is undoubtedly one of the most challenging goals for GIS, remote sensing, and spatial analysis. Yet the rewards from the design of more precise and reliable urban monitoring methodologies are enormous from the point of view of local government planning, zoning, and management of resources and services (Kent et al., 1993). Indeed, the Official Journal of the European Communities claims that nearly 90 percent of the European population will soon live in areas designated as built-up urban. Information on the spatial structure of urban areas is therefore vital to the monitoring of contemporary processes of urban growth/decline in terms of population shifts, employment restructuring, changing levels of energy use, and increased pollution and congestion problems. The main problem with monitoring urban areas has always been with the initial creation of the digital maps on which urban monitoring scenarios are based. Problems associated with the acquisition and generation of accurate, reliable, and consistent urban spatial data have resulted in maps that have not completely kept pace with the needs of dynamic urban monitoring programs and models. Moreover, the choice in the type of spatial data has not always been fully justified or reflected the choice in the type of analysis. What is needed is effective mapping of the structure of urban areas to act as the baseline component in the assessment of the general sustainability of settlements. Effective mapping should not only be in terms of higher accuracy and consistency, but should also be directly responsive to appropriate spatial analysis techniques. All too frequently the measurement of urban areas is completely independent of analysis, and most work concentrates on one or the other, but rarely on both. What this chapter proposes is a cohesive two-part strategy that links the measurement of urban structure with the analysis of urban growth and density change. The first part of the strategy concentrates on generating urban measurements that define the structural properties of the characteristic mix of built-up and natural surfaces. These measurements are generated using a methodology that links conventional image classification with contextual-based GIS urban land use information. Essentially, supervised maximumlikelihood classifications are constrained by a series of GIS-invoked decision rules using disaggregated population census data (see for example Figure 23.1d) (Mesev et al., 1995). The constraints, which include stratification, training adjustments, and post-classification sorting, represent an innovative suite of techniques that link remote sensing with GIS, and as such contribute to the advancement in research on GIS/ RS integration (Star et al., 1991; Zhou, 1989). Results from this GIS/RS link have already shown marked improvements in classification accuracy, particularly at the level of residential density. The methodology is fully documented elsewhere (Mesev et al., 1998) so coverage in this chapter will be kept at a minimum.
286
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 23.1 : A Selection of “Urban” Spatial Data for the Settlement of Bristol: (a) SPOT HRV-XS satellite image, taken on 17th May 1988; (b) census tracts: enumeration districts; (c) postal sectors; (d) surface model based on enumeration district centroids
Instead, the final product of this GIS/RS link, urban classifications, will be used to reveal important spatial and temporal patterns in urban land use juxtaposition and changes in density profiles, both vital for effective urban monitoring programs. In order to do this, a spatial analysis approach is needed that can summarise urban density measurements and display the degree to which these measurements are a reflection of underlying urban processes. At the same time, in choosing the spatial analysis technique careful consideration needs to be given to one that can also fully exploit the specific advantages of urban mapping by remote sensing (de Cola, 1989). In other words, making analysis accountable to measurement. The choice made in this chapter was to adopt a technique initially developed by Batty and Kim (1992) and later modified to accept remotely-sensed data by Mesev et al. (1995). The technique is based on traditional urban density functions, specifically inverse
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
287
power functions, which have proved to be good statistical and graphical indicators of the degree to which urban density attenuates with distance from the city core (Mills, 1992). A modification is made to replace the standard power function with a parameter derived from fractal theory (see Batty and Kim, 1992). The assumption is that urban areas exhibit spatial patterns of form that are commensurate with fractal notions of self-similarity and scale-independence. In other words, the shape of settlements and the way they grow can be conveniently represented and consistently summarised across space and time by fractal dimensions (Batty and Longley, 1994). Again, only brief coverage will be given to this now established technique. Instead, support will be given to the contention that urban measurements from GIS/RS are the most appropriate type of source data from which to base fractal-derived density functions. Appropriate in the sense that the approach allows for flexibility in land use classification, variations in image spatial resolutions, and frequency in data capture. These three advantages over more standard source data, when linked with fractal-led density modelling, allow for objective and detailed measurements of spatial and temporal changes not only in the size, form and density of urban land uses for individual settlements but also for comparisons through time and across the urban hierarchy at the national and eventually the international scale. In summary, this chapter will examine two mutually dependent areas of research: 1. the classification of satellite imagery by extensive reference to urban-based data held within a GIS, and 2. the spatial analysis of urban density profiles using concepts from fractal geometry. The most prominent emphasis will be given on how measurement from (1) is most effectively modelled by analysis in (2). 23.2 URBAN MEASUREMENT 23.2.1 Traditional Urban Measurements The field of urban density function analysis boasts a vast literature pool containing research from subjects as diverse as econometrics, urban geography, regional planning, and civil engineering. However, many papers written on this topic frequently end with concerned comments over the practical relevance of their results (Mills and Tan, 1980). The concern is centred around the type and quality of urban measurements from which their analyses are empirically verified. Much of the earlier work on the empirical fitting of density functions was based on the assumption that population densities could be imputed from conventional zonal data such as census tracts (Figure 23.1b). By this approach, ordinal, interval and ratio census data are directly related to areal or volumetric data which are represented by simple choroplethic surfaces (Clark, 1951). Unfortunately, density was calculated as a gross density, which means that the entire area of the census tract was assumed to contain a uniform population distribution. Furthermore, density was inextricably linked to the size of the tract, where any changes in its areal size directly affected the value of its density. These problems were alleviated, to a certain degree, by using more disaggregated surfaces (Figure 23.1d) (see Latham and Yeates, 1970), where unpopulated areas of land could be filtered out in order to estimate a population net density. However, both census tracts, and their disaggregated derivatives were always constrained by the number of zonal units from core to periphery, typically under 15 for large cities, and as little as four for medium-sized ones (Zielinski, 1980). This means that density gradients were
288
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
commonly over-generalised and unable to reveal the full amount of variability in population and density changes. 23.2.2 GIS and RS Measurements With the more recent uptake of digital representations, including GIS and RS, urban measurements have now become much more extensive and detailed, as well as more accurate. Density values can now be calculated down to much finer tessellations, including individual image pixels. This has undoubtedly opened up many more possibilities for monitoring urban areas using various integrated digital applications (Barnsley et al., 1989; Langford et al., 1991; Jensen et al., 1994). Spatial data digitised from topographic maps have become the main back-cloth to many contemporary GIS-based urban monitoring operations and urban density gradients. Examples include urban boundary data from the Department of Environment in the UK and TIGER files holding digitised residential streets in the US (Batty and Xie, 1994). More recently, the proliferation of aspatial data pertaining to census variables (figure 23.1b), mailing information (Figure 23.1c), and demographic and socio-economic characteristics have added a qualitative dimension to the GIS urban measurement process (useful comparisons in Mesev et al., 1996). In the United Kingdom, and no doubt many other countries, this wealth of human-based information is starting to be linked to standard spatial data to produce a kaleidoscope of geo-urban applications, from point-based address matching, line-based transport networks, through to area-based geodemographic profiles linking postal units with census tracts (Longley and Clarke, 1995). As a consequence, urban areas can now be represented in a variety of measurements, tailored to match specific monitoring applications. However, many of these spatial and aspatial datasets are available somewhat infrequently, in the case of census information every ten years. Furthermore, most, if not all, of the spatial data are secondary, and as such prone to errors from digitising or scanning, and all aspatial information commonly contains many systematic and interpretative discrepancies. Also, there is a lack of overall co-ordination and standardisation in positional and attribute information, leading to low commonality and inconsistencies between datasets (Star et al., 1991). Some of the problems of temporality and consistency can be addressed by the use of remotely-sensed imagery (Figure 23.1a). In brief, digital imagery has facilitated wider, more repetitive areal coverages of urban areas, that allow rapid and cost-effective computerised updating (Michalak, 1993). Traditional land observation imagery has been mostly taken from the Landsat TM and SPOT multispectral and panchromatic sensors, with nominal spatial resolutions of 30 m, 20 m, and 10 m, respectively. These scanners have had qualified success for local and regional monitoring of urban growth and decline, road network detection, and generalised land use changes (Forster, 1985; Lo, 1986). More precise and detailed urban monitoring may become possible from the next generation of optical technology. Plans for launching programs such as Earth Watch, Eyeglass, and Space Imaging, claim data will be available at spatial resolutions down to 1 m for panchromatic and 4 m for multispectral bands (Barnsley and Hobson, 1996). However, even with these increases remotely-sensed images are still only snapshots of the Earth’s surface able, at best, to differentiate between urban land cover types during cloudless days. As a result, only limited information can be interpreted on the way urban land cover types are used, even less if buildings are spectrally and spatially similar (Forster, 1985). What is needed is a means of inferring land use from mixtures of land cover types. Land use information from GIS is a promising way forward in labelling, and discriminating between, spectrally similar land cover types. However, this can only be successfully achieved if GIS and RS data are processed simultaneously, preferably within an integrated methodology (Zhou, 1989).
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
289
23.2.3 GIS/RS Integrated Measurements This chapter will now argue how, by using an integrated GIS/RS model, new qualitative land use information can be actively incorporated into the standard classification process of remotely-sensed images. The technique developed in Mesev et al. (1998) is an ideal method to demonstrate the links that can be easily established between remotely-sensed data and census information held as a GIS surface model, and how these links can produce improvements in the accuracy of urban measurements. The technique is based on the premise that census attributes and census surfaces held by a raster-based GIS (Figure 23.1d) (Martin and Bracken, 1991) can be used to inform as well as constrain the standard maximum likelihood (ML) image classifier (ERDAS, 1995). Essentially, GIS surfaces of census data are used as contextual information to perform a series of hierarchical stratifications by determining more reliable class training signatures and class estimates. Area estimates are then further normalised and directly inserted into the ML classifier as prior probabilities, Pr(x|w, z), where the probability of pixel vector x belongs to class w and is weighted by census variable z (Strahler, 1980). Elsewhere, favourable results have also been generated from area estimates which have been used as part of an iterative process for updating ML a posteriori probabilities (Mesev et al., 1998). A number of empirical applications have already been completed on four settlements (Bristol, Norwich, Swindon, Peterborough) (Figure 23.1 inset) in the United Kingdom (Mesev et al., 1996). In each case data from the land observation satellites, Landsat and SPOT, represented the base and pivotal components for the technique. After examining other remotely-sensed sources, only these types of satellite imagery were, at the time, able to provide consistently, at regular intervals, accessible multispectral data that allow wide surface coverage and at a spatial resolution that could facilitate large-scale classification of urban land cover categories. Eight types of urban land cover and land use were classified. These were “urban”, “built-up”, “residential”, “non-residential”, and four types of essentially residential density, “detached dwellings”, “semidetached dwellings”, “terraced dwellings”, and “purpose-built apartment blocks”. The first four of these were classified from Landsat TM images taken either in 1982 or 1984, and represent the “1981” dataset. Later Landsat TM images (1988 or 1989), along with a single SPOT XS (1988) for Bristol (Figure 23.1a), were classified into all eight categories, and represent the “1991” dataset (samples are found in Figure 23.2). The reason for including dwelling types only for 1991 is simply due to the superior quality of the SPOT and 1988 and 1989 Landsat images, as well as the introduction of dwelling type indicators in the 1991 UK Census of Population. The discrepancies between the dates the images were taken and the two Censuses were unavoidable but only directly affected the classifications, not the subsequent urban density analyses. As a verification of the classification methodology, class areal estimates derived using equal and unequal prior probabilities were compared against those produced by the GIS census surfaces for all four settlements. Each showed marked reductions in total absolute error, ranging between 1.4 percentage points for Bristol and 4.9 percentage points for Swindon. A detailed site-specific accuracy assessment was also conducted on residential dwelling categories for the Bristol 1991 dataset using 250 ground truth points collected by manual and GPS equipment (Table 23.1). Small, yet systematic improvements are evident in the number of points correctly classified (shown as bold), or percentage correct (shown in parentheses), and overall accuracy and Kappa coefficients. These GIS/RS classifications are advancements in themselves, but also represent a new type of source data for analysing urban density changes, that can be conveniently summarised by statistical distance-decay functions. However, conventional density functions are inadequate and need to be modified to ensure the specific merits of GIS/RS measurements are upheld. The next section will examine such a modification.
290
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 23.2: GIS/RS Classified Images of Three Urban Land Uses for the Four Settlements in 1991 Table 23.1: Accuracy Assessment of the 1 99 1 datasets for Bristol Reference data Classified data
Detached
Detached
Semi-detached
Terraced
Apartment blocks
Equal
Unequal
Equal
Unequal
Equal
Unequal
Equal
Unequal
22(76)
25(86)
3(3)
2(2)
2(2)
1(1)
0(0)
0(0)
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
291
Reference data Classified data
Detached Equal
Unequal
Semi-detached
Terraced
Equal
Equal
Unequal
Apartment blocks Unequal
Equal
Unequal
Semi-detached 2(7) 1(3) 72(82) 81(86) 13(13) 7(7) 3(18) 2(12) Terraced 3(10) 2(7) 12(14) 5(6) 81(82) 90(91) 5(29) 5(29) Apartments 2(7) 1(3) 2(2) 0(0) 3(3) 1(1) 9(53) 10(59) Overall Accuracy: Equal (73.6%) Unequal (82.4%) Kappa Coefficient: Equal (0.607) Unequal (0. 737)
23.3 URBAN DENSITY ANALYSIS 23.3.1 Traditional Urban Density Functions Urban density functions are defined as mathematical formulations which describe the relationship between distance from a city centre (or indeed any other growth focus) and density measures of some facet of the urban environment, often population, buildings or economic activity. For the purposes of demonstrating the importance of linking measurement with analysis, a simple urban density function will illustrate how classified satellite images can be most effectively spatially analysed using rudimentary fractal geometry. It must be stressed that the results from such an analysis are constrained to quantitative summaries of urban form and density, and urban processes may only be inferred from such measurements. As a starting point, the density function most widely adopted in quantitative urban analysis is the negative-exponential (Clark, 1951). It assumes that population density p(R) at distance R from the centre of the city (where R = 0) declines monotonically according to the following negative-exponential, (23.1) where K is a constant of proportionality which is equal to the central density p(0) and β is a rate at which the effect of distance attenuates. If α is large, density falls off rapidly; if it is small, density falls off slowly. In contrast, in inverse-power functions, K is again the constant of proportionality as in (23.1), but not defined where R = 0, and the parameter on distance a, is now a power, (23.2) Both (23.1) and (23.2) are poor predictors of central densities, but the inherent flexibility of the inverse-power produces a less steep fall-off at the periphery, reflecting more realistically the growth of urban areas, primarily through suburbanisation (Batty and Kim, 1992). Furthermore, this flexibility in the design of the inverse-power function allows modifications to be made to the distance parameter a. Unlike α , α is scale independent and is an ideal mechanism for representing the fractal way urban areas grow and fill available space. The assumption that urban systems exhibit fractal patterns of development is contentious but work by Batty and Longley (1994) gives strong support to the important contribution of fractal models as robust and convenient estimators of size, form and density. Moreover the GIS/RS technique outlined in this chapter typically produces classified scenes of urban land use (Figure 23.2) which exhibit spatial complexities and irregularities similar to, and as such can most efficiently be measured by, fractal-based models.
292
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
23.3.2 Fractal-based Urban Density Functions The development of urban fractal models can be found in general texts such Batty and Longley (1994). In this chapter, density and fractal dimension estimation is based purely on the idea of measuring the occupancy, or space-filling properties, of urban development. According to fractal theory, dimension D, will fall between the established range of 1 and 2, where each land use (or occupancy) fills more than a line across space (D = 1) but less than the complete plane (D = 2). The COUNT dimension refers to the estimation process which takes into account the cumulative totals for each concentric zone, R (where each zone is 1 pixel wide) from the urban centre and is calculated by, (23.3) and where N(R') is the total number of occupied cells at mean distance R' from the central point of the settlement. On the other hand, the DENSITY dimension refers to the incremental proportion of zone occupation and is expressed in terms of, (23.4) where p(R') is the proportion of occupied cells, again at mean distance R'. The two types of fractal estimates, COUNT and DENSITY, however, do not account for the variance in each land use pattern. As such, it is impossible to speculate upon the shapes of these patterns with respect to density gradients or profiles. To circumvent the problem, regression lines will be fitted to the profiles generated from each surface in terms of counting land use cells, i in each concentric zone from the urban centre, given as Nt, along with their normalisation to densities expressed as pt, producing, (23.5) (23.6) where, α and α are constants of proportionality, but not defined where radius R=0, and where D and α are the parameters on distance capable of accommodating scale independence observed in urban systems through the notions of fractal geometry (Batty and Kim, 1992). Fractal dimensions are generated by the intercept parameters, α and α which are, in turn, affected by the slope parameters, a and 2—D, in (23.6) and (23.5) respectively. Note that only density profiles (23.6) will be examined further. Count profiles typically produce highly linear functions and do not fully illustrate the constraints on urban development (refer to Mesev et al., 1995 for discussion). It has duly been noted that slope parameters may become volatile when confronted with abnormal data sets, leading to fractal dimensions that could lie beyond the logical limits associated with generalised space-filling, i.e. 1
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
293
23.3.3 Fractal-based Urban Density Analysis Figure 23.3 represents the complete process, from GIS/RS measurements to the fractal analysis of density functions. The procedure begins with the transformation of classified satellite coverages into circular concentric zones centred on the historical core of the settlement. Classified pixels are then assigned either 1 if they represent urban development (occupation) or 0 if they represent non-development (unoccupied) (occupied pixels are shown in black in Figure 23.2). In Figure 23.4, a simplified occupied/unoccupied pixel distinction is used to illustrate how concentric zones, radius distances, areas, and densities are calculated. These basic parameters are then used in all fractal models and linearised density profiles. It is important to note at this stage the degree to which these binary images exhibit the familiar dendritic pattern associated with fractal images. The notion of density for each pixel is here defined simply as the maximum likelihood land use allocation per pixel size (400m2 for SPOT, and 900 m2 for Landsat). The main fractal program is a FORTRAN algorithm developed by Batty and Xie, with minor modifications to the coding made by the author to allow successful operations on a UNIX-based platform. Graphic output of urban count and density profiles is produced by a combination of subroutines from statistics and plotting software. Finally, a series of regression programs written in C are applied to the natural logs of both the cumulative count (not shown) and density profiles to determine the degree of linearity (Figures 23.5 and 23.6). A brief visual examination of Figure 23.5 quickly reveals how classified satellite imagery has been able to draw out the detailed characteristics of the four urban morphologies. The combination of higher accuracy and technical expediency over more conventional data sources has allowed satellite images to reveal distinct irregularities in the density of urban development. Most significantly, these irregularities expose the existence of clear physical constraints on development in all cases, but are most apparent in Bristol. The presence of the river Avon gorge (as well as other central open spaces) in Bristol causes density profiles for all land use categories in both time periods to behave in a highly erratic manner (see also Figure 23.6). Bristol land use densities first fall in the way anticipated as distance increases away from the core. For the dwelling type categories, this decline is steepest at a distance of about 0.5/1.5 kilometres from the core, where the city is then cut by the river gorge. Once these and other constraints (e.g. public open space, derelict land, etc.) are passed, land uses increase in density quite rapidly, soon peaking again in the manner expected. Figure 23.6 is a sample of temporal density comparisons. Generally speaking, there are strong similarities between graphs from the two time points for the same land use. This might have been expected from actual satellite data which are only, in the Bristol and Swindon cases, four years apart. However, this similarity also provides some check on the confidence in the classification strategy that generated these land uses. This claim, of course, allows for inevitable discrepancies between different satellite images of the same area. These discrepancies are very much dependent on factors such as local atmospheric conditions, vegetation cover at various times of the year, and possible specific technical errors in image capture and preprocessing. Nevertheless, this group of profiles does provide an invaluable indication of some of the urban processes that may have been at work during the 1980s. Despite high levels of suburbanisation experienced by many large urban areas in recent times, each of the four sample settlements demonstrate small but marked increases in the urban and residential land uses. Of these, only Norwich provides any clear evidence of increased development towards the periphery. The urban profiles indicate that all but the truncated Bristol data sets have an expected pattern of reduced development near the centre and expansion at the periphery. With respect to the non-residential land use,
294
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 23.3: From Measurement to Analysis: flow of operations
there are strong indications of decentralisation, characterised by marked decreases in development close to the centre and growth on the periphery, in all but the Norwich data sets. It should have become clear that most density profiles are far from linear. A simple yet good measure of the degree of irregularity in density profiles is the coefficient of determination r2, which is routinely generated from linearised fractal models. The r2 is a direct reflection of how measured density behaves away from the smooth and regular idealised gradients. This departure is primarily due to constraints on urban development, both from physical and bureaucratic factors. The consequences of these barriers are most evident in density estimates where the r2 values are exceptionally poor, for example, 0.030 (the effects of the Avon gorge in Bristol), and 0.419 (restrictive factors that promote specific patterns of transport development in Peterborough). Reducing the range of density profiles is one way to provide more linear
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
295
Figure 23.4: Density Analysis of Classified Image Pixels
profiles. Mesev et al. (1995) document how profiles are partitioned into linear segments at breaks in slope, from where each of these segments is then fitted individually. 23.4 CONCLUSION This chapter has demonstrated how urban measurements can be reliably and routinely extracted from a combination of satellite and socio-economic data, and how these spatially extensive measurements are then analysed using new ideas that link fractal theory with urban density functions. Particular emphasis was given on the need to develop integrated models using GIS and RS, and on the need to link measurement with analysis. With the proliferation of diverse urban spatial data, there are now huge possibilities for the development of models that embrace the frequency and improving resolution of RS with the precision and flexibility of human-based data handled by GIS. Once links have been established and measurements generated, there is then the search for analyses that respond to, and complement, these measurements. One such example is the modification of standard density functions by fractal concepts that replicate the scale independence of RS/GIS measurements. So far this work has been limited to four towns in the United Kingdom. Nevertheless, even within this small group, careful measurement and responsive analysis has revealed insightful patterns on changes in urban growth, in particular detailed repercussions on urban density gradients arising from population and employment shifts and the effects of physical and planning restraints. There are many policy implications
296
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 23.5: Density Profiles: Spatial Comparisons of Four Urban Land Uses for the Four Settlements in 1991
stemming from this work, particularly those involving accessibility and energy use as well as questions of residential and economic spatial shifts. Rapid methods of urban classification and measurement and analysis will go a long way towards understanding how cities fill their available geographical space, and this in turn will contribute to more efficient urban monitoring and planning of infrastructure and services. ACKNOWLEDGEMENTS Special thanks are due to Michael Batty (University College London) for allowing his fractal models to be modified, and to Yichun Xie (University of East Michigan) for assistance with algorithm development and programming. Work for this chapter was funded by Research Fellowship number: H53627501295 from the UK Economic and Social Research Council (ESRC).
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
Figure 23.6: Density Profiles: Temporal Comparisons of Three Urban Land Uses for the Four Settlements between 1981 and 1991
REFERENCES BARNSLEY M.J and HOBSON P. 1996. Making sense of sensors, GIS Europe, 5(5) pp. 34–36.
297
298
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
BARNSLEY, M.J., SADLER, G.J. and SHEPHERD, J.W. 1989. Integrating remotely sensed images and digital map data in the context of urban planning, in Proceedings of the 5th Annual Conference of the Remote Sensing Society University of Bristol, Nottingham: Remote Sensing Society, pp. 25–32. BATTY, M. and KIM, K.S. 1992. Form follows function: reformulating urban population density functions, Urban Studies, 29, pp. 1043–1070. BATTY, M. and LONGLEY, P.A. 1994. Fractal Cities: A Geometry of Form and Function. London: Taylor & Francis. BATTY, M. and XIE, Y. 1994. Urban analysis in a GIS environment: population density modelling using ARC/INFO in Fotheringham A.S. and Rogerson P. (Eds.) Spatial Analysis and GIS. London: Taylor & Francis, pp. 189–220. CLARK, C. 1951. Urban population densities, Journal of the Royal Statistical Society (Series A), 114, pp. 490–496. DE COLA, L. 1989. Fractal analysis of a classified Landsat scene, Photogrammetric Engineering and Remote Sensing, 55, pp. 601–610. ERDAS 1995. ERDAS Imagine (8.2): User’s Manual. Atlanta: Earth Resources Data Analysis Systems Inc. FORSTER, B.C. 1985. An examination of some problems and solutions in monitoring urban areas from satellite platforms, International Journal of Remote Sensing, 6, pp. 139–151. JENSEN, J.R, COWEN, D.J., HALLS, I, NARUMALANI, S., SCHMIDT, N.J., DAVIS, B.A. and BURGESS, B. 1994. Improved urban infrastructure mapping and forecasting for BellSouth using remote sensing and GIS technology, Photogrammetric Engineering and Remote Sensing, 60, pp. 339–346. KENT, M., JONES, A. and WEAVER, R 1993. Geographical information systems and remote sensing in land use planning: an introduction, Applied Geography, 13, pp. 5–8. LANGFORD, M., MAGUIRE, D.J. and UNWIN, D.J. 1991. The areal interpolation problem: estimating population using remote sensing in a GIS framework , in Masser I. and Blakemore R. (Eds.), Handling Geographical Information: Methodology and Potential Applications. Harlow: Longman, pp. 55–77. LATHAM, R.P. and YEATES, M. 1970. Population density growth in metropolitan Toronto, Geographical Analysis, 2, pp. 177–185. LO, C.P. 1986. Applied Remote Sensing. arlow: Longman. LONGLEY, P.A. and CLARKE, G. (Eds.) 1995. GIS for Business and Service Planning. Cambridge: GeoInformation International. MARTIN, D.J. and BRACKEN. I. 1991. Techniques for modelling population-related raster databases, Environment and Planning A, 23, pp. 1069–1075. MESEV, T.V., BATTY, M, LONGLEY, P.A. and XIE, Y. 1995. Morphology from imagery: detecting and measuring the density of urban land use, Environment and Planning A, 27, pp. 759–780. MESEV, T.V., LONGLEY, P.A. and BATTY, M. 1996. RS/GIS and the morphology of urban settlements, in Longley P. and Batty M. (Eds.), Spatial Analysis: Modelling in a GIS Environment. Cambridge: GeoInformation International, pp. 127–152. MESEV, T.V, GORTE, B. and LONGLEY, P.A. 1998. Modified maximum likelihood classifications and their application to urban remote sensing, in Donnay J.P. and Barnsley M. (Eds.), Remote Sensing and Urban Analysis. London: Taylor & Francis, (in press). MICHALAK, W.Z. 1993. GIS in land use change analysis: integration of remotely-sensed data into GIS, Applied Geography, 13, pp. 28–44. MILLS, E.S. 1992. The measurement and determinants of suburbanisation, Journal of Urban Economics, 32, pp. 377–387. MILLS, E.S. and TAN, J.P. 1980. A comparison of urban population density functions in developed and developing countries, Urban Studies, 17, pp. 313–321. STAR, J.L., ESTES, J.E. and DAVIS, F. 1991. Improved integration of remote sensing and geographic information systems: a background to NCGIA initiative 12, Photogrammetric Engineering and Remote Sensing, 57, pp. 643–645. STRAHLER, A.H. 1980. The use of prior probabilities in maximum likelihood classification of remotely-sensed data , Remote Sensing of Environment, 10, pp. 135–163.
A GIS/RS APPROACH TO MONITORING CHANGES IN URBAN DENSITY
299
ZHOU, Q. 1989. A method for integrating remote sensing and geographic information systems, Photogrammetric Engineering and Remote Sensing, 55, pp. 591–596. ZIELINSKI, K. 1980. The modelling of urban population density: a survey, Environment and Planning A, 12, pp. 135–154.
Chapter Twenty Four Landscape Zones Based on Satellite Data for the Analysis of Agrarian Systems in Fouta Djallon (Guinea) Using GIS Eléonore Wolff
24.1 INTRODUCTION The aim of this chapter is to delineate landscape zones based on raw and high resolution satellite data using standard image processing capabilities (no programming). These landscape zones will be used in further research in order to generalise the local results of a household survey to the regional level (scale transfer) and improve the understanding of agrarian systems in Fouta Djallon using GIS. The method proposed involved a preliminary stage of identification of the optimal resolution in order to extract spatial information from an image using local filtering techniques. Then, the spectral information is synthesised to avoid redundancies. The spatial information is extracted using a standard high pass filter applied after having degraded the data to the optimal resolution. After a generalisation stage, a “pixel per pixel” based clustering is undertaken with, if necessary, location variables. The result consists in the delineation of homogeneous zones on a raw and high resolution satellite image that can be interpreted on the ground as landscape zones. 24.2 BACKGROUND It is recognised that the failure of rural development projects may often be assigned to a misunderstanding of agrarian facts (Dufumier, 1985). A better understanding may be achieved through a systemic and hierarchical approach of agriculture at the farm level, but also at a regional level. At the farm level, the association of production (such as crop and livestock) and production factors of the household (such as labour and investment) will be analysed; these associations are described by the concept of farming systems (Ruthenberg, 1971). At the regional level, the concept of agrarian system is used; a wider definition is given by Jouve (1988) for which the agrarian system characterises the spatial association of crops and agrarian techniques used by a community to satisfy its needs. Because of their complexity, fanning systems are mainly studied through extensive household and ground surveys over limited areas. Regional representativity of these detailed study was very difficult to establish (Orstom, 1972); therefore, spatial generalisation of their results was complex. Although it is recognised that it is essential to appraise the complexity of agrarian systems at several scales corresponding to the different levels of the system’s organisation, the integration of results obtained at these scales, or the transfer of scale, is generally not achieved. This problem of scale transfer has also been identified in other scientific fields (Costanza et al., 1993, Dale et al., 1989, O’Neil et al., 1989, Raffy, 1992, Turner et al., 1989).
LANDSCAPE ZONES BASED ON SATELLITE DATA
301
A bottom-up approach was recommended by Pelissier and Sautter (1970); it involved local and detailed analysis of the village’s territory, extended to a wider area through systematic and light survey of some key factors underlined by the first analysis. This approach is tedious to undertake. A top-down approach is widely spread. Aerial photographs, remote sensing data and maps are used in order to divide the area under study into homogeneous agrarian landscape zones (Deffontaines, 1973, 1985; Lambin, 1988, 1989; Totté et al., 1992). Then, each zone is investigated and analysed in detail. 24.3 OBJECTIVE Several studies showed that remote sensing data could be used in order to delineate landscape zones at a regional level, characterised by their spectral and spatial data (Bariou et al., 1984, 1981; Mitchell, 1991; Totté et al., 1995, Wilmet, 1985). Landscapes are generally delineated visually, although it is recognised that visual interpretation varies according to the experience of the photo-interpreter and the degree to which the landscape is generalised during the interpretation (Campbell, 1987). Several authors used numerical methods in order to delineate landscape zones from remotely sensed data (Bruneau and Kilian, 1984; Hay et al., 1982; Lambin, 1988, Pain-Orcet et al., 1989, Stralher, 1981). Two cases may be distinguished according to the possibility to achieve a relevant multi-spectral land cover classification on the whole studied region. Indeed, this is not always the case, especially when eco-climatic and morpho-pedological conditions are too heterogeneous, inducing several spectral responses for a single land cover type. While a multi-spectral land cover classification is possible, spatial distribution of one or several land cover classes may be used as criteria to divide the area into homogeneous zones (Bruneau and Kilian, 1984; Lambin, 1988). Few studies are using raw remotely sensed data to delineate landscape zones (Stralher, 1981); none of them are handling spectral and spatial characteristics of remotely sensed data. Indeed, many techniques of image’s segmentation exists in the literature (Argialas and Harlow, 1990; Cross et al., 1988; Gerig et al., 1984; Rosenfeld and Davies, 1979; Wang, 1986; Woodcock and Harward, 1992, Wu et al., 1988), but their operational use on a complete high resolution scene (i.e. a LANDSAT TM scene) is not feasible; they are not implemented into commercialised image processing systems and their use is restricted to very small images. The objective of this part of our research is to delineate numerically homogeneous landscape zones at a regional level on raw and high resolution remote sensing data using existing functions of standard image processing systems. Such landscape zones may be used for further studies of agrarian systems in order to generalise local survey of households at a regional level. 24.4 STUDIED AREA Fouta Djallon is one of the four regions of the Republic of Guinea. This region is settled by the sedentary populations of Peul. Ninety percent of the population is rural. Densely populated, the region is marked by severe rural out-migration. Principal economic activities are food crops and extensive cattle raising. Highlands reaching an approximate altitude of 1500 meters are surrounded by their foot-hills terraced from 700 meters to 100 meters. Belonging to the Sudano-Guinean zone, the climate is influenced by altitude and proximity to the sea. Slopes, drainage conditions and the bedrock nature set the soil type.
302
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 24.1: Study Area, main urban centres and roads
Diversity of physiographic and human factors induces a variety of landscapes. Therefore, delineating homogeneous landscape zones appears to be an essential step in the study of rural systems. 24.5 DATA High resolution remotely sensed data is used. LANDSAT TM data is chosen because it covers a wide area with a single scene (185×185 km). Scene 202–52 (WGS2) covers most of Fouta Djallon region. Because of the climate, cloud free images are scarce in this region. A cloud free image of 22 November 1987 is used; this period of time corresponds to the beginning of the dry season. Ground analysis along main roads traversing the region (Figure 24.1) was carried in order to interpret the landscape zones. 24.6 METHOD Whatever numerical methods are used for delineating landscape zones, all involve several steps: extraction and simplification of the information from the satellite image, spatial generalisation and clustering into contiguous and homogeneous zones. These zones can be interpreted in terms of landscape during a ground survey. Information to be extracted from remotely sensed data is of spectral and spatial nature.
LANDSCAPE ZONES BASED ON SATELLITE DATA
303
Figure 24.2: Finding the mean size of the objects structuring an image according to Woodcock and Stralher technique (1987).
24.6.1 Finding the optimal spatial resolution to extract spatial information Taking into consideration the variation of spatial information with scale (Gurney and Townshend, 1983; Woodcock and Strahler, 1987), a preliminary step is to identify the spatial resolution relevant for extracting spatial information from remotely sensed data with local filtering techniques. Woodcock and Strahler (1987) set up a method to identify the size of the objects structuring an image, and therefore chose the optimal resolution to measure spatial information with local filtering techniques; it lies on the calculation of the local variance at several resolutions. A graph of local variance versus resolution shows a dissymmetrical curve (Figure 24.2). The height of the maximum value is linked to the internal variance of the image. The resolution of the maximum value occurs at a half or a third of the size of the objects structuring the image. The general form of the curve results from the diversity of objects structuring the image. Each of the six spectral bands having a 30 meters resolution is degraded at several spatial resolutions until 1500 meters by averaging the spectral values. Then, local variance is processed into a 3×3 pixel moving window for each spectral band at several spatial resolutions. From these graphs, the optimal resolution to extract spatial information with local filtering techniques from the image is identified. 24.6.2 Synthesis of spectral information The second step is to synthesise the spectral information. It is very common that remote sensing data presents high statistical correlations between spectral bands; principal component analysis (PCA) may be use to condense information on fewer channels, but also to remove noise induced by uncalibrated detectors (Townshend, 1984). Spectral information is concentrated into the first components; noise is left into the last
304
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
ones. Before eliminating these last components, they are visualised on-screen checking their contents (Richards, 1986). 24.6.3 Extraction of spatial information Among the techniques existing to extract spatial information, texture is the most commonly used. Many textural parameters have been developed (Haralick, 1979; Gurney and Townshend, 1983); most of them are calculated into a moving window. Comparative studies have shown that second order statistical parameters (i.e. variance) are more efficient than first order ones (i.e. contrast) and that they give better results when used jointly with spectral information (Barber et al., 1993; Ulaby et al., 1986). Four textural parameters commonly used are calculated: three parameters from Haralick (1979), i.e., homogeneity, contrast and entropy, and the local variance. The formula are: (24.1) (24.2) (24.3) (24.4) where: P (i, j)=probability of concurrence of two grey levels for pixel i and j, xi and xj=values of pixel i and j, xm mean of the pixel values in the analysis window. Textural parameters are calculated into a 3×3 pixel moving window. Correlations between textural images and visualisation of their results lead to the choice of a single parameter for spatial information extraction. 24.6.4 SPATIAL GENERALISATION Spatial generalisation is an essential step in delineating landscape zones. In standard image processing systems, spatial generalisation may be achieved through low-pass filters (mean on raw data and modal on classified data) in a large moving window. The size of the moving window is a critical parameter to fix. It depends on the landscape under analysis. An approximate rule is to choose a window of similar size to the smallest zone to be identified. 24.6.5 CLUSTERING IN CONTIGUOUS ZONES To cluster the pixels into contiguous zones, one can either consider the location during the clustering process, or add location variables to the others variables (spectral and spatial images). The first solution requires subsequent programming. The second is the easiest to undertake using existing image processing software capabilities. Two location images are calculated as followed: (24.5) (24.6)
LANDSCAPE ZONES BASED ON SATELLITE DATA
305
where: X=X co-ordinate of a pixel, Y=Y co-ordinate of a pixel, Nc=number of columns, Nr=number of rows, Clustering is achieved in two steps. The first consists of the definition of the cluster’s statistics. Then, pixels are grouped according to the minimal Euclidean distance to the clusters in a multi-dimensional space (Richards, 1986). Several options are fixed to control the first stage of the clustering process: • • • • • • • • • •
the initial number of clusters (16), the maximum number of clusters to be generated (32), the number of iterations (8), a stability threshold defined as the mean change of the mean vector measured in the multi-dimensional space in numerical value (0.005), the maximal proportion of the image assigned to a class (0.2), the minimal proportion of the image assigned to a class (0.05), a few post-classification processes improve the visual quality of the results: modal filter in a small circular window, merging of area under 100 sq. km, labelling of area over 100 sq. km.
Figure 24.3 summarises the steps of this method developed for delineating numerically landscapes homogeneous and contiguous zones with standard image processing techniques. 24.7 RESULTS 24.7.1 Finding the optimal spatial resolution to extract spatial information For each spectral band of the LANDSAT TM image having a 30 meters resolution, a graph plots the mean of local variance versus the spatial resolution. These graphs may be grouped in two classes according to their shape. For TM1 and TM2, local variance falls rapidly between 30 and 120 meters; then it remains almost stationary (see graph for TM2, Figure 24.4). For TM3 to TM7, a similar shape is observed; the local variance rises regularly until its maximum, then it decreases slowly. Local variance stays at a maximal value for a range of resolutions indicating that objects of several sizes are structuring the data. The resolutions of maximal values of the local variance are of 180 to 300 m for bands TM3, TM5 and TM7 and of 270 to 400 m for TM4 (see graphs for TM4 and TM5, Figures 24.5 and 24.6). Graphs for bands TM3, TM5 and TM7 are very similar. Indeed, their correlation coefficients are higher than 0,7 (Table 24.1). Their spatial structure is marked by bare soils areas highly reflectant and by shades of cliffs and escarpments. TM4 is at the base different; this band is poorly correlated to the others (Table 24.1). Its spatial structure is more fragmented, highly dependent upon aggregates of burned mineral areas with low reflectance.
306
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 24.3: Numerical method to delineate homogeneous and contiguous landscape zones
According to this analysis, the images are degraded at a resolution of 240 meters, reducing the image size from 5440×5680 pixels to 680×710 pixels. This spatial degradation reduces spatial auto-correlation of the image keeping the main spatial structures of the image, Table 24.1: Correlation coefficients between 6 spectral bands of LANDSAT TM scene 202–52 of 22–1 1–1987. TM1 TM2 TM3 TM4 TM5 TM7
TM1 1.00 0.88 0.50 −0.05 0.02 0.05
TM2
TM3
TM4
TM5
TM7
1.00 0.75 0.07 0.34 0.29
1.00 −0.24 0.72 0.78
1.00 0.09 −0.37
1.00 0.85
1.00
LANDSCAPE ZONES BASED ON SATELLITE DATA
307
Figure 24.4: Local variance versus resolution for LANDSAT TM2 (202–52 of 22–11–1987).
Figure 24.5: Local variance versus resolution for LANDSAT TM4 (202–52 of 22–11–1987).
24.7.2 Synthesis of spectral information Table 24.1 shows high correlations between bands TM1, TM2 and TM3, and between bands TM3, TM5 and TM7. They illustrate redundancies present in the image. A principal component analysis allows the elimination of these redundancies from further analysis. The results of the transformation is given in Table 24.2. Table 24.2: Results of the principal component analysis applied on degraded LANDSAT TM scene 202–52 of 22–1 1– 1987. ACP
Eigen
Eigen
vectors
values
TM1
TM2
TM3
TM4
TM5
TM7
KL1 KL2 KL3 KL4
0.594 0.282 0.107 0.009
−0.031 0.053 −0.860 0.139
−0.067 −0.011 −0.364 −0.075
−0.237 0.086 −0.324 −0.210
0.048 −0.943 −0.080 −0.312
−0.837 −0.214 0.108 0.490
−0.486 0.233 0.070 −0.771
308
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 24.6: Local variance versus resolution for LANDS AT TM5 (202–52 of 22–11–1987). ACP
Eigen
Eigen
vectors
values
TM1
TM2
TM3
TM4
TM5
TM7
KL5 KL6
0.007 0.001
−0.421 0.245
0.196 −0.905
0.819 0.342
−0.025 0.061
−0.040 −0.028
−0.333 −0.003
Eigen values show that the first three components contribute to 98.3 percent of the total variance. Eigen vectors reveal that the first component comprises information coming from middle infra-red bands (TM5 and TM7), the second from the near infra-red band (TM4) and the third from visible bands (TM1, TM2 and TM3). After on-screen visualisation, only the first three components are kept. No significant feature is structuring the fourth, fifth and six components. 24.7.3 Extraction of spatial information Four textural parameters are calculated in a 3×3 pixel moving window on the first three components: variance (VAR), entropy (ENT), homogeneity (HOM), and contrast (CON). Correlations between these images are processed and images are visualised in order to choose the most suitable textural parameters to delineate landscape zones; results for the first component are shown in Table 24.3. Table 24.3: Correlation coefficients between 4 textural parameters processed on the first component.
VAR ENT HOM CON
VAR
ENT
HOM
CON
1 −0.311 −0.294 0.818
1 0.983 −0.263
1 −2.49
1
According to the correlations, textural parameters may be split into two groups: variance and contrast; and entropy and homogeneity. Considering their formulations, one can see that variance and contrast are
LANDSCAPE ZONES BASED ON SATELLITE DATA
309
sums of differences between pixel’s values, whereas entropy and homogeneity are sums of co-occurence probabilities. Textural images are also visualised. The spatial structure of the image is better seen with the first group of textural parameters; zones are distinguished easily. Simpler in its formulation, the variance is chosen as a textural parameter to extract spatial information on degraded data for the first three components. 24.7.4 Spatial generalisation Because available clustering algorithms are ‘pixel by pixel’ based, generalisation has to be achieved apart from the clustering process in order to delineate large homogeneous zones. Generalisation is completed with a mean filter applied in a large moving window on the three components but also on their textural images. The size of the window has to be adapted to the landscape characteristics and the level of abstraction required. A ground survey showed that the smallest relevant landscape unit is around 15 km large. Therefore, a window of 15.1 km large is used to generalise spectral and spatial information. 24.7.5 Clustering process The clustering process described in Section 24.6.5 is applied with the same options on several sets of images in order to asses the contribution of spatial information and location variables to the delineation of landscape zones. The sets of image are: • • • •
spectral images, spectral and spatial images, spectral and spatial images for each component, spectral and spatial images for the second component with two location variables.
Visual assessment of the results is quick and easy to achieve. Figure 24.7 shows that adding spatial information improves the image segmentation into homogeneous zones. New zones are generated only according to their spatial characteristics. Taking into consideration the spatial information avoids the generation of transitional zones induced by the generalisation process. Zoning completed on each component plus its spatial information reveals that the method leads to a good segmentation of each image into a small number of zones except for the second component. Indeed, this component is less spatially structured than the two others, leading to the generation of many homogeneous zones. Therefore, the second component is used to test the contribution of location variables to the clustering (see Figure 24.8). The use of location variables reduces effectively the number of zones from 17 to 13. During the clustering, the same weight is attributed to the different images. The weight of location variables relies upon the number of bands introduced in the clustering process. For our further analysis, the zoning achieved on the three spectral components plus their spatial images is kept as the most synthetic and the most relevant for the analysis of agrarian systems; it is used in order to generalise results from a household survey to a regional level (Wolff, 1994).
310
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 24.7: Numerical zoning on the three first components and their spatial channels (simple line) and numerical zoning only on the three first components (dotted line), with the original LANDS AT TM image.
24.8 CONCLUSIONS AND PERSPECTIVES A method for the numerical zoning of high resolution remotely sensed data is found using techniques of existing image processing software. The steps of this method are: • identification of the size of the objects structuring the data using a graph of local variance versus resolution, • synthesis of spectral information within three bands using a principal component analysis, • extraction of spatial information by using a high pass filter processing the local variance, • generalisation of the spectral and spatial information with a mean filter applied in a large window (size of the smallest unit to identify), • clustering process with or without location variables.
LANDSCAPE ZONES BASED ON SATELLITE DATA
311
Figure 24.8: Numerical Zoning on the Second Components and their Spatial Channels with (continuous line) or without (dotted line) location variables
It has been shown that spatial information is of primary importance in order to delineate large homogeneous zones on high remotely sensed data; it reduces significantly the generation of transitional zones. It is also revealed that spatial generalisation in a large window encourages the generation of contiguous zones when applied on well structured spatial images. When the image is poorly structured, such as the second component, the introduction of location variables reduces the number of zones generated. These location variables may be considered as objective criteria for the spatial consolidation of the zones, but determining their weight in the clustering process is critical. The advantages of this method stand in the use of standard image processing techniques, i.e. available in all image processing softwares, but also in its application on raw data (no need for classification). The limitations lie in the very high level of spatial generalisation of the zone’s contours, the critical choices of the generalisation window size and of the bands introduced in the clustering determining their weight. It would be interesting to compare such results with other techniques of image segmentation (e.g. quadtree and region merge-and-split). Another perspective would be to test the stability of this landscape zoning against seasons and sensors (lower resolution). ACKNOWLEDGEMENTS The study was performed at the Laboratoire de Télédétection et d’Analyse Régionale of the Université Catholique de Louvain under the direction of Professor Wilmet. It was funded by the Belgian Science Policy Office as part of its national research programme in remote sensing under the contract TELSATII/ 08. The research was also partly supported by the Belgian National Fund for Scientific Research through a doctoral grant. REFERENCES ARGIALAS, D.P. and HARLOW, C.A. 1990. Computational image interpretation models: an overview and a perspective, Photogrammetric Engineering and Remote Sensing, 56(6), pp. 871–886.
312
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
BARBER, D.G., SHOKR, M E., FERNANDES, RA, SOULIS, E.D., FLETT, D.G. and LEDREW, E.F. 1993. A comparison of second-order classifiers for SAR sea ice discrimination, Photogrammetric Engineering and Remote Sensing, 59(10), pp. 1397– 1408. BARIOU, R. and LECAMUS, D. 1981. Télédétection et agriculture, in Bulletin de la Société Française de Photogramrnetrie et de Télédétection, 83, pp. 27–40. BARIOU, R, HUBERT, L. and LECAMUS, D. 1984. Landsat en pays de bocage (Bretagne). Une nouvelle approche, Espace Géographique, XIII(3), pp. 233–240. BRUNEAU, M. and KILIAN, J. 1984. Inventaires agroécologiques, paysages et télédétection en milieu tropical, Espace Géographique, XIII(4), pp. 215–224. CAMPBELL, J.B. 1987. Introduction to Remote Sensing. London: Guilford Press. COSTANZA, R., WAINGER, L., FOLKE, C. and MÄLER, K.G. 1993. Modeling complex ecological economic systems, toward an evolutionary, dynamic understanding of people and nature, Bioscience, 43(8), pp. 547–555. CROSS, A.M., MASON, D.C. and DURY, S.J. 1988. Segmentation of remotely sensed images by a split-and-merge process, InternationalJournal of Remote Sensing, 9(8), pp. 1329–1345. DALE, V.H., GARDNER, RH. and TURNER, M.G. 1989. Predicting across scales, comments of the guest editors of landscape ecology, Landscape Ecology, 3(3/4), pp. 147–151. DEFFONTAINES, J.P. 1973. Analyse du paysage et etude régionale des systèmes de production agricole., Revue Française d Economie et de Sociologie Rurales, 98, pp. 3– 13. DEFFONTAINES, J.P. 1985. Etude de l’Activité agricole et analyse du paysage, L’espace Géographique, 1, pp. 37–47. DUFUMIER, M. 1985. Systèmes de production et développement agricole dans le Tiers-Monde, Les Cahiers de la Recherche Développement, 6, pp. 31–38. GURNEY, M.C. and TOWNSHEND, J.R. G. 1983. The use of contextual information in the classification of remotely sensed data, Photogrammetric Engineering and Remote Sensing, 49(1), pp. 55–64. HALL AIRE, A. 1970. Des montagnards en bordure de plaine: Hodogway (Cameroun du Nord), Études Rurales, pp. 212–231. HARALICK, R.M. 1979. Statistical and structural approaches to texture, Proceedings of the IEEE , 67(5), pp. 786–804. HAY, C.M., BECK, L.H. and SHEFFNER, E.J. 1982. Use of vegetation indicators for crop group stratification and efficient full frame analysis, in Proceedings of the Remote Sensingof Arid and Semi-arid Lands Conference, Cairo, Egypt, 19–25 January 1982. Ann Arbor: Environmental Research Institute of Michigan, pp. 737–745. JOUVE, P. 1988. Quelques réflexions sur la spécificité et l’identification des systèmes agraires, Les Cahiers de la Recherche-Développement, 20, pp. 5–16. LAMBIN, E. 1988. Apport de la télédétection satellitaire pour l'étude des systèmes agraires et la gestion des terroirs en Afrique occidentale. Exemples au Burkina Faso, thèse de doctorat. Louvain: Université Catholique de Louvain. MITCHELL, C.W. 1991. Terrain evaluation. An introductory handbook to the history, principles and methods of practical terrain assessment. Cambridge: Longman O’NEIL, R.V., JOHNSON, A.R. and KING, A.W. 1989. A hierarchical framework for the analysis of scale, Landscape Ecology, 3(3/4), pp. 193–205. ORSTOM 1972. Les Petits Espaces ruraux. Problèmes et Méthodes. Paris: ORSTOM PAIN-ORCET, M., JEANJEAN, H., LEMEN, H., NORMANDIN, J., CHEVROU, R. and BOUREAU, J.G. 1989. Télédétection spatiale et inventaires forestiers, Bulletin de la Société Française de Photogrammétrie et de Télédétection, 2, pp. 59–63. PÉLISSIER, P. and SAUTTER, G. 1970. Bilan et perspectives d’une recherche sur le terroir africains et malgaches, Etudes Rurales, pp. 7–48. RAFFY, M. 1992. Change of Scale in Models of Remote Sensing: A General Method for Spatialization of Models, Remote Sensing Environment, 40, pp. 101–112. RICHARDS, J.A. 1986. Remote sensing digital image analysis. Berlin: Springer-Verlag. ROSENFELD, A. and DAVIS, L.S. 1979. Image segmentation and image models, Proceedings of the IEEE, 67(5), pp. 764–772. RUTHENBERG, H. 1971. Farming Systems in the Tropics. Oxford: Clarendon Press.
LANDSCAPE ZONES BASED ON SATELLITE DATA
313
STRAHLER, A.M. 1981. Stratification of natural vegetation for forest and rangeland inventory using Landsat digital imagery and collateral data, International Journal of Remote Sensing, 2(1), pp. 15–41. TOTTÉ, M, HENQUIN, B., NONGUIERMA, A. and PENNEMAN, R. 1992. Stratification de l’espace rural par télédétection et caractérisation des systèmes ruraux dans la région de Bobo-Dioulasso (Burkina Faso), Cahiers Agricultures, 4, pp. 113–123. TOWNSHEND, J.R.G. 1984. Agricultural land-cover discrimination using thematic mapper spectral bands, International Journal of Remote Sensing, 5(4), pp. 681–698. TURNER, M.G., DALE, V.H. and GARDNER, R.H. 1989. Predicting across scales: theory development and testing, Landscape Ecology, 3(3/4), pp. 245–252. ULABY, F.T. and McNAUGHTON, J. 1975. Classification of physiography from ERTS imagery, Photogrammetric Engineering and Remote Sensing, pp. 1019–1027. WANG, R.-Y. 1986. An approach to tree-classifier design based on hierarchical clustering, International Journal of Remote Sensing, 7(1), pp. 75–88. WILMET, J. 1985. Analyse régionale et observations satellitaires à haute résolution spatiale: l’exemple de la Wallonie, Hommes et Terres du Nord, 3, pp. 184–194. WOLFF, E. 1994. Contribution à l’analyse de l’agriculture africaine à l’aide de la télédéction et d’un système d’information géographique, thèse de doctorat. Louvain: Université Catholique de Louvain. WOODCOCK, C. and HARWARD, V.J. 1992. Nested-hierarchical scene models and image segmentation , International Journal of Remote Sensing, 13(16), pp. 3167–3187. WOODCOCK, C.U. and STRAHLER, A.H. 1987. The factor of scale, Remote Sensing of Environment, 21, pp. 311–332. WU, J.K., CHENG, D.S., WANG, W.T. and CAI, D.L. 1988. Model based remotely sensed imagery interpretation, International Journal of Remote Sensing, 9(8), pp. 1347– 1356.
Chapter Twenty Five Spatial-Time Data Analysis: the Case of Desertification Julia Seixas
25.1 INTRODUCTION Monitoring the environment involves measuring its properties. A significant change discernible from a background of meaningless variation of those properties identifies some process. The goal of environmental data analysis is to discover patterns which can be interpreted properly for some environmental process or phenomenon. This approach can be extended to the analysis of remotely sensed data, the goal of which is mainly the accurate discovery of the environmental properties captured by the sensor, like biomass quantity, or land cover. The identification and assessment of the processes underlying the remote sensed properties of the environment, i.e., the inference of the processes responsible for the spatial and/or temporal patterns captured by remote sensors has not been fully accomplished. The main difficulty arises from the lack of knowledge of environmental processes at the spatial resolution of the sensor (>10 m). A technical constraint that limits remote sensed data analysis concerns the huge amounts of data associated with temporal series of images, which create problems of data management and make inadequate the state-of-the-art data analysis methodologies. This chapter addresses the problem of environmental analysis based on remote sensing data and proposes a methodology inspired on methods of exploratory spatial analysis, to deal with spectral data to identify spatial-temporal patterns for the identification of a desertification process. Allen and Hoekstra (1992) state that scaling is not a matter of nature, but is done by the observer as a need to monitor the Earth. This premise leads to the assertion that processes occurring on Earth are scale independent, i.e., may occur at several scales simultaneously, but with scale-dependent external manifestations. One of the major challenges of environmental scientists is to walk through scales, identifying and modelling the way the processes in nature jump from small to large scales. Although for some cases, the integration over space and/or time of the appropriate variables seems to be enough, there are rules and relations, as well as thresholds that promote or inhibit those jumps. Ideally, we may imagine a framework for walking-through scales of spatial processes from the atom levels to the planetary level, using those thresholds as the jumping scales, with the appropriate scale-sampled data. This concept is illustrated in Figure 25.1, applied to the desertification process, according to the hypothesis suggested by Schlesinger et al. (1990, 1996) from studies based on very small-scale spatial data analysis. Field studies in the Chihuahuan Desert of New Mexico have lead to the hypothesis that the distribution of soil nutrients in desert shrublands would show spatial autocorrelation up to the average size of the dominant individuals, either for biologically essential elements, (e.g., N), but also for nonlimiting (e.g., Na, Li, and Ca) (Schlesinger et al., 1996). In shrublands, soil Navail which is the total available N, considering the sum of
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
315
Figure 25.1: Increasing scales of the environmental evolution of the desertification process.
NH4-N and NO3N,. is more concentrated under shrubs and autocorrelated over distances extending to a maximum of 100–300 cm, which is likely to be due to biogeochemical cycles, acting at the scale of individual shrubs. As the shrubs develop “islands of fertility”, they are more likely to resist environmental perturbation and to persist in the community, due to the fact that the desertification process takes place as a patchy development through time and not like a wave front. The scale framework described above is used as the environmental hypothesis to guide remote sensing analysis, whose meaningfulness is based on its spatial distribution, in addition to its correlation with Earth properties. The approach we propose in this chapter, to infer some clues for the desertification process, is based on the analysis of the spatial and temporal patterns of remote sensing data, which is characterised by two terms, the spatial distribution of the spectral values and its spatial variability. Methods used in the field of spatial statistics, designed for kernel smoothing statistics, were experimented successfully and extended in order to incorporate temporal data. A brief review of the methodology underlying the assessment of the desertification process is presented in Section 25.2, while Section 25.3 reviews the state-of-the-art methods of data analysis applied to images. Section 25.4 describes the methodology proposed to assess meaningful spatial-temporal patterns of landscapes, and Section 25.5 presents the results for a case study for the south of Portugal. The conclusions are presented in Section 25.6. 25.2 THE ASSESSMENT OF THE DESERTIFICATION PROCESS Desertification has become a very important problem, due to its economic, social and environmental impacts. The study of desertification has gained increasing importance, mainly under the global change framework. Looking back at early desertification studies, it used to be a problem affecting specific regions (e.g. the Sahel) that potentially could spread out into adjacent regions, mainly during drought periods. The
316
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
desertification process was associated with the over-exploitation of the border ecosystems, mainly agriculture and pasture, which lead to an irreversible decrease of its biologic productivity. Recently, desertification has been assumed as an emergent problem, including in developed countries, for example in Southern Europe and some regions of the United States. In both cases, the phenomena is associated with the existence of human activities incompatible with the carrying and renewal capacity of the ecosystems and natural conditions that strength physical degradation, like precipitation decreasing patterns. Desertification has been analysed at two levels: the macro-scale of the process (Hare et al., 1977), which includes the tasks of identification and evolution of desert areas, according to some criteria (e.g. annual rainfall less than 200 mm), and the micro-scale of the process, (http:\medalus.leeds.ac.uk\) which involves the study of the physical and geochemical cycles that potentially originate the process. The key problem in studying the desertification process is the identification of the threshold at which the ecosystems degradation becomes irreversible within the temporal frame of human generations. The difficulty in evaluating the non-return point results from the capacity of the biological systems to adapt to new conditions, which may be adverse, sometimes with a positive feedback (Schlesinger et al., 1990). The problem of the threshold level is emphasised in regions with no explicit desert conditions but where there are some clues that could indicate such a degradation process. The hypothesis underlying this work is that if a desertification process exists, the Earth surface characteristics as captured by satellites should reveal meaningful patterns over a certain time period. This statement is based on the fact that physical degradation regulates the ecosystem biotic communities, which determines the landscape design, assessed by land cover units, as well as its intrinsic properties, measured by spectral data, like vegetation indices. The study of desertification based only on landscape design changes is very limiting, because land cover evolution depends on human decisions and policy land management. According to Schlesinger et al. (1996), a desertification process is underway if the soil patterns become heterogeneous, which means that the spatial distribution of some soil properties, namely, nutrients and water, comes from a random or uniform distribution to some autocorrelated distribution, containing the same land cover. Thus, units changes in the distribution of soil properties may be a useful index of desertification in arid and semiarid grasslands world-wide. 25.3 STATISTICAL ANALYSIS OF LANDSCAPES The structure of any spatially continuous variable must be assessed by two types of analysis, the spatial mean and the spatial variability patterns that provide complementary knowledge. While the former highlights the mean process, by localising meaningful spots of values, the latter accommodates the way the values vary over space (homogeneously or rather irregularly). The usual assumption of stationarity over space appears highly unrealistic when a large number of spatial observations are used (Anselin, 1993), as in the case of remote sensing images, making exploratory methods appropriate. Spatial analysis is conditioned by the underlying data model, i.e., the “discretization” of geographical variables, which implies a form of spatial sampling in terms of spatial resolution and arrangement of selected units of observation. The raster data model accommodates the spatial variability that occurs naturally in landscapes. Different approaches are being used to deal with the spatial nature of the data (Cressie, 1991; Isaacs and Srivastava, 1989), so that the computed parameters express somehow the “natural characteristics” of spatially contiguous sets of pixels (De long and Burrough, 1995). Although the individual pixel values may vary, the pattern can be distinctive. Global spatial measures, such as Moran’s I and Geary’s autocorrelation
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
317
coefficients, give a one-scalar measure on spatial patterns, while spatial adaptive filtering based on a moving-window scheme has been used to extract local spatial knowledge from the data. Spatial filters are well suited for raster data models, since they can integrate any local measure and accomplish the scale characteristics of the data by the use of increasing window-size, as shown by Getis (1994). Spatial patterns can be described quantitatively in terms of the semivariance function, which is based on the idea that statistical variation of data is a function of distance. Although the variogram has been used (Atkinson 1993; Seixas et al., 1995b) for the analysis of images, some disadvantages are identified. The variogram is a global estimator and does not give information on local variation, and different samples from the same landscape yield different estimated variograms, as shown by Webster and Oliver (1992). Spatial data analysis has been considered from a static perspective. The inclusion of time in spatial data analysis has been proposed in a statistical modelling approach referring to data interpolation (Soares 1993; and Miller, Chapter 28 in this volume), rather than as an exploratory data analysis tool. The data analysis process must accommodate dynamic issues about statistical patches, defined as a set of contiguous pixels that are aggregated by some statistical criteria. For example, the following questions can be asked: (1) What is the movement of the patch of the percentile 75 data during a decade? (2) Do the patches occur spatially at random over time, or do they move systematically for the north region? (3) Does their shape increase slightly from the centre to all directions, or do most of them disappear gradually in the considered time range? Visualisation of temporal tracking of statistical patches appears to be an intuitive way to get the answers for those questions, namely using animation techniques (Rhyne et al., 1993; Seixas et al., 1995a; and Stonebraker, 1994). However, scientific visualisation presents limitations for human capturing and understanding, when the attribute is abstract and not a familiar one, and when the time step changes are so slight that they become non-visible to the human eye. This is the case of desertification evaluation, for which one needs to quantify the dynamic spatial patterns. Temporal tracking of spatial patches, preserving geographical context, as a basis for remote sensing analysis implies two tasks: generation of the statistical landscapes in order to accommodate the pixels association and variability knowledge; and analysis of the behaviour of statistical landscapes through time. The next section presents a methodology that accomplishes those tasks. 25.4 SPATIO-TEMPORAL PATTERN ANALYSIS: PROPOSED METHODOLOGY The goal of this research concerns the assessment of heterogeneity in remote sensing images, both in the spatial and temporal domain. Heterogeneity refers to the statistically significant variability patterns of the spectral data. Homogeneous areas are those where the spatial or temporal variance of the pixel values are similar, and heterogeneous where there are significant changes. The assessment of the spectral variability landscapes over time could highlight temporal patterns which can be interpreted properly in the desertification process. Exploratory data analysis approach should be adopted, since one wants to find out clues indicating a desertification process, instead of confirming the hypothesis of desertification. Considering a remote sensing image, for a specific time, a landscape structure is defined by two uncorrelated components, the spatial association of the values which represents large-scale features, and the variability of the pixels over the space which embeds small-scale features. Temporal association patterns allow the discovery of the spots of values for different time lags, while temporal variability patterns highlight the way pixels change through time.
318
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Since the significance of the spatial and temporal patterns derives from its location on the landscape, the approach of a kernel-based structure is well suited for the remote sensing analysis. The implementation scheme was built on the basis of a square moving window, which results in a set of new images, which will be called statistical landscapes. The assessment of the statistical landscapes, over space or through time, allows identification of the spatial-temporal patterns. Let us consider a spatial process, S(St), on the two-dimensional space, represented in the raster data model of a satellite image, for a specific time t. Considering the sensor data source, one can state that, S(St)=X(St), X(St) being a positive real-value variable, representing the spectral data captured at a given time. The spatial resolution of the satellite sensor is assumed constant over time for X(St). A spatial process is assessed by the spatial meaningfulness of its values and its variability. The former indicates the large-scale features that cause the location and extension of the patches of high and low values, and thus it can be referred to as the average process, ( (st)), while the second captures small-scale features that promote the way values vary over space, being referred as the variance process, (X'(s t)). The spatial process, S(st), can be stated as: (25.1) Although in nature, processes evolve simultaneously over space and time, for the sake of simplicity, the methodology for the spatial assessment of landscapes is presented first and after the integration of time. 25.4.1 ASSESSMENT OF THE SPATIAL STRUCTURE OF LANDSCAPES Identifying the clusters of low and high values and where they are located is the basis for understanding the spatial trend of a landscape. The local Gi*(d) statistic was used to detect the spatial association pattern, since it measures the degree of association that results from the concentration of point-values (i) and all other points (j) included within a radius of distance, d, from the original point. The statistic, for the case of j may be equal to i, is described as follows, (25.2) where {wij} is a symmetric one/zero spatial weight matrix with ones for all links defined as being within distance d of a given i, including the link of point i to itself; all other links are zero. Since the movingwindow approach was used to compute the Gi*(d), being the d parameter the size of the window, images of Gi*-values result. The null hypothesis is that the set of x values within d of location i is a random sample. Assuming that Gi* is approximately normally distributed, when (25.3) is positively or negatively greater than some specific level of significance, then a positive or negative spatial association is obtained. A large positive Zi implies that larger values of xj (values above the mean xj) are within d of point i. A large negative Zi means that small values of xj are within d of point i. The structure of spatial association of a landscape is captured when the three classes of Zi are identified. Details on tests and the moments of the distribution of the statistic can be found in Getis and Ord (1992).
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
319
Since the spatial analysis is performed on a kernel basis, the spatial variability is assessed by the local variances associated with the set of pixels belonging to the window. Generally, the sample variances (s2), derived from n observations, should be distributed as a chi-square (β 2) variable with (n-1) degrees of freedom, if the samples are taken from a normal population. Following the chi-square distribution approximation, meaningful regions of small or great variances can be identified for a specific level of significance. Considering the remote sensing data sets, there is a threshold (taken from the chi-square distribution), below which the pixel values do not change significantly, and we refer them as patches of homogeneity. Above that threshold, the variability over space is meaningful. A second threshold exists, after which the variability values are very high; this is considered noise. This procedure does not conclude for a spatial variability model, like the variogram, but for the areas of significant variability patterns. 25.4.2 Integrating Time in Spatial Analysis The integration of time in the spatial analysis of landscapes can be done by assessing the differences between the statistical landscapes concluded for each time step, or by temporal measures of the remote sensing images, referring to the evolution of the statistical landscape design and properties, respectively. While the former implies state-of-the-art operations of image processing, the latter requires the development of appropriate temporal-spatial statistics. Following the rationale underlying the spatial analysis, the temporal patterns of the data, can be explored on two aspects: the temporal average process, which stands for the significant evolution of the image values, and the temporal variability process, which refers to the way pixel values vary through time. A temporal-spatial process is represented as ST(St), where t {t1, t2,..., tm}, is the time series of a region. Let us denote by β (sk, tm), as a time series (tm) of values for a pixel or a set of pixels (sk) within an image (with k less than the image dimensions), the spatial-temporal process, ST(st), can be stated as: (25.4) being ( (sk, tm)) the average process, and (X'(sk, tm)) the variability process. The evolution of the pixel values through time can be accommodated by the notion of temporal association for increasing time-lags. If the association of a set of pixels remains constant or random for all time-lags within the time range there is no temporal trend. On the other hand, if the degree of association increases with the time-lags, a positive trend is identified. The significance of this trend is derived from the slope of the association measure in order of time-lags, by fitting a linear least square function. An extension of the Gi* measure is proposed to measure the spatial-temporal association. Considering a n-sized window, within it the distance among the pixels are less than d, the spatial-temporal association for a time-lag, Gil* is obtained from the expression: (25.5)
where l is the time-lag of values and T the time range of the available time series of images. The analysis of the spatial variability through time follows the principle that observations close in time tend to be more similar than those further apart. Although this principle sounds similar to reality, its application imposes that time series should be continuous, in the sense that the process, or conditions which originate the values manifest themselves continuously in time. There is a sort of memory through time that make it possible to
320
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 25.2: Diagram of the methodology to analyse (a) the spatial and (b) the spatial-temporal patterns of remote sensing images.
assess temporal variability. A simple technique was performed, based on the test of the ratio of two variances, referred to two different dates. Figure 25.2 presents a diagram of the proposed methodology. 25.5 APPLICATION The hypothesis of desertification in the south of Portugal (Alentejo) is being technically tested by soil and hydrologic experts. The potential desertic conditions in the Southern basin of the Guadiana river have been identified, namely, the decrease in the spring precipitation, at least for the last 60 years, from 250 to 120 mm (Santo, 1994), and the increase of the amplitude of the number of very dry and very humid years (Sequeira, 1994). The annual ratio of precipitation/potential evapotranspiration is about 0.42, and in summer it decreases to 0.05, representing a water deficit to the biotic system (Seixas et al., 1995). Soil units are mainly cambisoils, with low organic matter content (less than two percent) and a depth of 5 to 10 cm. The landscape is composed of wheat fields, open oak forest, and large areas of scrub (Cistus ladeniferus and Cistus monspeliensis). All these results are based on point data, being the spatial dimension dependent of interpolation and extrapolation methods. The thesis of this work states that it should be possible to assess the desertification process from space. A time series of Landsat5-TM (1985–1994) for August was explored for the analysis of the landscape design evolution, and for the detection of significant spatialtemporal patterns of the spectral data. The application was developed within a GIS, which provides a robust environment for the visualisation and analysis of spatial data.
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
321
25.5.1 Land cover analysis The Landsat5-TM images were corrected to geographic coordinates (Bessel ellipsoid, and Bonne projection), with the 1991 image as the reference image, being the mean square error of 17,4 m. Radiometric calibration, which refers to the transformation of the digital numbers (0–255) into absolute radiance values at the on-board sensor, was not performed, since for the study goal, the digital numbers are well suited. The interannual radiometric calibration is usually assured by the EOSAT procedures on board the satellite. From field work, five thematic classes were identified: 1. Dense vegetation, with cork and holm-oaks, and pine trees, and a soil cover of scrub vegetation. 2. Scrub land, dominated by bushes, shrubs, and herbaceous plants (rock-rose, broom, and gorse). 3. Scarce vegetation, with large areas of bare soil with ephemeral plants, and disperse scrub vegetation patches, or isolated cork-oaks. 4. Stubble, including wheat wastes from harvest activities. 5. Fallow ground, since the wheat culture is a rotational activity in the region. Water and urban classes were also identified, but with no interest for the present study. A spectral confusion among the five themes occurs, mainly due to the weak contrast in August between soil, stubble and dryness vegetation, and also due to shadows, topography, and soil unit types, thus bandratio images were experimented successfully. Three ratio images were built for the classification procedure: 1. TM4/TM3, usually designed as a vegetation index, suited for the discrimination between vegetative and non-vegetative areas. 2. TM5/TM7, used to discriminate the stubble. 3. TM3/TM7, suited to discriminate the non-cultivated fallow ground areas. The RGB false colour composite with the three ratio images, revealed a meaningful contrast between the fallow (dark blue) and stubble (cyan), the vegetation areas (from red to yellow) and the mixture bare soil and scrub areas (green). A supervised classification was developed, using the maximum likelihood algorithm, being the accuracy coefficients on a range from 86 to 96 percent. Analysing the evolution of the land cover for the ten-year period, scrub areas dominated the region until 1991; afterwards the scarce vegetation areas dominated, with a very important participation of bare soil. A transition from scrub lands and cultivated areas was verified, which means the progressive conquest of scarce vegetative areas, and the abandonment of wheat production. This picture means an even more poor primary productivity scenario, and if the trend continues over the next few years, it could be meant a sustainable physical degradation of the region. Multitemporal histogram analysis of the spectral data associated with the land cover units have suggested significant time patterns for the understanding of desertification. Natural vegetation areas show an increase of the mean of the TM4/TM3 ratio, which means an increase of the available biomass. Related field works (Veiga et al., 1984) demonstrated that rock-rose fields, associated with poor soils, presented about three times more biomass (about 1140 g/m2) than a diversified brush with ten species (about 400 g/m2). The transition for vegetation species with more biomass, mainly leaves biomass, is according to the ecological principle of the adaptation of the ecosystem species to adverse conditions, in this case, water stress. This biological positive feedback in desertification, as pointed out by Schlesinger et al., (1990) seems to be verified at Alentejo. A decrease of the dispersion measure of the TM5/TM7 values was also identified,
322
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 25.3: TM4/TM3 ratio image (lighter values correspond to higher vegetation degree).
indicating a progressive homogenisation of the soil spectral signal. Despite the significance of these nonrandom temporal patterns of the spectral data, histogram analysis does not determine the spatial dimension of such patterns. 25.5.2 Spatial—temporal analysis The proposed methodology was implemented for the areas of natural vegetation to ensure temporal stationarity of the data, and in view of the results obtained from the histogram analysis. Figure 25.3 shows the TM4/TM3 ratio image for the study area, which gives the vegetation index. Black areas are not considered since they are areas of non-natural vegetation. Two complementary approaches were implemented to assess temporal patterns: spatial association, and spatial variability. The results will be presented separately below. Association patterns
The spatial patterns associated with significant clusters of low and high vegetation values were obtained with the local Gi* association measure. To assess local patterns, the Gi* statistic was computed for the set of temporal TM4/TM3 images, on a moving 3 pixels-window basis. Figure 25.4(a) illustrates the spatial arrangement of the meaningful spots of vegetation values (for a 10 percent significance level), for a sample area of the overall study region. Spots of low vegetation values represent about 3 to 10 percent of the overall natural vegetation areas, while the clusters of high vegetation correspond to 15 to 20 percent indicating a skewed distribution of the vegetation for the high values, which is consistent with the histogram analysis. The pattern of the low vegetation values is very disperse, while the extent of the high clusters is usually great, which reveals that water proximity is an important constraint for the design of natural vegetation landscapes. Although field studies have demonstrated that the biomass content of rock-rose fields, on poor soils, is three times more than a diversified brush, the Landsat-TM recognises the latter as the most “green”, because scrub areas are usually associated with bare soil, which confuses the satellite sensor, considering its spatial resolution (30 m).
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
323
Figure 25.4: Spatial patterns of (a) spot values of vegetation (black colour corresponds to low and grey to high spot values of vegetation) and (b) significant slopes (black corresponds to negative and grey to positive slopes).
Figure 25.4(b) presents the spatial pattern of the 10percent level of significant slopes derived from the Gi* values over time. The temporal association Gi* values were computed on a moving 3 pixels-window basis, for increasing time lags, referred to the base year of 1985 (for example, two-year lag (1985–1987), four-year lag (1985–1989), and so on). Areas of positive slopes represent about 60 percent of the natural vegetation area, while the negative slopes refer only to 30 percent From the Gi* slope landscape, one can see an increase of the vegetation degree, associated with dense vegetation and scrub areas. From an environmental perspective, the increasing biomass level associated with scrub areas is according to the positive feedback pointed out by Schlesinger et al. (1990). Variability patterns
Local spatial variances were assessed by a moving 3-pixels-window, and three classes were obtained, homogeneous patches, corresponding to small variances (less than the chi-square value at the 10 percent level of significance), heterogeneous patches, referring to very high variances (greater than the chi-square value at the 90 percent), and “normal” patches, corresponding to the expected sample variances, if the samples were taken from a normal population. Homogeneous patches dominate the area, meaning that one can establish an extensive spatial dependence pattern of the natural vegetation areas, at least for the spatial resolution of 30 m. To evaluate the evolution of the three classes of spatial variability, a normalised ratio, referred to the year of 1985 was computed (e.g., homogeneous areas (HA):(HA1987–HA1985)/HA1985)). Figure 25.5(a) presents the evolution of the normalised ratio during the last decade. At a first glance, a decrease of the spatial dependent areas appears, and is substituted by higher variance areas. Since variability classes are designed according to the spectral values distribution at each year of analysis, the same patch can be homogeneous for two years, but can show a meaningful temporal trend of its variance. The differences between spatial variances for two dates, was assessed by a statistical test of its ratio. This test states that the ratio of two population variances possesses an F distribution with the appropriate degrees of freedom, being the null hypothesis of the equality of the variances against the fact the variance at the numerator is greater than that at the denominator. Four ratio images were computed (87/ 85, 89/87, 91/89, 94/91), as well as the areas of significant equal variances, considering a 10 percent level of significance. Figure 25.5(b) shows the areas of equivalent spatial variances. The areas of the two-year similar variances are increasing, from about 60 percent at the beginning of the time period under study, to
324
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 25.5: (a) Normalised ratio for the homogeneous and non-homogeneous areas, and (b) Areas (%) of equal spatial variances for successive years.
85–90 percent at the end, which means that the temporal pattern of the spatial variances can be considered roughly as increasingly homogeneous. 25.6 CONCLUSIONS Deserts are not necessarily the product of outside forces like decreasing rainfall; rather, it is the internal ecology of the desert itself—its web of plants, animals, and soil—that drives its growth to maturity and stability. Following the historical evolution of some deserts, it has been found that internal arrangements drive the ecosystems from smooth to patchy, and finding the driving forces of that movement means explaining desertification. Thus, one can hypothesise that temporal spectral changes manifest themselves in privileged spatial niches, instead of uniformly in the whole area. Physically, this hypothesis makes sense because the behaviour of each pixel or close pixels through time, depends on the local characteristics, like soil type, biotic communities, proximity of available water, slope and aspect. If the pixel ecology is changing, its spectral properties also change and the patchy trajectory can be assessed. Thus, the consideration of the spatial dimension in the multitemporal spectral analysis becomes essential for the desertification study, which is our research goal. A methodology based on the spatial—temporal assessment of the association and the variances of spectral values, referred to the vegetation index, was proposed. The main results show that there probably is an increase of the biomass in the region, associated either with the dense vegetation areas, but more important, with the scrub areas. From an environmental perspective, the increasing biomass level associated with scrub areas is due to the positive feedback of the desertification process. In terms of spatial variability, although a spatial dependence pattern of the natural vegetation areas dominates, there has been an increase of the heterogeneous patches. Considering two-year periods of comparisons, the spatial variances remain almost equal, which means that the heterogeneity pattern is increasing very slightly. The application of the proposed methodology to assess spectral data has to be extended, if we aim to make consistent the environmental goal of desertification, namely, by integrating soil related spectral data. Spatial patterns assessment could be enriched by considering larger window sizes. As the window size tends to be equal to the image dimension, the local variances tend to equal the global variance, which is equivalent to the conceptual approach of the variogram analysis. Also, the temporal association could be evaluated for
SPATIAL-TIME DATA ANALYSIS: THE CASE OF DESERTIFICATION
325
increasing nsized windows, giving highlights on the behaviour of temporal trends of the spectral values for increasing spatial distances. ACKNOWLEDGEMENTS The National Board for Scientific and Technological Research partially supported this work, under research contract PEAM/C/RNT/84/91. The author wishes to thank P.Gonçalves for his support on programming, W.Schlesinger for his comments, and also the National Center of Geographic Information for the support on image processing. REFERENCES ALLEN, T. and HOEKSTRA, T. 1992. Toward a Unified Ecology. New York: Columbia University Press. ANSELIN, L. 1993. Exploratory spatial data analysis and geographic information systems, Proceedings of DOSES Workshop on Spatial Analysis and GIS, Lisboa, 18–20 November. Luxembourg: Eurostat, pp.45–54. ATKINSON, P. 1993, The effect of spatial resolution on the experimental variogram of airborne MSS imagery, International Journal of Remote Sensing, 14(5), pp. 1005–1011. CRESSIE, N. 1991. Statistics for Spatial Data. New York: John Wiley & Sons. DE JONG, S.M. and BURROUGH, P. 1995. A fractal approach to the classification of mediterranean vegetation types in remotely sensed images, Photogrammetric Engineering & Remote Sensing, 61(8) pp. 1041–1053. GETIS, A. 1994. Spatial dependence and heterogeneity and proximal databases, in Fotheringham, S. and Rogerson, P. (Eds.), Spatial Analysis and GIS. London: Taylor & Francis, pp. 103–120. GETIS, A. and ORD, J.K. 1992. The analysis of spatial association by use of distance statistics, Geographycal Analysis, 24(3) pp. 189–206. HARE, K., WARREN, A., MAIZELS, K., KATES, W., JOHNSON, D., HARING, J. and GARDUNO, A. 1977. Desertification: Its Causes and Consequences. New York: Pergamon Press. ISAACS, E. and SRIVASTAVA, R. 1989. An Introduction to Applied Geostatistics. New York: Oxford University Press. RHYNE, T., BOLSTAD, M. and RHEINGANS, P. 1993. Visualizing environmental data at the EPA, IEEE Transactions on Visualization and Computer Graphics, 13(2), pp. 34–38. SANTO, F.E. 1994. Variabilidade da precipitação noAlentejo, Lisboa, Associação Portuguesa de Recursos Hídricos. SCHLESINGER, W., REYNOLDS, J., CUNNINGHAM, G., HUENNEKE, L., JARRELL, W., VIRGINIA, R. and WHITFORD, W. 1990. Biological feedbacks in global desertification, Science, 247(2), 1043–1048. SCHLESINGER, W., RAIKES, I, HARTLEY, A. and CROSS, A. 1996. On the spatial patterns of soil nutrients in desert ecosystems, Ecology 77(2), pp.364–374. SEIXAS, J., GONÇALVES, P., SILVA, J.P. and NEVES, N. 1995a. Temporal tracking of spatial data, Proceedings of the 1st Conference on Spatial Multimedia and Virtual Reality, Lisbon, 18–20 October. Lisbon: New University of Lisbon, pp. 11–20. SEIXAS, J., SEABRA, C. and HENRIQUES, R.G. 1995b. Exploratory spatial data analysis of Landsat-TM images, Proceedings of the American Congress of Surveying and Mapping and American Society of Photogrametry and Remote Sensing, Charlotte, 27 February-2 March. Bethesda: ACSM/ASPRS, pp. 553–561. SEQUEIRA, E. 1994. Convenção pra Combater a Desertificação, Areas abrangidas em Portugal, Personal Communication. Lisboa: Estação Agronómica Nacional. SOARES, A. 1993. Predicting Probability Maps of Air Pollution Concentration, in Scares A. (Ed.) Geostatistics Troia ‘92, Amsterdan: Kluwer Academic Publishers, pp. 625–636. STONEBRAKER, M. 1994. Sequoia 2000: a reflection on the first three years, IEEE Computational Science and Engineering, Winter 1994, pp. 63–72.
326
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
VEIGA, A. and PEREIRA, A. 1984. Estudos de Vegetação arbustiva em parcelas de mo ntado de sobro, Working Paper, Institute Superior de Agronomia. Lisboa: Technical University of Lisbon. WEBSTER, R. and OLIVER, M. 1992. Sample adequately to estimate variograms of soil properties, Journal of Soil Science, 43, pp. 177–192.
Chapter Twenty Six The Potential Role of GIS in Integrated Assessments of Global Change Milind Kandlikar
26.1 INTRODUCTION There are many investigators all around the world conducting research on problems related to global climate change. By far the largest effort is being directed at the physical science aspects of the problem: understanding the operation and likely future response of the atmosphere and the oceans. There is also work being done to understand how a changed atmosphere and climate might affect plants and managed and unmanaged ecosystems. Social scientists are studying the potential economic and other social implications of climate change. There are studies of strategies and technologies that might be used to reduce future emissions of greenhouse gases. Finally, there are studies of how society and the environment might adapt in response to climate change. Integrating the knowledge gathered from natural, physical and social scientific endeavours is essential for gaining a more complete view of global climate change issues. Integrated assessments of climate change provide a vehicle for attaining this goal. Integrated Assessments (IA) of climate change are typically carried out using computer models that integrate knowledge from the different disciplines. These computer models link representations of the different natural and social science components including some of the known feedbacks. Several recent papers provide an overview of current IA models (Rotmans and Dowlatabadi, 1998; Weyant et al., 1995). The objective of IA is to put the various pieces of the climate problem together and look carefully at the big picture so as to: • keep a proper sense of perspective about the problem, since climate change will occur in the presence of ongoing environmental and social changes; • develop the understanding necessary to support informed decision making by the many actors and institutions around the world; • ensure that the type and mix of research undertaken will be as useful as possible to these decisions makers in both the near- and long-term. Although a majority of IAs are performed using global modelling frameworks, IAs can be seen as more than just a model building exercise. IA models can play the role of ensuring consistency among the disciplinary components and point to substantive areas where more information is required. The actual research program emerges iteratively from the insights that the model provides and investigations in the substantive domains of the sub-components. IA models that attempt to provide meaningful input for policy purposes are difficult to build because of the inherent complexity embedded in the temporal (10 to 100 years), spatial (local,
328
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
continental and global) and socio-political (national and international) scales of the analysis and because of the vast uncertainties in relevant scientific and socio-economic variables. Problems with the use of global IA models for policy recommendations have been discussed elsewhere (Kandlikar and Risbey, 1996; Risbey et al., 1996). Nonetheless, in their current form IA models serve an important purpose because they provide an organising framework for interdisciplinary research and for bringing together communities from different disciplines working on different aspects of the problem. Values are an important part of the climate problem. A good integrated assessment must treat values explicitly, not hide them down in the details of the analysis. When possible the treatment of values should be transparent, so that many different actors can all make use of results from the same assessment, and so people can ask “what if” questions about the different value choices they may be considering. Studies of climate change policy show that uncertainties in values and personal beliefs may be more important obstacles to decision making than scientific uncertainty (Lave and Dowlatabadi 1993). This has led us to suggest that the questions integrated assessments seek to answer are most meaningfully framed in terms of multi-actor, multiple metric approaches (Kandlikar and Morgan, 1995). In this chapter, we discuss the potential role of GIS in performing integrated assessments, particularly in facilitating the development of multi-actor multiple metric approaches. Although this chapter focuses on global climate change, the issues raised here relate to other global change phenomena as well. In what follows, we describe the use of multi-actor multiple metric approaches in integrated assessments (Section 26.2). We then consider the functional role that GIS can play in facilitating better integrated assessment studies, paying particular attention to the varied perspectives of the different stakeholder groups (Section 26.3). In Section 26.4, we provide two examples of the cross cutting role that GIS can play in integrated assessment work. 26.2 MULTIPLE ACTORS, MULTIPLE METRICS Many integrated assessment models ignore the role of the different individual and collective values and report their results using a single economic variable, most frequently the sum of impacts and abatement costs reported as a fraction of global GDP (Dowlatabadi and Morgan, 1993; Nordhaus, 1992; Peck and Teisberg, 1992). In doing so they assume that decisions about the future of the planet will be made by global “commoner” with robust monetary valuations for aggregate market and ecological impacts, as well as net costs of abatement policies. In fact, however, these decisions will be made in a globally distributed manner by scores of national governments, millions of private and public sector managers and billions of citizens. None of these stakeholders can be expected to make decisions based on globally averaged values; instead each will decide on the basis of the specific costs they incur and choices they face. Although each stakeholder faces different and uncertain market costs, the methodologies for arriving at them are relatively well established. This is far from true for evaluations of non market impacts, particularly those on ecosystems. Robust economic valuations of ecosystems remain a mythical abstraction, even in culturally homogenous settings such as the United States, and problems with valuation techniques are well documented (Diamond and Hausman, 1994; Fischhoff, 1991). In addition to individual differences, it is not hard to imagine that significant cross cultural differences may exist. A young farmer from Bangladesh may place a very different value on a species of the American maple than an elderly member of the Sierra Club from Boston. In general, actors from different geographic locations, income and resource levels and cultural perspectives will place very different values on the impacts of climate change.
GIS IN THE INTEGRATED ASSESSMENT OF GLOBAL CHANGE
329
In Table 26.1 we provide a “toy” example taken from Kandlikar and Morgan (1995) to demonstrate the importance of including multiple metrics and multiple actors in climate change policy debates. The example is purely illustrative, and may seem to be making a rather obvious point—that individual and collective values determine preferred policy outcomes. However, recent debates in the literature on climate change impacts, particularly the controversy over chapter 6 of IPCC’s working group III (Pearce et al., 1996), suggest that the importance of this elementary but powerful observation has not adequately sunk in to the collective wisdom on the impacts of global climate change. Table 26.1: Illustration of Multi- Attribute Decision Making Applied to Climate Change Actor
Environmentalist
Decision Rule
2000
Industrialist 2050
2000
2050
Minimise Expected B~C C A A Value Minimise Worst C C A B Case Actors choose policies: (A—No Carbon tax, B—Medium Carbon Tax, C- High Carbon Tax) using two decision rules. Dominant policies for the years are 2000 and 2050 are shown.
The example uses outputs from ICAM—an Integrated Climate Assessment Model developed at Carnegie Mellon University (Morgan and Dowlatabadi, 1996). Subjective preferences of two actors—a proenvironment actor (“Environmentalist”), and a pro-growth actor (“Industrialist”) are evaluated using multiattribute utility functions. The two actors have different subjective preferences (utility functions) for different levels of ICAM output metrics: changes in ecosystem prevalence, global impacts of sea level rise, global mean temperature change and changes in global per capita income resulting from economic impacts and CO2 abatement measures. Single attribute utility functions for each metric are combined additively; the different actors associate different importance weights to each of the attributes based on their own preferences. Hence, the environmentalist places a larger value on ecosystem changes than on changes in per capita income. The industrialist on the other hand, may be more concerned about changes in per capita income. In contrast to this approach, common economic approaches convert each impact’s metric into monetary units that are the same for different actors. The resulting multi-attribute utility function can be used to evaluate emissions reduction options. The two actors have very different choices for dominant policy strategies. Environmentalists prefer option C in most situations, except in the very short term where they are indifferent (B~C) between the two tax options. Industrialists on the other hand prefer option A in most cases, except when the decision rule is to minimise worst case outcomes in the long term where option B is preferred. The decisions of the two actors are most similar when either the short term expected damages or chances of long term low probability-high consequences events are minimised. The previous example was designed to be simplistic in order to illustrate the importance of the differences among the different stakeholders for aggregate climate change policy formulation. In reality, of course, there is a bewildering array of stakeholders, who have concerns that span a variety of spatial and temporal scales and with different goals, resources and motivations. It is precisely these kinds of differences in evaluations of climate change impacts that makes GIS an attractive platform for putting multi-actor approaches into practice. Visualisation and manipulation of different kinds of data can be used to describe a wide range of possible climate change outcomes in an integrated manner. Coupling these outcomes with
330
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
decision support tools may help in characterising less and more desirable outcomes from the perspectives of the different stakeholder groups. In order to understand better stakeholder goals, and the information that IAs may provide towards meeting them, we find it useful to characterise the different stakeholders into a few canonical categories. These categories are provided in Table 26.2. In addition, we also provide the multiple spatial and temporal scales of interest, as well as the end goals that these stakeholders may have in the context of global change. An examination of the literature suggests that much of IA modelling has focused to date on: • The concerns of “rational” national policy makers in the USA and Western Europe using IA models as test beds for evaluating macro-economic policy measures (carbon taxes, R&D subsidies, etc.). IA models used in this mode are typically cost-benefit models. These models perform ad-hoc aggregations of local and regional impact metrics into global monetary units and compare the resulting numbers with costs of global macro-economic policies to reduce carbon dioxide emissions. Given the uncertainties and time scales and diversity of values inherent in climate change, and the history of failures of long term large scale global modelling efforts we are guarded about the utility of IA models for explicit macro level policy formulation (Risbey et al., 1996). Further arguments echoing this view can be found in Brunner (1996). • Using IA models as test beds for research to understand the linkages and feed-backs between natural and social systems (see for example, Morgan and Dowlatabadi, 1996). This is a more modest and perhaps attainable goal for IA modelling. Hence, current IA models function primarily at aggregate global levels and aim to serve the interests of stakeholder groups at the top and bottom of Table 26.2 with very limited success. (Kandlikar and Risbey, 1996). Table 26.2: Canonical Stakeholder Groups, Spatial and Temporal Scales of Interest and Possible Concerns that I A Can Help Address Stakeholders
Spatial Scale
Temporal Scale
Goals and IA End Use
National Policy makers Long term
National/ Continental R&D budgets Rhetorical value Local/Regional
Short term
“No regrets options”
Global/Local
Short to medium term Short term
Sectoral efficiency and equity Maximising profits
All scales
All scales
Regional/Global
Medium/Long term
Broaden the range and focus of discussion Understand interactions between natural and social systems
Public Sector Managers Private Sector Managers NGOs Global Change Researchers
There is at least one ongoing IA effort that addresses the concerns of other groups in Table 26.2—public and private sector managers and NGOs. The Mackenzie Basin Impact Study (Cohen, 1994, 1996) is an example of a regional integrated assessment that incorporates local and regional decision making and
GIS IN THE INTEGRATED ASSESSMENT OF GLOBAL CHANGE
331
interests. In this work regional stakeholders were consulted from the start so the analysis could be tailored to the unique circumstances of the Mackenzie basin. Numerical IA modelling was just one of many tools of the analysis. Qualitative reasoning, data analyses and GIS systems were also used when suited. This mixture of modelling and non-modelling approaches can lead to the identification of site specific issues, similar to those described by the Mackenzie Basin Impact Study (Cohen, 1996), that are not dealt with by global scale IA models. In what follows, we argue that GIS could serve as a cross cutting tool; one that plays a measurable role in responding to the concerns of all the canonical decision makers described in Table 26.2. In particular, GIS has the ability to handle multiple spatial scales and inform the process of aggregation from local to regional and global. GIS can provide coherent mappings between different spatial scales and across the interests of the different stakeholders. 26.3 THE FUNCTIONAL ROLE OF GIS IN BUILDING AND INTERPRETING INTEGRATED ASSESSMENT MODELS In the past GIS systems have been applied for evaluating climate change impacts, particularly for land use evaluation. Studies of agricultural impacts of climate change have employed GIS systems for classification of regions into crop growth categories (Carter et al., 1991; Darwin, 1995; Easwaran and Van Der Berg, 1992). The classifications characterise how potential shifts in climatic zones may favour some crops and become unfavourable to others and help model crop responses to future climatic change. GIS systems also provide a natural means to model and represent ecosystem impacts of climate change. Land classification schemes can be used to correlate climatic patterns with vegetation distributions (Goodchild, 1976). GIS systems can be used to trace the effect of climate change on land cover patterns and biogeochemistry (Kittel et al., 1995; VEMAP, 1995). These studies represent advances in understanding sectoral impacts of climate change. An important potential application of GIS systems is in understanding the interactions between sectoral impacts, and in providing a more integrated view of resulting societal outcomes that sectoral studies alone cannot hope to provide. While there are multiple applications for GIS in integrated assessments, in a functional sense GIS can contribute to integrated assessments in three different ways: • provision of geographical and regional specificity; • improvement of data quality and model validation; • capture and display of data along with linkages to analytical and decision support models 26.3.1 Geographical and Regional Specificity The benefits of using GIS to provide regional and geographic specificity to integrated assessment models are critical. As noted earlier, integrated assessment models are often highly stylised and represent processes at a very high level of aggregation (see for example, Nordhaus, 1992). Incorporating specific details of a region such as its natural resource characteristics and socio-economic variables might result in significantly different inferences regarding impacts of climate change and response measures when compared with aggregated analyses.
332
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
In addition it is easy to see how regional specificity will address some concerns of the groups in Table 26.2. For example, consider ecosystem impacts of climate change—coupling model experiments with the regional socio-economic data might help illuminate the impacts of climate change on a region; this may be particularly important for local communities and institutions (NGOs, regional planning commissions) most vulnerable to ecosystem changes. From the perspective of scientists and researchers, the integration of ecosystem models and GIS with statistical software will help in managing and evaluating model experiments (Parks, 1993). By making specific regional outcomes transparent GIS systems can aid in incorporating and highlighting the concerns of different stakeholders in a regional or global basis (see for example, Mounsey, 1991). 26.3.2 Data Quality Integrated assessments at present lack a systematic grounding in historical data; indeed very few attempts have been actually made to validate them against environmental and societal data (Risbey et al., 1996). Societal and environmental data are collected for different purposes, at different scales, and with different underlying assumptions about the nature of the phenomena. Environmental data often exhibit continuous spatial variation while social phenomena tend to be more spatially discrete. Researchers in the GIS community have encountered these issues and have learnt to address many of them. Integrated assessment researchers on the other hand, have little experience in handling and integrating inherent differences in data types. As a result members of the integrated assessment community have seldom been able to generate and address questions of model validation and appropriate use of data. There are several requirements that integrated assessments models place on systems for the adequate handling of data issues. These include the need to represent uncertainty; to integrate different kinds of data (e.g., coupling socio-economic and physical information), often at different scales and possessing different attributes; to incorporate dynamics; and to address scaling and aggregation issues, because impacts occur in a location-specific fashion, while quantitative estimates of impacts may be required at multiple scales to account for the interests of the various stakeholder groups. Some of these issues are already being addressed by members of the GIS community. Other issues, particularly the representation of uncertainty (Brown, 1995) and incorporation of dynamics (Steyaert, 1993; Wheeler, 1993) are challenges that the GIS community faces. 26.3.3 Coupling with Analytical and Integrated Models Broadly speaking, there are two advantages of coupling GIS with regional and global integrated assessment models. First, as described above, GIS can play a role in data integration and quality assurance in the validation of integrated assessment models. Second, GIS models have a significant role to play in the interpretation of model outcomes and implications for policy. In particular, GIS systems have the potential for meaningfully synthesising socio-economic and environmental information that integrated assessment models generate. The uncertainties in the climate problem are so large that any single set of model results has very little heuristic value. However, if the models are exercised over a range of possible outcomes and GIS is used to envision the possible interactions and outcomes, a more holistic picture of climate change impacts might emerge. In this regard, it may be useful to combine the multiple outputs from GIS systems with multi-attribute decision frameworks which provide a scheme for integrating the utility of different outcomes based on the values and beliefs of different stakeholders.
GIS IN THE INTEGRATED ASSESSMENT OF GLOBAL CHANGE
333
One problem with the use of multiple metrics is that their results can be hard to display and interpret (Morgan and Henrion, 1990). Decision-support systems coupled with GIS tools that allow for the representation and manipulation of inherently different kinds of data may be necessary to make full use of the approach. Recent experience with integrated assessment modelling in the US Pacific Northwest also suggests that there may be significant mismatches between the information needs for decision-making and the current form of available information (Jones, 1995). In a broad sense, this points to a need for designing GIS based studies to learn how integrated assessments modelling can be structured to support the information needs of various decision-makers and how these results can be effectively communicated. This will require collaborations between quantitative policy analysts, GIS experts and experts in human decision making. 26.4 CROSS CUTTING ROLE OF GIS: AN EXAMPLE FROM SECONDARY CLIMATE CHANGE IMPACTS The example we provide in this chapter is conceptual. To our knowledge there are few studies that use GIS systems in the manner that we propose their use. Additionally, since we want to illustrate the multiple roles of GIS in integrated assessment, the examples can be generalised over different spatial scales and different stakeholder groups. Evaluating impacts of climate change requires the integration of physical, biological and socio-economic aspects of a region. As we describe earlier, most previous impact assessments have been sectoral in nature, i.e., regional assessments have tended to include “several key economic sectors in a parallel sectoral assessment format” (Cohen,1996). This parallel separation of economic sectors is a way to limit the scope of an analysis in order to make it more tractable. In reality, of course, the impacts from different sectors will interact and result in other “second order” consequences. Will these second order impacts be necessarily less important than primary impacts on sectors? This is a difficult question whose answer is situated at least partly in the specifics of regional resource constraints, population pressures and institutional capabilities. Additionally, a consideration of the dynamics of social change is intrinsic to studying these interactions (see for example Crosson and Rosenberg, 1993, and Cohen, 1994 who also use a combination of modelling and other approaches to achieve similar integrative goals). Consider for example, the case of China. Although the land area of China is vast, a significant fraction of 1.2 billion Chinese citizens live within 100 miles of the coast. In the next century, as China’s population continues to grow, industrialisation will place increasing constraints on the limited land available for agriculture. In a country where 80 percent of grain crop is irrigated, water resource issues are likely to loom large if climate changes. In addition, sea level rise (land inundation, salination, etc.) could displace people from coastal cities and coastal agricultural communities and force them to migrate inland. In isolation, sea level rise impacts, industrial expansion, and impacts from water resource shifts may, perhaps, be manageable. Together, they could result in increased stress and competition on a decreasing amount of agricultural land. Of course, it is difficult to assign any degree of confidence to particular second order outcomes that result from dynamic interactions. However, using GIS systems to display a range of plausible outcomes provides the means to link isolated sectoral studies in an integrated manner. In order to do such an analysis, one would have to produce (at a desired level of aggregation—regional or national) specific sector assessments for agriculture, water resources, industrialisation and sea level rise. Overlaying and displaying maps would characterise the extent of possible interactions among the sectors and highlight problem areas. In addition, one would need to specify the mechanisms and rules through which sectoral impacts could interact, particularly through competition for scare resources—water and land.
334
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Analyses that test alternative mechanisms gleaned from the concerns of the different stakeholder groups would then provide a cognitively coherent picture of possible secondary impact outcomes. It is difficult to comprehend these interactions outside their actual spatial and temporal contexts. Hence, GIS is key to allowing such comprehension because it provides a platform that is the closest approximation to the “real” world. Clearly, this is an ambitious exercise, one that requires the use of consistent data sets, “reasonable” assumptions regarding societal dynamics and creative use of visualisation tools. It also brings together challenges faced by GIS researchers (display and visualising dynamics, dealing with uncertainty), with challenges that IA researchers face—incorporating stakeholder views and integrating social dynamics. In summary, exploring the synergy between GIS and integrated assessment models will provide a better understanding of how the two can be effectively combined. GIS is a tool for data collection, manipulation and quality assurance, all of which are critical issues in the development of better integrated assessment models. Additionally, GIS systems help in integrating data from different sources with different data models and spatial and temporal resolutions. More importantly, GIS systems provide a platform for communicating and visualising information. Coupling GIS with integrated assessment models and associated decision support frameworks (e.g. multi-attribute utility techniques) might lead to the sorts of keen insights about global change impacts and regional policy measures that integrated assessments in their current form will be unable to provide. REFERENCES BROWN, D.G. 1995. Issues and Alternative Approaches for the Integration and Application of Societal and Environmental Data within a GIS, Working Paper No. 3, Rwanda Society Environment Project, Department of Geography, Michigan State University. BRUNNER, R.D. 1996. Policy and global change research: a modest proposal, Climatic Change, 32, pp. 121–147. COHEN, S.J. (Ed.) 1994. Mackenzie Basin Impact Study Interim Report No. 2. Downsview, Ontario: Environment Canada, COHEN S.J. 1996. Integrated assessment of long term climatic change, SDRI Discussions paper, University of British Columbia, Vancouver, BC. CROSSON, P.R, and ROSENBERG, N.J. 1993. An overview of the MINK study, Climatic Change, 24, pp. 159–173. DARWIN, R. 1995. World Agriculture and Climate Change: Economic Adaptations, Economic Research Service, USDA. Agricultural Economic Report No. 703. DIAMOND, P.A., and HAUSMAN, J.A. 1994. Contingent valuation: is some number better than no number? Journal of Economic Perspectives, 8(4), pp. 45–64. DOWLATABADI, H., and MORGAN, G. 1993. A model framework for integrated studies of the climate problem, Energy Policy, 21(3), pp. 209–221. EASWARAN, H. and VAN DER BERG, E. 1992. Impact of doubling CO2 on the length of the growing season, Pedalogie, 42(3) pp. 289–296. FISCHHOFF, B. 1991. Value elicitation: is there anything in there? American Psychologist, 46, pp. 835–847 GOODCHILD, M.F. 1976. The Determinants of Land Capability. Land Directorate, Environment Canada, Ottawa. JONES, S. 1995. Climate change information and environmental decision making in the pacific Northwest, Unpublished Thesis Proposal, Department of Engineering and Public Policy, Carnegie Mellon University. KANDLIKAR, M, and MORGAN, M.G. 1995. Addressing the human dimensions of global change: a multi-actor, multiple metric approach, Human Dimensions Quarterly, 1(3), pp. 13–16. KANDLIKAR, M. and RISBEY, J. 1996. Uses and limitations of insights from integrated assessment modelling, in A Usage Guide to Integrated Assessment Models, Model Visualization and Analysis Project, CIESIN (http:// www.ciesin.org.)
GIS IN THE INTEGRATED ASSESSMENT OF GLOBAL CHANGE
335
KITTEL, T.G.F., OJIMA, D.S., SCHIMEL O.S. et al. 1995. Model-GIS integration and dataset development for assessing the vulnerability of terrestrial ecosystems to climate change in Goodchild, M., Steyaert, L., Parks, B., Crane, M., Maidment, D., Johnston, C and Glendenning, S. (Eds.), GIS and Environmental Modelling: Progress and Research Issues. Fort Collins, CO: GIS World, Inc LAVE, L.B. and DOWLATABADI, H. 1993. Climate change policy: the effects of personal beliefs and scientific uncertainty, Environmental Science and Technology, 27(10), pp. 1962–1972. MORGAN M.G. and DOWLATABADI, H. 1996. Learning from integrated assessments of climatic change, in Climatic Change, 34, pp. 337–368. MORGAN, M.G. and HENRION, M. 1990. Uncertainty: A Guide to Dealing With Uncertainty in Quantitative Risk and Policy Analysis. New York: Cambridge University Press. MOUNSEY, H.M. 1991. Multisource, multinational environmental gis: lessons learnt from CORINE, in Maguire, D.J., Goodchild, M.F. and Rhind, D.W. (Eds.) Geographical Information Systems: Principles and Applications. London: Longman, pp. 185–200, NORDHAUS, W.D. 1992. An optimal transition path for greenhouse gas emissions, Science, 258, pp. 1315–1319. PARKS, B.O. 1993. The need for integration: environmental modeling with GIS, in Goodchild, M., Parks, B. and Steyaert L. (Eds.), Environmental Modeling with GIS. New York: Oxford University Press, pp. 31–34. PEARCE, D., FANKHAUSER, S., CLINE, W., TOL, R., VELLINGA, P., ACHANTA, R. and PACHAURI, R.K. 1996. Greenhouse damages and benefits of control, in Intergovernmental Panel of Climate Change: Working Group III Report. Cambridge: Cambridge University Press, pp. 183–219. PECK, S.C. and TEISBERG, T.J. 1992. CETA: a model for carbon emissions trajectory assessment, The Energy Journal, 13(1), pp. 55–77. RISBEY, J., KANDLIKAR, M. and PATWARDHAN, A. 1996. Assessing integrated assessments, Climatic Change, 34, pp. 369–395. ROTMANS, J. and DOWLATABADI, H. 1998. Integrated assessments of climate change: evaluation of methods and strategies, in Rayemr S. and Malone E. (Eds.) Human Choice and Climate Change: A State of the Art Report. Columbus: Batelle Press, pp. 292–377. STEYAERT, L.T. 1993. Perspective on the state of environmental simulation modeling, in Goodchild, MF., Parks, B.O. and Steyaert L.T. (Eds.), Environmental Modeling with GIS. New York: Oxford University Press, pp. 16–30. VEMAP Members. 1995. Vegetation/ecosystem modeling and analysis project (vemap): comparing biogeography and biogeochemistry models in a continental-scale study of terrestrial ecosystem responses to climate change and CO2 doubling, Global Geogeochemical Cycles, 9(4), pp. 407–437. WEYANT, J. 1995. Integrated assessments of climate change: an overview and comparison of approaches and results, in Intergovernmental Panel of Climate Change, Working Group III Report. Cambridge: Cambridge University Press, pp. 371–393. WHEELER, D.J. 1993. Linking environmental models with geographic information systems for global change research, Photogrammetric Engineering & Remote Sensing, 59(10), pp. 1497–1501.
Part Four METHODOLOGICAL ISSUES
Chapter Twenty Seven Spatial and Temporal Change in Spatial Socio-Economic Units Jonathan Raper
27.1 INTRODUCTION In recent GIS research there has been an increasing concern with the fundamental issues of spatial and temporal representation. As the functionality of commercial software systems for handling static georeferenced geometric data has developed rapidly, researchers have in parallel begun to examine the mappings between the entities identified in the world of practical social and environmental problem solving and the representational devices used in the GIS. This new research, perhaps most clearly initiated in Mark and Frank (1991), has sought to identify spatial representations that can more fully handle the richness of spatial and temporal variation in these domains. In general this work has focused on problems where although current representations fall short in some way, specific benefits of spatial representations can be identified. Such potential can be identified in the applications of route finding, environmental management and the analysis of datasets relating to spatial socioeconomic units. Of necessity research in this new field has drawn from a variety of other disciplines including philosophy, physics, cognitive science, psychology, linguistics, artificial intelligence, computer science and sociology and has become definitively interdisciplinary (Raper, 1996). This chapter will focus on the efforts that have been made to develop richer spatial representations for the analysis of spatial socio-economic units (SSEUs). Most human societies find it necessary and desirable to identify temporally persistent units of geographic space by establishing boundaries for areas of significance or by aggregating entities such as people or buildings into spatial clusters. It is suggested here that such defined “spatial units” can be classified according to their characteristics on three scales: • purposes ranging from the symbolic (e.g. religious) to the instrumental (e.g. governmental); • lifespans ranging from transient (short lived) to the permanent (long lived); and • spatial identities ranging from the diffuse (vaguely defined) to the concrete (sharply defined). At one end of a spectrum of possible spatial units communities and social groups tend to informally create symbolic spatial units which are transient and diffuse in nature, for example, neighbourhoods (defined by identity), favourite vistas (defined by concepts of landscape) or perceptions of “dangerous places-to-be” (defined by social behaviour). By contrast, at the other end of the spectrum governments and commerce tend to define instrumental spatial units using structured approaches which are largely permanent and concrete in nature (though not necessarily completely unchanging). Such spatial units are made, for
338
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 27.1: A preliminary typology of spatial socio-economic units
example, to control access to space (defined by ownership), to limit the uses of space (defined by jurisdiction), to distribute resources efficiently (defined by supply and demand locations) or to characterise localities (defined by a socio-economic classification). The definition and the contesting of these “spatial units” is an important social and political process and involves communities, government and commerce through such means as governance, spatial planning, service delivery or property trading. Fundamental to the process of making and remaking such “spatial units” is the need to communicate the status of the units widely, and hence, the need to describe and publish their nature and extent. In order to debate and realise “spatial units” (especially where governance is concerned) some form of representation of space is desirable: while symbolic, transient and diffuse units may employ oral descriptions referenced to landmarks or street furniture, the instrumental, permanent and concrete units have traditionally been defined in terms of boundaries which are attached to physical objects identifiable on the ground and subject to a wide consensus. Such boundaries may be defined on “official” maps in some jurisdictions. A preliminary spatial typology of such units is given in Figure 27.1, defined firstly by their tendency to exhaust space and secondly by their tendency to overlap. In this classification: box 1 is non-overlapping/ space exhausting (corresponding to cadastral/census); box 2 is non-overlapping/non-space exhausting (corresponding to various service areas when not all of space is served); box 3 is overlapping/non-space exhausting (corresponding to areas which interpenetrate but where every location must belong to one spatial unit as in travel-to-work areas); and, box 4 is space exhausting but overlapping (corresponding to various kinds of responsibility zones such as emergency services where overlaps are desirable but there must never be unexhausted space). However, the creation of all these kinds of spatial unit requires the use of a basic assumption, i.e. that some defining “condition” can be held constant over space and over some span of time. This assumption is in itself highly problematic as many such defining conditions can interpenetrate spatially and/or temporally. In the case of symbolic spatial units which are transient and diffuse in nature individuals may define their identify or adapt their behaviour relative to each particular ‘condition’ and such interpenetration is normal. In the case of instrumental spatial units the need for universality and consensus on their extent tends to make them permanent and concrete in nature, although commerce may make and remake spatial units for marketing both frequently and without reference to those outside the company. In terms of their spatial character two broad cases can be identified: in one case the “condition” can be a restriction on access or
SPATIAL AND TEMPORAL CHANGE IN SPATIAL SOCIO-ECONOMIC UNITS
339
activity within a geographic area defined physically with reference to the world of entities; in the other case the “condition” refers to the characteristics of the people or property defining the spatial unit- in other words the characteristics of an aggregation. The “holding constant” of the “condition” is usually defined by reference to scales of social, economic or political activity significant to a particular society. In certain cases the “spatial units” that have been defined have become incorporated in the legislation of national states, usually those that are instrumental in nature. The best examples are: cadastral systems of land ownership and legal jurisdictions (where activities are restricted within a spatial unit which is defined physically); and postal codes, electoral districts or resource delivery zones (where spatial aggregations are characterised). These “instrumental” spatial units can be called “spatial socio-economic units” (SSEUs) since they are geographic expressions of specific and tangible socio-economic uses of space. Since SSEUs must be communicated throughout society they have been widely “commodified” by government and commerce and are often communicated by paper/digital maps available from designated suppliers. However, commodification implies the creation of units that are unambiguous, especially in geographic terms. As a consequence most SSEUs have been implemented as concrete and permanent spatial units using defining “conditions” subject to some strong form of social consensus (e.g. land ownership, mail delivery address). The traditional (and structured) method of creating SSEUs has been to set design criteria, consider alternative scenarios, draw sharp boundaries on paper maps and update them periodically when change has taken place in the defining “condition” (Raper et al., 1992). The arrival of geographic information systems (GIS) has made this process easier to carry out; however, it has also opened up the possibility that some of the symbolic, transient and diffuse spatial units in existence (such as no-go areas, favourite views or hazard zones) might in future be recorded and studied (Monmonier, 1997). There are few integrated research studies on spatial units in the context of the wide availability of GIS (Nunes, 1991). The European Science Foundation (ESF) GISDATA programme meeting on “Formalising change in SSEUs” offered a first opportunity to examine some of the questions posed by ‘spatial units’ in general and the instrumental, permanent and concrete kind in particular (Frank et al., 1998). This chapter offers a personal view on the issues raised at the meeting held in Nafplion, Greece (22–25 May 1996) along with an attempt to develop a scheme for the study of temporal change in SSEUs. The issues raised at the meeting are synthesised here under the following headings: philosophical questions about the status of “spatial units”; the social justification of spatially and temporally discrete “spatial units”; methodologies for the design, creation and maintenance of SSEUs; the analysis of spatial and temporal series of data related to SSEUs; and spatial and temporal change in the boundaries and position of SSEUs. 27.2 PHILOSOPHICAL QUESTIONS ABOUT THE STATUS OF ‘SPATIAL UNITS’ Over the last few years philosophers have begun to consider the question of spatial and temporal reasoning afresh. One particular concern has been with the nature of entities in geographic space and whether they have any ontologically unique properties. Several views have been expressed on this question. Casati (1998) has been concerned with entities which are abstract in their defining “condition” stating that they are “causally sustained by process, but lack physical reality” and gives the “nation” as an example. Casati (1998) defines three kinds of abstract entity: things which share parts; things which are topologically connected with each other; and separate things which can be “at” each other and yet seem part of them. By contrast Smith (1995) was concerned to distinguish two kinds of explicitly geographic objects which have “the identity of existence”. Firstly, he defined “bone fide” objects as “discontinuities in physical reality”, and secondly, he defined all other objects as “fiat” objects which are created by human whim. Couclelis
340
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(1996) proposes another scheme suggesting that spatial units can be: intrinsically well bounded or not; dependent on whether the representation used produces well bounded units or not; and dependent on whether the user requires well bounded units or not. Frank (1998) (discussing the work of Cohn and co-workers; Cohn, 1995), applies the methods of mereology to the study of parts and wholes of geographic objects. He questions whether human concepts of space at the “table-top” scale apply also at the scale of “spatial units”. Eschenbach (1998) discusses the ontology of space employed by spatial units noting that SSEUs require absolute geographic referencing in order to define their shape and location. She points out ways in which the identity and mereology of instrumental SSEUs are frequently ambiguous and complex as in the case of the changing definitions of Berlin and “Germany’s capital” during the twentieth century. In general it might be stated that there is a gulf in the theory of spatial units between those defined above as SSEUs (since they are largely instrumental, permanent and concrete) and the wider and less well studied category of symbolic, transient and diffuse spatial units. The difficulty appears to lie in defining an “identity criterion” for spatial units which covers both boundary drawing and aggregation processes. One preliminary attempt at such a definition which can be suggested here is that a spatial unit is a “concept of space that reproduces the same understanding or behaviour regarding the spatial unit in a large heterogeneous group of people”. Using this approach stronger definitions of spatial units (e.g. SSEUs) can be made when the mutual understanding of a spatial unit is reinforced by more explicit concepts of space. These might include either widely used spatial representations (e.g. maps) or strong forms of consensus about physical features. 27.3 SOCIAL JUSTIFICATION OF SPATIALLY AND TEMPORALLY DISCRETE SPATIAL UNITS Persistent units of space or aggregations of clustered entities appear to exist because they are conceived of and sustained by people living in structured societies and because human beings appear to reproduce their experiences of small scale space at larger scales. Hence, on the one hand instrumental spatial units (SSEUs) affect people’s lives by affecting their residency, entitlements and recourse to the law within crisply defined zones while on the other hand symbolic spatial units may induce conformity to/membership of some collectively held belief in a more diffuse region. As a consequence it is clear that spatial units of all kinds are “socially produced” by diverse forms of social relations both formal and informal. Geographical research has developed a number of theories of spatial relations in the twentieth century, beginning with positivistic models of the use of space such as the agricultural zonation models of von Thünen that related the accessibility of farms to markets for their produce. In the last 20 years the predominant theorising in the discipline of Geography (Johnston, 1991) has become anti-positivistic by seeking to contradict the view that some kind of universal physical geographic reality is deterministic with respect to human behaviour (Raper, 1996). From this more recent perspective space is seen as socially structured through non-spatial networks and socially produced by economic, political and social transactions (Smith, 1979). In this broadly humanistic view “spatial units” have no explanatory power per se in studying variations of the behaviour or status of populations. This view therefore suggests that SSEUs exist despite social patterns of behaviour rather than because of them. Aitken (1998) points out that space is not transparent- hence, not everything seen is meaningful, not everything meaningful is seen. In other words the physical boundaries of the SSEU may not induce dependent social responses whereas social interactions may directly create spatial differentiation across space and time (Livingstone and Harrison, 1980).
SPATIAL AND TEMPORAL CHANGE IN SPATIAL SOCIO-ECONOMIC UNITS
341
Yet clearly SSEUs do exist as explicit social constructs when, for example, social behaviour is adapted to the physical constraints of boundaries because of the changes in property taxes between SSEUs. Similarly, social behaviour is adapted to inclusion in the ‘membership’ of particular SSEUs such as electoral constituencies which have been defined as spatial aggregations of people. Openshaw and Alvanides (1998) argue that SSEUs are made necessary by the fundamental scale limitations of visualising collective social characteristics and privacy considerations. Salvemini (1998) gave an example of the problems of accommodating illegal construction without planning permission within a set of SSEU boundaries and showed how such illegal action could force the redesign of instrumental SSEUs. Hence, SSEUs must be both monitored as they are and made more constitutive of social process, for example through the incorporation of spatio-temporal change or hierarchical structuring (Aitken, 1998). Raper (1998) suggests that some forms of social behaviour can be regionalised (echoing Giddens, 1984) and suggests the use of GIS in seeking space-time conjunctions. 27.4 METHODOLOGIES FOR THE DESIGN, CREATION AND MAINTENANCE OF SSEUS Most developed countries have found the creation and maintenance of SSEUs to be essential for at least postal and census purposes. In many of these countries the modern SSEUs have been based on historical regions such as the French Departments or communes or they have been created by dividing them. However, historical regions have often been overwhelmed by recent population growth or changes in the shape and extent of cities making it necessary to create new SSEUs or redesign the old ones. Reis (1998) discusses the methods available to construct SSEUs from geometric primitives such as those describing buildings or roads that are available as the descriptions of settled landscapes. He reviews the range of possible transformations from points, lines, (other) areas and surfaces to areal SSEUs and assesses the methodologies available. He considers the special problems of creating areal SSEUs when there are only linear sources such as postal delivery routes or street centrelines (as in Portugal). The need to exhaust space when creating such units adds difficult constraints to most computational procedures. By contrast Openshaw and Alvanides (1998) discuss the problems of designing sets of SSEUs for the purpose of socio-economic analysis. As SSEUs are discrete in nature (since continuous representation of discrete people or aggregations of people is impossible) and must be areal geometrically, the biggest SSEU design challenge is the modifiable areal unit problem (MAUP). Openshaw and Alvanides (1998) suggest three strategies for dealing with the MAUP: first, no (re)aggregation of the data at all (use all the data on all the SSEU’s at the finest level); second, explicitly re-engineer units as needed, justifying the exercise; or third, create units using frame free methods. Since the first and the third are usually inappropriate, Openshaw and Alvanides (1998) argue that the explicit engineering of units is acceptable when such methods are transparent to users. In this context zone design is a pattern detector: Openshaw and Alvanides (1998) characterised resistance to the “re-engineering” approach as “zoneism” (the unreasonable use of traditional zones)!
342
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
21.5 THE ANALYSIS OF SPATIAL AND TEMPORAL SERIES OF DATA RELATED TO SSEUS Once SSEUs are defined a record of changes to their boundaries must be kept if the records which are linked to them are to be analysed through time. In a number of cases extremely long records of social change can be studied from such boundary change records as in Pred’s study of land parcel reorganisations in southern Sweden (Pred, 1986). Ryssevik (1998) considered the problems of analysing time series of national socio-economic datasets collected for SSEUs in Norway which change in shape and position through time. He discusses the standardisation of the units through time using areal interpolation. Gautier (1998) discussed the dynamics of land use change in rural areas using the “landscape element” which was defined as “the smallest spatial unit with a uniform structure and function”. Gautier (1998) described three methods to study their change through time and space: firstly, the “transition-state” method which focuses on the mapping of successive states of the landscape using aerial photography; secondly, the “historical land-use events” method which aims to discover the events causing the change of landscape elements; and, thirdly, the “functional model” method which aims to link function matrices to evolution scenarios. By forming a two by two matrix with non-spatial/non-spatial change on one axis and state succession/dynamics on the other, Gautier (1998) is able to classify change in landscape elements into: land use at a date; unit shape at a time; function at a date; and chronology of events at spatial units. 27.6 SPATIAL AND TEMPORAL CHANGE IN THE BOUNDARIES AND POSITION OF SSEU’S. Given the range of applications described above for SSEUs there are clearly a variety of implementational issues to resolve in the storage and analysis of SSEU’s in databases. Frank (1998) characterises the “life and motion” of SSEUs and presents a preliminary typology. He suggests a general methodology of analysis, beginning with prototypical situations and moving to more complex and realistic ontologies. Kavouras (1998) suggests that SSEUs undergo change of: location, shape, topology, motion type, order/rank and nature. Several attempts have been made to specify the concepts and implementations required for the handling of SSEUs in GIS and databases. Libourel (1998) studies the requirements that spatio-temporal change places on database design and argues that the definition of the object (SSEU) identity must be the key design principle at the schema level. Cheylan (1998) proposes a classification of spatial unit dynamics by forming a two by two matrix with spatially constrained/unconstrained units on one axis and permanent identifier/temporalised unit identifier on the other. This classification generates four forms of spatial unit dynamics: 1. 2. 3. 4.
spatially constrained, permanent identity; spatially unconstrained with a permanent identity; spatially constrained, though varying identity through split/merge through time; and spatially unconstrained, temporalised identity where units completely change.
He points out that GIS only handle cases 1 and 2 satisfactorily at present. Worboys (1998) attempts a preliminary model of spatio-temporal change in an object-oriented context. In this scheme the object model must be extended to support events with attributes, event-event relationships,
SPATIAL AND TEMPORAL CHANGE IN SPATIAL SOCIO-ECONOMIC UNITS
343
processes which are temporally composite events and the integration of these concepts with spatial referencing. This implies that progress towards a model of spatio-temporal change requires a richer model of change itself. 27.7 PRELIMINARY RESEARCH HYPOTHESES Further research in spatial units in general and SSEUs in particular seems overdue. Current GIS operations are predicated on a concept-poor ontology of spatial units which treats them as permanent and concrete. As a consequence of this lack of theory there are few if any methods for analysing relations between spatial units. Of particular current concern is that few approaches genuinely address the spatio-temporal identity of SSEUs despite their growing importance in governance and commercial activity. Some simple observations can be made about the potential for change in the types of spatial units described in Figure 27.1. The spatio-temporal behaviour for each of the boxes seems limited to the following: 1=Split/merge/move-constrained by adjacent units 2=Birth/death/split/merge/move/grow (not constrained)/ shrink (not constrained) 3=Birth/death/split/move/grow/shrink 4=Split/merge/move-constrained by adjacent units. Current analysis and GIS implementations are dominated by attention to box 1 type SSEUs to the detriment of all the other types. Further work on such spatio-temporal ontologies for spatial units is now required to progress this field. ACKNOWLEDGEMENTS I am grateful to Andrew Frank and Roberto Casati for their helpful comments on this chapter and to all of the participants of the European Science Foundation GISDATA programme meeting on “Formalising change in SSEUs” in Nafplion, Greece (22–25th May 1996) for stimulating conversations that greatly helped in the development of the ideas in this chapter. REFERENCES AITKEN, S. 1998. Critically assessing change: contriving space-time relations and constraining people, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. CASATI, R. 1998. On the structure of shadows, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. CHEYLAN, J-P. 1998. Time and spatial Database: towards a conceptual application and framework, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. COHN, A.G. 1995 The challenge of qualitative spatial reasoning, Computing Surveys, 27(3), pp 323–327. COUCLELIS, H. 1996. A typology of geographic entities with ill-defined boundaries, in Burrough, P. and Frank, A.U. (Eds.), Geographic Objects with Indeterminate Boundaries. London: Taylor & Francis, pp 45–56. ESCHENBACH, C. 1998. On identity and location of socio-economic units, Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis.
344
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
FRANK, A.U. 1998. Socio-economic units: their life and motion, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. FRANK, A.U., RAPER, J.F. and CHEYLAN, J-P (Eds.) 1998. Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. GAUTIER, D. 1998. Review and integration of three methods for the spatio-temporal analysis of the rural land-use dynamics, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. GIDDENS, A. 1984. The Constitution of Society. Oxford: Polity press. JOHNSTON, R.J. 1991. Geography and Geographers: Anglo-American Human Geography Since 1945. London: Edward Arnold. KAVOURAS, K 1998. Understanding and modelling spatial change, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. LEBOUREL, T. 1998. How do databases perform change? In Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis, LIVINGSTONE, D.N. and HARRISON, R.T. 1980. The frontier: metaphor, myth and model, Professional Geographer, 32(2), pp. 127–32. MARK, D.M and FRANK, A.U (Eds.) 1991. Cognitive and Linguistic Aspects of Geographic Space. NATO ASID 63, Las Navas del Marqués, Spain, 8–20 July, 1990. Dordrecht: Kluwer. MONMONIER, M. 1997. Cartographies of Danger: Mapping Hazards in America. Chicago: University of Chicago Press,. NUNES, J. 1991. Geographic space as a set of concrete geographic entities, in Mark, D.M. and Frank, A.U. (Eds.) Cognitive and Linguistic Aspects of Geographic Space. NATO ASID 63, Las Navas del Marqués, Spain, 8–20 July, 1990. Dordrecht: Kluwer, pp. 9–33. OPENSHAW, S. and ALVANIDES, S. 1998. Designing zoning systems for representation of socio-economic data, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. PRED, A. 1986. Place, Practice and Structure: Social and SpatialTtransformation in S. Sweden 1750–1850. Cambridge: Polity Press. RAPER, J.F. 1996. Unsolved problems of spatial representation, in Kraak, M-J and Molenaar, M. (Eds) Advances in GIS Research 2 (7th International Symposium on Spatial Data handling, Delft, Netherlands, 12–16 August 1996). Delft: International Geographical Union, pp 14.1–11. RAPER, J.F. 1998. Spatio-temporal change in spatial socio-economic units, in Frank, A.U., Raper, J.F. and Cheylan, JP (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. RAPER, J.F., RHIND, D.W. and SHEPHERD, J. 1992. Postcodes: the New Geography. London: Longman. REIS, R. 1998. On the creation of socio-economic units from linear digital spatial data, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. RYSSEVTK, J. 1998. Dealing with boundary changes when analysing long-term relationships on aggregate data, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. SALVEMINI, M. 1998. Elementary socio economic units and city planning: limits and future developments, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis. SMITH, B. 1995. On drawing lines on a map, in Frank, A.U. and Kuhn, W. (Eds.). Spatial Information Theory. A Theoretical basis for GIS (Lecture Notes in Computer Science 988). Berlin/Heidelberg/New York: Springer, pp. 475–484. SMITH, N. 1979. Geography, science and post-positivist modes of explanation, Progress in Human Geography, 3, pp. 365–83. WORBOYS, M. 1998. An object-oriented model of motion and spatial change, in Frank, A.U., Raper, J.F. and Cheylan, J-P (Eds.), Formalising Change in Spatial Socio-Economic Units. London: Taylor & Francis.
SPATIAL AND TEMPORAL CHANGE IN SPATIAL SOCIO-ECONOMIC UNITS
345
Chapter Twenty Eight Spatio-Temporal Geostatistical Kriging Eric Miller
28.1 INTRODUCTION Natural phenomena, such as those found in hydrology, geology, meteorology and oceanography, are inherently four-dimensional in nature having characteristics represented in both space and time. A rising awareness of the complexities of natural phenomena is providing an impetus to understand spatial and temporal patterns associated with these processes (Miller, 1997). Effective analysis of these patterns requires several components including an accurate understanding of the given processes, an integrated set of tools for handling and organising large volumes of spatially and temporally referenced information, and detailed and accurate sampled observations. A formidable obstacle that stands in the way of this analysis, however, is the unfortunate realisation that, to date, none of these components fully exist. With regard to an accurate understanding of natural processes, Journel (1996) suggests that the extraordinary success of physics in the explanation of environmental factors has lead to a certain arrogance in earth sciences in the belief that with these laws and increased sampled observations all natural processes could be accurately predicted. Macroscopic laws, Journel continues, are most often experimental and shift the uncertainty of the process to residual terms or “fudge-factors” later to be evaluated on an ad-hoc basis. Geographic information systems (GIS) provide an integrated set of tools for handling and analysing large volumes of spatially referenced data and potentially provide a useful framework for analysing complex space-time processes. The current limitations of these systems, however, specifically with respect to the deficiencies of managing and analysing volumetric and temporal information, compromise their potential effectiveness for the analysis of space-time processes. These limitations and the importance of such systems are evident in a growing corpus of literature on this area of research including Raper and Kelt (1991), Turner (1992), Langran (1992), Peuquet (1994), Mason et. al. (1994), O’Conaill et. al. (1994), Mitasova et. al. (1995) and Miller (1997). Analysis of space-time patterns additionally requires detailed spatial and temporal observations. These observations, unfortunately, are generally difficult to collect at sufficient scales necessary to represent and understand complex natural phenomena. The data that are observed originates from many different measurement devices with varying precision which are taken over different volumes at differing temporal intervals (Journel, 1996). Typically, however, only a limited number of samples in both the spatial and temporal dimensions are available to assess complex, dynamic processes whose state variables change with respect to both time and space (Woldt and Bogardi, 1992). The reality of these obstacles, however, does not excuse the analyst from attempting to understand complex spatial and temporal patterns associated with natural processes. The ability to interpolate in both
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
347
space and time is assumed to be a necessary requirement for overcoming these obstacles and is the general focus of this research. Specifically, this research is concerned with the theoretical development of spatiotemporal geostatistical kriging, an extension to regionalised variable theory, as a method for estimating unsampled values and variances in both space and time. This research additionally focuses on issues associated with the effective exploratory analysis of multi-dimensional data for the determination of spatiotemporal models of continuity necessary for the geostatistical process. 28.2 GEOSTATISTICS Very few earth science processes are understood well enough to permit deterministic modelling (Isaaks and Srivastava, 1989; Journel, 1996). Whereas the observed data may seem to exhibit continuity from point to point, the fundamental physical and chemical processes are generally so complex that they cannot be described by a tractable deterministic function. When this is the case, the observed data may be viewed as a regionalised variable (Isaaks and Srivastava, 1989). Thus there is a degree of process uncertainty about how the phenomenon in question behaves between sampled observations. In this case, the available sample data are viewed as the result of some random function which reflects this process uncertainty (Isaaks and Srivastava, 1989). The random function recognises the existence of fundamental uncertainties concerning complex processes and provides the analyst with tools for estimating values at unknown locations based on assumptions made concerning statistical characteristics of the phenomenon (Isaaks and Srivastava, 1989). Traditionally, statistics are generally based on independent variables which assume no continuity between observations and allow for no extension of each data value. In earth sciences, however, there often exists continuity between spatially “near” observations. Tobler (1979) observes, for example, that two data values close to each other in space are more likely to have similar values than two data values further apart. The recognition and analysis of these spatial correlations and the statistical approach to the estimation of attribute values at unsampled locations is often referred to as geostatistics. The geostatistical process is generally a two step, often iterative procedure. The first is the calculation of the experimental variogram and fitting a model to it. The variogram function describes the correlation between the sampled observations. Models of correlation can be seen as probabilistic complements for the many complex interdependent deterministic processes that relate attribute values (Journel, 1996). The next step is to use the correlational relationships between observations to solve for the cell, block or point estimates at unsampled locations. In this research the technique used for estimating unsampled locations is called kriging and will be discussed later in this paper. In geostatistics, the correlational relationships are statistical in nature and thus account not only for data uncertainty but also for the uncertainty in the relationships themselves (Journel, 1996). Geostatistical analysis is not a black box. The iterative component of the geostatistical process arises due to the analysts’ assessment (and perhaps refinement) of the “goodness” associated with the predicted values and its corresponding variance which are directly influenced by the correlation functions. 28.3 THE SPATIO-TEMPORAL VARIOGRAM One of the oldest methods of defining space dependency between neighbouring observations is through autocorrelation (Vieira et al., 1983). When the neighbouring observations are distributed n-dimensionally, ndimensional autocovariance functions may be used to ascertain the spatial dependency (Vieira et al., 1983).
348
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
However, when observed samples are not contiguous and interpolation between measurements is needed, a more adequate tool is needed to measure the correlation between measurements and the spatial continuity of the random function. This tool is defined as the variogram and is defined as: (28.1) in which E is defined to be the expected value of the set of random variables Z(vi) which are separated by a distance h (Journal and Huijbregts, 1978). With any single sample observation all that is known about the random function Z(vi) is one realisation. If estimated values are required for unsampled locations an intrinsic assumption concerning the random function is needed. The intrinsic assumption requires the constraint of stationarity over the random function. A random function is defined as stationary if for all pairs of random variables separated by a particular distance h, regardless of their location, have the same joint probability distribution (Isaaks and Srivastava, 1989). A random function may be considered intrinsic if: (28.2) where E is defined to be the expected value equal to a constant m for all samples inside of S. In geostatistical practice the adoption of the variogram function to represent the stationary random function satisfies the intrinsic hypothesis (Isaaks and Srivastava, 1989). Traditionally, the variogram function represents the stationary random function in a two-dimensional framework. Geostatistical analysis of a four-dimensional phenomenon, however, requires the extension of traditional geostatistical methods to include the third (vertical) and fourth (temporal) dimensions. The typical method used to extend two-dimensional geostatistical methods into the third dimension is to broaden the intrinsic hypothesis to include the vertical dimension (Journel and Huijbregts, 1978). Similar methods are used in this investigation to extend geostatistical methods to include time. Temporal extensions to geostatistical methods, however, must first take the special properties of space and time into account and these representations of space and time can be viewed in many ways (Peuquet, 1994). Views on space and time can be classified into what have historically been termed absolute and relative (Hawking, 1988). Peuquet (1994) defines absolute space-time as objective, space and time metrics are fixed and attributes are measured. Relative space-time is defined as subjective, attributes are fixed and space and time are measured. In this research, space and time are viewed in terms of the absolute and objective. It is assumed that the physical process is embedded in 4-D Euclidean space with the fourth dimension, time, being both continuous and orthogonal to the three spatial dimensions (Pigot and Hazelton, 1992). Given this, the random function Z*(vi) may then be considered intrinsic in both space and time if vi in Equation. 28.2 is now defined as a value defined in Cartesian four-space: [xi, yi, zi, ti] The spatio-temporal variogram may now be expressed identical to Equation 28.1; however, the separation distance hij is now calculated between any two observations defined in four-space. An example of this separation distance is shown in Figure 28.1. Like the traditional variogram, the four-dimensional spatial-temporal variogram can then be approximated by half the average squared difference between the paired data values: (28.3) where N(h) is the number of paired values, vi and vi, whose corresponding hij vector equals the lag vector h. The plot of β (h) verses h for a set of particular lag distances represents the variogram, and thus the continuity, of the particular process. With the assumption of the intrinsic hypothesis, the spatial-temporal variogram is independent with respect to any spatial or temporal location and consequently is dependent only on the distance or difference
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
349
Figure 28.1: The separation distance vector hij between two sampled observations referenced in an absolute space and time framework.
in separation between data points. “Distance” in this context is used rather loosely when dealing with spatially and temporally referenced data. A point may be sampled at time t0 and then again at time t1. There is no spatial difference in the sampled locations, however, the distance in this case is defined as the difference in the observed values over time. In this investigation, time is considered a direct and measurable extension to space. Thus, the variogram indicates over what distance, and to what extent, values at a given point influence values at adjacent points; and conversely therefore, how close together points must be for a value at one point to be capable of predicting an unknown value at another (Rock, 1988). With the additional representation of the temporal component, the variogram now indicates how close together sampled points referenced in both space and time must be for predicting unsampled spatial and temporal locations. 28.3.1 Multi-Dimensional Issues An assumption throughout this discussion of the variogram calculations so far has been that natural processes occur in isotropic space and the corresponding observed data, albeit sparse, are evenly distributed in all sampled dimensions. These assumptions, unfortunately, are often not true. Directional characteristics are often prevalent in natural processes. Also, sampling techniques often result in irregularly spaced sampled data. Multi-port well samples of subsurface water quality, for example, provide data that are distributed more densely in the vertical dimension than the horizontal dimension. Any geostatistical analysis attempting to assess the spatial and temporal variability of the data can be extremely difficult if there are not enough samples available. This is so because each variogram value requires a minimum number of sample
350
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
pairs for the variogram function to be stable and statistically significant (Journel and Huijbregts, 1978). The problems associated with the inference of statistical moments have been discussed by several authors (Matérn, 1960; Matheron, 1970; Myers, 1989; Rossi and Posa, 1990). Analysis that requires statistical inference will suffer from sparse information. These issues define an unfortunate dilemma; earth science observations are generally sparse and sparse data potentially compromise the accuracy of geostatistical analysis. In order to address (but in no means solve) this dilemma, several issues concerning the nature of the data must be incorporated into the geostatistical analysis. The analysis of spatial and temporal continuity evident in a data set is often a frustrating and difficult processes. Multi-dimensional data, sparse data samplings, statistically biased sampled data and irregular or anisotropic sampling locations are all problematic in assessing statistical correlation between sampled values. To facilitate this analysis, this investigation has developed several tools for interactively visualising and analysing multi-dimensional data in the attempt to ascertain dimensional continuity better. The details of these tools are outside of the scope of this chapter and can be found in Miller (1994), however, the following discussion reflects some of the issues associated with rationale behind the development of these tools. Filtering
Continuity of multi-dimensional data may be approximated by incorporating correlational models of each of the primary sampled axis (Miller, 1994). In the case of four-dimensionally referenced data, spatiotemporal correlation may be reflected by variograms representing each primary dimension: horizontal, vertical, and temporal. If correlation calculations rely solely on Euclidean proximity for all sampled locations, the variogram analysis may indicate an erroneous, more continuous, spatial correlation in a particular dimension than is actually present. For this reason it is important to identify the most “representative” observations that reflect the process in each dimension. For calculating the spatial correlations in the horizontal dimension, for example, it is important to filter out statistically biased data in the vertical dimension and identify the best representative cross-sectional area. Figure 28.2, for example, represents the horizontal filtering of point source observations from multi-port wells using the tools and data described in Miller (1994). In this case, all the values that are contained within a specified circle reflecting somewhat homogeneous subsurface characteristics within a thin vertical slice and small tolerance interval are used in the spatial variogram calculations. Additionally, Figure 28.3 represents the filtering along the vertical dimension used to assess the vertical spatial correlations. In this case, the horizontal component of this analysis has been drastically reduced and a much larger vertical slice and tolerance is taken into account. Temporal restrictions should additionally be imposed upon the filtering process in order to include the most representative samples in time for the desired spatial distribution. Only sampled observations that fall between both the spatial and temporal windows should be used to calculate the corresponding spatial horizontal and vertical variograms. For the accurate assessment of the spatial variograms, a small temporal interval should be utilised in order to minimise the potential biases. Correlations in time are calculated by restricting the horizontal and vertical filters to a small volume and increasing the temporal window of observations. Ideally, one would hold a single point in space constant and assess the temporal continuity by analysing the changes in its state variable over time. This temporal continuity would then be assumed representative over the entire process. In reality, however, this may or may not be the case, and thus the flexibility of providing configurable spatial filters is an important component for effective analysis.
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
351
Figure 28.2: Data filtering in the horizontal dimension. Directional and Anisotropic Considerations
Anisotropic characteristics are generally found in most earth science data sets. While dimensional filtering may be useful for masking dimensional differences in sampling distributions, additional mechanisms should be considered for incorporating the possible anisotropic nature of the processes. Both the anisotropic characteristics of the process and the sampled data need to be considered in variogram analysis and then reflected in the variogram functions. The degree of anisotropy may be determined though exploratory analysis of the sampled observations. Additionally, qualitative information, such as the orientation of subsurface geology or bedding planes, is generally useful for assessing the axes of anisotropic orientation (Isaaks and Srivastava, 1989). Figure 28.4, for example, represents a simple distribution of values with two different anisotropic models. The first is shown with an anisotropic model of zero (isotropic considerations) and thus all of the values within a sphere of influence would influence estimation of an unknown value. The second example shows the same distribution with an anisotropic model strongly influenced along the X dimension with less in the Y dimension and even a smaller influence in the Z dimension. This may be representative, for example, of a hydrogeological flow along a bedding plane. In this case, the values along this X axis would have a much greater degree of influence over an unknown value. Directional or anisotropic spatial continuity can be assessed and later incorporated into a stochastic solution. How the aniostropic factors may be assessed in the variogram analysis is found in the directional variogram discussion in the next section. How these anisotropic factors may be incorporated into the stochastic solution is discussed later in this chapter.
352
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 28.3: Data filtering in the vertical dimension.
Figure 28.4: Dimensional anisotropy and the influences over unknown locations.
With regard to temporal anisotropy, the concept of anisotropic characteristics evident in the temporal dimension is somewhat ambiguous. Anisotropic characteristics evident in the temporal dimension are most likely a reflection of spatial anisotropic characteristics changing over time. While this may indeed be a viable possibility, the conceptualisation, representation and implementation of these issues are at this time unclear. Additional research on these issues is required.
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
353
Approximate Variograms
The variogram function, for example, as defined in Equation 28.3, represents a summary statistic over the pairs of observations whose separation distance is exactly some defined value. If a large, and regularly gridded data set is available, this equation may suffice. If the data set is irregularly spaced, however, a more lenient representation of the variogram function may be required. This flexibility is incorporated by summing the pairs of observations whose separation distance is approximate rather than exact (Isaaks and Srivastava, 1989) and can be rewritten as: (28.4) where N(h)is the number of paired values, vi and vj, whose corresponding hij vector approximates the lag vector h plus or minus a particular lag tolerance. In this case, there are two distance parameters that are specified when analysing spatial continuity with an approximate variogram function: the lag vector and the lag tolerance. This flexibility is useful in that it allows for the inclusion of a range of observations and consequently provides a better understanding of the spatial and temporal continuity evident in relatively sparse and/or irregularly gridded sampled data. Figure 28.5 represents the approximate variogram functions for pairing values in both an isotropic and anisotropic model. The degree of approximation depends on the sampled data available. In this illustration for the omnidirectional pairing of values, any sample falling within the 5 meters in any direction with a tolerance of ± 1 meter is paired. For the directional pairing, an additional tolerance of 45 degrees along the X axis is specified. Directional pairing is used to reflect the anisotropic nature of the process. 28.4 SPATIO-TEMPORAL KRIGING The kriging method of interpolation, which is based on the theory of regionalised variables while using the degree of autocorrelation between adjacent samples, estimates values for any coordinate position within the domain measured without bias and with minimum variance (Vieira et al., 1983). There are several derivations of the kriging method of interpolation. Due to the point source sampling techniques used in the original investigation (Miller, 1994), the type of kriging defined as “ordinary kriging” has been extended to provide interpolative capabilities in the volumetric and temporal dimension and will be discussed in this chapter. Additional information concerning other kriging techniques including co-kriging, block kriging, and universal kriging may be found in Journel and Huijbregts (1978), Isaaks and Srivastava (1989), Olea (1975) and Matheron (1970). The estimation of an unmeasured value z*(vo) at a specific spatial and temporal location is accomplished using the weighted linear combination of the available samples: (28.5) where n is the number of measured values, z(vi) and β i are the corresponding weights attached to each measured value (Journel and Huijbregts, 1978). By taking z*(vo) as a realisation of the random function Z* (vo) and assuming stationarity, the estimator becomes: (28.6)
354
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 28.5: An illustration of the tolerances associated with an approximate variogram calculations for both opmnidirectional and directional calculations.
The weights β t, therefore must be determined before the estimation of the unmeasured value can be produced. The system of equations that ensure that the average for the model is zero and minimises the error variance is often referred to as the ordinary kriging system and can be written in matrix notation as:
(28.7)
In this equation, Č is defined as the variance matrix which describes the spatial and temporal continuity of the random function, the vector Č is defined as the weight vector, and μ is defined as the Lagrange parameter, which is used for converting a constrained minimisation problem into an unconstrained one (Edwards and Penny, 1982), and D is defined as the distance vector. The set of weights Č that will produce estimated values at unsampled spatio-temporal locations are directly dependent on the variogram functions chosen to represent the spatial and temporal continuity of the phenomenon. Once the correlational functions have been chosen, the Č matrix and the D vector can be built. The D vector on the right hand side of Equation 28.6 represents a weighting scheme similar to that seen in inverse distance approaches (Isaaks and Srivastava, 1989). This vector, however, contains a form of inverse distance weights in which the distance is not based on the geometric distance to the samples but rather upon statistical distance . The statistical distance that corresponds to the D vector is the distance based on the chosen variogram functions and the isotropic or anisotropic characteristics of the process. From this pattern of continuity, a theoretical function is then fitted to each dimensional variogram and used to derive “positive definite” results for any separation distance. In order to ensure that the set of kriging equations have a unique and stable solution, the left hand matrix Č in Equation 28.6 must satisfy the mathematical condition known as positive definiteness (Isaaks and Srivastava, 1989), and thus functions that
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
355
are known to be positive definite are used. The main theoretical variogram models that are known to be positive definite are: (28.8)
where C0, commonly called the nugget effect, represents a possible spatial discontinuity at the origin; C0 + C1, commonly called the sill, reflects the value for very large distances; and a, commonly called the range provides a distance which the variogram value remains essentially constant (Isaaks and Srivastava, 1989). Figure 28.6, for example, illustrates the dimensional variograms and corresponding theoretical functions for the data defined in Miller (1994). The anisotropic distance and direction associated with the spatial variograms can be incorporated into the kriging process by performing specified transformations to the separation vector h. In the sampled data above, for example, anisotropic considerations of 15 degrees in the X dimension and a slight 5 degree dip in the Y dimension reflect the anisotropic preferential flow of the region. The range, as defined by the distance to the asymptotic level of the dimensional variogram function, reflects this degree of anisotropy evident in the sampled data. In order to incorporate anisotropic distances, a transformation is performed to reduce all directional variograms to a common model with a normalised range. Each separation vector h, therefore, needs to be transformed so that the standardised model will provide a variogram value that is identical to any directional models for the pre-transformed separation distance. Any directional model along a particular dimension with a range of ad can be reduced to a standardised model with a range of 1 by replacing the separation distance of the corresponding dimension hd by a reduced distance ad / hd . The transformation of the standardised model can be written in matrix notation as defined by: (28.9) where ax, ay, az and at are the ranges of the anisotropic distance models along the coordinate axes x, y, z and t. Anisotropic direction can be similarly incorporated in the kriging estimation process by once again performing a transformation on the separation vector h. The transformation of spatial anisotropic distance vectors can be rotated around a three-dimensional coordinate system as a function on two rotational angles. Given the set of anisotropic vectors (AX, AY, AZ), the rotational transformation matrix R can be defined by two angles of rotation which correspond to basic trigonometric operation in Cartesian space. The first angle of rotation ø is defined as the clockwise rotation around the Z axis resulting in the new vectors (AX’, AY’, AZ’). The second angle of rotation β is defined as the clockwise rotation around the new ,AY' vector forming the new set of vectors (AX”, AY”, AZ”), The transformation of a three-dimensional coordinate system can be defined by two angles of rotation and can be defined in matrix notation as: (28.10)
356
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 28.6: Spatial (horizontal and vertical) and temporal dimensional variograms and corresponding theoretical functions.
Thus given a data coordinate system defined by the (X, Y, Z) axes and an anisotropic coordinate system defined by the (AX", AY", AZ") axes, the transformation matrix R as defined by Equation 28.9 will transform any vector h in the data coordinate system to h” defined in the anisotropic data system. The anisotropic variogram model can then be correctly evaluated using the vector h”.
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
357
The statistical distance for any lag vector can be defined by the transformation that incorporates both anisotropic distance and direction. By utilising Equation 28.8 which define the anisotropic distance transformation, and Equation 28.9, which define the anisotropic directional transformation, the normalised lag vector hn can be written as: (28.11) Therefore for each 0 calculation of the D vector in Equation 28.6, the magnitude of the corresponding hn vector is used as input to the theoretical variogram function to reflect the anisotropic characteristics of the observed process. Similar to the anisotropic transformations performed on the D vector, the Č matrix also records the distance in terms of statistical distance rather than geometric distance providing the ordinary kriging system with information on the clustering of the available samples (Isaaks and Srivastava, 1989). If two samples are close together in space and time, this will be represented by a large entry in the Č matrix. Two values far apart, consequently, will be represented by a small entry. The multiplication of D by Č –1 adjusts the raw inverse statistical distance weights in D to account for possible redundancies between the samples (Isaaks and Srivastava, 1989). The information on the distances to the various samples and the clustering between the samples is recorded in terms of a statistical distance, thereby customising the estimation procedure to both application specific patterns of spatial and temporal continuity and anisotropic characteristics. To solve for set of customisable weights Č that will produce unbiased estimates with the minimum error variances at a specified spatial and temporal location, both sides of Equation 28.6 are multiplied by the inverse of the variance matrix Č –1:
(28.12)
Utilising these weights, the resultant estimate for any given spatial and temporal location is: (28.13) where n is the number of measured values vi and β i are the corresponding weights. As mentioned earlier, another powerful mechanism of the kriging method of interpolation is the ability to gauge the accuracy of the estimates. By utilising these weights, the minimised estimation variance at any unsampled point in space and time is: (28.14) 2 where is the sill of the variance structure, β i are the corresponding weights attached to each measured value, α i0 are the variogram function calculation for the distance D vector, and μ is the Lagrange parameter. Utilising the resultant estimate and corresponding variance equations, any point defined within the constraints of a minimum-maximum bound box surrounding the sampled observations in space and time may be estimated at arbitrary resolutions.
358
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
28.5 CONCLUSION Effective analysis of earth science processes requires detailed spatial and temporal information. This information is generally difficult to collect at sufficient scales necessary to represent and understand complex natural phenomena. The methods presented here provide one possibility for effectively interpolating data, as well as the degree of uncertainty associated with the estimation, using spatio-temporal geostatistical kriging. Discovering the spatial and temporal models of continuity necessary for the geostatistical process, however, is often a frustrating and difficult process. Multi-dimensional data, sparse data samplings, statistically biased sampled data and irregular or anisotropic sampling locations are all problematic in assessing statistical correlation between sampled values. Tools designed for the exploratory analysis of multi-dimensional data and geostatistical kriging provide a powerful coupling for the discovery of spatio-temporal correlations and effective interpolation. One example of this coupling for the analysis, reconstruction and visualisation of a subsurface hydrological contaminate plume can be found at . Additional research regarding these techniques are required. Better representations of time, incorporation of stochastic imaging techniques (Journel, 1996), computational optimisation and parallelisation of the algorithms, and tighter couplings between these techniques and GIS are but a few of the many possible research directions. REFERENCES EDWARDS, C. and PENNY, D. 1982. Calculus and Analytical Geometry. Englewood Cliffs, NJ: Prentice-Hall. HAWKING, S. 1988. A Brief History of Time. Bantam Books. ISAAKS, H. and SRI VASTAVA, R. 1989. An Introduction to Applied Geostatistics. Oxford: University Press. JOURNEL, A. 1996. Modeling uncertainty and spatial dependence: stocastic imaging, InternationalJournal of Geographic Information Systems, 10(5), pp. 517–522. JOURNEL, A. and HUIJBREGTS, C. 1978. Mining Geostatistics. London: Academic Press. LANGRAN, G. 1992. Time in Geographic Information Systems. London: Taylor & Francis. MASON, D., O’CONNAIL, M., and BELL, S. 1994. Handling four-dimensional geo-referenced data in environmental GIS, International Journal of Geographic Information Systems, 8. MATÉRN, B. 1960. Spatial Variations, Lecture Notes in Statistics 36. Berlin: Springer Verlag. MATHERON, G. 1970. La théorie des variables régionalisées et ses applocations. Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau. Fascicule 5. Paris: École Supérieure des Mines de Paris. MILLER, E. 1994. Volumetric reconstruction of non-point source contamination flow utilizing four-dimensional geostatistical kriging, Unpublished Masters Thesis, School of Natural Resources, The Ohio State University. MILLER, E. 1997. Towards a four-dimensional gis: four-dimensional interpolation utilizing kriging., Innovations in GIS 4, London: Taylor & Francis. MITASOVA, H., MITAS, L.H., BROWN, W., GERDES, D. and KOSINOVSKY, I. 1995. Modeling spatially and temporally distributed phenomena: new methods and tools for GRASS GIS, International Journal of Geographic Information Systems, 9(4). MYERS, D.E. 1989. To be or not to be…stationary, Mathematical Geology, 21(1), pp. 347– 362. O’CONAILL, M.A., BELL, S.B. M. and MASON, D. 1994. Developing a prototype 4D GIS on a transputer array, ITC Journal, 1992(1), pp. 47–54. OLEA, R. 1975. Optimum mapping techniques using regionalized variable theory. Lawrence. Kansas Geological Survey, University of Kansas. PEUQUET, D. 1994. It’s about time: a conceptual framework for the representation of temporal dynamics in geographic information systems, Annals of the Association of American Geographers, 84(3), pp. 441–461.
SPATIO-TEMPORAL GEOSTATISTICAL KRIGING
359
PIGOT, S. and HAZELTON, W. 1992. The fudamentals of a topological model for a four-dimensional GIS, in Bresnahan, P., Corwin, F. and Cowen, D. (Eds.), Proceedings of the 5th International Symposium on Spatial Data Handling, Charleston, S.Carolina. RAPER, J.F. and KELK, B. 1991. Three-dimensional GIS, in Maguire, D.J., Goodchild, M.F. and Rhind D. (Eds.), Geographical Information Systems: Principles and Applications Vol 1: Cambridge: Longman/Wiley. ROCK, N. 1988. Numerical Geology. Berlin: Springer-Verlag. ROSSI, M. and POSA, D. 1990. 3-D mapping of dissolved oxygen in Mar Piccolo, Journal of Environmental Geological Water Sciences, 16(1), pp. 209–219. TOBLER, W. 1979. Cellular geography, in Gale S. and Olsson G. (Eds) Geography in Philosophy. Dordrecht: Reidel, pp. 379–86. TURNER, A. 1992. Three-dimensional Modeling with Geoscientific Information Systems. NATO Advanced Science Institute Series C: Mathematical and Physical Sciences. Dordrecht: Kluwer. VIEIRA, S., HATFIELD, I, NIELSEN, D. and BIGGAR, I 1983. Geostatistical theory and application to variability of some agronomical properties, HILGARDIA, 51(3), pp. 1–75. WOLDT, W. and BOGARDI, I. 1992. Ground water monitoring network design using multiple criteria decision making and geostatistics, Water Resources Bulletin, 28(1).
Chapter Twenty Nine Using Extended Exploratory Data Analysis for the Selection of an Appropriate Interpolation Model Felix Bucher
29.1 INTRODUCTION Over the past years, environmental scientists have been increasingly using spatial data and respective technologies for its processing, such as geographical information systems (GIS) and remote sensing technology. A major information type in environmental sciences as well as in spatial data processing is data describing continuous fields. Data describing (random) continuous fields consist of a series of samples at fixed spatial and temporal locations, e.g., as point values or as aggregations over certain areas and/or time intervals. One of the main challenges of using such field representations in analysis and modelling arises from the restrictions caused by the discretization that occurs when sampling a field. As a consequence, a transformation of the original (sampled) field representation into another field representation, for example, by interpolation, is often required before using the data in a particular application. Such transformations, however, are often one of the most critical tasks within the whole analysis: on the one hand, properly performing such transformations is at the very heart of reliable use of field representations in subsequent applications. On the other hand, such transformations might offer considerable challenge. This is particularly true for the appropriate selection and parameterization of the model being used for the transformation. As most spatial data users do not fully have all the skills and experience required to process spatial data reliably and efficiently, there has been an increasing awareness of the need to develop tools that support these functions. With this in mind this chapter presents some conceptual features as well as some implementation issues for a tool which aims at supporting data users in the selection of an appropriate spatial interpolation model. The following two sections point at the importance and at some of the problems of selecting a suitable interpolation model. Section 29.4 introduces extended exploratory data analysis (EEDA) as a strategy to support the decision-making process when selecting an interpolation model. This is followed by a description of features required in the decision-making process. In section 29.6, the implementation of this conceptual framework towards a decision support system where the decision is optimised through high human-machine interaction is briefly outlined, and followed by some conclusions.
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
361
29.2 THE SELECTION OF AN APPROPRIATE SPATIAL INTERPOLATION MODEL In many environmental as well as socio-economic applications, data describing continuous fields are a major type of spatial information. In order to get information about the behaviour of a real-world phenomenon, the continuous field has to be sampled. This sampling may be sparse, for example, at a few irregularly distributed sites, or it may be quasi-continuous, such as a remotely sensed image or census data collected for administrative spatial units. The sampling of a continuous field, however, always implies some discretization and thus results in some uncertainty about the spatial behaviour of the real-world phenomenon between the sampled locations. As a consequence of this discretization, the (sampled) original field representation often does not meet the specific data requirements of an intended application and thus needs to be transformed into another field representation (see Goodchild, Chapter 21, this volume). Examples of such transformations are interpolation, extrapolation and aggregation. Consider an original field representation consisting of annual mean surface temperature over Switzerland sampled at irregularly distributed sites. A numerical simulation of the potential evapo-transpiration might require the temperature information for each cell of a 1 km resolution grid. In order to prepare the temperature data for the numerical simulation, the original field representation has to be transformed into an alternative field representation by means of an appropriate spatial interpolation model. The remainder of this chapter focuses exclusively on spatial interpolation, but is in general also relevant for other types of spatial transformations, for example, extrapolation or aggregation. Spatial interpolation is defined as the procedure of estimating data values at unsampled sites within the area covered by measured or observed sites. Spatial interpolation is a problem that has occupied scientists from various disciplines for a long time. Over the past decades, some basic models have been introduced for spatial interpolation such as trend surface and regression models (Raining, 1990), thin plate splines (Mitasova and Mitas, 1993), kriging (Isaaks and Srivastava, 1989), conditional simulation (Deutsch and Journel, 1992), and inverse distance weighted models. Moreover, within each of these basic models, various sub-models have been developed for a wide range of potential spatial interpolation situations. As an example, kriging is a collection of several dozens of models which might respond quite individually to any particular interpolation situation. An excellent overview on available 2D-and 3D-kriging models and their behaviour is given in Deutsch and Journel (1992), and extensions to 4D-kriging models are presented by Eric Miller (Chapter 28 this volume). This has lead to a situation that is both comfortable and challenging: on the one hand, there is now an appropriate model for almost all potential interpolation situations. On the other hand, with the variety of interpolation models, the selection process is getting demanding and tricky. However, it is a truism that no one method will ever exist that will be able to account reliably for all potential applications of spatial interpolation (Lam, 1983). The selection of a model that is appropriate to a particular spatial interpolation situation is fundamental with respect to using the resulting field representation reliably in subsequent applications. More specifically, the selection of an interpolation model has strong implications on the representation of spatial patterns as well as on the accuracy of interpolated data. The former aspect is strongly related to the purpose of subsequent use of interpolated data: as an example, some interpolation models are more suitable for estimating wholemap features of the variable of interest (such as mean value, variance or quantiles) whereas other interpolation models provide better estimates for the extreme values. As a consequence, the use of an inappropriate interpolation model may significantly hurt the specific data requirements of subsequent applications, e.g., mapping a continuous field or using the data as input to a numerical simulation of
362
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
environmental processes. Again, no one method is optimum for all purposes and trade-offs in the selection of the interpolation model must be considered (Dungan et al., 1994). The latter aspect—implications of the selection of an interpolation model on the accuracy of interpolated data—is more related to the fact that each model poses some assumptions to the data being used. This means that even an advanced interpolation model may inalterably yield inaccurate results unless the input data conform to the assumptions of the model. As a consequence, each interpolation should be accompanied by careful considerations of how the accuracy of interpolated data was affected by the data assumptions inherent in the model. A good practise is to calculate some summary statistics of estimation errors for a few alternative interpolation models, and to select the model that works best. The calculation of estimation error statistics can be done with cross-validation (Isaaks and Srivastava, 1989) or bootstrapping (Efron and Tibshirani, 1993). 29.3 THE ROLE OF EXPLORATORY DATA ANALYSIS IN THE SELECTION PROCESS The decision-making process behind the selection of an appropriate interpolation model must be based on various criteria. These criteria include the real-world behaviour of the field under consideration, characteristics of the sampled field representation, and their mutual relationships. Moreover, properties of the various interpolation models, purpose of the interpolated data and many other factors influence the selection and parameterization of an interpolation model. Taking all these dependencies into account leads to a very demanding selection process which is ideally based on a considerable body of information (Vckovski and Bucher, 1996). Much of the required information, however, is only implicitly represented in the data, and thus has to be derived from explicit information which includes the data itself and metadata. The selection of an appropriate interpolation model ideally also includes knowledge that is usually neither explicitly nor implicitly represented in the data, such as information about the sampling design, measurement errors, and pre-processing of the data. However, since this knowledge is merely present outside the data producer’s domain, it is not considered in the proposed selection procedure. The extraction of implicit information is largely based on the examination of various data characteristics, such as magnitude and range of spatial autocorrelation, or distributional properties of the field representation being interpolated. The totality of tests for the derivation of the implicit information required for a particular task, such as the selection of an interpolation model, is often referred to as exploratory data analysis (EDA) (Haining, 1990; Tukey, 1977). However, a reliable EDA for the acquisition of the implicit information needed for the selection of an appropriate interpolation model is a demanding procedure and may cause various problems, including: • Reliably examining data characteristics requires a considerable statistical background, such as knowledge about the relevant criteria, and test procedures that may yield the required information for the selection of an appropriate interpolation model. • The interpretation of test statistics is often not simple and clear. • Most test statistics are not robust or resistant, i.e., they fail under the presence of certain data characteristics and/or (spatial) outlier values.
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
363
Such statistical background—and the awareness of the importance of EDA—is often lacking within the community of data users. Beyond that, the problems around the selection of an appropriate interpolation model are often reinforced by the fact, that current software tools for spatial data processing usually provide only a few and quite simple interpolation models that might not be adequate for covering the whole range of potential spatial interpolation situations (see Cheesman and Petch, Chapter 14, this volume). In fact, this is particularly true for most of commercial and widespread GIS software. To conclude, it would be very helpful to have a tool, that: points to the variety of available interpolation models; and provides support to select an appropriate model with respect to the particular interpolation situation. 29.4 EXTENDED EXPLORATORY DATA ANALYSIS The previous section has pointed to the relevance and potential problems of EDA for the selection of an appropriate interpolation model. In order to overcome problems of conventional EDA procedures, a standardised specification of EDA steps needed at the various stages of the selection process would be very helpful. Extended exploratory data analysis (EEDA) (Bucher and Vckovski, 1995) is a standardised procedure that supports the derivation of the required implicit information and which navigates data users through the decision-making process that is behind the selection of an appropriate interpolation model. EEDA is based on two principles: • Formalization: In order to open reliable and straightforward interpolation even to inexperienced users, the required statistical knowledge is formalised by means of selection rules. These selection rules incorporate data characteristics relevant to the selection of an interpolation model, suitable test procedures for evaluating the respective data characteristics, and knowledge on how to interpret the results of these test statistics. • Structuring: In order to facilitate data users navigating through the decision-making process, the selection rules are incorporated in an overall structure, e.g., in a decision tree. At each node of the decision tree, the user is provided with the formalised statistical knowledge as described above, thus supporting the user’s correct navigation through the decision-making process. On the way through the decision tree the user is required to examine all relevant data characteristics in a logical order, ending up with a proposition for an appropriate interpolation model. A promising feature of formalising statistical knowledge would be to facilitate a scoring of available interpolation models with respect to their appropriateness for a particular application. Such a scoring could be based on probability values resulting from test statistics used for the diagnostics of presence or absence of data characteristics relevant for the selection of an appropriate interpolation model (under a fixed significance level). The scoring would allow direct comparisons of available models within a particular interpolation situation and could be an important input in cost-benefit studies, i.e., whether it is worth to use a more sophisticated but demanding interpolation model. In the following section, some features of a procedure for the selection of an appropriate interpolation model are proposed.
364
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
29.5 A PROCEDURE FOR THE SELECTION OF AN APPROPRIATE INTERPOLATION MODEL The literature offers some case studies that compare the performance of alternative interpolation models in a particular application (e.g., Dubrule, 1984; Dungan et al., 1994; Englund, 1990; Leenaers et al., 1990; Ripley, 1981). However, a compilation of the results of such case studies to an overall selection procedure is still missing. In the following, some features of a procedure for the selection of an appropriate interpolation model are proposed. As mentioned above, the selection criteria are operationalized by means of implicit information, i.e., relevant data characteristics. Beyond that, appropriate test statistics have to be defined to be able to evaluate properly these characteristics for the data being interpolated. 29.5.1 Key Data Characteristics for the Selection Procedure Basically, spatial interpolation is a two-step procedure: in the first step, a model of the spatial variation has to be selected and fitted to the set of sampled values (parameterization). The second step consists in calculating values at unsampled sites by means of the model of spatial variation. The problem of interpolation is thus mainly a problem of selecting an appropriate model of spatial variation (Burrough, 1986). Regionalized variable theory (Matheron, 1971) provides an useful starting point to define and operationalize some key criteria for the selection of an appropriate model of spatial variation for the data being interpolated. This theory which has much influenced geostatistics, states that the spatial variation of any data can be modelled as the sum of three major components. These components are: a large-scale or first-order variation; a random but spatially correlated (small-scale) second-order variation; and a random noise or residual error term. For the selection of an appropriate model of spatial variation, only the first two components are relevant. The random noise component, however, provides a suitable means to control whether the first two components have been correctly modelled, i.e., the third component should always be a spatially uncorrelated Gaussian term having zero mean and constant variance α 2. So far, to select a suitable model of spatial variation, and the respective interpolation model, the presence or absence of first-order and second-order variation has to be determined carefully. The evaluation of the first-order variation is mainly concerned with testing whether the expected value of the (random) continuous field is constant over the complete dataset, i.e., stationary, as is required by certain interpolation models. The evaluation of second-order variation is about testing the magnitude and range of spatial dependence present in the data, i.e., spatial autocorrelation, and how this autocorrelation might be properly modelled. Consider the family of regression models: in the presence of second-order variation, it has to be decided whether the autocorrelation is represented by means of spatially lagged explanatory variable(s) or a lagged response variable or by means of an additional spatially correlated error term. Many available interpolation models, however, explicitly incorporate secondary data to model more reliably the first-order, or the second-order spatial variation, respectively. Examples of the former type are regression models or some forms of universal kriging, and co-kriging for the latter type of interpolation models. As a consequence, a third key criteria for the selection of an appropriate interpolation model is suitability of secondary data for modelling spatial variation. The evaluation of this criteria, however, must be done such that the suitability of secondary data for representing either order of spatial variation is carefully distinguished.
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
365
Figure 29.1: Discrimination of interpolation models by means of key data characteristics
To conclude, the selection of an appropriate model of spatial variation is based mainly on three key data characteristics: presence/absence of first-order and second-order variation, and the suitability of secondary data. Figure 29.1 shows the respective three-dimensional attribute space and how the properties of a dataset with respect to these key data characteristics can be used to discriminate between a set of interpolation models. 29.5.2 Test Statistics for Evaluating Key Data Characteristics The second step in the design of a procedure for the selection of an appropriate interpolation model is concerned with choosing suitable test statistics for the evaluation of the above mentioned key characteristics of data being interpolated. To evaluate properly the key data characteristics, these test procedures should ideally cover the following requirements: • Robustness: Many test statistics fail under certain circumstances, i.e., the diagnostic power of test statistics suffers from the presence of “disturbing” data properties, such as deviations from Gaussian distribution, heteroscedasticity, autocorrelation, multicollinearity, or (spatial) outliers. Statistically speaking, under the presence of disturbing data properties, the theoretical distribution of test statistics under the null hypothesis is distorted. As a consequence, the null hypothesis, e.g., no autocorrelation,
366
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
might be significantly more or less likely to be rejected. Therefore, the data being interpolated should be tested beforehand against the presence of such disturbing data properties (see Section 29.5.3). If such presence is evident, respective measures have to be undertaken, such as data transformations, removal of outliers, or use of alternative, i.e., resistant or robust test statistics. Over the past decades, so-called resistant and robust test procedures have been developed for a wide range of statistical test situations (see Cressie, 1991; and Haining, 1990). The distinction between robust and resistant methods is not very clear, and these terms are sometimes used interchangeably. After Haining (1990), a test procedure is called resistant if its behaviour is insensitive to a small number of extreme values, whereas a robust test statistics is used when underlying distributional assumptions have been violated. However, in order to keep the selection procedure straightforward, test statistics should be chosen such that they are a priori robust as much as possible • Transparency: The output of test statistics should be easily interpretable and should facilitate objective decisions. This is realised for instance when a test statistic returns a value for the probability that the null hypothesis could be rejected. Such probability values could also be a suitable means for the scoring of the relative appropriateness of interpolation models as proposed in Section 29.4. • Straightforward: The evaluation of key data characteristics should be based on test procedures that are as little demanding as possible with regard to time spent in performing the test and understanding its proper functioning. Beyond that, respective test routines should be available for a reasonable range of software packages and hardware platforms. However, for more sophisticated test procedures, for example, for the majority of robust and resistant test statistics, this is far away from being realised. • Locally Applicable: A common feature of the majority of test procedures is that they are applied globally, i.e., they return a unique statistic for the complete dataset. In many instances, however, the assumption of a unique interpolation model that holds over the complete dataset is not tenable. Instead, spatial heterogeneity may be present and a spatial subdivision with individual interpolation models for each region might be preferable. In order to become aware of such situations, test procedures should be provided that facilitate the evaluation of key data characteristics on a local level. Examples of local statistics are moving-window statistics, and several of the G statistics (Getis & Ord, 1992) or the local Moran statistic (Anselin, 1995) for the evaluation of spatial autocorrelation. There is now a wide range of test procedures for each of the three key data characteristics that may be more or less suitable for evaluation. No one of these tests, however, covers all four requirements thoroughly. One of the reasons is that some of the requirements might be complementary up to a certain degree. Consider the evaluation of a key data characteristics by visualisation. The resulting statistical plot might be quickly derivable and interpretable, i.e., intuitively, and its generation might be facilitated by most statistical software packages. On the other hand, its interpretation might be at least partially subjective, because the visualised data property is not fully robust against disturbing data properties, and because there is no calculation of a test statistic that could be piped into a significance test. Table 29.1 gives an overview of test procedures, and some of their properties with respect to their suitability for the evaluation of key data characteristics according to the above mentioned requirements. This table is far from being comprehensive both with respect to the available test procedures, and the criteria to be considered for the definitive choice of test procedures. However, Table 29.1 highlights two key issues: • To provide the flexibility required for a wide range of potential interpolation situations, each node of the decision tree must be designed such that several test procedures, both visual and analytical, are available.
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
367
It is now a requirement of our current research to choose a suitable combination of test procedures for each node by means of empirical testing. • It is obvious that most conventional test procedures fail under the presence of one or more disturbing data properties, i.e., they are not robust. As a consequence, the presence of potentially disturbing data properties has to be reliably tested as well to evaluate key data characteristics. The presence of disturbing data properties has various consequences on the evaluation of key data characteristics. On the one hand, it has to be decided whether to use conventional test procedures after first editing the data (e.g., removing outliers or autocorrelation, transforming the data to normalise, or divide the data into several sub-populations), or to use resistant and/or robust test procedures which are generally more demanding and complex. Table 29.1:Conventional test procedures for the evaluation of key data characteristics Test procedure
Key data feature Test output
Comments and restrictions
Disturbing data properties
More robust alternatives
mean polishing
1st-order variation
decomposition of data into trend and residuals
4, 6, 7
median polishing
trend surface analysis
1st-order variation
significance value for each coefficient
moving window statistics
1st-order variation
semi-variogram
2nd-order variation
mean value per window; visual interpretation of contour maps parameters and fit of model for autocorrelation
Moran’s I statistics
2nd-order variation
significance value for dataset
Gi*(d)/ Gi(d) stats.
2nd-order variation
significance value for each data value
less suited for non-lattice data; may provide indication of order of trend surface autocorrelation in OLS-residuals may indicate too low order of polynomials results strongly dependent on selected window size intransitive model types -> 1st-order variation; large nugget value -> no autocorr.; range and anisotropic behaviour derivable results strongly dependent on the hypothesis used only applicable for positive data values with natural origin
1, 2, 3, 4, 5, 7
4, 6, 7
1, 2, 4, 6, 7
(1), 2, 4, 8
4, 7
overlapping windows; calculation of medians relative semivariogram types; madograms; rodograms
368
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Test procedure
Key data feature Test output
Comments and restrictions
Disturbing data properties
More robust alternatives
Lagrange Multiplier Test
2nd-order variation
significance value for different forms of autocorr.
2, 4, 7, 8
robust LM-tests available but rarely implemented
correlogram
2nd-order variation
significance value per lag distance
added variable plot
2nd-order variation
Pearson correlation coefficient
suitability of secondary data
statistical plot; correlation coefficient for point cloud significance value
only applicable for residuals of trend surface and regression models often used when severe forms of non-stationarity are assumed visual equivalent of Lagrange multiplier tests
1,3,4,5,7
cross-semi variogram
suitability of secondary data
large value may be artefact of identical spatial structures see variogram
cross-hscatterplot
suitability of secondary data
visual equivalent of crossvariogram when applied for a series of lag distances
1, 2, 4, 6, 7
parameters and fit of model for cross-autocorr. statistical plot; correlation coeff. for point cloud
(1), 4, 7, 8
2, 4, 7, 8
1, 2, 4, 6, 7
adjusted statistics available but rarely implemented relative crosssemi variogram types
Legend of main data disturbing properties: 1) non-normality, 2) heteroscedasticity 3) multicollinearity, 4) outliers, 5) spatial autocorrelation, 6) clustered spatial configuration of point data, 7) small sample size, 8) inappropriate spatial weight matrix
On the other hand, the presence of disturbing data properties might even influence the logical order of evaluation of key data characteristics. The following section focuses on the evaluation of distuibing data properties, and on measures that have to be undertaken when their presence is evident. 29.5.3 The Evaluation of Disturbing Data Properties Table 29.1 clearly shows that most conventional test procedures for the evaluation of key data characteristics might fail under certain circumstances, i.e., under the presence of one or more disturbing data characteristics. Depending on whether such disturbing data properties are present and to what degree, the diagnostic power of these test procedures might be significantly reduced. There is now a vast literature on the findings of experiments about the properties of test procedures under many situations. In parallel, over the past years, various alternatives to conventional test procedures have been proposed that are apparently less sensitive to the usual disturbing data properties. Many properties of both conventional and alternative test procedures, however, remain fuzzy and are still the subject of extensive research. Therefore, a
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
369
considerable body of additional statistical knowledge is required to evaluate key data characteristics. This knowledge is derived by comparing the results of a series of pre-tests summarising disturbing data properties with the assumptions made by the test procedures used for the evaluation of key data characteristics. The presence and severity of departures from these assumptions may have strong impacts on the evaluation procedure of key data characteristics. Four of the most prominent disturbing data properties, and their impacts on the evaluation of key data characteristics are the following: • Non-normality: Many statistical test procedures depend on distributional assumptions and in particular on normality, i.e., they assume data to be Gaussian. There are now many ways to test for normality, e.g., more visual or strictly analytical test procedures. However, when non-normality is evident, the reasons for the departure from this assumption have to be assessed as well to define an appropriate reaction. A data distribution with several distinctive peaks may indicate the need to identify several sub-populations from the complete dataset. Alternatively, a linear relationship between local means and local variances, i.e., a proportional effect, is a good indication of log-normally distributed data, and a respective data transformation might be successful for subsequent analysis. In those cases where the reasons for departures from non-normality are not very obvious, the use of robust test procedures might be the most appropriate reaction. • Outliers: The evaluation of outliers is concerned with the identification of atypical values. Such outliers may be errors arising from measurement or coding of data. The identification of atypical values, however, must be followed strictly by an assessment of whether these outliers are erroneous or are rather values that are different enough from the rest of data to distort the average properties of the complete dataset. In addition to these so-called distributional outliers, spatial data may also contain spatial outliers, i.e., data values which are significantly unusual with respect to neighbouring values. Spatial outliers may have serious destorting effects on the results of non-resistant test procedures of spatial properties, such as first-order or second-order spatial variation. Many test statistics for detecting spatial outliers are based on the so-called concept of leverage, i.e., the detection of observations that exert a (too) large influence on the general results of a test procedure. In general, results from outlier investigations may suggest the need for data editing or removal, or may indicate the use of resistant test procedures if the outliers are isolated, or robust procedures if the outliers are more continuous, e.g., a heavy tailed distribution. • Heteroscedasticity: A common property of spatial data is that the data values in some sub-regions are more variable than in others. The statistical term for such anomalies in the variance is heteroscedasticity. Many test procedures, however, only work properly under the assumption that the variance is roughly constant over the complete dataset. A well-known consequence of heteroscedasticity is the biased estimation of the error variance matrix of trend surface and regression models, i.e., models representing first-order spatial variation, and hence invalidity of test procedures against misspecifications of such models. There are now many tests available against heteroscedasticity. The choice among these tests in a particular application is strongly dependent on the presence of other disturbing data properties. As an example, in regression residuals analysis, the most common test is the Breusch-Pagan test. When the residuals are non-normal, however, the Koenker-Bassett test should be used instead. For situations where there is little prior information about the form of heteroscedasticity, a test developed by White is more appropriate, since it has the power against any unspecified form of heteroscedasticity (Anselin, 1992). • Spatial Autocorrelation: Although spatial autocorrelation was considered a key data characteristic, it may appear as a disturbing data property as well. The presence of spatial autocorrelation may affect the validity of a number of conventional test procedures, such as tests against heteroscedasticity or nonnormality, and standard misspecification tests of regression coefficients. As an example, a large value for
370
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
the Pearson correlation coefficient may be entiely an artefact of spatial dependence in the data, e.g., two variables with identical autocorrelation structures. As a consequence, the assumed suitability of secondary data for modelling first-order spatial variation may be misleading. However, over the past years, alternative test procedures with a certain robustness against spatial autocorrelation have been designed (Haining, 1990). These tests are either based on a preliminary removal of spatial dependence, or by taking it explicitly into account by means of some scaling factors. Many other disturbing data properties could be listed here as well, such as multicollinearity, anisotropy, spatial configurations of data sparsity or clustering, or small sample sizes, but it is clear that deriving the knowledge required for the assessment of key data characteristics is a demanding procedure. The complexity of this procedure is even enlarged by the fact that some disturbing data properties usually appear in combination, but should be evaluated in isolation. As an example, in regression analysis, model misspecifications and measurement errors may lead to both spatial autocorrelation and heteroscedasticity in the residuals (Anselin and Griffith, 1988). 29.6 A POSSIBLE IMPLEMENTATION OF THE SELECTION PROCEDURE The selection procedure proposed in this chapter may serve as a conceptual framework for a decision support system as well as for a digital tutorial for training a user in selecting an appropriate interpolation model. Here the focus is exclusively on the design of a prototype decision support system. It is important to note that this prototype is implemented such that the decisions are not made by the computer but rather based on a high human-computer interaction. Figure 29.2 shows a possible implementation of the proposed selection procedure. The core component of this prototype is called inference machine and is responsible for the communication with and the access to both the rule base and the method base. The rule base incorporates the selection rules, such as proposed in the selection procedure described above, i.e., the formalised statistical knowledge required for the selection of an appropriate interpolation model. The method base consists of statistical tests for the derivation of the required implicit information, and a set of interpolation models that covers a wide range of potential interpolation situations. Information about the characteristics and performance of available methods is also provided, i.e., as method metadata. A graphical user interface (GUI) is set on top of the inference engine to make the selection of an appropriate interpolation model a product of high human-computer interaction. This GUI might be implemented by means of a World Wide Web (WWW) browser: at each stage of the selection process, the user is first presented with an input form written in HyperText Markup Language (HTML). These input forms are designed such that requests on the method base required for the evaluation of the respective data properties are made properly. The input forms are enhanced with an on-line help which explains the fundamental concepts and theories. By submitting the input form, the web browser requests a web server which invokes appropriate test procedures included in the method base. The interaction between the web server and the method and rule base is mediated by means of a common gateway interface (CGI). The results of these CGI programs are then reformatted to a web page and returned via web server to the browser which renders them for display to the user. To keep the interactivity as high as possible, the results include additional input forms to accept further user input, along with the respective functionality, thus beginning the cycle anew. The conceptual framework behind such an implementation is sometimes called amplified intelligence (Schlegel and Weibel, 1995): the user is kept in the decision-making process and initiates and controls the
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
371
Figure 29.2: A possible structure of implementation of EEDA
test procedures required for the evaluation of the selection rules. Both interpolation models and statistical test procedures are embedded in an interactive environment and complemented by various supporting facilities, such as tools providing guidance on how to navigate correctly through a particular decisionmaking process, on-line help about each stage in the decision-making process, and other related information. 29.7 CONCLUSIONS This chapter presents some conceptual features as well as some implementation issues for a decision support system for the selection of an appropriate spatial interpolation model. Given that the problem of spatial interpolation is mainly a problem of appropriately selecting and parameterizing a model of spatial variation, three data characteristics are identified as main criteria for the decision-making process: magnitude and manner of first-order and second-order spatial variation in the data being interpolated, and suitability of secondary data to assist the modelling of either form of spatial variation. To evaluate properly these key data characteristics, the assessment of a variety of potentially unique or joint disturbing data properties has to be included in the decision-making process as well. The approach presented here lays less stress on novelty (indeed many of the proposed issues are far from being new) as on the integrated nature of the framework proposed. From this point of view, this research foresees several benefits, including: • The prototype system supports a more reliable and efficient selection of an appropriate interpolation model even for non-experts. • The prototype system promotes the use of interpolation models that are merely known but suitable for many interpolation situations. • The conceptual framework of extended exploratory data analysis, i.e., formalising and structuring of statistical knowledge, could also be applied to a variety of other tasks in spatial data processing.
372
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
• The proposed conceptual framework should contribute towards greater sensitivity among data users of the general usefulness of exploratory data analysis, and the suitability of available exploratory techniques in a particular application. Future research will concentrate on the final design of the presented conceptual framework, and particularly on the definition of the required range of test procedures for the derivation of implicit information under various situations. Beyond that, a prototype decision support system for the selection of an appropriate interpolation model based on the proposed concepts will be implemented. ACKNOWLEDGEMENT This research has been partially funded by the Swiss National Science Foundation under contract No. 50– 35036.92. The author would like to thank Andrej Vckovski for many fruitful and motivating discussions. REFERENCES ANSELIN, L. 1992. A Workbook for Using SpaceStat in the Analysis of Spatial Data. Morgantown: Regional Research Institute, West Virginia University. ANSELIN, L. 1995. Local indicators of spatial association—LISA, Geographical Analysis, 27, pp. 93–115. ANSELIN, L. and GRIFFITH, D.A. 1988. Do spatial effects really matter in regression analysis? Papers of the Regional Science Association, 65, pp. 11–34. BUCHER, F. and VCKOVSKI, A. 1995. Improving the selection of appropriate spatial interpolation methods, in Frank, A. and Kuhn, W. (Eds.), Spatial Information Theory: A Theoretical Basis for GIS. Lecture Notes in Computer Science. Berlin: Springer Verlag, pp. 351–364. BURROUGH, P.A. 1986. Principles of Geographical Information Systems for Land Resources Assessment. Oxford: Clarendon Press. CRESSIE, N. 1991. Statistics for Spatial Data. Chichester: John Wiley. DEUTSCH, C.V. and JOURNEL, A.G. 1992. GSLIB: Geostatistical Software Library and User’s Guide. New York: Oxford University Press. DUBRULE, O. 1984. Comparing splines and kriging, Computational Geosciences, 101, pp. 327–338. DUNGAN, J., PETERSON, D. and CURRAN, P. 1994. Alternative approaches for mapping vegetation quantities using ground and image data in Michener, W. (Ed.), Environmental Information Management and Analysis: Ecosystems to Global Scales. London: Taylor & Francis. EFRON, B. and TIBSHIRANI, R.J. 1993. An Introduction to the Bootstrap. New York: Chapman & Hall. ENGLUND, E.J. 1990. A Variance of Geostatisticians in Mathematical Geology 22(4), pp. 417–455 GETIS, A. and ORD, J.K. 1992. The analysis of spatial association by use of distance statistics, Geographical Analysis, 24(3), pp. 189–206. HAINING, R. 1990. Spatial Data Analysis in the Social and Environmental Sciences. Cambridge: University Press. ISAAKS, E.H., and SRIVASTAVA, R.M. 1989. An Introduction to Applied Geostatistics. New York: Oxford University Press. LAM, N.S. 1983. Spatial interpolation methods: a review, American Cartographer, 10, pp. 129–149. LEENAERS, H., OKX, J.P. and BURROUGH, P.A. 1990. comparison of spatial prediction methods for mapping floodplain soil pollution, CATENA, 17, pp. 535–550. MATHERON, G. 1971. The theory of regionalized variables and its applications, Les cahiers du Centre de Morphologie Mathématique de Fontainebleau. Paris: Ecole National de Superieure des Mines. MITASOVA, H. and MITAS, L. 1993. Interpolation by regularized spline with tension: i. theory and implementation, Mathematical Geology, 25, pp. 641–655.
EXTENDED EXPLORATORY DATA ANALYSIS TO SELECT INTERPOLATION MODELS
373
RIPLEY, B. 1981. Spatial Statistics. New York: Wiley. SCHLEGEL, A. and WEIBEL, R 1995. Extending a general-purpose GIS for computer-assisted generalization, in Proceedings of the 17th International Cartographics Conference. Barcelona: International Cartographic Association, pp. 2211–2220. TUKEY, J.W. 1977. Exploratory Data Analysis. Reading, MA: Addison-Wesley. VCKOVSKI, A. and BUCHER, F. 1996. Virtual data sets—smart data for environmental applications, in Proceedings of the Third International Conference on Integrating GIS and Environmental Modelling, Santa Fe, USA. http:// www.ncgia.ucsb.edu/conf/sf_papers/vckovski_andrej/vbpaper.html
Chapter Thirty Semantic Modelling for Oceanographic Data Dawn Wright
30.1 INTRODUCTION The traditional home of the geographical information system (GIS) in terms of managing, mapping and modelling spatial data and associated attributes, as well as spatial decision-making, has been in the landbased sciences and professions. However, within the past decade researchers have noted that the GIS and related technologies will become increasingly important in support of oceanographic and atmospheric research, particularly in the context of monitoring global change (see Goodchild, Chapter 21, this volume). This chapter describes research in collaboration with the Vents Program at the Pacific Marine Environmental Laboratory (PMEL) in Newport, Oregon, USA. PMEL conducts interdisciplinary scientific investigations in physical oceanography, marine meteorology, marine geology and geophysics, geochemistry, and related subjects under the auspices of the National Oceanic and Atmospheric Administration (NOAA). The Vents Program focuses on determining the oceanic impacts and consequences of submarine hydrothermal venting and has a strong marine geographical component (Fox, 1995; Hammond et al. 1991). Most of its efforts are directed at mapping specific geographical locations on the seafloor, and determining patterns and pathways for the regional transport of hydrothermal emissions, along with their relationship to the geology and tectonics of Northeast Pacific Ocean sea-floor-spreading centres. The understanding obtained from this relatively isolated system will eventually be extended to a prediction of the impact of sea-floor hydrothermal systems on the global ocean. The attainment of the overall program goal therefore requires a long-term interdisciplinary approach (Fox, 1995; Hammond et al. 1991). The integration of multidisciplinary data gathered from multiple sampling platforms for analysis and interpretation is of great importance in marine geography. The cost of acquiring these data alone justifies the development of dedicated systems for data integration and analysis (e.g., an oceanographic research vessel usually costs over $15,000-$25,000 a day to operate). Not only do a wide variety of data sources need to be dealt with, but a myriad of data “structures” as well (e.g., tables of chemical concentration versus raster images versus gridded bathymetry versus four-dimensional data, etc.). The synergy of different types of data provides the scientific community with more information and insight than that obtained by considering each type of data separately. Such an approach requires that the vast amount of information that will accrue be intelligently catalogued, as well as spatially and temporally coregistered. Scales of information range from hundreds of kilometres to millimetres, and decades to milliseconds. It is important that data are in such a format that the realtime information is available to the broad community within one fieldwork cycle (~1 year).
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
375
In this fashion all related and germane experiments performed in one year will be able to draw on previously obtained information and insights. Data quality criteria must be established and Internet connections made available to all. Here a scientific information management infrastructure, which provides an essential technology for data dissemination, sharing, cataloguing, archiving, display, and mapping, has an obvious relevance. The Vents Program has therefore established scientific information management as an organisational priority and is serving as a role model in this regard for multiagency, multidisciplinary national and international research initiatives such as the Ridge Inter-Disciplinary Global Experiments (RIDGE) program for understanding the geophysical, geochemical and geobiological aspects of the global seafloor spreading system (Detrick and Humphris, 1994). According to Gritton and Baxter (1992), any comprehensive scientific information management infrastructure should have at least three components: (1) provision of uniform access to data from multiple sources; (2) testing and evaluation of both commercial and “homegrown” technologies; and (3) assimilation of new data acquisition instrumentation, computers and database management software into the existing system. The Vents Program has adopted a technical approach that emphasises these components. A GIS is the foundation of the computing environment. It integrates data from multiple conventional sensor and sample analysis systems, as well as interpretative information from video and sidescan sonar data analysis. One crucial component in the effective implementation of the entire infrastructure is the definition of a semantic data model that will allow users and designers to capture the meaning of the database. A semantic model essentially defines objects, relationships among objects, and properties of objects, thus providing a number of mechanisms for viewing and accessing a database schema at different levels of abstractions (Ram, 1995). A number of semantic models have been developed and described in the computer science and information management literature (e.g., Hull and King, 1987; Mann and Haux, 1993; Miura and Moriya, 1992/93; Peckham and Maryanski, 1988; Ram, 1995; Yazici et al., 1995). The application of a database in the field of marine geography leads to specific requirements for data acquisition and analysis (previously considered only by Gritton et al. (1989) for undersea data from a single vehicle). These requirements are considered for multiple data types from a variety of sensors in the semantic model to be described below, hereafter referred to as the Vents Program Scientific Information Model or VPSIM. 30.2 RATIONALE FOR THE VPSIM Specific scientific goals of Vents Program research include: (1) the coordinated collection of geological, chemical, biological, and physical data sets; and (2) high level analyses of the processes and process interactions represented in the data sets. The VPSIM provides a formal description of the scientific information requirements for this work. Normally the formulation of a semantic data model should be the first and most critical activity of any database system implementation (Environmental Systems Research Institute, 1994; Gritton and Baxter, 1992). Specifically, it represents the objects, places, concepts, or events, as well as their interrelationships, which are pertinent to the real-world scenario of data flow and data analysis. It includes important concepts of scientific information management such as flexible data grouping, data traceability, and data quality assessment. Such models provide a basis for communicating information requirements, performing database design, and enhancing user interaction with the information management system. The formulation of the VPSIM is necessary for the following reasons:
376
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
• Provision of a schema for database designers. The VPSIM aids in the implementation of data or information management, thereby helping the information management system to reach its full potential. The system should be as efficient as possible in storing and sharing data, as well as integrating the data with other technologies. The VPSIM provides a schema for this, not unlike a flow chart that helps a computer programmer code a program. • Simplification of a complex situation for users. Individual end-user scientists must not be overwhelmed by the complexity of an integrated database. Each must be able to view the database in a way pertinent to their part of the scientific problem. • Effective communications tool for everyone. In terms of the Vents Program database, the VPSIM provides to those within and without the Program a picture of where the Program is, where it’s going, and how it will get from one level to the next. It is thus anticipated that the model will increase in detail and complexity with time. 30.3 VPSIM FORMULATION AND IMPLEMENTATION One approach to formulating an information model might emphasise aligning database structures with sensor acquisition systems. This is fine if one wants to manage only the sensor data itself and not the information derived from it. Vents Program researchers deal with so many different acquisition systems that this is not feasible (e.g., bathymetry from the Sea Beam, Hydrosweep, Sea Beam 2000, or SeaMARC II systems; Alvin, Turtle, or Shinkai 6500 submersible observations and samples; acoustic imagery from the SeaMARC I, IA, II, or DeepTow vehicles). Sensor acquisition systems are constantly in flux. What happens when the technology changes? For the Vents Program a better approach is to emphasise the management of the scientific information that comes out of the acquisition systems. This better reflects the information requirements of the scientists and thus helps them to gain a better understanding of the scientific problems to be addressed. This is especially important as the interdisciplinary research of the Vents Program often requires each contributor to work with information in which they may have limited expertise, but must still interpret the results. Such an approach also allows the model to remain stable despite changes in data acquisition systems or perhaps even data management systems. The result is the ability to evolve to more advanced technical solutions. The VPSIM must therefore identify the inherent structure of oceanographic facts and hypotheses (e.g., hypotheses about processes associated with seafloor-spreading), rather than the structure of the sensor data record. In order to address its scientific goals and objectives, the Vents Program performs certain organisational functions to accommodate its interdisciplinary goals. These functions were the starting point for the design of the VPSIM. Modelling proceeded with the identification of the functions and a general description of the activities within each function (Figure 30.1). Once the functions were compiled, the next logical step was to identify the data supporting each function. Table 30.1 is a data/function cross-reference matrix, where function is essentially a particular subdiscipline of oceanography. The data/function matrix shows a highlevel classification of data for the Vents Program, the interdependence of data and function, those functions or disciplines which create commonly required data, the interdependence of certain functions (such as “Geology” and “Chemistry” in Table 30.1). It is particularly useful for pinpointing information generalities (i.e., data sets useful to more than one or all sub-disciplines), which should in term be reflected in the VPSIM.
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
Table 30.1: Data/Function Table for the Vents Program Scientific Information Model FUNCTION DATA
Geology
Geophysics
Bottom Pressure Recorder Casts -CTD Casts—dissolved/particulate Casts—biology macroplume Casts—biology microplume Current Meter Arrays EQ focal mechanisms EQ locations—terrestrial EQ locations—from T-phase events EQs—T-phase source location fixes Flow fronts Gridlines Images—35 mm, selected Images—ESC, selected Images—other raster Images—Satellite Images—SM I Images—SM II Locations—ESC shots Locations—Hydrothermal vents Locations—35 mm photo (photo-mrg) Locations—video frames, selected Locations—Whales Markers Miniature Temperature Recorder Moorings Overlays (logos,borders, interps) Samples—submersible rock or bio Samples—submersible water or gas Bathymetry—Sea Beam Bathymetry—Sea Beam 2000 Bathymetry—other (Hydrosweep, etc.) Extensometer Tows—camera Tows—camera—interpretations Tows—CTD Tows—Helium Tracks/nav points—Alvin
X
X X
X X X X X X
X X X X X
Biology
X X
X X X X X
X X X X X X X
X X X
X X X X X X X X X X X X
X X X X
X
Chemistry
X X X X
X X X X
X X X X
X X X X
X X X X X X X X X X X X X X
X X X X X X X X X
X X X X X
X
377
378
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 30.1: Organizational function diagram for the Vents Program Scientific Information Model. Refer to Table 30.2 for explanation of acronyms. FUNCTION DATA
Geology
Tracks/nav points—Shinkai 6500 Tracks/nav points—Deep Tow sidescan Tracks/nav points—ROPOS Tracks/nav points—SB Tracks/nav points—SM I Tracks/nav points—SM II Transponders Volcanic System Monitor
X X X X X X X X
Geophysics
Advanced Tethered Vehicle (US Navy). Current Meter. See Cannon et al. (1995). Conductivity-Temperature-Depth. Database Management System. Earthquake. Electronic Still Camera. Geographical Information System. Global Positioning System
Biology
X
X
X
X
X
X
X X X X X X
Table 30.2: List of Acronyms for the Vents Program Scientific Information Model ATV CM CTD DBMS EQ ESC GIS GPS
Chemistry
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
HARU MTR NOAA PMEL ROPOS ROV SB SeaMARC SOSUS T-Phase USGS VSM VPSIM WOCE XTS
379
Hydrophone Acoustic Research Underwater (recently developed at NOAA-PMEL, Newport, OR). Miniature Temperature Recorder, (see Baker and Cannon, 1993). National Oceanic and Atmospheric Administration. Pacific Marine Environmental Laboratory (Newport, OR and Seattle, WA) Remotely-Operated Platform for Ocean Science (Fisheries and Oceans Canada), (see Embley et al., 1995). Remotely-Operated Vehicle. Sea Beam, (see Chadwick et al., 1995a). Sea Mapping and Remote Characterization, (see Applegate, 1990). Sound Surveillance System (U.S. Navy), (see Fox and Hammond, 1994). Tertiary-phase (seismically generated acoustic waves that propagate over great distances in the oceanic sound channel), (see Dziak et al., 1995; Fox and Hammond, 1994). United States Geological Survey. Volcanic System Monitor, (see Fox, 1993). Vents Program Scientific Information Model. World Ocean Circulation Experiment (see Lupton et al., 1985). Extensometer. (see Chadwick et al., 1995b).
30.4 DESCRIPTION OF THE VPSIM The VPSIM (Figure 30.2) is expressed mainly in terms of entities, attributes, and relationships. An entity (shown as a labelled box in Figure 30.2) is an information template corresponding to objects, places, concepts, or events about which one wishes to store information (Fleming and Vonltalle, 1989; Gritton et al., 1989). Attributes (plain, unitalicised text in Figure 30.2) define the information to be recorded about entities, and relationships (labelled lines in Figure 30.2) express an association between entities information (Fleming and Vonltalle, 1989; Gritton et al., 1989). Figure 30.2 illustrates a very high level view of the VPSIM, and thus for simplicity attributes are not included for every single entity and relationship cardinality includes only two classes: “one to one” or “one to many.” The VPSIM illustrates a three-part pipeline from the marine environment being measured to an abstract description of the scientist’s understanding of that environment. Part 1 (Figure 30.2a) focuses on data acquisition, the first piece of the pipeline, and is thus concerned with the accurate sensing and collection of measurements from the marine environment and the transformation of these measurements from raw into processed data for scientific consumption. Part 1 of the VPSIM therefore illustrates a situation where the entities are largely independent of the data model of the GIS or database management system (DBMS). Here data collection is made in two ways: (1) short-term (2 weeks to 1 month) mobile entities, such as research cruises employing various instrument platforms, deployments of remotely-operated vehicles or submersible dives; or (2) long-term (1 year or more) fixed entities, such as hydrophones and various other types of moored instrumentation. For the mobile entities, data collection runs gather in situ samples, casts, and tows of various types for future laboratory analysis. These runs may be made at predefined sites, or along predefined transects or tracklines. For the fixed entities, raw data are sent back to shore from the hydrophones or moorings for encryption and/or processing. Ocean observations from the mobile and fixed entities are made at points in space and time and
380
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 30.2: The Vents Program Scientific Information Model. (Parts 1–2). Refer to Table 30.2 for explanation of acronyms.
may have multiple assessments of the geophysical, geological, chemical, physical, and biological properties
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
381
Figure 30.2: The Vents Program Scientific Information Model, Part 3. Refer to Table 30.2 for explanation of acronyms.
that exist in the marine environment. Each component of an ocean observation group may have one to many assessments of data quality or error. In turn, one to many observation sets or types may arise from a particular ocean observation group. These observation sets or types may be derived by a theoretical method or derived from a sensor, created from a laboratory run, or derived by observing a process (Figure 30.2a). Part 2 of the VPSIM (Figure 30.2b) focuses on the next portion of the pipeline, database management. This portion must provide integrated management of diverse scientific data which will be accessed, often simultaneously, in numerous ways by a variety of users. Further, each end-user scientist must be able to access the database in a way pertinent to their particular scientific function or discipline. In this part of the VPSIM the entities are thus very much constrained by the data model of the chosen GIS or DBMS. The Vents Program uses the ARC/INFO GIS. The entities in Part 2 are essentially coverages, the basic unit of vector data storage in ARC/INFO. A coverage stores geographic features as primary features, such as arcs, nodes, polygons, and label points, and secondary features, such as map extents, links, and annotation (Environmental Systems Research Institute, 1994). Associated attributes of the primary geographic features are stored in feature attribute tables. In Figure 30.2, Part 2, entities are once again shown as labelled boxes but each entity representing an ARC/INFO coverage is shaded according to its topology, which is classed as “point,” “line,” “polygon,” “line and point,” or “line and polygon.” Topology is the spatial relationship between connecting or adjacent features in a coverage (Environmental Systems Research Institute, 1994). In Figure 30.2, Part 2, the associated attributes for each ARC/INFO coverage are listed below it. The boxed “M” next to each coverage stands for the metadata and lineage information associated with that coverage. Metadata or “data about data” would include such ancillary information as sensor calibration, data quality assessment, processing algorithm used, etc., while lineage would be the time stamp for each of these. Part 3 of the VPSIM (Figure 30.2) focuses on the final piece of the pipeline, data presentation, analysis, and visualisation, in which scientists derive meaning and knowledge from the data. In this portion of the
382
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
VPSIM, the entities are largely dependent on the scientist. Here it is more of a flow chart than an entityattribute-relationship diagram as it illustrates the flow of activity by Vents Program users in the course of doing empirical and positivist science. For example, in Figure 30.2, Part 3, scientists may retrieve and display an ARC/INFO coverage either directly from ARC/INFO, Arc View (the graphical user interface to ARC/INFO), or by way of a World Wide Web page that provides an interactive link to ArcView and the ARC/INFO coverages illustrated in Figure 30.2, Part 2 (the Uniform Resource Locator for this page is http://www.pmel.noaa.gov/vents/coax/coax.html; Fox et al., 1996). Concurrently, data manipulation or analysis may occur in the form of overlays, buffering, clipping, and the like. The end result after data retrieval/display and/or data manipulation/analysis is that a Vents Program scientist has the power to view, query, summarise, calculate, and make decisions (Figure 30.2, Part 3). Data are interpreted, new knowledge is gained, and hypotheses may be derived and/or tested. This cycle of activity may then return either to more data retrieval/display, more manipulation/analysis or to the marine environment in Part 1 of the VPSIM (Figure 30.2, Part 1) to collect more data. 30.5 INSIGHTS RESULTING FROM THE VPSIM These are the preliminary insights resulting from the VPSIM at this stage in its development: • The VPSIM distinctly illustrates the interdependency of Vents Program data sets. Great insights can be gained into the information generalities that exist across scientific disciplines and levels of interpretation (Table 30.1 and Figure 30.2b). • The VPSIM elucidates the challenges with Vents Program chemical and physical oceanography data. This kind of data may be sparse in one or two dimensions as compared to others (e.g., vertical casts for salinity/temperature/density where sampling is frequent in the vertical direction but sparse in the horizontal). In this case, it is extremely difficult to provide input data for all points in the sample space. It is clear now that adequate interpolation methods will thus be needed in order for the GIS to fill the sample space adequately. • Sidescan sonar imagery is a very important Vents Program data set that is still not part of the scientific information management system. Work is in progress to remedy this with the integration of SeaMARC and Sea Beam 2000 imagery into ARC/INFO and Arc View. • The VPSIM provides a clearer view of how some data will require more metadata and lineage than others. For example, camera tow and seafloor mooring coverages, which carry an extensive number of attributes, will clearly require more attention than other coverages, such as bathymetry and lava flows (Figure 30.2-b). Metadata cataloguing has just been completed and will be reported on by Lubomudrov et al. (1997). • Since a good portion of the Vents Program data sets includes visual marine observations, the VPSIM may be useful in the future for aiding human interpretation of marine video. Similar to what has been accomplished at the Monterey Bay Aquarium Research Institute, it may be possible to use the VPSIM to define a structured language syntax for use by an expert video annotator (Gritton and Baxter, 1992). The idea is to provide the most accurate and efficient delivery of visual marine observations to an integrated scientific database system or GIS. This work with the video data would also build upon the work of Fox et al (1988), which is a system that automatically displays, interprets, and performs statistical analysis on deep sea bottom photographs.
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
383
• Part 3 of the VPSIM shows that the GIS provides additional power beyond that presently used by Vents Program researchers (Figure 30.2-c). For instance, there is currently very little “analysis” going on in the strictest sense of the term, as defined in terms of spatial statistics by experts such as Fotheringham and Rogerson (1993) or Openshaw (1991). However, it has been enough of a challenge to get data into a GIS and to convince colleagues of the great utility of the technology. More rigorous spatial analysis is a goal for the future as is discussed in the next section. • The VPSIM has been especially helpful as a communication tool to Vents Program scientists, most of whom have little or no previous experience with GIS and have not been intimately involved in database development. It has provided a simplified picture for them of the structure of GIS operations, as well as a picture of where the Vents Program is, where it’s going, and how to get from one level to the next in terms of database development. 30.6 FUTURE DIRECTIONS: VISUALISATION AND MODELLING The VPSIM clearly portrays the two-dimensionality of Vents Program data. This is a satisfactory start for current mapping needs, but because oceanographic data are often three-and four-dimensional in nature, it is clear that new GIS storage structures and more powerful query facilities will be required in order to handle these data more effectively. At present, few commercial GISs are able to deal with three-dimensional or time-dependent data (Hamre 1993; Mason et al., 1994; Raper and Kelk, 1991). So there will be limitations when using the current packages, which are only equipped with two-dimensional spatial objects. What is planned for the future is that the “analysis” depicted in Part 3 of the VPSIM will include coupling of the GIS with a 3-D visualisation package or interfacing the GIS to a numerical model or simulation (Figure 30.2, Part 3). Gahegan (1996) and Hay and Knapp (1996) point out that while it would be desirable to have sophisticated visualisation and numerical modelling tools available from within GIS. It is fairly simple and sufficient to pass data from the GIS to the visualisation or modelling environment, since integration occurs at the external level within both systems. Lam et al. (1996) argue that for interfacing environmental models over a large range of possible input parameters, it is beneficial to have a level of integration that is much greater than simple file transfer (e.g., shared link libraries). However, the current priority within the Vents Program is to facilitate the integration of various types of data (e.g., marine geological, biological, and chemical data) at various scales of resolution. Complex environmental models have not yet been developed from the various types of data. The proprietary nature of ARC/INFO code will necessitate the use of either the file transfer approach or the design of some kind of intermediate data file format that can be read by both the GIS and the visualisation system. Kuiper et al. (1996) in integrating ARC/INFO with the Site View™ visualisation system elected to design and implement an ASCII file format that was uniquely adapted to Site View™ and its data model and then to program an ARC/INFO translator using ARC/INFO software development libraries and C code. The ASCII file format consists of an informational header, a structure definition (i.e., object type, geometric type, data attributes), and data records (i.e., object type followed by data elements) (Kuiper et al., 1996). The Site View object-oriented data model is very similar to that of the Fledermaus, a visualisation system developed specifically for ocean floor data by the Ocean Mapping Group at the University of New Brunswick. Given the success of the Kuiper et al. (1996) approach, it is conceivable that a “Fledermaus” translator could be coded for Vents Program GIS data. The translator will benefit ARC/INFO users by giving them the ability to explore data more interactively in 3-D and to view numerical model output in both the spatial and temporal dimensions.
384
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
30.7 CONCLUSION The Vents Program has provided a framework for successful integrated data management by establishing scientific information management as an organisational priority. The ultimate objective is a system that integrates the functions of data storage, selective retrieval, display and archiving. The results of ongoing efforts in semantic data modelling and information management have produced a relational database in which marine geological, geophysical, chemical, and biological observations can be accessed by any investigator within the Vents Program and throughout the marine science community. The VPSIM provides an effective schema for database design, simplifies a complex data management scenario for end-user scientists, serves as an effective communications tool for scientists and database managers, and demonstrates that GIS technology can be an effective, if not crucial, tool for supporting oceanographic research. ACKNOWLEDGEMENTS The support of NOAA and the Oregon State University Department of Geosciences is gratefully acknowledged. Data used in this chapter were collected and supplied by NOAA. The assistance and expertise of Andra Bobbitt and Chris Fox of NOAA/PMEL is gratefully acknowledged. REFERENCES APPLEGATE, T.B. 1990. Volcanic and structural morphology of the south flank of Axial Volcano, Juan de Fuca Ridge. Results from a Sea MARC I side scan sonar survey, Journal of Geophysical Research, 95(B8), pp. 12,765–12,783. BAKER, E.T. and CANNON, G.A. 1993. Long-term monitoring of hydrothermal heat flux using moored temperature sensors, Cleft segment, Juan de Fuca Ridge, Geophysical Research Letters, 20(17), pp. 1855–1858. CANNON, G.A., PASHINSKI, D.J. and STANLEY, T.J. 1995. Fate of event hydrothermal plumes on the Juan de Fuca Ridge , Geophysical Research Letters, 22(2), pp. 163–166. CHADWICK, W.W. Jr, EMBLEY, R.W. and FOX, C.G. 1995a. Sea Beam depth changes associated with recent lava flows, CoAxial segment, Juan de Fuca Ridge. Evidence for multiple eruptions between 1981–1993, Geophysical Research Letters, 22(2), pp.167– 170. CHADWICK, W.W. Jr, MILBURN, H.B. and EMBLEY, R.W. 1995b. Acoustic extensometer. Measuring mid-ocean spreading, Sea Technology, 36(4), pp. 33–38. DETRICK, RS. and HUMPHRIS, S.E. 1994. Exploration of global oceanic ridge system unfolds, EOS, Transactions, American Geophysical Union, 75, pp. 325–326. DZIAK, R.P., FOX, C.G. and SCHREINER, A.E. 1995. The June-July 1993 seismo-acoustic event at CoAxial segment, Juan de Fuca Ridge. Evidence for a lateral dike injection, Geophysical Research Letters, 22(2), pp. 135–138. EMBLEY, R.W., CHADWICK, W.W. Jr, JONASSON, I.R, BUTTERFIELD, D.A. and BAKER, E.T. 1995. Initial results of the rapid response to the 1993 CoAxial event. Relationships between hydrothermal and volcanic processes, Geophysical Research Letters, 22 (2), pp. 143–146. ENVIRONMENTAL SYSTEMS RESEARCH INSTITUTE 1994. ARC/INFO Data Management Concepts, Data Models, Database Design, and Storage. Redlands, California: Environmental Systems Research Institute. FLEMING, C.C. and VONLTALLE, B. 1989. Handbook of Relational Database Design. New York: Addison-Wesley. FOTHERINGHAM, A.S. and ROGERSON, P.A. 1993. GIS and spatial analytical problems, International Journal of Geographical Information Systems, 7(1), pp. 3–19.
SEMANTIC MODELLING FOR OCEANOGRAPHIC DATA
385
FOX, C.G. 1993. Five years of ground deformation monitoring on Axial Seamount using a bottom pressure recorder, Geophysical Research Letters, 20(17), pp. 1859–1862, FOX, C.G. 1995. Special collection on the June 1993 volcanic eruption on the CoAxial segment, Juan de Fuca Ridge, Geophysical Research Letters, 22(2), pp. 129–130. FOX, C.G. and HAMMOND, S.R 1994. The VENTS program T-phase project and NOAA’s role in ocean environmental research, Marine Technological Society Journal, 27(4), pp. 70–73. FOX, C.G., MURPHY, K.M. and EMBLEY, R.W. 1988. Automated display and statistical analysis of interpreted deepsea bottom photographs, Marine Geology, 78, pp. 199–216. FOX, C.G., BOBBITT, A.M. and WRIGHT, D.J. 1996. Integration and distribution of deepsea oceanographic data from the NE Pacific using Arc/Info and Arc View, in Proceedings of the 16th Annual ESRI User Conference, Palm Springs, California, 20–24 May, URL: http//www.esri.com/resources/ GAHEGAN, M. 1996. Visualisation strategies for exploratory spatial analysis, in Proceedings of the Third International Conference/Workshop on Integrating GIS and Environmental Modelling, Santa Fe, New Mexico, 21– 25 January, URL: http://www.ncgia.ucsb.edu/conf/santa_fe.html. GRITTON, B.R and BAXTER, C.H. 1992. Video database systems in the marine sciences, Marine Technological Society Journal, 26(4), pp. 59–72. GRITTON, B., BADAL, D., DAVIS, D., LASHKARI, K., MORRIS, G., PEARCE, A. and WRIGHT, H. 1989. Data management at MBARI, in Oceans ‘89 Proceedings: The Global Ocean, Long Beach, California, pp. 1681–1685. HAMMOND, S.R, BAKER, E.B., BERNARD, E.N., MASSOTH, G.J., FOX, C.G., FEELY, R.A., EMBLEY, R.W., RONA, P.A. and CANNON, G.A. 1991. The NOAA VENTS Program: understanding chemical and thermal oceanic effects of hydrothermal activity along the mid-ocean ridge, in Eos, Transactions of the American Geophysical Union, 72(50), pp. 561–566. HAMRE, T. 1993. Integrating remote sensing, in situ and model data in a marine information system (MIS), in Proceedings Neste Generasjons GIS 1993, Ås, Norway, pp. 181–192. HAY, L. and KNAPP, L. 1996. Integrating a geographic information system, a scientific visualization system and an orographic precipitation model, in Kovar, K. and Nachtnebel, H.P. (Eds.), HydroGIS 96 Application of Geographic Information Systems in Hydrology and Water Resources Management (Proceedings of the Vienna Conference), Vienna, 16– 19 April, Wallingford, Oxfordshire International Association of Hydrological Sciences, pp. 123–131. HULL, R and KING, R 1987. Semantic database modelling. Survey, applications, and research issues, Association of Computing Machinery Computing Surveys, 19(3), pp. 201–260. KUIPER, I, AYERS, A., JOHNSON, R. and TOLBERT-SMITH, M., 1996. Efficient data exchange integrating a vector GIS with an object-oriented, 3D visualization system, Proceedings of the Third International Conference/ Workshop on Integrating GIS and Environmental Modelling, Santa Fe, New Mexico, 21–25 January, URL: http:// www.ncgia.ucsb.edu/conf/santa_fe.html. LAM, D.C.L., SWAYNE, D.A., MAYFIELD, C.I. and COWAN, D.D. 1996. Integration of GIS with other software systems: integration versus interconnection, Proceedings of the Third International Conference/Workshop on Integrating GIS and Environmental Modelling, Santa Fe, New Mexico, 21–25 January, URL: http:// www.ncgia.ucsb.edu/conf/santa_fe.html. LUBOMUDROV, L., FOX, CO., BOBBITT, A.M., WRIGHT, D.J. 1997. A metadata catalogue for the VENTS Marine GIS, National Oceanic and Atmospheric Administration-Pacific Marine Environmental Laboratory Internal Report. Washington: NOAA LUPTON, I.E., DELANEY, J.R., JOHNSON, H.P. and TIVEY, M.K. 1985. Entrainment and vertical transport of deepocean water by buoyant hydrothermal plumes, Nature, 316, pp. 621–623. MANN, G. and HAUX, R 1993. Database scheme design for clinical studies based on a semantic data model, Computational Statistics and Data Analysis, 15, pp. 81–108. MASON, D.C., O’CONAILL, M.A. and BELL, S.B. M. 1994. Handling four-dimensional geo-referenced data in environmental GIS, International Journal of Geographical Information Systems, 8(2), pp. 191–215. MIURA, T. and MORIYA, K. 1992/93. On the completeness of visual operations for a semantic data model, Data and Knowledge Engineering, 9, pp. 19–44.
386
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
OPENSHAW, S. 1991. Developing appropriate spatial analysis methods for GIS, in Maguire, D.J. , Goodchild, M.F. and Rhind, D.W. (Eds.), Geographical Information Systems: Principles and Applications. New York: John Wiley & Sons, pp. 389–402. PECKHAM, J. and MARYANSKI, F. 1988. Semantic data models issues, Association of Computing Machinery Computing Surveys, 20(3), pp. 153–189. RAM, S. 1995. Intelligent database design using the unifying semantic model, Information and Management, 29, pp. 191–206. RAPER, J.F. and KELK, B. 1991. Three-dimensional GIS, in Maguire D.J., Goodchild, M.F. and Rhind, D.W. (Eds.), Geographical Information Systems: Principles and Applications, New York: John Wiley & Sons, pp. 299–317. YAZICI, A., BUCKLES, B.P. and PETRY, F.E. 1995. A semantic data model approach to knowledge-intensive applications, International Journal of Expert Systems, 8(1), pp. 77– 91.
Chapter Thirty One Hierarchical Wayfinding—A Model and its Formalisation Adrijana Car
31.1 INTRODUCTION This chapter focuses on hierarchical spatial reasoning. The topic belongs to spatial information theory as it investigates the hierarchical structuring of space and its use for reasoning. The human ability to experience space directly plays a fundamental role in cognition (Mark and Frank, 1996): experientialists assume that humans build their basic concepts according to their spatial experience (Lakoff and Johnson, 1980; Johnson, 1987). To do so they use abstraction mechanisms which allow those details that are necessary for a particular spatial task to be chosen (Frank, 1990). Hierarchization is such an abstraction mechanism. There is evidence that humans use hierarchical structures which they impose on space to reduce the cognitive load and to improve performance: for example, in organisation and retrieval of spatial knowledge (Palmer, 1977; Stevens and Coupe, 1978), for organising route knowledge (Hirtle and Jonides, 1985) or for representing spatial knowledge needed in robot navigation (Kuipers and Byun, 1991). This experimental observation leads to a framework of hierarchical reasoning, however, current models are insufficiently detailed to allow their implementation in a GIS (Medyckyj-Scott and Blades, 1992). To understand how spatial hierarchies are formed and used is one of the most important questions in spatial reasoning research today (Golledge, 1992), and research in formalising spatial hierarchical reasoning is necessary. Wayfinding in large networks is studied as a particular case to gain better insight into the general problem of hierarchical reasoning. This is the case where sufficient evidence for hierarchical structure in human performance is available (Elliott and Lesk, 1982; Gotts, 1992; Hirtle and Hudson, 1991; Hirtle and Jonides, 1985; Stevens and Coupe, 1978; Tversky, 1992), and where algorithms for the non-hierarchical case are well known (Ahuja et al., 1993; Gibbons, 1985). Humans can find the fastest path even in very large street networks very quickly by applying a hierarchical strategy. Non-hierarchical algorithms, as currently used in GIS, show performance that decreases rapidly with increasing size of the network (Kuipers, 1982; Lapalme et al., 1992; see also Patterson, Chapter 9, this volume). Therefore, a different approach to modelling networks is necessary (Blades, 1991; Claussen and Mark, 1991). We propose a conceptual model of hierarchical wayfinding based on human cognition. The model describes the hierarchical structure of a network and explains how the reasoning process progresses on this structure. This leads to an efficient wayfinding algorithm. The novel contribution of this work is combining advances in human knowledge representation and hierarchical reasoning with classical algorithms for wayfinding. A model for wayfinding is based on the hierarchically structured space (i.e., road network) as opposite to hierarchy of tasks as is the case in Timpf s
388
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
model of highway navigation (Timpf, 1992) or Kuipers’ robot navigation (Kuipers and Byun, 1991). A combination of both approaches would be necessary to achieve an optimal model of navigation. A more general contribution is a proposed theory of hierarchical spatial reasoning. The theory is expected to be applicable in cases where large spatial data sets need to be processed and the requirements tell how to apply the theory to such cases. Formalising the conceptual models of space is essential for their implementation. As an intellectual tool it helps to understand the underlying theory and structure, i.e., to describe the meaning of the data in a model, and it provides a basis for the comparison of similar models. The formalisation method introduced in this chapter uses object orientation as a design method, algebraic specifications as a formalism, and a functional programming language as a specification and prototyping tool (Car and Frank, 1996). Software engineering proposes object orientation as a design method (Khoshafian and Abnous, 1990) because there is a strong correspondence between the idea to model both the structure of an object and its behaviour and the way humans build classes of similar objects according to their experience (e.g., drive a car, sell or buy a parcel, etc. (Couclelis, 1992)). Algebraic specifications support object oriented modelling as they combine the advantages of data abstraction with an axiom-based (mathematical) method to describe semantics (Guttag et al., 1978). This is a direct link to functional programming because functional programming allows for such descriptions of abstract models in the form of executable code (Frank and Kuhn, 1995). The remainder of this chapter is organised as follows: Section 31.2 explains the fundamentals of hierarchical spatial reasoning. Section 31.3 presents a cognitive approach to modelling wayfinding as a case of hierarchical reasoning in space. The design of the cognitive model (Section 31.4) follows the general requirements of hierarchical spatial reasoning. Details of the formalisation method are presented in Section 31.6. 31.2 HIERARCHICAL SPATIAL REASONING Hierarchical spatial reasoning is any reasoning process which applies hierarchy to subdivide either a task or space. Hierarchisation is used to structure knowledge about the problem domain. It is a prime method to reduce complexity of the real world: a divide and conquer strategy where a problem is divided in smaller parts and then solved. A strategy of breaking the problem into manageable sub-problems seems reasonable in the context of spatial memory (McNamara et al., 1989). A hierarchical structure is used to reduce the spatial domain in such a way that many elements can be excluded quickly. Through this reduction we expect more efficient problem solving. This method is different from the types of hierarchical methods where the hierarchy just provides simpler subdivisions of the underlying data structure, e.g., tree walk or merge sort (Wirth, 1976). The difference is in that hierarchy is not used to infer spatial information or to enhance problem solving. Conceptual hierarchisation of space has proven useful as a spatial concept and as such is transferred metaphorically to other domains such as user interface design (Kuhn, 1992, 1993), information processing (Fotheringam, 1992), multiple representations for cartographic objects (an intelligent graphic zoom) (Frank and Timpf, 1994) or visualisation (Robertson et al., 1991).
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
389
31.2.1 Types of Hierarchies Hierarchical structuring can be used to refine the general problem to a particular task. This leads to a hierarchical structure, where objects of a different type occur at different levels (level-specific structure in terms of Freksa (1991)). Studies of hierarchical subdivisions of tasks are more common: for example, levels of highway navigation (Timpf et al., 1992), semantic hierarchy in robot navigation defined by Kuipers and Levitt (1988) and Kuipers et al., (1993), or segmentation hierarchy for multimodal, incremental route description process as proposed by Maass (1993). The hierarchical structure of space is level-independent. It organises spatial objects such that each level contains objects of the same type. The difference between levels is only in resolution or detail. The quadtree (Samet, 1984), hierarchical irregular triangular networks (De Floriani and Puppo, 1992), a global hierarchical spatial data structure called triacon or quaternary triangular mesh (Goodchild and Shiren, 1992; Dutton, 1996), or regionalisation à la Christaller (1950) are some examples for spatial hierarchies. 31.2.2 General Principles of Hierarchical Spatial Reasoning Assume that a spatial algorithm to solve a spatial reasoning problem is given. The goal is to subdivide the space hierarchically so that the algorithm uses smaller sets of data. The same (or nearly the same) result is achieved but processing is more economical. For each method of hierarchical spatial reasoning the following elements must be given: • a hierarchical structure, with objects and operations found on each level, the criteria on how objects are assigned to levels and how objects from one level are connected to objects at the next level; • rules on how to use this structure and how reasoning progresses over the levels, e.g., description of the algorithm considering how results from solutions on one level are used when the algorithm proceeds on the next level; • comparisons of the results of the hierarchical processing with the results achieved with a nonhierarchical algorithm; and • a study of the performance improvement from this restructuring. These are the general requirements of hierarchical spatial reasoning (Car and Frank, 1994a). In this chapter a model for wayfinding is proposed that satisfies the general requirements of hierarchical spatial reasoning. 31.3 APPROACH The approach to the problem of representing the knowledge needed for hierarchical spatial reasoning in wayfinding is based on the following cognitive assumptions (Car and Frank, 1993): (1) humans divide a large road network hierarchically; (2) the amount of detail decreases from the top to the bottom; and (3) the path found with hierarchical reasoning is (close to) optimal if hierarchical levels are formed according to expected travel speed. From these assumptions we posit our primary hypothesis that the fastest path can be found by just considering the highest appropriate level of the network.
390
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Our approach uses conceptual modelling as the formal investigation method. Tools used are graph theory, object oriented design, algebraic specifications and functional programming. We derive a conceptual model of wayfinding in a hierarchically structured network. This model contains formalised human concepts and its semantics. The complete set of objects to be modelled should be collected and brought together into an ontology. This includes deriving the properties and behaviour of the objects. Objects are formalised with the help of algebraic specifications. Functional programming languages provide executable specifications which can be formally checked. The code can be used as a prototype which allows testing of the captured semantics. 31.4 CASE STUDY: WAYFINDING We propose an algorithm for determining the minimum-cost path between two nodes in a hierarchically structured graph (Car and Frank, 1994b). It is based on a flat (non-hierarchical) algorithm enriched by rules about how to deal with a hierarchical structure. 31.4.1 Ontology We define an ontology for both the non-hierarchical and the hierarchical case, ontology being an abstracted, idealized model of reality containing only those objects, relations among them, and rules that govern them, which are of interest in a particular reasoning system (Davis, 1990),. The objects in the conceptual model emerged from the formalisation by algebraic specifications. Selected parts of the algebraic specification are given for the non-hierarchical and hierarchical case. A short introduction to the method of algebraic specifications is given in Section 31.6. Ontology for the non-hierarchical case
The ontology for a non-hierarchical case contains places and road-segments. A place is a distinctive location where humans make decisions about the path. A road segment connects two places. Attributes like travel time or length are assigned to it. In terms of graph theory nodes represent places and edges represent road segments. Together they form a graph representing the road network. The road network forms a planar graph, excluding the cases where streets do not cross at the same level. At a node travel may continue along any adjacent edge. There is no restriction on turns or one-way streets considered here, as they do not influence the general requirements. However, these issues must be included in the algorithm. The nodes are situated in the Euclidean space and Euclidean distances between them can be computed. A node can be specified as follows: class Node n c where getId : : n c −> Id getPos : : n c −> PosD c data NodeD c=NodeT Id (PosD c) instance Node NodeD c where getId (NodeT i p) = i getPos (NodeT i p) = p A node (NodeD c) is constructed by the operation NodeT that requires a name (Id) and a position (PosD c). The operations getId and getPos, applied to a node, retrieve the name and position, respectively. (Remark: any get -operation retrieves one of the object’s components). A position is constructed by the operation PosT as a “package” of two coordinates of the type c. data PosD c=PosT c c
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
391
This data type has one parameter c, which allows to produce positions of the type Integer or Float. This determines also the kind of nodes or edges in this specification. Operations on positions are grouped in the class: class (…) => Pos p c where getX : : p c -> c getY : : p c -> c dx : : p c -> p c -> c dy : : p c -> p c -> c dx a b = getX a—getX b dy a b = getY a—getY b dist : : p c -> p c -> Float dist a b = sqrt(toFloat((dx a b)^2)+ toFloat((dy a b)^2)) Operations like coordinate differences (dx, dy) are used to compute distance (dist) between two positions. An edge is specified as follows: class (...) => Edge ed c where getEId : : ed c -> Id getSN : : ed c -> NodeD c getEN : : ed c -> NodeD c getEdgeClass : : ed c -> RoadClassD lengthE : : ed c -> Float lengthE a = dist (getSN a) (getEN a) weight : : ed c -> Float weight a = (lengthE a) / toFloat (getRCProp a) data EdgeD c = EdgeT Id (NodeD c) (NodeD c) RoadClassD instance Edge EdgeD c where getEId (EdgeT i sn en rc) = i getSN (EdgeT i sn en rc) = sn getEN (EdgeT i sn en rc) = en getEdgeClass (EdgeT i sn en rc) = rc An edge (EdgeD c) is constructed by the operation EdgeT that requires a name (Id), two nodes (NodeD c) and a road class (RoadClassD). lengthE computes the length of an edge as a distance (dist) between its two end nodes. This operation assumes straight edges, weight determines the weight of an edge as a ratio between the length of the edge and the average travel speed. The average travel speed is the property of a road class to which the edge belongs to (getRCProp). A graph (GraphD c) is specified as follows: class Graph gr c where emptyGraph: : gr c getNL : : gr c -> NodeList c getEL : : gr c -> EdgeList c addE : : EdgeD c -> gr c -> gr c addN : : NodeD c -> gr c -> gr c… data GraphD c=GraphT (NodeList c) (EdgeList c) instance (...) => Graph GraphD c where emptyGraph = GraphT [ ] [ ] getEL (GraphT nl el) = el getNL (GraphT nl el) = nl addE e gr = GraphT (getNL gr) (addToList e (getEL gr)) addN n gr = GraphT (addToList n (getNL gr)) (getEL gr) … A graph consists of a list of nodes and a list of edges. An empty graph (emptyGraph) has an empty list of nodes and edges respectively. One can add a node (addN) or an edge (addE) to a graph. Ontology for the hierarchical case
The ontology for the hierarchical case contains places, road segments, and hierarchical levels (Figure 31.1). The hierarchical structure is built by selecting a set of connected edges in the graph and have them form a connected subgraph representing one hierarchical level. The process can be repeatedly applied to form a multilevel hierarchy. The ontology of the non-hierarchical case (i.e., places and road segments) must be preserved, so that the algorithm for the flat case can be applied in each hierarchical level independently. The selection of edges is based on a road classification scheme (e.g., interstate highway, federal highway or local road). A road class is specified as follows: class RoadClass rcl where getRCname : : rcl -> String getRCLv : : rcl -> Level
392
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 31.1: A model of a US road network with three levels.
getRCProp : : rcl -> Int type Level = Int data RoadClassD = RoadClassT String Level Int instance RoadClass RoadClassD where getRCname (RoadClassT rc 1 rcp) = rc get RCLv (RoadClassT rc 1 rcp) = 1 getRCProp (RoadClassT rc 1 rcp) = rcp A road class is constructed by the operation RoadClassT that requires a name or textual description for that class (String), level (Level) and property of that road class (Int). A property in this case is the average travel speed for a particular road class (on an interstate 130 km/h, on a highway 100km/h, etc.). Level is a numerical representation of a road class with 0 being the highest level and 00). 31.4.2 Algorithm for the Non-Hierarchical Case Dijkstra’s algorithm is used here as the fundamental, general-purpose algorithm (algorithm for the flat case) (Dijkstra, 1959). It solves shortest path problems with non-negative arc lengths. The original implementation of that algorithm takes O(n2) operations in a completely dense network, where n is the number of nodes in the graph. It can be improved for sparse networks to O(n log n) time (Johnson, 1977) and O(n (log n)½) in planar graphs (Frederickson, 1987) (see also Ahuja et al., 1993).
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
393
31.4.3 Algorithm for the Hierarchical Case The hierarchical algorithm is based on the idea of the step-wise reduction of the initial graph: a node on the lower level always climbs up to the node in the higher level. Each time the nearest entrance to the higher level is found, a new subgraph is created excluding nodes and edges from the lower level. Therefore, the flat algorithm is applied only to a subgraph. Let me explain it by a simple example: Assume that Ai and Bj are given, where i<j and the level i is the highest level in the network. The procedure to find the shortest path, in terms of the minimum travel time based on the distance and travel speed associated with the edges in the network, is a recursive procedure as follows: 1. Determine a node to start with {this is a node at the lower level, Bj} 2. use the flat algorithm to find the nearest entrance from the start node to the higher level k, k<j (Bk) 3. store a partial shortest path found at the level j 4. create a new subgraph {it includes the level k and higher than k, k>0 } 5. take Bk as a new start node 6. If Ai and Bk are at different levels then go to Step 2 else apply the flat algorithm to the subgraph containing only the highest level to find a partial shortest path to the goal node End In the final step the flat algorithm finds a partial shortest path in a subgraph containing only the highest level. The hierarchical algorithm terminates when the goal node is reached. A total shortest path found by the hierarchical algorithm is a concatenation of the partial shortest paths. A bi-directional search is used in cases where neither start nor goal node has access to the highest level (e.g., one node has access only to level 2 and the other only to level 1). For the example given in Figure 31.2, the hierarchical algorithm finds the shortest path between the nodes Ai and Bj, i=0 and j=2, in a three-level network as follows: • Figure 31.2a: Bj is selected as start node (step 1); the flat algorithm finds the shortest path between Bj and B1 (step 2 and step 3) with access to the level k=1; • Figure 31.2b: a new subgraph is created containing all levels higher than or equal to k=1, i.e., levels 1 and 0 (step 4); B1 is a new start node (step 5); Ai and B1 are still at different levels (step 6), therefore go to step 2; the flat algorithm finds the shortest path between B1 and B0 (step 2 and step 3) with access to the level &=0; • Figure 31.2c: a new subgraph is created containing all levels higher than or equal to k=0, i.e., only level 0 (step 4); B0 is a new start node (step 5); Ai and B1 are at the same level now, therefore take the else part of the step 6. The total shortest path between Ai and Bj is a concatenation of the partial shortest paths Bj-Bl-B0-Ai 31.4.4 Analysis of the Results Structuring knowledge hierarchically incurs a cost. Performance must consider both computational operations and storage requirements for data. A larger gain in performance, exceeding this initial cost, must be achieved to make it worthwhile. Performance of the hierarchical algorithm will increase by excluding some of the input data from consideration. It is possible to demonstrate that the results (costs of paths) are
394
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 31.2: Determination of the shortest path in a hierarchical network.
correct in the sense of being identical to the results achieved by the flat algorithm when using the full data set. The prototype has been tested on a theoretical network with three hierarchical levels (Figure 31.3). Shortest paths between different nodes were computed by both the flat (non-hierarchical) and the hierarchical algorithm. The criteria for the analysis were: 1. cost of the shortest paths found by the hierarchical and non-hierarchical algorithm and nodes building these paths; and 2. number of operations needed by each algorithm. Comparison of the Results Shortest paths computed by both algorithms had the same cost and they consisted of the same nodes (Table 31.1). Table 31.1: Cost and nodes for shortest paths determined by hierarchical and flat algorithm. Sp-Name
sp1 sp2
Node in shortest paths
Cost diff.
Nonhierarchical path (nh)
Hierarchical path (h)
(h-nh)
hn4,hn3,hn2,hn1 hn1,hn2, hn3, hn4, hn14
hn1,hn2,hn3,hn4 hn14,hn4,hn3,hn2, hn1
0 0
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
395
Figure 31.3: A three-level test network. Sp-Name
sp3 sp4 sp5 sp6 sp7 sp8 sp9
Node in shortest paths
Cost diff.
Nonhierarchical path (nh)
Hierarchical path (h)
(h-nh)
hn14,hn4,hn9,hn6,hn20,hn25 hn25,hn20,hn6,hn9,hn4,hn14 hn5,hn6,hn7,hn8,hn24 hn25,hn20,hn6,hn7,hn8,hn28 hn22,hn2,hn1 hn16,hn10,hn2,hn1 hn24,hn8,hn2,hn10, hn11,hn17,hn18
hn14,hn4,hn9,hn6,hn20,hn25 hn25,hn20,hn6,hn9,hn4,hn14 hn24,hn8,hn7,hn6,hn5 hn25,hn20,hn6,hn7,hn8,hn28 hn22,hn2,hn1 hn16,hn10,hn2,hn1 hn24,hn8,hn9,hn4,hn12, hn11,hn17,hn18
0 0 0 0 0 0 0.03
One deviation in behaviour has been found: the shortest path between the nodes hn24 and hn18 determined by the hierarchical algorithm is longer than that determined by the flat algorithm. There is also a difference in nodes. This is due to exclusion of the lower level. However, this ‘mistake’ is insignificant because the cost difference is about one quarter of the shortest edge (the smallest cost) in that graph. Results of earlier studies (Car and Frank, 1994b) have shown that the path determined with the hierarchical algorithm is equally fast as that determined with the flat algorithm, if travelling on the higher level is twice as fast as on the lower level In a three-level network, ratios of 1.5 to 2 for the travel speed between levels are quite realistic, and translate to say 50 km/h on local roads, 75 km/h on regular roads, and 120 km/h on highways. A hierarchical algorithm often produces paths which take longer than the ‘flat solution’, but a driver still takes the hierarchical solution because it is ‘faster’. Performance analysis
396
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 31.4: A diagram that shows the performance of both algorithms (in 1000 operations).
Dijkstra’s algorithm used as a flat algorithm, needs O(n2) operations to find the shortest path in a nonhierarchical graph with n nodes. This is much slower than in a hierarchically structured network where the hierarchical algorithm needs 0(m2) operations, m being the number of nodes inspected by the hierarchical algorithm. A number of nodes to be inspected by the hierarchical algorithm grows with the length of the path: e.g., in a grid, for a particular path of length s the computation takes O((m2)2) because a subnetwork of radius s is considered and it contains m2 nodes (for more details see Car and Frank, 1994b). Figure 31.4 is a diagram presenting a number of operations needed by each of the algorithms to determine shortest paths. The hierarchical algorithm needed 30–80 percent less operations than the flat algorithm to find a particular shortest path. However, we assume that these figures depend on the network configuration and the number of levels, and this requires systematic exploration. 31.5 SUMMARY We proposed a theory on hierarchical spatial reasoning and applied it to wayfinding in large road networks. In fact, we adapted human skills in spatial problem solving and used them for computational methods. Results of this study provide: 1) a conceptual model for the hierarchy of space based on cognitive assumptions; (2) an efficient hierarchical algorithm; and (3) an understanding of the underlying heuristics. These three points are fulfilments of the requirements for the hierarchical spatial reasoning. Formal specifications and a prototype of the proposed model are available. Tests on theoretic data have shown that a complex spatial problem can be solved more efficiently, if the reasoning process uses only a hierarchical structure of the domain. Comparison of the results of the hierarchical and non-hierarchical algorithm verified the hypothesis that the fastest path can be found by considering the highest appropriate level of the network. The method of formalisation introduced here uses algebraic specifications and a functional programming language. This method provides executable specifications which are valuable for GIS because they allow for: (1) formal checking of the specification (syntax); and (2) prototyping (checking of semantics). The achieved results have proven the validity of the conceptual model because they have shown that the implementation is possible and that the proposed model behaves as expected. We expect the applied methods and results to provide better insight into the theory than the heuristics currently proposed and, therefore, their better applicability in cases where large data sets need to be
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
397
processed. Example applications are car navigation systems, where the use of a hierarchical structure can achieve more effective wayfinding in very large road networks. 31.6 APPENDIX: FORMALISATION METHOD The formalisation method is based on an object-oriented approach which uses algebraic specifications and a functional programming language. Algebraic specifications (Liskov and Guttag, 1986) are a formalism with an ability to capture the meaning of data and operations, i.e., their semantics. Functional programming languages (Bird and Wadler, 1988), used as a tool for writing these specifications, allow for their formal checking and for observing if the specifications capture the intended behaviour (rapid prototyping). The functional language Gofer (Jones, 1994) is used as the specification language. The approach integrates specification and prototyping in a single and easy to use environment. An algebraic specification describes an object by a name for its type (sort), a set of operations applicable to this type, and a set of axioms defining the behaviour of these operations. The main idea is to express the meaning (semantics) of operations through axioms, i.e., to define the behaviour of one operation in terms of other operations. Gofer supports two separated concepts: (1) abstraction that describes the behaviour of objects; and (2) implementation that constructs these objects and describes how they achieve the intended behaviour. Abstraction uses classes and the implementation uses data types and instances. A class defines a set of operations (expressed through functions) that are applicable to a family of types. A data type definition describes the internal structure of an object. An instance of a class shows the implementation of operations for a particular data type. Here is an example of a specification for positions in a two-dimensional space which contains operations for creating a position and for retrieving its coordinates. class Pos p c where getX : : p c -> c Positions PositionType c where
getY : : p c -> c data PosD c=PosT c c instance getX (PosT x y) = x getY (PosT x y) = y
The first part of the specification is the definition of classes. A class contains signatures of the operations, defining only the types of arguments and of the result. The operation PosT takes two inputs (coordinates) of the type c and constructs the data type PosD c. The data type PosD c can be seen as a ‘package’ of two coordinates. The operations getX, getY are observers. These operations allow the retrieval of one of the components from such a “package”. The constructor PosT is used to construct a data type PosD c. This data type has one parameter c, which allows to produce positions of the type Integer or Float. An instance of a class for a particular data type shows how to execute the operations. REFERENCES AHUJA, R.K., MAGNANTI, T.L. and ORLIN, J.B. 1993. Network Flows: Theory, Algorithms, and Applications. Englewood Cliffs, NJ: Prentice Hall. BIRD, R and WADLER, P. 1988. Introduction to Functional Programming. Hemel Hempstead: Prentice Hall International.
398
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
BLADES, M. 1991. Wayfinding theory and research: the need for a new approach, in Mark, D.M. and Frank, A.U. (Eds.), Cognitive and Linguistic Aspects of Geographic Space. Dordrecht: Kluwer Academic Publishers, pp. 137–165. CAR, A. and FRANK, A.U. 1993. Hierarchical street networks as a conceptual model for efficient way finding, in Proceedings of EGIS'93, Genova, Italy, 29 March -1 April, Vol. 1. Utrecht: EGIS Foundation, pp. 134–139. CAR, A. and FRANK, A.U. 1994a. General principles of hierarchical spatial reasoning -the case of wayfinding, in Waugh, T.C. (Ed.), Proceedings of Spatial Data Handling (SDH 94), Edinburgh, 5–9 September. Vol. 2. Edinburgh: IGU Commission on GIS, and the Association for Geographic Information, pp. 646–664. CAR, A. and FRANK, A.U. 1994b. Modelling a hierarchy of space applied to large road networks, in Nievergelt, J. et al., (Eds.), Proceedings of IGIS'94: Geographic Information Systems. International Workshop on Advanced Research in GIS. Monte Verita, Ascona, Switzerland, Lecture Notes in Computer Science 884. Berlin: Springer-Verlag, pp. 15–24. CAR, A. and FRANK, A. 1996. Formalisierung konzeptioneller Modelle für GIS- Methode und Werkzeug. Internal report. Vienna: Dept. of Geoinformation, TU Vienna. CHRISTALLER, W. 1950. Das Grundgeruest der raeumlichen Ordnung in Europa: die Systeme der europaeischen zentralen Orte. Frankfurt am Main: Kramer. CLAUSSEN, H. and MARK, D.M. 1991. Vehicle navigation systems, in Muller, J.C. (Ed.), Advances in Cartography. London: Elsevier Science Publishers, pp. 161–179. COUCLELIS, H. 1992. People manipulate objects (but cultivate fields): beyond the raster-vector debate in GIS, in Frank, A.U., Campari, I. and Formentini, U. (Eds.), Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639. Berlin: Springer-Verlag, pp. 65–77. DAVIS, E. 1990. Representations of Commonsense Knowledge. San Mateo, CA: Morgan Kaufmann Publishers. DE FLORIANI, L. and PUPPO, E. 1992. A hierarchical triangle-based model for terrain description, in Frank, A.U., Campari, I. and Formentini, U. (Eds.), Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639. Berlin: Springer-Verlag, pp. 236–251. DIJKSTRA, E.W. 1959. A note on two problems in connection with graphs, Numerische Mathematik, 1, pp. 269–271. DUTTON, G. 1996. Improving location specificity of map data—a multi-resolution, metadata-driven approach and notion, International Journal of Geographical Information Systems, 10(3), pp. 253–268. ELLIOTT, R, J. and LESK, M.E. 1982. Route finding in street maps by computers and people, in Proceedings of National AAAI-82 Conference. Los Altos, CA: American Association for Artificial Intelligence, pp. 258–261. FOTHERINGAM, A.S. 1992. Encoding spatial information: the evidence for hierarchical processing, in Frank, A.U., Campari, I. and Formentini, U. (Eds.), Theories and Methods of Spatio-Temporal Reasoning in Geographic Space. Lecture Notes in Computer Science 639. Berlin: Springer Verlag, pp. 269–287. FRANK, A.U. 1990. Spatial concepts, geometric data models and data structures, in Proceedings of GIS Design Models and Functionality Conference. Leichester: Regional Research Laboratory, pp. 1–15. FRANK, A.U. and KUHN, W. 1995. Specifying open GIS with functional languages, in Egenhofer, M.J. and Herring, J.R. (Eds.), Advances in Spatial Databases (SSD'95), Lecture Notes in Computer Science 951. Berlin: SpringerVerlag, pp. 184–195. FRANK, A. and TIMPF, S. 1994. Multiple representations for cartographic objects in a multi-scale tree—an intelligent graphical zoom, Computers and Graphics, Special Issue on Modelling and Visualization of Spatial Data in GIS, 18 (6), pp. 823–829. FREDERICKSON, G.N. 1987. Fast algorithms for shortest paths in planar graphs, with applications, SIAM Journal on Computing, 16(6), pp. 1004–1022. FREKSA, C. 1991. Qualitative spatial reasoning, in Mark, D.M. and Frank, A.U. (Eds.), Cognitive and Linguistic Aspects of Geographic Space, Dordrecht,: Kluwer Academic Press, pp. 361–372. GIBBONS, A. 1985. Algorithmic Graph Theory. Cambridge: Cambridge University Press. GOLLEDGE, R.G. 1992. Do people understand spatial concepts: the case of first-order primitives, in Frank, A.U., Campari, I. and Formentini, U. (Eds.), Theories and Method of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639 Berlin: Springer-Verlag, pp. 1–21.
HIERARCHICAL WAYFINDING: A MODEL AND ITS FORMALISATION
399
GOODCHILD, M.F. and SHIREN, Y. 1992. A hierarchical spatial data structure for global geographic information system, Graphical Models and Image Processing, 54(1), pp. 31–44. GOTTS, N.M. 1992. Human Way finding in Path-Networks: A Survey of Possible Strategies, Working Paper 364. Leeds: Institute for Transport Studies, University of Leeds. GUTTAG, J.V., HOROWTTZ, E. and MUSSER, D.R 1978. The design of data type specifications, in Yeh, RT. (Ed.), Current Trends in Programming Methodology: Vol. 4 — Data Structuring. Englewood Cliffs, NJ: Prentice-Hall, pp. 60–79. HIRTLE, S.C. and HUDSON, J. 1991. Acquisition of spatial knowledge for routes, in Journal of Environmental Psychology, 11, pp. 335–345. HIRTLE, S.C. and JONIDES, J. 1985. Evidence of hierarchies in cognitive maps, in Memory & Cognition , 13(3), pp. 208–217. JOHNSON, D.B. 1977. Efficient algorithms for shortest paths in sparse networks, Journal of the Association for Computing Machinery, 24(1), pp. 1–13. JOHNSON, M. 1987. The Body in the Mind: The Bodily Basis of Meaning, Imagination, and Reason. Chicago: The University of Chicago Press. JONES, M.P. 1994. The Implementation of the Gofer Functional Programming System. Research Report, YALEU/DCS/ RR-1030, Yale University. KHOSHAFIAN, S. and ABNOUS, R 1990. Object Orientation—Concepts, Languages, Databases, User Interfaces. New York, NY: John Wiley & Sons. KUHN, W. 1992. Paradigms of GIS use, in Proceedings of 5th International Symposium on Spatial Data Handling, Charleston, SC, 3–7 August, Vol. 1, pp. 91–103 KUHN, W. 1993. Metaphors create theories for users, Frank, A.U. and Campari, I. (Eds.), Spatial Information Theory: Theoretical Basis for GIS, Lecture Notes in Computer Science 716. Berlin: Springer-Verlag, pp. 366–376. KUIPERS, B. 1982. The “map in the head” metaphor, Environment and Behavior, 14(2), pp. 202–220. KUIPERS, B.J. and BYUN, Y.-T. 1991. A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations, Robotics and Autonomous Systems, 8(8), pp.1–17. KUIPERS, B. and LEVITT, T.S. 1988. Navigation and mapping in large-scale space, AI Magazine, 9(2), pp. 25–43. KUIPERS, B., FROOM, R, LEE, W.-Y. and PIERCE, D. 1993, The semantic hierarchy in robot learning, in Connell, J. and Mahadevan, S. (Eds.), Robot Learning. Dordrecht: Kluwer Academic, pp. 1–24. LAKOFF, G. and JOHNSON, M. 1980. Metaphors We Live By. Chicago: The University of Chicago Press. LAPALME, G., ROUSSEAU, J.-M, CHAPLEAU, S., CORMIER, M., COSSETTE, P. and ROY, S. 1992. GeoRoute, Communications of the ACM, 35(1), pp. 80–88. LISKOV, B. and GUTTAG, J. 1986. Abstraction and Specification in Program Development . Cambridge, MA: MIT Press. MAASS, W. 1993. A cognitive model for the process of multimodal incremental route description, in Frank, A.U. and Campari, I. (Eds.), Spatial Information Theory: Theoretical Basis for GIS, Lecture Notes in Computer Science 716. Berlin: Springer-Verlag, pp. 1–13. MARK, D.M. and FRANK, A.U. 1996. Experiential and formal models of space, Environment and Planning B: Planning and Design, 23, pp. 2–24. McNAMARA T.P., HARDY, J.K. and HIRTLE, S.C. 1989. Subjective hierarchies in spatial memory, Journal of Environmental Psychology: Learning, Memory, and Cognition, 15(2), pp. 211–227. MEDYCKYJ-SCOTT, D. and BLADES, M. 1992. Human spatial cognition: its relevance to the design and use of spatial information systems, Geoforum, 23(2), pp. 215–226. PALMER, E.S. 1977. Hierarchical structure in perceptual representation, Cognitive Psychology, 9, pp. 441–474. ROBERTSON, G.G., MACKINLAY, J.D. and CARD, S.K. 1991. Cone trees: animated 3D visualizations of hierarchical information, in Proceedings of ACM CHI'91 Conference on Human Factors in Computing Systems, pp. 189–194. SAMET, H. 1984. The quadtree and related hierarchical data structures, ACM Computing Surveys, 16(2), pp. 187–260. STEVENS, A. and COUPE, P. 1978. Distortions in judged spatial relations, Cognitive Psychology, 10, pp. 422–437.
400
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
TIMPF, S. 1992. Conceptual modelling of highway navigation, Master Thesis University of Maine, USA. TIMPF, S., VOLTA, G.S., POLLOCK, D.W., and EGENHOFER, M.J. 1992. A conceptual model of wayfinding using multiple levels of abstraction, in Frank, A.U., Campari, I. and Formentini, U. (Eds.) GIS-from Space to Theory: Theories and Methods of Spatio-Temporal Reasoning, Lecture Notes in Computer Science 639, Berlin: SpringerVerlag, pp. 348–367. TVERSKY, B. 1992. Distortions in cognitive maps, Geoforum, 23(2), pp. 131–138. WIRTH, N. 1976. Algorithms+Data Structures=Programs. Englewood Cliffs, NJ: Prentice Hall.
Chapter Thirty Two Integrated Topological and Directional Reasoning in Geographic Information Systems Jayant Sharma
32.1 INTRODUCTION This chapter deals with computational methods that exploit qualitative spatial knowledge, in addition to geometric information, for making inferences about objects in a spatial database. The motivation is to provide such systems as geographic information systems (GIS), which manage the storage and retrieval of large data sets, with intelligent mechanisms to deal with complex spatial concepts for data selection and integration. One purpose of these intelligent mechanisms is to enable intuitive interaction with the data by capturing and reflecting the user’s perception of the world. Intuitive interaction entails providing facilities for the representation of qualitative spatial information and making inferences. Qualitative spatial information is characterised by a finite set of symbols that specify distinctions among spatial configurations. For example the symbol set {north, south, east, west} denotes a system of directions and the set {near, far} a system of qualitative distances. Inference is the process of combining facts and rules to deduce new facts. We investigate the inference of qualitative spatial information from stored base facts. Thus the core problem is to find the spatial relations that are implied by a particular configuration given a set of objects and a set of spatial constraints relating these objects. While spatial inferences may appear trivial to humans, they are difficult to formalise for implementation in an automated system. This problem is germane to GIS because spatial reasoning is extremely useful for searching in large databases containing complex geographic datasets. Previous work on spatial data models and spatial relations concentrated on defining models for representing spatial objects and on defining spatial relations within these models. Work on defining spatial relations independent of the data model or underlying representation usually tackled specific types of relations or their combinations. For example, considerable work has been done on defining topological relations and using the formalism for consistency checking (Egenhofer and Sharma, 1993; Smith and Park, 1992), specifying integrity constraints (Hadzilacos and Tryfona, 1992), maintaining a spatial knowledge base (Hernández, 1993), and optimising queries on topological relations (Clementini et al., 1994; Papadias et al., 1995). These research efforts have concentrated on specific aspects of spatial reasoning rather than developing a comprehensive framework. This chapter concentrates on the use of these formalisms for spatial reasoning. In particular it defines a framework within which the formalisms can be integrated.
402
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
32.2 SPATIAL REASONING Since humans are particularly skilled in spatial cognition and spatial reasoning tasks, such as wayfinding, computational models for spatial reasoning are guided by research in the cognitive sciences and psychology. Spatial reasoning is the process by which information about objects in space and their interrelationships is gathered by various means, such as measurement, observation, or inference, and used to arrive at valid conclusions regarding the objects’ relationships or in determining how to accomplish a certain task. For example, spatial reasoning is used in navigation, designing the layout of an office, or inferring all possible spatial relations between a set of objects using a specified subset of the relations. The latter aspect of spatial reasoning is the focus of this research. We attempt to build computational models and formalisms of spatial relations that permit the inference of new spatial relations from a specified, but possibly incomplete, set of spatial relations between objects. The following sections present an overview of cognitive aspects of human spatial reasoning and their implications for developing formal models of reasoning. 32.2.1 Spatial Cognition The computational issue of interest is the mental representation and model of spatial relations that humans construct. A study of human spatial reasoning abilities helps provide answers to questions like: Which are the properties of the spatial domain and objects that are preserved in the mental model? Which properties are discarded? What is the level of abstraction? What is explicit in the mental representation and what is implicit? Are hierarchical structures used? Answers to these questions determine the level of detail, the encapsulated properties, the built-in functionality, and the overall organisation of knowledge in computational models for spatial reasoning. Research in cognitive psychology has examined humans’ use of hierarchies in organising landmarks in a cognitive map of their environment (Hirtle and Jonides, 1985) and in mental representations of spatial knowledge (McNamara, 1986). Researchers also investigated: (1) whether people use rules of inference and mental models—multiple or unified—for spatial reasoning; and (2) do they use only categorical representations defining classes of spatial relations between objects (Byrne et al., 1989). Examples of categorical representations are connected/disconnected and left/right, whereas coordinate representations determine a division of space based on some unit or resolution of the visual system (Kosslyn et al., 1992). Results of this research indicate that humans: (1) organise cognitive maps hierarchically, and (2) use deductive reasoning via simple inferences and non-deductive reasoning via mental imagery. The research in cognitive issues has implications and utility for artificial intelligence (AI). An essential component of any AI system is a knowledge representation scheme that highlights the core issues and constraints of a problem and thereby facilitates its solution. The representation should reflect its intended use and contain information at the appropriate level of granularity. One possible approach is to develop representation and manipulation schemes that mimic the methods humans use in similar problem solving situations. As a result there is considerable interest, in the areas of AI and robotics, in qualitative reasoning and particularly spatial reasoning. The insights that these fields have drawn from the cognitive sciences about human spatial reasoning are (Freksa, 1991; Freksa and Röhrig, 1993; Hernández, 1994; Kuipers, 1994):
INTEGRATED TOPOLOGICAL AND DIRECTIONAL REASONING IN GIS
403
• The information humans store and process is necessarily qualitative, since their cognitive mechanisms are limited in resolution and capacity. Hence only comparisons between features are made possible, whereas details such as size, shape, and location within some grid are disregarded. Topological information such as inclusion, coincidence, and connectivity is retained fairly precisely in comparison. • The number of features and distinctions encoded is just sufficient to make the identification of objects or situations possible within a given context. • Structural similarities between the represented and representing world are used to capture the constraints and inherent properties of the domain. • Humans use multiple representations. Two commonly used kinds of representation are depictional, where the image acts as an analogy of the precept, and prepositional, in which the relations between identified entities are stored as facts. • Within the context of human spatial reasoning the static data structures that encode the qualitative information about the represented world can be viewed as data or information depending on their semantic content. Knowledge can be viewed in terms of the active processes performed on these data structures. Reconstructing a scene from its verbal description or performing inferences are examples of such active processes. 32.2.2 Qualitative Spatial Information Qualitative information and reasoning deals with a small set of symbols whose semantics may vary depending on the context or scale. For example, the notion of proximity (near) depends on the task—such as walking, driving, or flying—and the scale. One would say that Bangor is near Boston when describing its location to someone in India, which sets the scale implicitly to be the whole of the US, but not when the context is the New England region. Quantitative information in contrast is ideally of arbitrary precision and detail, and independent of context. For example, depending on the desired precision one could state that downtown Orono is 10, 10.4, or 10.438 miles from downtown Bangor. Thus qualitative information is concerned with the “what” and quantitative with “how much” as illustrated in the statements, “parcel A neighbours parcel B” and “parcel A shares a 20.5 meter boundary with parcel B.” Qualitative and quantitative approaches to spatial reasoning are complementary methods. Quantitative spatial relations include such observations as bearings (150° 25'), distances (4.3 miles), and corresponding values derived from coordinates. Such quantitative values are in close relationship with some qualitative spatial relations (Hong et al., 1995). For example, if the azimuth to a point (measured clockwise from due north) is 90° then this corresponds to the cardinal direction East. Likewise, if two regions meet topologically, then the distance between their boundaries is 0. Unlike qualitative spatial relations, quantitative spatial relations depend on precise metric information. For many decision processes qualitative information is sufficient; however, occasionally quantitative measures, dealing with precise numerical values, may be necessary, which would require the integration of quantitative information with qualitative reasoning. Qualitative approaches allow the users to abstract from the myriad of details by establishing “landmarks” (Gelsey and McDermott, 1990) when “something interesting happens”; therefore, they allow users to concentrate on a few, but significant events or changes (Egenhofer and Al-Taha, 1992). This working pattern is typical for scientists and relevant for geographic databases in which scientists record the data of their experiments—frequently time series observations— with the goal of subsequently extracting the “interesting” stages. By abstracting details and highlighting
404
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
significant aspects of a problem, qualitative spatial information facilitates planning an approach to a solution and in determining what further information is needed. 32.3 FORMALISMS FOR QUALITATIVE SPATIAL REASONING The interest in inference mechanisms and formalisms for qualitative spatial reasoning stems from cognitive science research in the area of human spatial cognition and reasoning. The research has led to a better understanding of humans’ ability to deal with incomplete, imprecise, and vague information, which has multiple possible interpretations. Spatial relations between objects can be classified as being metrical, directional, or topological relations (Pullar and Egenhofer, 1988; Worboys, 1992). We identify spatial objects with a distinct identity with the uppercase letters A, B, C, etc. The composition of spatial relations, denoted by “;”, is a useful inference mechanism that permits the derivation of a spatial relation between two objects A and C based on their relation with a common object B. The spatial relation is denoted by the symbol ri. The composition of A r1B with B r2 C =>A r3 C is denoted by r1 ; r2 => r3 The composition may often result in a set of relations denoted by {}, for example r1 ; r2 => {r3, r4, r5} implying that any one of them can be the relation between objects A and C Consistency requirements dictate that the inferred set of relations between objects A and C be the set intersection of the compositions over common objects. For example if A r1 B ; B r2 C {r3, r4, r5} and A r1 D ; D r2 C => {r3, r5, r6} then the set of possible relations between A and C is {r3) r5}. Such inferences require a definition of the spatial relations involved and the corresponding composition tables that define the results of each possible composition. The formalisations of the relations and their compositions taken together form the model for spatial reasoning used in this work. Composition tables have been defined for topological (Egenhofer, 1991) and directional relations and qualitative distances (Frank, 1992; Hong et al., 1995) . The formalism for topological relations permits the inference that A is inside C from the facts: (1) A is inside B, and (2) B is inside C. Similarly, knowing that A is North of B and B is Northeast of C allows the inference that A is North or Northeast of C. 32.3.1 Heterogeneous and Integrated Spatial Reasoning While the categorisation of spatial relations is a useful tool for organising different types of spatial knowledge, developing formalisms, and providing a match with such terms and prepositions—as adjacent, in, and left of—in natural language, in some situations the spatial knowledge must considered all together. As compared to topological and directional relations, which have useful individual composition tables, reasoning about qualitative distances requires considering knowledge of the relative orientation of the objects involved. For example, if A is near B and B is near C then A could be near C or at a medium distance from it depending on the orientation of AB and BC. If AB and BC have opposite orientations then A is near C, whereas if they have the same orientation then A is at a medium distance from C. The information on the relative orientation enhances the knowledge regarding relative distances. Reasoning about such combinations of spatial relations, for example qualitative distance and orientation, will be called integrated spatial reasoning when all the relations are used in conjunction. In the previous example both the distance and orientation information were used. In such cases the spatial relations between objects are given as tuples, for example A [near, left] B, Heterogeneous spatial reasoning differs from integrated spatial reasoning in that combinations of single spatial relations of different types are considered at each step of the
INTEGRATED TOPOLOGICAL AND DIRECTIONAL REASONING IN GIS
405
Figure 32.1:. An example that requires heterogeneous spatial reasoning.
reasoning process. For example, heterogeneous spatial reasoning is used in inferring that A is North of C from A North of B and B contains C since a directional and topological relation are involved in the inference. Heterogeneous and integrated spatial reasoning enhance the capabilities of an automated spatial reasoning system given the appropriate composition tables and hence inference mechanisms. 32.3.2 The Need for Integrated and Heterogeneous Spatial Reasoning In a pictorial representation or natural language description of a scene all types of spatial relations coexist and their coexistence illustrates the artificiality of the categorisation of spatial relations into topological, directional, and metrical relations. Often both topological and directional information is available about objects in a scene and it is necessary to use a combination of the two types of information in order to infer new facts that could not be inferred by considering individual types of relations in isolation. For example, an appropriate formalism would allow the inference of the facts A disjoint D, and A West of D from the specified facts, A is West of B, B overlap C, and C West of D (see Figure 32.1). It is evident by visual inspection that object A, in Figure 32.1, is West of D; however this fact cannot be inferred using pure symbolic manipulations and composition tables for topological and direction relations independently. The directional relation between objects A and C and objects B and D is unknown and hence neither composition (ArdB ; Brd D) or (A rd C ; C rd D) is feasible. Since the directional relation West implies the topological relation disjoint the composition of topological relations suggests itself. However the result of a composition of topological relations is a non-empty set of topological relations and hence the directional relation between objects A and D would remain unknown. In order to infer the directional relation the reasoning process must include the deduction that since B and C overlap they have some part in common, say C’. Therefore, A is West of C’ since it is a part of B and C’ is West of D and hence A is West of D. Facilitating this reasoning process requires a comprehensive formalism for the composition of combinations of various types of spatial relations. For an example of a situation where both the topological and directional relation information must be utilised in conjunction consider the scene depicted in Figure 32.2. Suppose that the spatial relationships A meet B, A North B, B meet C, and B North C are specified and the relations between A and C must be inferred. The composition A meet B; B meet C results in a set of possibilities, i.e., A {disjoint, meet, equal, overlap, coveredBy, covers} C , and therefore provides little information. If, however, the topological and directional relations are considered in conjunction, i.e., A meet and North B; B meet and North C, it is evident that the result should be A disjoint and North C. The above examples indicate the usefulness of heterogeneous and integrated spatial reasoning. We present a formal description of various types of spatial reasoning in the following section.
406
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 32.2: An example that requires integrated spatial reasoning.
32.3.3 Formal Definition of Heterogeneous and Integrated Spatial Reasoning An automated spatial reasoning system may deal with each type of relation individually, with various combinations of relations, or with conjunctions of relations when, for example, both the topological and directional relations between pairs of objects are known. The three situations will be referred to as isolated, heterogeneous, and integrated spatial reasoning, respectively. In the following paragraphs the symbols rt and rd denote the topological and direction relation respectively and the braces {} indicate that the result of a composition is a non-empty set of possible spatial relations. • Homogeneous spatial reasoning involves deriving a single type of spatial relation between objects given two spatial relations of the same type. For example, inferring A disjoint C given (1) A disjoint B, and (2) B contain C, where disjoint and contain are both topological relations. Formally this form can be expressed as: (32. 1a) and: (32. 1b) • Heterogeneous spatial reasoning involves deriving a spatial relation of either type given two spatial relations of different types. An example is the inference of A North C from: (1) A inside B; and (2) B North C, where inside is a topological relation and North a directional relation. Formally this form can be expressed as: (32.2a) and: (32.2b) • Mixed spatial reasoning involves the derivation of spatial relations of one type from the composition of two spatial relations of a different type. An instance of mixed spatial reasoning is the inference of A disjoint C from the facts A North B and B North C. The formal description of this form is: (32.3a) and:
INTEGRATED TOPOLOGICAL AND DIRECTIONAL REASONING IN GIS
407
Figure 32.3. Composition of combinations of spatial relations.
(32.3b) • Integrated spatial reasoning involves deriving each type of spatial relation given two sets of identical types of spatial relations between objects. The instances of the spatial relations in each set may differ but both sets will have the same number and types or spatial relations. An example is inferring A disjoint and North C from: (1) A meet and North B, and (2) B meet and North C. Given the set of spatial relations between two objects, i.e., {rt , rd } the operator group creates a tuple, [rt, rd ]. The inverse operator, ungroup, creates the set from the tuple. Group and ungroup are required since the composition is defined on a tuple of spatial relations and for mapping from the heterogeneous to the integrated form.. The integrated form can be formally expressed as: (32.4) The three mechanisms of homogeneous, heterogeneous, and mixed spatial reasoning will together be considered as a combined spatial inference mechanism that works with individual spatial relations. Combined and integrated spatial reasoning can be considered as two forms of spatial reasoning. Either form can be used depending on the completeness of the available information. For the example illustrated in Figure 32.1 all the topological and directional relations between the objects concerned is not available and hence the heterogeneous form is required. The integrated form of composition (Equation 32. 4) is not a completely new or distinct method. It is a combination of the isolated and heterogeneous forms as shown in Equations 32.5a and 32.5b. (32.4a) and: (32.4b) The research question is whether the two methods, Equation 32.4 and Equations 32.5a and 32.5b, are equivalent (see Figure 32.3).
408
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 32.4: Iconic representation of integrated relations.
A significant contribution of this work is demonstrating that the set of results obtained by individual compositions is identical to the set formed by ungrouping the result from tuple compositions. 32.4 COMPARING INTEGRATED WITH COMBINED SPATIAL REASONING In order to compare the two inference methods (Equations 32.4 and 32.5) we need additional composition tables for individual combinations of topological and directional relations, i.e., rt, rd and rd; rt, and for tuples of relations, i.e., [rt, rd ] ; [rt, rd ]. The composition tables are based on the formal definitions of topological and directional relations. Topological relations are described by the 4-intersection model (Egenhofer and Franzosa, 1991) which is based on elementary concepts of boundary and interior in point-set topology. The values, empty or non-empty, of the four intersections between the boundary and interior of the objects are used to specify the relations. Directional relations are described by the projection-based system (Frank, 1995) which segments the space surrounding an object’s minimum bounding rectangle (MBR) into nine zones. The nine zones are the eight cardinal directions, {North, Northeast, East, Southeast, South, Southwest, West, Northwest}, and a neutral zone denoting the relation “at the same location”. Since projection-based direction relations use MBRs, we propose their use for topological relations also thus providing a common method for specifying both. Relations between MBRs can be represented as pairs of interval relations (Guesgen, 1989) . Hence we use existing composition tables for interval relations (Allen, 1983) to determine the composition tables for combinations of topological and directional relations. The composition of interval pairs results in a set of interval pairs that specifies the possible relations between the MBRs. We map this set of possibilities onto topological and directional relations thereby defining the composition of the two relation types. The composition tables for heterogeneous and integrated spatial reasoning are given below. The integrated topological and directional relations must at least be a Cartesian product of the sets {disjoint, meet} and {North, Northeast, East, Southeast, South, South West, West, Northwest} giving 16 relations. These are: DisjointNorth, DisjointNortheast, DisjointEast, DisjointSoutheast, DisjointSouth, DisjointSouthwest, DisjointWest, DisjointNorthwest, MeetNorth, MeetNortheast, MeetEast, MeetSoutheast, MeetSouth, MeetSouthwest, MeetWest, MeetNorthwest. The above set of 16 relations, however, does not provide a complete coverage since objects may overlap or have a containment relationship. For such situations we introduce the broadly defined relationship OverlapSameDirection. This relation specifies that the objects share a common region and includes the possibility that one object is contained in or equal to the other. The relation OverlapSameDirection is used only as an entry in the cells of the 16×16 composition table for the integrated relations. The integrated spatial relations are represented iconically using a cyclic pattern of circles corresponding to the eight directions, two topological relations, and overlapping objects (Figure 32.4). The outer circles correspond to the relations DisjointNorth through DisjointNorthwest, the inner circles to the relations MeetNorth through MeetNorthwest, and the innermost to OverlapSameDirection. The same
INTEGRATED TOPOLOGICAL AND DIRECTIONAL REASONING IN GIS
409
Figure 32.5: Iconic representation of disjunctions of integrated relations.
iconic representation can be used to indicate a disjunction of relations. Figure 32.5 shows the representation of the relations Not South and (DisjointNortheast or MeetNortheast). Table 32.1 gives the results of the composition of heterogeneous and integrated topological and directional relations. The completely filled in circles indicate inferences that are obtainable using integrated or combined isolated and heterogeneous reasoning. The circles with thick borders indicate combined topological and directional relations that are not inferred if integrated spatial reasoning is used but are included in the inferred set if combined isolated and heterogeneous spatial reasoning is used. Table 32.1 is the composition table for the case when both pairs of objects (A, B) and (B, C) are disjoint. Table 32.1: Composition table for A disjoint B and B disjoint C.
The details of the derivation of this and other composition tables for reasoning about topology and direction can be found in Sharma (1996).
410
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
32.5 DISCUSSION The composition table can be used to solve the two problems outlined in Section 32.3.2. Consider the second example where the given spatial relations are (1) A meet B; (2) B meet C; (3) A North B; and (4) B North C. Using the composition table one can determine that the integrated relation between objects A and C is A DisjointNorth C and therefore the individual relations are A disjoint C and A North C. The composition table also has some patterns from which the following conclusions can be drawn: • Integrated spatial reasoning gives the same results as combined spatial reasoning if mixed spatial reasoning is used for inferring topological relations from the composition of directional relations. The composition of directions that are conceptual neighbours does not ever result in the topological relation meet. Therefore only disjoint is in the inferred set of spatial relations whenever the composition of topological and directional relations involves directions that are conceptual neighbours. This creates the pattern observed among the diagonal and off-diagonal entries in Table 32.1. In summary, we may conclude that integrated spatial reasoning is the preferred choice whenever the directional relations involved are conceptual neighbours. 32.6 CONCLUSIONS This chapter has presented a method for inferences using combined knowledge of topological and direction relations. The use of Allen’s intervals, as a canonical representation for both topological and direction relations, has permitted a limited but useful amount of reasoning to be performed. In particular we show that: • Heterogeneous spatial reasoning about topological and directional relations is useful when there is a containment relationship between one or more pairs of objects. This is because the contained object has the same directional relationships with other objects as the containing object. Hence a topological relation implies a directional relation. • Integrated spatial reasoning about topological and directional relations is useful when the topological relations among objects are either disjoint or meet. In such a situation the directional relation helps localise the positional relationship and therefore the inferences result in a smaller set of possibilities than the set obtained using topological or directional relations in isolation. • Integrated and combined spatial reasoning give the same set of inferred relations. Hence integrated spatial reasoning can be used in place of combined spatial reasoning whenever all the necessary spatial information is available. This is computationally more efficient since the predetermined and stored composition tables for integrated spatial relations simplify the inference process. Future work will include: • Reasoning with interval relations for both axes to generate compositions for Northeast, Southeast, Southwest, and Northwest.
INTEGRATED TOPOLOGICAL AND DIRECTIONAL REASONING IN GIS
411
While this extension is straight forward, the complexity of the problem is increased due to thenumber of ambiguous situations that can result. • Compositions for other sequences of relations, for example topological; direction; topological. The result of this sequence of compositions being direction and topological relations. This would permit inferences like: A inside B ; B North C ; C covers D A North of D and A disjoint D. • Testing different definitions for direction relations between extended objects. The strict definition of West, for example, can be relaxed to allow partial overlap of regions and permit the notion of Partially West. This relaxation would allow inferences such as determining that Oregon is West of Wyoming given that Oregon is West of Montana and Montana is PartiallyWest of Wyoming. • Using information about legacy in reverse mappings. Legacy information will help reduce the ambiguity when mapping back from interval to topological relations since it provides information about the initial mapping of topological relations onto interval relations. ACKNOWLEDGEMENTS This work was performed while the author was with the Department of Spatial Information Science and Engineering and the National Center for Geographic Information and Analysis, University of Maine. It was partially supported by a University Graduate Research Assistantship and by NSF grant IRI-9309230 (Principal Investigator: Max Egenhofer). REFERENCES ALLEN, J.F. 1983. Maintaining knowledge about temporal intervals, Communications of the ACM, 26(11), pp. 832–843. BYRNE, R.M., and JOHNSON-LAIRD, P.N. 1989. Spatial reasoning, Journal of Memory and Language, 28(5), pp. 564–575, CLEMENTINI, E., SHARMA, J. and EGENHOFER, M.J. 1994. Modelling topological spatial relations: strategies for query processing, Computers and Graphics, 18(6), pp. 815–822. EGENHOFER, M.J. 1991. Reasoning about binary topological relations, in Günther O. and Schek H.-J. (Eds.), Advances in Spatial Databases-Second Symposium, SSD ‘91, Vol. LNCS 525. Switzerland: Springer Verlag, pp. 143–160. EGENHOFER, M.J., and AL-TAHA, K. 1992. Reasoning about gradual changes of topological relationships.,in Frank, A.U., Campari, I. and Formentini, U. (Eds.), Theories and Models of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science, No. 639. Berlin: Springer-Verlag, pp. 196–219. EGENHOFER, M.J., and FRANZOSA, R. 1991. Point-Set topological spatial relations, International Journal of Geographic Information Systems, 5(2), pp. 161–174. EGENHOFER, M.J., and SHARMA, J. 1993. Assessing the consistency of complete and incomplete topological information, Geographical Systems, 1(1), pp. 47–68.
412
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
FRANK, A. 1992. Qualitative reasoning about distances and directions in geographic space, Journal of Visual Languages and Computing, 3(4), pp. 343–371. FRANK, A. 1995. Qualitative spatial reasoning: cardinal directions as an example, International Journal of Geographical Information Systems, 10(3), pp. 269–290. FREKSA, C. 1991. Qualitative spatial reasoning, in Mark, D. and Frank, A.U. (Eds.), Cognitive and Linguistic Aspects of Geographic Space. Dordrecht: Kluwer Academic, pp. 361–372. FREKSA, C. and RÖHRIG, R. 1993. Dimensions of qualitative spatial reasoning. Paper presented at the QUARDET (Qualitative Reasoning in Decision Technologies), Barcelona, Spain. GELSEY, A., and McDERMOTT, D. 1990. Spatial reasoning about mechanisms, in Chen, S. (Ed.), Advances in Spatial Reasoning, Vol. 1. Norwood, NJ: Ablex Publishing Corp, pp. 1–33. GUESGEN, H.W. 1989. Spatial Reasoning Based on Allen’s Temporal Logic, Technical TR-89–049, International Computer Science Institute. HADZILACOS, T., and TRYFONA, N. 1992. A model for expressing topological integrity constraints in geographic databases. Paper presented at the Theories and Models of Spatio-Temporal Reasoning in GIS, Pisa, Italy. HERNÁNDEZ, D. 1993. Maintaining qualitative spatial knowledge. Paper presented at the European Conference on Spatial Information Theory, COSIT 93, Italy. HERNÁNDEZ, D. 1994. Qualitative Representation of Spatial Knowledge, Vol. LNAI 804. New York: SpringerVerlag, HIRTLE, S.C., and JONIDES, J. 1985. Evidence of hierarchies in cognitive maps, Memory and Cognition, 13(3), pp. 208–218. HONG, J.-H., EGENHOFER, M.J., and FRANK, A.U. 1995. On the robustness of qualitative distance and direction reasoning. Paper presented at the Autocarto 12, Charlotte, NC. KOSSLYN, S.M., CHABRIS, C.F., MARSOLEK, C.J. andKOENIG, O. 1992. Categorical versus coordinate spatial relations: computational analysis and computer simulations, Journal of Experimental Psychology: Human Perception and Performance, 18(2), pp. 562–577. KUIPERS, B. 1994. Qualitative Reasoning: Modeling and Simulation with Incomplete Knowledge. Cambridge, MA: MIT Press. McNAMARA, T.P. 1986. Mental representations of spatial knowledge, Cognitive Psychology, 18(1), pp. 87–121. PAPADIAS, D., THEODORIDIS, Y, SELLIS, T. and EGENHOFER, M.J. 1995. Topological relations in the world of minimum bounding rectangles: a study with r-trees. Paper presented at the ACM SIGMOD, San Jose, CA. PULLAR, D.V., and EGENHOFER, M.J. 1988. Toward the definition and use of topological relations among spatial objects. Paper presented at the Third International Symposium on Spatial Data Handling, Sydney, Australia. SHARMA, J. 1996. Integrated Spatial Reasoning in Geographic Information Systems: Combining Topology and Direction. Ph.D. thesis. Orono: Department of Spatial Information Science and Engineering, University of Maine. SMITH, T., and PARK, K. 1992. Algebraic approach to spatial reasoning, International Journal of Geographic Information Systems, 6(3), pp. 177–192. WORBOYS, M.F. 1992. A generic object model for planar geographic data, International Journal of Geographical Information Systems, 6, pp. 353–373.
Chapter Thirty Three Distances for Uncertain Topological Relations Stephan Winter
33.1 INTRODUCTION Uncertainty is an inherent property of observations. Abstracting the real world to conceptual objects is a step of generalisation, and the measurement, taking place on abstracted objects, propagates this uncertainty, adding systematic, gross and random errors. Each spatial analysis is infected by these sources of uncertainty. Therefore it is necessary to introduce propagation of uncertainty in spatial analysis to allow an assessment of the results. The scope of this chapter is to combine the process of observation with a mathematical model of qualitative spatial relations, modelling the randomness of the observations. A methodology is presented for probability-based decisions about spatial relations. When determining spatial relations from positional uncertain objects, one has to distinguish between quantitative relations, which become imprecise, and qualitative relations, which become uncertain. Topological relations, being of qualitative nature, may or may not be true in the presence of positional uncertainty. For example, assuming that the overlay of two independent objects indicates a very small overlap, the question arises whether the two objects could in reality be neighbours. Comparing the degree of overlap to the size of uncertainty will allow a decision to be made, and this decision to be assessed. This chapter describes the inference from positional uncertain objects to observations characterising the topological relations. As observations, it introduces a distance function, based on a skeleton. These observations allow a representation of topological relations which is equivalent to the 9-intersection model by Egenhofer, but yield additionally metric information. The inference from this description to the uncertainty of derived topological relations is treated with a statistical classification approach. Probabilities of single relations are determined, and the relation with maximum probability, given the evidence from observations, is chosen. It is shown that it is sufficient to use the minimum and maximum distance, and to classify the relation, depending on the signs of these two values, which is done by a Bayesian classification of the two distances. From these distance classes the decision about the topological relation of the two objects is derived. Other approaches for handling uncertainty are discrete, using error bands, or fuzzy, with the problem of weaker results. With the strong connection to an observation process, the research described in this chapter intends to give a more valuable decision method, with probabilities as interpretable results, which should be useful for the assessment and propagation in spatial reasoning processes.
414
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
33.2 RELATED AND FUNDAMENTAL WORK 33.2.1 Uncertainty of Objects and Relations There are a few ideas to handle positional uncertainty of spatial objects in GIS, mainly by bands or by fuzzy sets. The modelling of spatial relations was until now been based on these models. Statistical models are neglected in GIS because of their complexity. While the positional uncertainty of a point can be described by a 2×2 covariance matrix, objects of higher dimensionality—curves, or bounded regions—need additional efforts in modelling correlations and superpositions (see also Vauglin, Chapter 36, this volume). For these reasons existing models remain discrete or fuzzy. Bands, replacing linear boundaries of regions, are a discrete two-dimensional representation of positional uncertainty. There are bands in use with a constant width (α -bands) (Chrisman, 1982; Perkal, 1956), and socalled error-bands, which may have variable widths. The error band for linear segments can be calculated stochastically as of hyperbolic shape (Caspary and Scheming, 1992; Wolf, 1975). Bands disturb Euclidean topology, but they allow in a first instance the differentiation of topological relations for their uncertainty (Clementini and Di Felice, 1996). The interpretability of a discrete subrelation is poor, of course, and an assessment or a numerical scale cannot be given. Kraus and Haussteiner calculate a map of the probabilities of points in R2 being inside a polygon. In principle their map is also characterised by hyperbolic isolines (Kraus and Haussteiner, 1993). Propagation of positional uncertainty to derived metric parameters—line intersections, surface areas—is treated for example, by Kraus and Kager (1994). Another way of modelling uncertainty is the interpretation of regions as two-dimensional fuzzy sets (Zadeh, 1965; Molenaar, 1994). Then the problem arises how to determine the fuzzy membership values, and how to interpret the results of fuzzy reasoning. In contrast to fuzzy sets a probability distribution can be interpreted by specifying an experiment, which follows the distribution. Wazinski used an α -band to derive graded topological relations between very restricted objects (Wazinski, 1993). These graduations are comparable to fuzzy measures of the relations. 33.2.2 Representation of Topological Relations This chapter uses the 9-intersection, a specific representation model for binary topological relations (Egenhofer and Herring, 1991), the main principles of which are summarised below. The nine intersection sets between the interior (X°) the boundaries, and ( ) the exterior (Xe) of two spatial objects are used to characterise sets of topological relations. For simple and regular closed regions (Worboys and Bofakos, 1993) in R2 a set of eight relations can be distinguished (Egenhofer and Franzosa, 1991). Spatial objects with other topological properties differ in the number of topological relations, but this number is always greater than eight (Egenhofer and Herring, 1991). The research described in this chapter focuses on simple, regular closed regions, being interested more in developing a method than in being complete. The set of topological relations R consists of (see also Figure 33.1): (33.1)
DISTANCES FOR UNCERTAIN TOPOLOGICAL RELATIONS
415
Figure 33.1: The set of eight topological relationships which can be distinguished by the 9-intersections for single, regular closed regions.
This set can be ordered by a planar graph, the conceptual neighbourhood graph (CNG), based on concepts like topological neighbourhood or topological distance (Egenhofer and Al-Taha, 1992). Additionally it is possible to direct the edges of the graph, using the concept of dominance (Galton, 1994; Winter, 1994). Definition (dominance): A relation is dominating against its neighbouring relations, if it holds in some translation or deformation only at a point of time. The edge directions in the CNG are taken from the dominating relation to the dominated relation. Figure 33.2 shows this CNG, extended by some ideas from Section 33.3 below. The next section develops a distance function to replace the non-emptiness of intersection sets in the determination and representation of topological relations, while in Section 33.4 a model of observation is applied to the function which allows classification of distances, and topological relationship. 33.3 This section develops an alternative representation to the 9-intersection with full compatibility, but with additional properties, yielding metric information of the distance of two regions. 33.3.1 Relation Clusters The considered regions are restricted by two conditions:
416
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 33.2: The conceptual neighborhood graph, partitioned into clusters of the neighborhood of Equal and the neighborhood of Touch.
• The uncertainty about the position of each region has to be small against the size of the region. • The question whether a point is inside a region must be related (in practice) at most to one segment of the boundary. In geodetic contexts these restrictions usually are fulfilled. For example, in cadastral surveying object dimensions are in decametres while precision is in centimetres, and in topographic mapping object dimensions are in hectometres while precision is in meters. The geometric restrictions give justification to partition the CNG into two connected subgraphs, or to partition α R into two relation clusters C1, C2. Definition (clusters of relations): C1 is the set of relations which consists of Touch and its neighboured relations: C1=(Disjunct, Touch,WeakOverlap} Cluster C is the set of relations which consists of Equal and its neighboured relations C2={StrongOverlap, Covers, CoveredBy, Contains, Contained By,Equal} This partitioning cuts the central relation Overlap into a WeakOverlap and a StrongOverlap, so that both clusters are centred around a dominant relation (Touch and Equal respectively) and contain all the relations which are directly neighbouring these central relations (Figure 33.2). The weight of the relation Overlap is based on an overlap factor OF: (33.2) With that it can be defined simply:
DISTANCES FOR UNCERTAIN TOPOLOGICAL RELATIONS
417
The idea behind splitting Overlap is the observation that a situation with an overlap factor near to 0.5 is insensitive to imprecision and always to be classified as Overlap. 33.3.2 Ternary Skeletons If an observed relation between two regions A and B is out of C1, then positional uncertainty is linked with the uncertainty of the relation about the intersection sets and And if an observed relation is out of C2, then the positional uncertainty is linked with the uncertainty of the relation about the intersection sets , and . The link between positional uncertainty and other intersection sets is then redundant. In order to develop a metric measure of the uncertainty of intersection sets three zones O, P, and Q, are defined, each consisting of the following intersection sets: (33.3)
(33.4) All zones are open sets. Following the concept of a zonal skeleton (Lantuejoul, 1978), for skeletonization, the closure of the zones P and Q is considered as foreground X, and O as background . A skeleton S(X) is the set of centres of the maximal discs which are in a closed set X (Serra, 1982). Then an exoskeleton is the skeleton S( ). Definition (zonal skeleton): A (ternary) zonal skeleton is the subset of an exoskeleton on disjunct particles P and Q, where the maximal discs touch both P and Q. With two different initialisations of P and Q (Eq. 33.3, 33.4) it is possible also to distinguish S1(Rel α C1) and S2(Rel α C2). A zonal skeleton is a finite union of simple lines (Lantuejoul, 1978), but neither S1 nor S2 must be connected. 33.3.3 Morphological Distance Functions Now it is possible to use the zonal skeleton to introduce a distance function between two particles, i.e. a diameter function in uncertain intersection sets. The idea is based on the invocation of the diameter d of the maximal discs at each point s of the skeleton si, i α {1, 2}. Definition (morphological distance): A function AB (s) between two regions A and B with the following properties:
418
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
for S α S1 and d α R+ is defined as the morphological distance. The name and sign of the morphological distance follows from the morphological operations of dilation (+) or erosion (-) of A to reach the skeleton. In the following paragraphs the morphological distance is indicated in short as & AB . It is now easy to show that the morphological distance is symmetric along S1 (), and antisymmetric along 2 S (). The next step is to concentrate on the range of morphological distances between A and B, and reduce the actual values further to distance classes based on their sign. Definition (distance classes): The following classes for morphological distances are more defined as:
With the triple consisting of the relation cluster C1, the class β min of the minimum distance , and the class β max of the maximum distance along S1 have found an equivalent representation of the 9-intersection, (see Table 33.1): (33.5) The variations of the triple in Table 33.1 are complete (Winter, 1996). Table 33.1: Equivalence in the relations of the 9-intersection and of the triple consisting of the relation cluster C and the classes of extremal morphological distances C
ω (
C1
β β β β β β β β β
C2
2 0 1 1 1 0 1 2 0
min)
ω ( β β β β β β β β β
2 2 2 2 0 2 1 2 0
max)
Relation Disjunct Touch WeakOverlap StrongOverlap Covers CoveredBy Contains ContainedBy Equal
33.4 CLASSIFICATION OF TOPOLOGICAL RELATIONS 33.4.1 Uncertainty of Abstraction Distance classes have been defined through a mathematical definition, α 0 stands for AB=0, which is for a continuous random variable like impossible (P( = 0) = 0). This argument coincides with the observation, that human concepts of IsZero always have a natural width. The width depends merely on the context of an observation.
DISTANCES FOR UNCERTAIN TOPOLOGICAL RELATIONS
419
Figure 33.3: The inquiry among experts yields a probability distribution for the fuzziness of an abstract concept of IsZero.
Consider for example a surveyor, who will always avoid to mark a new point at a distance of, let us say, 5 cm of an already captured point; instead he will use that one. Therefore, if one finds in cadastral datasets two points nearer than 5 cm one has strong support for the assumption that the same point was meant. Modelling the width of a concept (μ α can be done by asking experts. Let us collect a high number of answers from independent experts. Then the function which describes the number of agreements with the concept for all values of R is a probability density function Pμ α (Figure 33.3). This function could be described by two parameters: a mean width α , and a smoothness of the concept, given with α α . Such a model would be consistent with the mathematical concept of “=0”, because in the case of α =0 and α α =0 the density function degenerates to a α -function. Density functions with α 1 and α 2 can be set up in a similar way because both intervals are in practice limited either by the region diameters or by a possibly minimum bounding rectangle containing A and B. 33.4.2 Uncertainty of Measurement The width of a concept is (in a first step) independent from measurement, which introduces additionally gross, systematic and random errors. Here only random errors are considered, because gross and systematic errors are in principle avoidable. Measured variables are the boundaries of the regions. In a simple model we may assume that the random error may be described by a probability density function p9, which depends on α A, α B and of two regions A and B. 33.4.3 Combined Observation Uncertainty With the assumed independence between abstraction and measurement we can write: It follows that: or, with a sufficient small interval α &:
420
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(33.6)
33.4.4 Classification Under the given uncertainties, it is necessary to classify an observed (extremal) morphological distance into one of the classes of R. With a maximum likelihood classification: (33.7) The theorem of Bayes (Koch, 1987) is used to calculate the vector of three conditional probabilities, i α {1, 0, 2} : (33.8) The probability p(9/α i)can be calculated with Equation 33.6, with the benefit of eliminating the factor α , which appears in Equation 33.8 in each product of numerator and denominator. Because of finite intervals in α 9, the probability of a class β i varies along the length of the interval i, which leads to a low probability of β 0, and high probabilities for α 1 and β 2. With this consideration, Equation 33.8 can be solved with given α , β B, and . Finally we need to show the transition from classifying distances to a classification of relations. An observed triple (C1, min, max) can be classified by Equation 33.7 to (Ci, β min, β max). With P(C=Ci)=1, a consequence of the discussion in Section 33.3.1 is that: (33.9) Referring to Table 33.1 this probability is also the probability of the topological relation between A and B. Probabilities of alternative classifications can also be calculated allowing a decision to be assessed. 33.5 DISCUSSION The research presented in this chapter has defined a morphological distance function which is used to determine the topological relation between two regions. It has also shown the equivalence of the extremal distances and the known representation by intersection sets. Considering the two regions as (positionally) uncertain, the observation of the distances can be modelled statistically. Within this model it is possible to propagate the uncertainty of the observations to the uncertainty of topological relations. Applying a statistical decision rule, the decision yields probabilities of the classification result, and also of alternative relations, which allow an assessment of the decision. The approach has some new aspects: • a combined statistical model of observation uncertainty and relation uncertainty; • a statistical model of the lack of definition in spatial abstraction;
DISTANCES FOR UNCERTAIN TOPOLOGICAL RELATIONS
421
• a new view on the conceptual neighbourhood graph, which earlier was used in the context of motion or deformation, and now is adapted to positional uncertainty; • a partitioning of the conceptual neighbourhood graph, based on weighting the central relation Overlap; • the use of morphological distances, which keep metric information about the magnitude of intersection sets, instead of empty or non-empty intersection sets. The proposed method is relevant for all aspects in GIS: • Input. Single data layers are less involved here, because they follow semantical constraints in their topological structure. But data homogenisation between layers of different thematic classes, as for example after data import, requires techniques to support decisions for eliminating slivers, etc. The problem of a geometric determination of common boundaries is not touched. • Management. The topological structure usually is the basis of a data model. In a first step data homogenisation may lead to a topological structure keeping the relations certain, for example, by maximum likelihood decisions, cutting alternatives and probabilities. Then storing positional uncertainty of objects has to be solved elsewhere. But for keeping alternatives and probabilities further, data models must be developed, and the propagation of uncertain topological structures in spatial analysis must be investigated. • Analysis. Up to now to reason from a set of known topological relations to additional ones is a logical problem. Now the propagation of probabilities could be introduced in the reasoning. Also the existence and the probability of alternative relations to the known uncertain relations have to be investigated. Possibly additional rules for reasoning, or combined probabilities in reasoning are to be handled with. Another problem is the consistency in a network of topological relations, which includes alternative relations. • Presentation. Visualising uncertainty is an actual theme of research, but in the context of topological relations, or more generally of qualitative relations it is, for our knowledge, a completely new question. The ideas presented in this chapter are worked out for simple regions and conditions of small positional uncertainty. Further research should extend these ideas for complex objects— an intermediate step could be generalised regions (Egenhofer et al, 1994), or for objects of other dimensions. This research should prove that the ideas hold for simple ID-objects in R1, or for simple 3D-objects in R3, and investigate whether the method can be applied to other qualitative spatial relationships. Additional research is also needed extending the sources of uncertainty, or combining other techniques for handling uncertainty, for example, knowledge from thematic properties, or cartographic generalisation rules. REFERENCES CASPARY, W., SCHEURING, R. 1992. Error-bands as measures of geometrical accuracy, in Proceedings ofEGIS ‘92, Utrecht: EGIS Foundation, pp. 226–233. CHRISMAN, N.R. 1982. A theory of cartographic error and its measurement in digital databases, Proceedings of AutoCarto 5, Crystal City, pp. 159–168. CLEMENTINI, E. and DI FELICE, P. 1996. An algebraic model for spatial objects with indeterminate boundaries, in Burrough, P.A., Frank, A.U. (Eds.), Geographic Objects with Indeterminate Boundaries. London: Taylor & Francis, pp. 155–169.
422
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
EGENHOFER, M.J., AL-TAHA, K.K. 1992. Reasoning about gradual changes of topological relationships, in Frank, A.U., Campari, L, Formentini, U. (Eds.), Theories and Models of Spatio-Temporal Reasoning in Geographic Space, Lecture Notes in Computer Science 639. New York: Springer, pp. 196–219. EGENHOFER, M.J., FRANZOSA, R.D. 1991. Point-set topological spatial relations, International Journal of Geographical Information Systems,5(2), pp. 161–174. EGENHOFER, M.J., HERRING, J.R. 1991. Categorizing Binary Topological Relationships Between Regions, Lines, and Points in Geographic Databases, Technical report. Orono, ME: Department of Surveying Engineering, University of Maine. EGENHOFER, M.J., CLEMENTINI, E. and DI FELICE, P. 1994. Topological relations between regions with holes, International Journal of Geographical Information Systems, 8(2), pp. 129–142. GALTON, A. 1994. Perturbation and Dominance in the Qualitative Representation of Continuous State-Spaces. Technical Report 270. Exeter: Department of Computer Science, University of Exeter. KOCH, K.R. 1987. Parameterschätzung und Hypothesentests, 2nd edition. Bonn: Dümmler. KRAUS, K. and HAUSSTEINER, K. 1993. Visualisierung der Genauigkeit geometrischer Daten, GIS, 6(3):pp. 7–12. KRAUS, K. and KAGER, H. 1994. Accuracy of derived data in a geographic information system, Computer, Environment and Urban Systems, 18(2), pp. 87–94. LANTUEJOUL, C. 1978. La squelettisation et son application aux mesures topologiques des mosaiques polycristallines. PhD thesis. Paris: Ecole Nationale Superieure des Mines de Paris. MOLENAAR, M. 1994. A syntax for the representation of fuzzy spatial objects, in Molenaar, M. and de Hoop, S. (Eds.), Proceedings of Advanced Geographic Data Modelling, Delft: Netherlands Geodetic Commission, pp. 155–169. PERKAL, J. 1956. On epsilon length, Bulletin de l’Academie Polonaise des Sciences, 4, pp. 399–403. SERRA, J. (Ed.) 1982. Image Analysis and Mathematical Morphology, vol. 1. London: Academic Press. WAZINSKI, P. 1993. Graduated Topological Relations. Technical Report no. 54. Universität des Saarlandes. WINTER, S. 1994. Uncertainty of topological relations in GIS, in Ebner, H., Heipke, C. and Eder, K. (Eds.), Spatial Information from Digital Photogrammetry and Computer Vision, Proceedings of ISPRS Commission III Symposium. München: ISPRS, pp. 924–930. WINTER, S. 1996. Unsichere topologische Beziehungen zwischen ungenauen Flächen. PhD thesis. Bonn: Landwirtschaftliche Fakultät der Universität Bonn. WOLF, H. 1975. Ausgleidiungsrechnung. Bonn: Dümmler. WORBOYS, M.F. and BOFAKOS, P. 1993. A canonical model for a class of areal spatial objects, in Abel, D. and Ooi, B.C. (Eds.), Avances in Spatial Databases, Lecture Notes in Computer Science 692. Berlin: Springer-Verlag, pp. 36–52. ZADEH, L.A. 1965. Fuzzy sets, Information and Control, 8, pp. 338–353.
Part Five DATA QUALITY
Chapter Thirty Four Spatial Data Quality for GIS Henri Aalders and Joel Morrison
34.1 INTRODUCTION Implementing today’s changing technology means that fields such as cartography, photogrammetry and geodesy—which have traditionally been responsible for the collection and conveyance of spatial data and knowledge primarily through the making and use of maps - are in the process of switching from their historic (particularly the last 300 years) common end -product: a printed map on a flat sheet of paper, to a series of new products created from digital files containing data representing features on the earth. These disciplines are also facing, perhaps for the first time, a situation in which today’s technology can and does enable the routine collection and use of data whose quality exceeds that which is needed for the map’s purpose and/or that requested by the data user. In the digital mapping world data producers are collecting data and structuring spatial data into databases that contain topology, enhanced attribute information, and/or relationships amongst features. The fact that these digital databases may also be distributed allows other data producers to add attributes or relationships to features already in the database or to add new features. Therefore, any given data file may be the product of a number of data producers. The ease and speed in the collection of data results in many more datasets of various quality which duplicate earth features at different resolutions and which compete to satisfy the user’s needs. These readily available distributed data files are accessible electronically at numerous isolated workstations. The end result is that data collectors, who originally collect data for a feature, have lost the degree of control over the datasets which traditional data collectors for analogue maps enjoyed. As a result collectors of data, face the pertinent question: “How does the producer describe the spatial data quality so that it appears to meet the demands of the potential customers?” From another perspective, the data user must select amongst competing dataset delimitation for a given feature. What criteria can the data user use to make this selection decision? Several possibilities exists and may compete in any given instance: (1) the quality of the data, (2) data availability, or (3) the ease of accessibility. Traditional cartographers, who have a reputation for excellent maps, have mastered the use of these criteria, often unconsciously. For today’s graphically sensitive world, the user must still impart to a visualisation
SPATIAL DATA QUALITY FOR GIS
425
indications of the data quality. In the digital world, two aspects become important for the conscientious user of these digital spatial data: • what spatial data quality information is needed, and • which methods are available to impart to a visualisation the quality information of a digital set of feature data. This chapter discusses quality elements that can be used in practical applications. Both the quality of geometric data and the quality of non-geometric attributes, following the rules that are applied in information technology, are considered. 34.2 HISTORY OF QUALITY IN GIS In the analogue era, concern for the quality of spatial data was almost synonymous with the concern for the accuracy of the planimetric position of a feature relative to its position on the earth. On topographic maps this concern concentrated on elevation of the land surface above a precise datum. Only rarely was a similar degree of attention paid to other attributes of the data. The norm for society became that certain institutions, over a long period of time, earned a reputation and prestige for producing quality products from spatial data, including their visualisation in printed map form. In contrast, today’s technology allows every user surfing the Internet to collect datasets from a variety of sources, and download them to the user’s terminal. The visualisation of each layer of these datasets can be made to look precise and accurate regardless of the original quality of spatial data. The user may often find that data from differing sources will not perfectly agree, i.e., the visualisation of the points, lines and areas representing the same earth features do not match exactly. In the absence of quality information, or any a priori knowledge, the user has no logical way to resolve these differences. This results in a need for spatial data quality information. To provide this quality information about spatial data, three areas of general agreement are necessary: 1. definition of the elements of spatial data quality, 2. derivation of easily understood indices of spatial data quality that may accompany a dataset, and 3. methods for representation or rendering the specified data quality in a visualisation. 34.3 DATA QUALITY In general a comprehensive statement of data quality represents for the user the fitness of that dataset for a potential application. This is a very general, but also very often used, interpretation of data quality. A more precise statement, adhering to the ISO 8402 Standard, describes quality as being the combination of properties and characteristics of a product or a service that fulfils both the expressed and implicit needs of a user. One can distinguish between an internal and an external quality. The internal quality represents the difference between the data and the specifications, to which they were collected, while the external quality indicates how well these specifications fulfil the user needs. This chapter is about internal quality.
426
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
34.4 INFORMATION TECHNOLOGY IN GIS Geographic data are created by abstracting phenomena existing in the real world. This is often done according to predefined rules that define the details of what is to be represented, how it is represented, and how it is described in a geographic database. This subjective interpretation of real world objects creates an idealised view of the world called the “nominal ground” or “abstract view of the universe”. This interpretation is subjective for two reasons: object-type descriptions and application dependency. • Object-type descriptions: It is assumed that the real world is composed of a set of interrelated objects. When creating a model of the real world, the abstraction describes these real world objects and their properties as well as the relationships between the objects and their properties. The nominal ground is a set of descriptions of properties of objects and relations that occur in the real world; they are called object-types, relationships (both sometimes called entities), and attributes. • Application dependency: According to a given specification, only those objects are selected into the nominal ground that are of interest for an intended application(s), omitting all other objects. Referring to the nature of geographic objects there are different types of attributes: • Geometric: The geometric attributes refer to the position of the object, stated according to some system of georeferencing; shape of the object, defined in mathematical terms; and topology, e.g. defining the order and neighbourhood of an object. Geometry can be expressed both by raster or vector representation and is by nature continuous in value. This type of attribute also has a value for which the range is predefined, e.g., descriptions in the form of text, photographs, drawings, video images. • Semantic: The semantic attributes can be either: continuous, i.e., expressed or measured in a given unit, e.g., width, temperature, time, etc.; or discrete, referring to one specific value of a finite set, e.g., class names, addresses, colours, etc. A discrete attribute can be nominal, i.e., there is no order between the possible values, e.g., class names; or ordinal, where there is an explicit order between the values, e.g., elevations above a datum. Relations, entities (being the abstract representation of real world objects), and attributes can be defined as in Table 34.1. Table 34.1: Definition of Entitites, Attributes, and Relations ENTITY (id, { n attributes | n α N \ {0}}) ATTRIBUTE ({ n types, n values | n α N \ {0}}) RELATION (id, {type, n entity id’s | n α N \ {0,1}}, {m attributes | n β N \ {0}})
Quality indicators may be attached to entities in the form of attributes. It is assumed that qualityattributes describe the quality of one or more entities and/or attributes that are contained in the dataset. Each dataset, formed for which a given quality statement applies, can be considered a metadataset. In defining datasets one can apply set-theory. As a dataset consists of a set of occurrences of entities because of the semantics and according to set-theory, they can be hierarchically subdivided into superclasses, classes and
SPATIAL DATA QUALITY FOR GIS
427
Figure 34:1: The role of the Nominal Ground
subclasses. A subclass-entity may be a member of a class-entity and, a class-entity in turn may be a member of a superclass-entity. Quality attributes follow the concept of metaclasses, i.e. super/subclasses are not members of a metaclass but carry the attributes of a metaclass. As such, quality data are always metadata, i.e., data about data. Quality attributes are valid for all occurrences of entities in a dataset that are related to the metaclass, i.e., quality metaclasses describe the quality of homogeneous entities or entity-sets. 34.4.1 Nominal Ground To define a set of real world objects into an ideal form by position, theme, and time-i.e., in order to make these objects intelligible and representable in a database, a nominal ground is defined to model the potentially infinite characteristics of the objects in the real world. The process used describes the abstraction of objects from the real world into an ideal concept. (See Figure 34.1). To model the quality of a dataset, it is also necessary to define precisely the process that allows one to derive the dataset from the real world. This process is decomposed into two steps: • modelling: which contains both the content specifications of what should be considered in the real world and the abstraction of the selected objects and; • mensuration: which specifies the measuring methods and the measurement requirements for the capturing and storing of the data itself. To fill the conceptual gaps between the real world and the nominal ground, and between the nominal ground and the dataset, quality indicators are defined for the specifications and abstraction of real world objects. The nominal ground forms the basis for specification, against which the quantitative contents of datasets can be tested.
428
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
These specifications are usually defined by either the users or producers of geographic datasets and they should depend directly on their intended use. The accompanying quality model consists of a set of quality indicators that will allow users or producers to define the relation between the specifications and the expected performance of geographic datasets. So, the nominal ground can be considered as the userintended dataset in an ideal situation. Although one can speak of the quality of the modelling process, one should realise that the intended use is defined in the dataset contents specification. Since quality is defined as a combination of characteristics for both the expressed and the intended use, it is obvious that any quality model for an existing dataset should refer to the quality of both the data modelling and data capturing process(es) (see Chapter 37, this volume). 34.4.2 Metaquality The existence of the quality of each quality statement in a dataset should also be noted. In other words, to each quality indicator a metaquality statement should be attached, expressing the reliability of the quality indicator (the quality of quality) (see Chapter 36, this volume). This information is comprised of: 1. a measure of the confidence of the quality information, indicating the level of confidence in the dataset; 2. a measure of the reliability of the quality information, reflecting how well the information represents the whole dataset; 3. a description of the methodology used to derive the quality information, to indicate how this result was obtained; and 4. a measure of the abstraction to account for the differences between reality and the nominal ground. (Note: this measure is not relevant for factual statements concerning quality that are independent of the content of the geographic dataset.) 34.5 STRUCTURING OF QUALITY INDICATORS AND ELEMENTS In 1982 in the United States of America, a National Committee on Digital Cartographic Data Standards (NCDCDS) was established under the auspices of the American Congress of Surveying and Mapping (ASCM). The reports of this committee (Moellering 1985, 1987) contain sections on data quality which specify five components of data quality. The ICA Commission on Spatial Data Quality instituted in 1991, has added two additional components, and published a book defining each of the seven elements of spatial data quality (Guptil and Morrison, 1995). In the CEN/TC 287 (1995) Quality model all seven of these components are considered. In the following discussion, all seven elements are discussed under indicators and elements constituting a set of quality parameters. The quality indicators in a quality model are of four main types: • Source of the dataset: This indicator lists the name and the organisation responsible for the dataset, as well as the purpose, date, and creator of the dataset’s original production. Source information can be considered as part of the metadata of the dataset as can all quality information. Source information can also be seen as part of the “overall” dataset’s lineage. Clarke et al. (1995) state that “Lineage is usually the first component given in a data quality statement. This is because all of the other components of data
SPATIAL DATA QUALITY FOR GIS
429
Figure 34.2: Pyramid of Quality Indicators
quality are affected by the contents of the lineage and vice versa” (p. 13). They admit that “the ultimate purpose of lineage is to preserve for future generations the valuable historical data resource” (p. 15). In this context, source information is considered as the minimum quality information that should always be available for a dataset or a data(sub)set. The attributes of the source information may tell the present user in general terms about the possible validity of the data. This impression of the data may be created from a consideration of the time interval that has elapsed since the original creation of the dataset, an awareness of the reliability by the producer, and from who should be considered to be ultimately liable for errors in the dataset. The original purpose for the dataset should also be included. (See Figure 34.2) • Usage: Any previous use of a dataset by other users for various applications may be a good indication of the fitness of the data for the present use and the general reliability of the dataset in different circumstances. For each usage a separate statement should be given indicating the organisation that has used the dataset, the type of usage and its perceived fitness and any possible constraints or limitations that were imposed or discovered during that use. Usage of a dataset is considered to be important from the point of view of new users. If a dataset has been used for applications similar to the one envisaged by the current user then it clearly adds to the potential user’s confidence. Also previous combinations of the actual dataset with other datasets can be a firm indication of the quality of the dataset. However, information about usage is only useful if it is honest, and errors, constraints, and limitations are also recorded. • Quality parameters: Descriptions of measurable aspects of the performance of occurrences in a dataset for an intended use can be considered as a quality parameter. Some of these parameters, referred to as elements, are described in the next section of this chapter. • Lineage: A description of the processing history which each occurrence of an entity has undergone since its original creation is termed lineage. For each process, a statement has to be given describing the processing the entity has undergone (including details of the methods used and possibly references to documents containing actual algorithms applied), who performed the process, when it was performed, and why (See Figure 34.3). Processes that have been performed on each occurrence of an entity in the dataset may refer to the type of transformations, selections, generalisations, conversions, updates, consistency checks, validations, etc. For this purpose metaentities may be defined and related to occurrences of entities in the dataset.
430
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 34.3: The Quality Parameter Set
Occurrences of specific entities may be assumed to follow the quality statements of the other entities in the same metaclass, and thereby enable the user to obtain their quality properties. This involves a complex logical structure within the dataset. As a dataset matures and is frequently used, each occurrence of a metaclass of the above described quality indicators tends to become larger in size, successively in the order of source, usage, quality parameters and lineage, and at the same time also tends to refer to fewer occurrences of objects in the database. 34.5.1 Quality Parameter Set A quality parameter is a specific instance of a particular quality parameter type, or element, and represents its performance characteristic. Each quality parameter element may have one or more ‘metrics’ which is a quantitative statement of the quality parameter. Metrics may be statistical (expressed in real numbers, with or without units) or binary and in some instances may be descriptions. All metrics should be dated to indicate their temporal validity. The descriptions of the algorithms used to determine the metrics can be described as part of a metaquality methodology component. The quality parameter set may consist of different quality parameter metrics for each of the following elements of spatial data quality: Positional accuracy: the quality report section on positional accuracy must include the relationship of the data coordinates to latitude and longitude (or eastings and northings). It includes measures of the horizontal and vertical accuracy of the features in the dataset, and it must also consider the effects on the quality of all transformations performed on the data as well as the result of any positional accuracy testing performed on the data (Moellering 1985, pp. 18–19). So one can speak of:
SPATIAL DATA QUALITY FOR GIS
431
1. absolute accuracy; the accuracy measured (in both the horizontal and vertical dimensions) against the spatial reference system; 2. relative accuracy; i.e., the accuracy of the positions of occurrences of entities relative to each other (in both the horizontal and vertical dimensions); and 3. the results of testing procedures. The possible metrics for positional accuracy can be (assuming they are continuous valued attributes): 1. for single-valued attributes; the root mean square error (RMSE), the standard deviation, the range (minimum, maximum), a histogram of deviations from the mean value, the confidence level and/or interval; and 2. for multiple valued attributes: a list of accuracies for single-valued quantitative attributes, an error ellipse, a correlation matrix, a function range, or the Eigen value of the correlation matrix. Drummond (1995) uses practical examples to summarise the knowledge currently held by land surveyors on positional errors and their propagation, and posits that the “likely maximum error” is a concept that most users will find helpful. Thematic or attribute accuracy: this refers to the accuracy of continuous (also called scalar or quantitative) and discrete (also called qualitative or nominal) values associated with a feature or relationship in the dataset. Metrics for scalar attributes may involve the same procedures as those for positional accuracy. For discrete attributes possible metrics include the probability of a correct classification or misclassification, and the probability of correctly assigning alternative values. Thematic accuracy can be expressed by a percentage of correctly classified attributes, by a misclassification matrix, a list of likely alternative classifications, the probability {Þi, P(X=Þi)} of correct classifications, the maximum probability, or the largest m probabilities in Þi. Goodchild (1995) raises the important issue “of the spatial structure of error, and demonstrates its significance, particularly in the analysis of attribute accuracy in geographic information systems”. Temporal accuracy: describes the correctness of time and updating of a data(sub)set (currentness) by such metrics as: 1. 2. 3. 4.
moment of last update in case of creation, modification, deletion or unchanged use; rate of change of entities per unit of time; a trigger value indicating the number of changes before a new version of the dataset is issued; temporal lapse, giving the average time period between the change in the real world and the updating of the database; and 5. temporal validity indicating data to be out of date, valid or not yet valid. Temporal accuracy is also mentioned in the NCDCS (Moellering, 1995) report. However, it is not treated separately, but as an integral part of each of the five quality aspects reported. Guptill (1995) states that in the use of digital spatial data and geographic information system technology users will collect, combine, modify and update various components of spatial information. This will occur in a distributed, heterogeneous environment, with many parties participating in the data enterprise. In such an environment, having information about temporal aspects of the spatial data becomes paramount. Completeness: indicates the estimation of errors of omission and commission which can be expressed by percentages of missing or over-complete data in the dataset relative to the specification; estimates of change
432
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
of occurrences of entities between the last update of the dataset and the actual time of use; and density of occurrences relative to one another. Moellering (1995) states “the report on completeness shall describe the relationship between the objects represented and the abstract universe of all such objects. In particular, the report shall describe the exhaustiveness of a set of features” (p. 21). Brassel (1995, p. 81) distinguishes “between two types of completeness: data completeness (which is an error of omission and a measurable data quality component) and the model completeness (which is an aspect of fitness for use)”, and relates to the semantic accuracy. His goal is to “provide a comprehensive explanation of completeness and to develop criteria for users and developers of GIS that enable them to evaluate data completeness.” Logical consistency: describes the fidelity of relationships encoded in the data structure (Moellering, 1985, 1987). For example, it describes the number of features, relationships, and attributes that have been correctly encoded in accordance with the integrity constraints of the feature’s data specification. As such, tests should be both topological and visual. Kainz (1995) states “According to the structures in use, different methods may be applied to test for logical consistency in a spatial dataset. Among these are attribute database consistency tests, metric and incidence tests, topological and order related tests, as well as scene consistency checks” (p. 109). A metric for topological consistency in one dimension could be reported by indicating the percentages of junctions that are not formed when they should be, or in two dimension by indicating the percentage of incorrectly formed polygons. The results of redundancy checking the correctly formed relations; file structure adherence testing for file structure conformance; and validity checks on all attribute values are examples of metrics to describe logical consistency. These metrics may indicate whether or not such a check has successfully been performed. Semantic accuracy: Salgé (1995) defines semantic accuracy as “the quality with which geographical objects are described in accordance with the selected model. Related to the meanings of ‘things’ of the universe of discourse (reality), semantic accuracy refers to the pertinence of the meaning of the geographic object rather than to the geometrical representation.” He believes that “semantic accuracy is an element of the evaluation of’ fitness for use' that users have to perform before using spatial data” (p. 139). As an illustration, one aspect of semantic accuracy could be termed textual fidelity. One measure of textual fidelity could indicate the accuracy of spelling, perhaps by the percentage of wrong spellings. Another measure could be of the use of exonyms or alternative spellings. A third metric may indicate the consistency of abbreviations. Cartographers, surveyors, and other spatial scientists must recognise that our current technology allows any person to select a point on the surface of the earth and to record the position of that point to a degree of precision that will serve well over 95 percent of the possible uses of that data. The need for surveying, in the traditional use of that term, has been superseded by technology that can be easily employed by any person, not to mention robots. The recording of attributes for an entity can also be increasingly precisely measured, but the true degree of precision depends upon the attribute being measured, upon its definition and its intended use. Information about completeness and logical consistency also follows directly from their definitions and measuring activities. The temporal element of spatial data quality during the first part of the twenty-first century will probably require and receive the most attention. It is postulated that by sometime early in the twenty-first century, positions and attributes will be routinely recorded in digital files to a sufficient level of accuracy to satisfy most users needs. The element that is currently lagging behind is the specification and use of the temporal element or the specification of the currency qualities of spatial data.
SPATIAL DATA QUALITY FOR GIS
433
It is also recognised that the metrics and quality parameters of the temporal element may affect, to a very large extent, the overall size of most spatial data files. The position and description of one feature and its attributes contribute a fixed amount of information in a data file. It is conceivable that, as that feature evolves and the historical character of the feature is archived, time stamps of changes, processes applied and uses will contribute significantly to an increase in the size of the accompanying quality information file. 34.6 METRICS FOR THE ELEMENTS OF SPATIAL DATA QUALITY The seven elements of geospatial data quality as defined in Guptill and Morrison (1995) are measured along at least three different axes: space (location, places), time, and theme (content, characteristics). The space axis is characterised by positional accuracy metrics. The time axis can be specified by the temporal accuracy and completeness metrics. The theme axis reflects the metrics of semantic accuracy, attribute accuracy and completeness. The element of logical consistency cannot be related to a single axis, but refers directly to interrelationships of the entities in the dataset. Finally, lineage should accompany any and all of the other spatial data quality elements. But not all of these quality elements have well defined universally agreed upon quality metrics associated with them. Metrics for semantic accuracy, for example, are not well understood. Some of the measures currently used for quality elements may be inadequate for GIS processing. Consider an example where the location and extent of small bodies of water are being captured from aerial images. Some of the water bodies are ponds, and others are reservoirs. What are the proper measures of quality? How does one encode the information that we have a high degree of certainty that the area in question is covered with water, and a lesser degree of certainty as to whether the body of water is a pond or a reservoir? To fire fighters looking for water sources to combat a canyon brush fire, misclassification between pond and reservoir is of much less concern than the fact that a body of water exists at that location (as opposed to a dry lake bed or gravel pit). This example seems to argue for multiple levels of misclassification matrices (see Tables 34.2 and 34.3) to characterise fully the uncertainty of attributes for certain features. Table 34.2: Misclassification Matrix of Land Versus Water for the Lake/Pond Feature
Land Water
Land
Water
0.95 0.05
0.05 0.95
Table 34.3: Misclassification Matrix of Lakes Versus Reservoirs for the Lake/Pond Feature
Lake Reservoir
Lake
Reservoir
0.75 0.25
0.25 0.75
Other situations, for example measuring the error in modelling geographic phenomena such as vegetation stands or soil types that are not homogeneous and have ill defined boundaries, continue to pose methodological problems in measuring data quality. The temporal heterogeneity of datasets increases the complexity of all the data quality elements. Also needed are linkages between the quality information
434
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
present in the data and GIS processing algorithms, so that software can produce error estimates of the end results. A number of quality elements, such as logical consistency and completeness, are spatially variant and context dependent. Any given feature is consistent only with respect to its neighbours at a given instant in time. Does one attach a consistency measure to an individual feature? A group of features? If the spatial extent of a study area changes, does the consistency metric of the data change, and if so, what does this mean to the user? Tobler (1989) argues that only spatial analysis methods that exhibit invariance of conclusions under alternate spatial partitionings should be used. Is frame independence necessary, or even feasible, for determining measures of data quality? This is an example of a subject that needs further study. The relationship between quality measures and the area to which the measure applies is likely to become even more troublesome in the future. At present, robust descriptions of data quality (when they exist) are usually limited to specified study areas (e.g., a map quadrangle, a drainage basin, or a county). As datasets are pieced together to form large “seamless” coverages, answering questions about data quality will become quite complex. The fact that data quality will be highly variable runs against the desire of many users to have “one number” to describe the quality of a dataset. What is needed is a data quality “tricorder” that the user moves across (or through) the three-dimensional data quality (space, time, theme) and gets data quality readings on the features that are in the field of view of the tricorder. The user could adjust the field of view to encompass a region or an individual feature and change the settings to tune in a given data quality element. Other, less fanciful, schemes to provide data quality reports in a GIS environment have been developed. In 1994, the International Standards Organisation (ISO), instituted Technical Committee 211 (TC211) to devise standards for geographic information/geomatics. Working item 13 of Working Group 3 of ISO/ TC211 is directly tasked with the creation of standards for data quality. Working item 14 of Working Group 3 began its work by trying to specify metrics for standard data quality elements. The work of these two working items will establish the basis for the world’s spatial data gatherers to tag their digital spatial data with quality information in the future. 34.7 VISUALISATION OF SPATIAL DATA QUALITY In the world of the twenty-first century, people will be able to reconstruct a visualisation of any locality on the earth for any past date. One could reminisce about the “good old days” by creating a virtual reality of a small town in the American Midwest in the 1920s, or the detailed movements within the Kremlin in the Soviet Empire at its presumed height under Stalin. Equally plausible would be the recreation of the battles of the late twentieth century wars in Vietnam, Afghanistan, or Nicaragua, or the large scale displacements of people in Central Africa or the Balkans during the 1990s. Some people will be interested in recreations of the Nile before and after the Aswan Dam; Mount Saint Helen’s before and after its eruption in 1980; or the Yangtze Gorge, before its planned modifications are implemented. One could speculate that generals could fight wars, conquer territory, and incapacitate major enemy functions without subjecting humans to the battlefield. The limiting factor will be the availability of historic digital spatial data. More precisely, the accuracy of the temporal tagging of spatial data will to a large extent control the reconstructed visualisations. The point is that digital spatial data visualisations could easily become an expected and accepted part of daily life and its activities in the next century. The utility of those spatial displays will depend directly on
SPATIAL DATA QUALITY FOR GIS
435
the quality of the spatial data used in their preparation. One can envisage a time when the basic digital spatial data about the earth and the human activity which results in semi-permanent features on its physical surface, will be readily available. Data on human activities among these semi-permanent and physical features on the earth will be constantly changing and knowledge about these changes will be in great demand. Knowledge about all of the elements of spatial data quality for the digital data in a database, will be required. How can we efficiently present that knowledge to the twenty-first century spatial data user? We believe spatial data quality is one of the absolute building blocks needed for the twenty-first century functioning of human society. 34.8 CONCLUSIONS The implementation of the above developed quality model to describe the performance of geographic information will require the combination of many different aspects to describe the quality of the geographic data or datasets. The basic assumption of the model is to model a database according to homogeneous data (sub)sets, and then to consider these as new entities. This makes it possible to describe the different aspects of quality in terms of quality elements and their metrics and apply this quality information to all members of a data subset occurrence. However this additional data quality information will make the database structure extremely complex and will extend the database size to a multiple of its original size. As yet we have no experience in implementing such a model, completely and correctly, from either a theoretical or a technological point of view. Experience with attempts at the implementation of such a model should help users decide whether this model is feasible, and/or useful. REFERENCES BEARD, K., and MACKANESS, W. 1993. Visual access to data quality in geographic information systems, Cartographies 30, pp. 37–45. BRASSEL, K. 1995. Completeness, Chapter 5 in Guptill, S. and Morrison, J. (Eds.), pp. 81– 108. BUTTENFIELD, B. 1993. Representing data quality, Cartography 30, pp. 1–7. CEN TC 287 WG 2 1994. Data description: Quality, Working Paper N15. Paris: CEN. CEN TC 287 PT05 1995. Draft Quality Model for Geographic Information, Working Paper D3. Paris: CEN. CLARKE, D., CLARKE, D. and CLARKE D. 1995. Lineage, Chapter 2 in Guptill S. and Morrison, J. (Eds.), pp. 1–12. DRUMMOND, J. 1995. Positional Accuracy, Chapter 3 in Guptill, S. and Morrison, J. (Eds.), pp. 31–58. FISHER, P. 1993. Visualisation uncertainty in soil maps by animation, Cartographica, 30, pp. 20–27. GOODCHILD, M. 1995. Attribute Accuracy, Chapter 4 in Guptill, S. and Morrison, J. (Eds.), pp. 59–80. GUPTILL, S. 1995. Temporal Information, Chapter 8 in Guptill, S. and Morrison, J. (Eds.), pp. 153–166. GUPTILL, S. and MORRISON, J. (Eds.) 1995. The Elements of Spatial Data Quality. New York: Elsevier. ISO. Quality. Vocabulary. Standard ISO 8402. ISO 1993. Guide to Expression of Uncertainty in Measurement. KAINZ, W. 1995. Logical Consistency, Chapter 6 in Guptill, S. and Morrison, J. (Eds.), pp. 109–138. MOELLERING, H. (Ed.) 1985. Digital Cartographic Data Standards: An Interim Proposed Standard, Report no. 6. Bethesda, MD: ACSM. MOELLERING, H. (Ed.), 1987, A Draft Proposed Standard for Digital Cartographic Data, Reportno. 8 . Bethesda, MD: ACSM. NAISBITT, J. 1994. Global Paradox. New York: Avon books. SALGÉ, F. 1995. Semantic Accuracy, Chapter 7 in Guptill, S. and Morrison, J. (Eds.), pp. 139–154
436
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
TOBLER, W. 1989. Frame independent spatial analysis, in Goodchild, M. and Gopel, S. (Eds.), Accuracy of Spatial Databases. New York: Taylor & Francis, pp. 115–122. VEREGIN, H. 1989. Error modelling for the map overlay operation, in Goodchild, M. and Gopel, S. (Eds.) Accuracy of Spatial Databases. New York: Taylor & Francis, pp. 3–18.
Chapter Thirty Five Assessing the Impact of Data Quality on Forest Management Decisions Using Geographical Sensitivity Analysis Susanna McMaster
35.1 INTRODUCTION While the usefulness of GIS is evident in its diverse application to many aspects of natural resource management, the reliability of GIS as a spatial decision making tool depends upon the quality of the source data. Issues associated with data quality, error, and uncertainty in spatial databases have recently attracted the attention of researchers in the field. The quality of data is related to the accuracy of the information and the type and amount of error or uncertainty present in source information. Accuracy is associated with the true or correct representation of some real-world phenomenon while precision deals with the exactness of measurement, i.e., the number of significant digits used to measure some phenomenon. Error refers to the amount of deviation from the true value (or in a statistical sense the difference between the actual and observed values). Uncertainty is viewed as a more appropriate term by MacEachren (1991) since the exact amount of error present in a data object is never known. All of these terms influence the confidence a user can have in the source data to depict a real-world situation and to produce accurate analytical results. Mead (1982) provides a broader definition of data quality in terms of its usefulness stating: “Data quality is not simply a reflection of data accuracy or precision. It relates more to the usability and versatility of data and may be synonymous with data usefulness. Data quality is a reflection of how many different, appropriate uses there are for the data. The more versatile and useful data are, the higher the quality” (p. 53). For this chapter, data quality is defined in terms of not only accuracy but also the usefulness of the data for a particular application. An important consideration in assessing the influence of data quality on the results of spatial analysis using GIS relates to its dependence on a specific application. For example, broad, administrative level forest management plans may require less accurate data than site-specific, operational level management plans requiring more accurate data for reliable model results. Openshaw (1989) supports developing methods of quantifying and evaluating the significance of error in terms of a particular application. Rather than being concerned about precise error estimates or eliminating error, he suggests that research should focus on determining some level of confidence or reliability that the error in data is not so high as to impact adversely the validity of analytical results for a specific application. Sensitivity analysis has been proposed as a method for examining the impact of data quality on spatial analysis using GIS (Lodwick et al., 1990; Openshaw, 1989) when precise knowledge about the occurrence of error in a database is unavailable. The purpose of this research is to examine the impact of attribute error upon the result of spatial modelling using geographical sensitivity analysis. Attribute error is important in terms of characterising the
438
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
spatial units that define the various features of forest stands. The attributes studied in this project are quantitative, interval-ratio scale data. By manipulating particular attributes in a systematic manner–i.e., introducing varying degrees of random error into attribute values—it is possible to examine how these variations affect the reliability of the results of spatial modelling. The primary research objectives are: 1. to assess the impact of variations in attribute error on the results of a spatial model designed to identify sites suitable for pulpwood management in northern Minnesota, emphasising the operational level of forest resource management; and 2. to identify strategies for dealing with uncertainty in GIS applications in forest resources management. This research represents an exploratory approach that evaluates the impact of attribute error in terms of its impact on the decision to cut stands for pulpwood production. The conclusions drawn from the sensitivity analysis relate directly to this specific application and can be viewed as an individual test case. A goal of this research is to contribute to a better understanding of the level or degree of data quality required for a specific GIS application in forest resources management. The results of this analysis may indicate that certain variables are more sensitive to particular types of error or that certain types of error have more of an impact on the final result of spatial modelling. The analysis uses data from the Minnesota Department of Natural Resources (DNR) coupled with a spatial model designed to determine the suitability of an area for pulpwood management. Minnesota’s DNR is responsible for managing the State’s forest lands and must do so under budgetary and other constraints. The decision to use data from the Minnesota DNR represents an effort to define the project more realistically within the current context of data availability and GIS use by state agencies for analysis and decision making purposes. This agency was one of the first of its kind to implement GIS, and the Division of Forestry is actively involved in expanding its GIS capabilities (Minnesota DNR, 1989). At present, the current role of GIS in such agencies represents mapping or inventory activities and the data acquired from DNR reflect the current state of database development and GIS use at this agency. By having a better grasp of data quality’s impact on GIS model results, the agency may be able to assess better how to utilise limited resources efficiently and improve the reliability of spatial models used as part of the decision making process. 35.2 THE USE OF GEOGRAPHIC SENSITIVITY ANALYSIS TO EXAMINE THE IMPACT OF DATA QUALITY Data quality and the accuracy of spatial databases have increasingly attracted the attention of researchers in the area of GIS. This recent interest is highlighted by the National Center for Geographic Information and Analysis (NCGIA) initiatives related to the topic of data quality (NCGIA, 1989; 1991). Veregin (1989) compiled an extensive annotated bibliography of the GIS literature associated with error and data quality that highlights the diversity of specific research topics. Several chapters in this volume address the issue of data quality (see Vauglin, Chapter 36; Joos, Chapter 37; and Folly, Chapter 38, this volume). This chapter focuses on the use of sensitivity analysis to examine the impact of attribute error on spatial analysis using GIS. Only a handful of examples of using sensitivity analysis to examine the impact of data quality on the results of spatial analysis using GIS exist in the literature. Lodwick et al. (1990) referred to its use for the study of spatial problems as geographical sensitivity analysis. They defined geographical sensitivity analysis as “the study of the effects of imposed perturbations (variations) on the inputs of a geographical analysis on the
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
439
outputs of that analysis (p. 413). Sensitivity analysis offers researchers an alternative approach to estimating the reliability of the final results of a GIS analysis. Openshaw (1989) outlined a general approach to using sensitivity analysis to study the impact of error or uncertainty on GIS operations using Monte Carlo simulation. This study adopts a similar approach for the analysis of attribute errors’ effects on spatial modelling results that is detailed in the discussion of methodology. Stoms et al. (1992) cited several examples of using sensitivity analysis to study how geographic information system applications are affected by classification error (Ramapriyan et al., 1981), scale and resolution (Laymon and Reid, 1986; Lyon et al, 1987; Turner et al, 1989), different classification schemes (Lyon et al, 1987), and weighting factors (Heinen and Lyon, 1989). Other research has focused specifically on the application of sensitivity analysis to examine the impacts of attribute error on spatial analysis. 35.2.1 Examining attribute error sensitivities Lodwick et al. (1990) focused in their work on the types of perturbations that could be imposed upon data and the types of measures that could be used to evaluate the effects of these perturbations on output values. The authors presented five different geographical sensitivity measures for a suitability map based on a comparison of the attributes resulting from no perturbation with those resulting from some perturbation. They applied the proposed methods to a spatial model examining ground water vulnerability. Based on Lodwick’s framework for undertaking sensitivity analysis to examine the effects of attribute error on suitability analysis, Fisher (1991) presented two alternative algorithms for simulating inclusions in categorical natural resource maps. He used Monte Carlo simulation to evaluate the accuracy of agricultural land valuation using land use and soil information. Using two error algorithms and land valuation as a summary value for the analysis, Fisher’s analysis indicated that the original valuation might be in error, with an error term of up to 6 percent Additionally, the simulations resulted in valuation both above and below the original values. Heuvelink (1993) investigated the issue of attribute accuracy and error propagation in GIS. Specifically, this involved developing and implementing a methodological approach for handling attribute error and tracking error propagation in quantitative spatial modelling with GIS. Haining and Arbia (1993) used simulation experiments to study error propagation through map operations. The authors were particularly interested in examining how locational and measurement error contained in some maps affect maps derived through certain GIS operations. While the authors focused on the spatial structure of source errors and their propagation, they recognised the need for other approaches that relate to understanding the impacts of source errors on the decision making process. The scarcity of studies that investigate the influence of attribute error on natural resource management applications in GIS highlights the need for further research in this area. Some articles focused strictly on developing the theoretical aspects of this problem and did not use actual case examples to test the proposed theory or assess associated impacts on decision making outcomes. Since data quality’s impacts are application specific, more studies are needed using different types of data and analytical approaches to identify critical impacts of error sources for various applications in addition to continued theoretical development.
440
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
35.3 METHODOLOGY 35.3.1 The Study Area The study area is located within the boundaries of the Pine Island State Forest in north-central Minnesota. Since the Division of Forestry uses the township as its basic management unit, the data were available based on this land partitioning system. According to this system, the study site is specifically referred to as Township 153, Range 26. The choice of this study site was based on the availability of the data in digital format and the large area of State-owned forest lands within its bounds. Pine Island State Forest represents the largest of Minnesota’s state forests comprising 878,040 acres in Koochiching County. The county’s abundant forest resources make forestry its major economic enterprise. Swampland dominates the region which is located in the basin of glacial Lake Agassiz. The area is identified with wood products made from pines, black spruce, white cedar and tamarack. The area is characterised by low relief (the relative relief is approximately 80 feet). Within the study site, the elevation ranges from a minimum of 1215 feet to a maximum of 1295 feet. Due to this lack of topographic relief, elevation is not included as a variable in the analysis. 35.3.2 Data Collection The source information was acquired from the State of Minnesota, Department of Natural Resources, Division of Forestry. These data came in the form of ARC/INFO digital data layers including forest stand boundaries, roads, and rivers, and associated attribute information. The forest stand map included the boundaries for 513 stands in Township 153, Range 26. A stand represents a group of trees in a given area that is sufficiently uniform in species composition, age and other conditions. Associated with the stand map layer was a forest inventory file containing various attribute data. Ninety-six fields of attribute data were available for each of the 513 stand records including variables such as cover type and size, site index, age, height and cords per acre. 35.3.3 Data Processing The ARC/INFO data layers were converted into ARCEXPORT file formats and imported into the Spatial Analysis System (SPANS). SPANS capabilities include the ability to build and integrate spatial data sets, explore relationships between these data sets, query and identify suitable locations, and assess the impact of decisions through predictive modelling using the SPANS modelling language. This software employs a quadtree data structure consisting of 15 quadtree levels. The decision to use SPANS was based mainly on its data import and analytical capabilities. Additionally, its quadtree data structure facilitated the creation of maps at varying resolutions and few applications have used this particular data structure. The ability to use its modelling language to set up the spatial model and run it in batch mode facilitated the application of sensitivity analysis. The software proved to be useful in conducting a suitability analysis through its index overlay feature in which weights can be assigned to individual maps as well as to categories within maps.
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
441
Figure 35.1: The general methodological approach for testing a spatial model’s sensitivity to varying degrees of random attribute error.
35.3.4 Data Analysis Examining the impact of variations in attribute error on the result of spatial modelling using geographical sensitivity analysis involved perturbing the source data in order to assess the sensitivity of the final map result to these variations. The overall analytical approach consisted of four major steps and is illustrated in Figure 35.1. Step 1 involved developing and running a spatial model that assessed the suitability of stands for pulpwood management using the original source data. In Step 2, specified data layers were manipulated to reflect variations in attribute error by varying the percentages of random error introduced into the original attribute source data. A computer program facilitated the introduction of different degrees of random error into three stand attributes, namely, age, site index, and basal area per acre. Once these perturbations were introduced into the data, Step 3 could be undertaken. This step involved substituting the original source data layer with each of the perturbed data layers for each test case and
442
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
running the spatial model identified in Step 1. For example, the original age map was replaced with the one generated from the perturbed age data; several replications were then run for each percentage of error for each attribute being investigated. Finally, Step 4 consisted of comparing the original model result with each of the perturbed model results to assess the sensitivity of these model results to the associated data quality issues. An assessment of the results used a combination of summary statistics and visual map analysis. Evaluation of the analytical results involved identifying any patterns relating to the impact attribute error on the spatial analysis in terms of the decisions made based on the suitability ranks. For example, how had the decision to cut certain stands changed as a result of error introduction? How did the volume to be harvested change based on different amounts of error? 35.3.5 Development of the Spatial Model This section explains the development of the spatial model used in the sensitivity analysis. The spatial model assessed the suitability of stands in the study site for pulpwood management based on certain forest inventory criteria. Figure 35.2 illustrates the source data and derived maps used to create the final suitability map. Fourteen source maps were created for use in the model: cover type, physiography, cover size, cover density, age, diameter at breast height, basal area per acre, site index, mortality, damage, volume per acre, cords per acre, acreage, and roads. The spatial model consists of three submodels: the pulpwood cover type submodel, the biological submodel and the economic submodel. Each of the model components is discussed in more detail below. The final result of this model was a map that ranked the stands from one to ten, with ten as the highest rank, in terms of their suitability to be harvested for pulpwood. Implementation of the model involved using a combination of SPANS commands and use of its modelling language. Much of the analysis involved using the four types of overlay procedures available in SPANS. These overlay approaches include reclassification, matrix, index, and modelling overlays. Reclassification overlays allow a user to change values on a map through the use of a reclassification file generated through SPANS, a user-written reclassification file, or directly from the screen by specifying a range of old values and the new value to be assigned. Matrix overlay allows a user to specify class values based on the two maps specified for the overlay procedure. This procedure uses a cross tabulation file that contains the desired class values for the various combinations of class values found on the two overlay maps. A template is produced that shows the class values of one map in the rows and the values of the other map in the columns. It can then be filled in, using a text editor, with user-specified class values for each different type of class intersection of the original two maps. The index overlay operation proves to be useful for suitability analysis and allows for the combination of up to 15 maps. This method involves specifying weights for each map and map category to be used in the overlay procedure. Again, a template must first be created, then the user can specify the maps to be included in the index overlay. Using a text editor, weights are assigned to individual maps and map categories. The formula used to calculate the suitability index value for each cell is: ((weight of map 1 × rank of legend) + (weight of map 2 × rank of legend) +…+(weight of map n×rank of legend)) / (total weight). Modelling overlay involves the use of the SPANS modelling language that resembles a computer programming language. This process requires the user to write modelling equations that are stored in an ASCII input file. Once an equation has been written and stored in this file, it can be used to combine maps according to the equation specifications. For example, forest management prescriptions or groundwater
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
443
pollution models can be translated into the SPANS language. This procedure results in the computation of the equation value for each cell of the resultant quadtree map. The pulpwood cover type submodel was designed to identify the operable pulpwood cover types within the study area. In this case, stands composed of operable pulpwood types were identified according to their cover type and physiographic class. Stands with ash, aspen, balsam poplar, balsam fir, black spruce, or tamarack cover types, and mesic or hydromesic physiographic classes were considered operable pulpwood stands. The cover type map was reclassified to create a map that depicted all of the pulpwood cover types, each having its unique class number, and assigning all other types a value of zero. The pulpwood cover type and physiographic maps were then combined using a modelling overlay approach to generate a map of stands with operable pulpwood cover types. The biological submodel used maps generated from the biological variables in the forest inventory data file to determine stand suitability for pulpwood management. These variables included age, diameter at breast height (dbh), basal area per acre, site index, mortality, damage, cover size, and cover density. The two components of the biological submodel included models that: generated a map of management prescriptions based on management guidelines; and combined the mortality and damage maps as an indicator of stand health based on impacts of pests and disease. These two parts were then combined to create a map that indicated stand management decisions based on the biological factors for each operable pulpwood cover type. The management prescriptions map was generated using decision trees based on management guidelines for the Lake States Forest region published by the North Central Forest Experiment Station (Brand, 1981). A series of management guidelines for ash (Erdmann et al., 1987), balsam fir (Johnston, 1986), black spruce (Johnston, 1977), aspen (Perala, 1977), and balsam poplar (Tubbs, 1977) for the Lake States Forest region provided additional information in producing the decision trees. Figure 35.3 is an example of the decision tree used for determining aspen and balsam poplar management prescriptions. These models were developed specifically for use in areas such as this study site in northern Minnesota. The decision trees relevant to the cover types used in the pulpwood management suitability model were selected and then translated into model equations using the SPANS modelling language. The aspen and poplar decision tree also used stems per acre as a criterion for deciding on the management outcome so a matrix overlay was undertaken using the cover size and cover density maps. The management prescription equations were run for each pulpwood species under consideration. Then all of the management prescriptions for each cover type were combined into one map that showed the various management categories for each pulpwood cover type including “Do Nothing,” “Clearcut,” “Thin,” and “Regenerate”. This map was simplified by combining the similar management decisions for each cover type into the four basic management decisions to portray the distribution of the four management categories. Maps showing the percentage of mortality and percentage of damage each contain five classes representing the percentage of mortality and damage occurring in a stand: class 1 = 1– 10 percent class 2 = 11–25 percent; class 3 = 26–50 percent; class 4 = 51–80 percent; and class 5 = 81–100 percent. These two maps were combined into one map depicting stand health conditions in terms of the impact of pests and disease using the SPANS index overlay. Next, a map ranking the stands based upon biological factors was created by combining the mortality and damage map with management prescriptions for a resulting map that displayed stand rankings based on biological criteria ranging from 1 (low) to 10 (high). The economic submodel involved using volume per acre, cords per acre, and acreage maps as well as proximity to roads to assess the economic variables associated with the stands. The volume per acre and acreage maps required some manipulation before being used in combination with the biological rank map. The original volume map did not distinguish between values representing stems and cords so a model was
444
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 35.2: The spatial model for determining the suitability of forest stands for pulpwood management
written to separate out these two volume measures. After distinguishing between these two measurements, the acreage map was simplified using a reclassification. This resulted in a map containing ten acreage classes. Another economic consideration was the distance over which the cut timber had to be transported to the road. The roads map was buffered by a quarter-mile. The decision to use only the all-weather roads for the analysis was based on the designation of Highway 71 as a high priority forestry transport route. Finally, an index overlay was used to combine the biological factors with the economic factors resulting in a map showing the overall suitability of the stands for pulpwood management. In other words, the biological ranking map, road buffer map, volume maps, and acreage maps and their map categories were assigned weights to be used in producing the final pulpwood management suitability map. The final map
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
445
Figure 35.3: Decision tree for aspen and balsam poplar based on Brand (1981).
contained rankings ranging from 1 (low) to 10 (high) with 10 indicating stands that were most suitable to harvest for pulpwood production. 35.3.6 Testing attribute error sensitivities This portion of the sensitivity analysis involved the use of simulation techniques to introduce random error into specific attribute variables. The analysis focused on the source attributes used in the biological submodel, namely, age, site index, and basal area per acre, since this submodel was most critical for
446
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
determining the management decisions. The selection of these three variables was also made based on discussions with forest management experts. A computer program designed to introduce random error into each of these attributes was run 25 times for two error levels, specifically, up to ±5 and ±25 percent. The decision to use ±25 percent as the maximum expected error percentage was based on the reasonable amount of error that might be expected in such field surveyed source data according to forest management experts. The actual calculation of the perturbed attribute involved taking the original value and adding or subtracting up to 5 or 25 percent of that value (depending on random number value) to derive the new attribute value. The decision to use 25 replications for each case represents a reasonable number for the exploratory nature of this study and was based upon consultation of the literature relating to similar studies (Fisher, 1992; Goodchild, et al., 1992; Haining and Arbia, 1993; Openshaw, 1989). A FORTRAN program was written to introduce random error into the attributes associated with operable stands in the original attribute file and to generate a new perturbed attribute file. After testing and verification, the program was run on a UNIX-based SUN workstation using IMSL routines for random number generation. For each of the three variables for each error percentage, the simulation program was run 25 times and the perturbed attribute files generated from the program were transferred to SPANS. These files were then used to create new data layers for use in the spatial model. This process resulted in 25 final suitability maps for each attribute for each error percentage (a total of 150 maps) that could then be compared to the original model result. The results were summarised using a combination of techniques including maps that indicated the frequency of change experienced by each of the stands based on comparisons with the original, unperturbed map. Error maps were generated by comparing the perturbed map to the original map and reclassifying these 150 maps into maps that indicated the impact on management decisions, i.e., the decision to cut stands. Best and worse case suitability maps were then selected from each set of 25 maps to provide a sense for the range of impact the introduced error had on final map results. Comparison of these two maps helped to identify potentially sensitive areas and assess how the decision to harvest stands might vary depending on the different degrees of introduced error. 35.4 EVALUATION AND DISCUSSION OF RESULTS 35.4.1 The Final Suitability Map The final suitability map generated from the pulpwood management model based on the original, unperturbed source data at quadtree level 10 resulted in eight suitability ranks with 1 indicating the least suitable stands for pulpwood management and 8 signifying the stands most suitable for management. For purposes of this study, stands with ranks 6–8 were designated as being suitable to cut for pulpwood. Approximately 36 percent of the total suitability area was comprised of ranks 6–8 with a total suitable cut volume of 35,135 cords. 35.4.2 Results of Attribute Error Sensitivity Analysis The initial phase of this analysis resulted in the creation of 25 sets of final suitability maps for each combination of attributes (age, site index, and basal area per area) and error percentages (up to ±5 percent and ±25 percent). Each of these 150 suitability maps was then compared with the original suitability map
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
447
based on the unperturbed source data. Two types of comparisons were made resulting in 300 comparative maps (two sets of 150 maps). The first set of comparisons centred on examining the frequency of change experienced by various stands. In other words, change implies any difference or deviation in the suitability rank based on a comparison between the perturbed data and the original data. This change included misclassified ranks, eliminated ranks, or newly created ranks. The results of this analysis were useful in visually depicting areas on the map that were more sensitive to the perturbations. The second type of comparison involved an examination of actual changes in the suitability ranks between the original map and each perturbed map according to their impact on the decision to clearcut particular stands. 35.4.3 Frequency of Change Maps The six change maps summarised the results of each of the 25 sets of maps by indicating the frequency of change experienced by different stands and providing a spatial sense for which stands were undergoing the most change and appeared to be most sensitive to the introduced random error. Visual comparison of these six maps illustrated that the maps subjected to a maximum expected error of up to ±25 percent experienced a greater frequency of change in comparison to those perturbed at the ±5 percent error level. Among the maps subjected to ±25 percent error levels, the map displaying the greatest change was represented by the age attribute with one stand that experienced 23 changes in comparison to its original suitability rank. The minimum number of changes on the map was 1. The change map displaying the least amount of change was associated with the basal area per acre attribute. The maximum number of changes on this map was 16 while the minimum was 2. The number of changes on the site index change map associated with the ±25 percent error level ranged from a minimum of 1 to a maximum of 18. Of the ±5 percent error change maps, site index was most volatile while basal area was least volatile. The frequency of change map associated with site index and a ±5 percent error level ranged from 4 to 14. In contrast, the basal area per acre map indicated a minimum change of 1 and maximum change of 10. Finally, the age data perturbed at the ±5 percent error level showed the number of changes ranging between 1 and 12, Based on these findings, age and site index attributes were more sensitive to the introduced error when compared with basal area per acre. Thus, for this particular application, the spatial model result was most sensitive to changes in the quality of the age and site index data. 35.4.4 Impact on Decision to Cut Stands The second set of comparative maps focused on determining the impact of attribute error perturbations on the decision to clearcut stands. These comparisons were achieved by combining each perturbed map with the original map to produce a map that provided all possible combinations of original versus perturbed suitability ranks. These 150 maps showing all possible combinations were then reclassified to produce 150 maps with eight categories indicating the impact on the decision to clearcut stands. Category 1 represented stands that had been eliminated and thus resulted in a cut loss (i.e., they were classified as suitable for cut on the original map but were eliminated on the perturbed map). Category 2 also indicated an eliminated stand; however, there was no cut loss since the stand was designated as unsuitable for cut on the original map. A newly represented stand that indicated suitability to cut on the perturbed map but did not exist on the original map (false suitability) was classified as category 3 while category 4 was a newly represented stand with a non-cut rank value. Categories 5–7 indicated different impacts associated with misclasstfication: 5
448
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
signified misclassification resulting in cut loss; 6 represented misclassification resulting in false suitability to cut; and 7 indicated a misclassified stand with no change in the decision to cut. Finally, category 8 showed no change in suitability ranks between the original map and perturbed map. Once all 150 maps were reclassified according to these criteria, the best and worst case maps for each set of 25 maps were chosen based on the greatest and least amount of error in terms of cut decision. In other words, the best case maps depict the least amount of deviation from the original suitability map while the worst case maps indicate the most deviation from the original suitability maps, particularly those indicating the greatest degree of impact on the decision to cut. The emphasis in determining the worst case maps focused on categories 1 and 5 indicating cut loss, and categories 3 and 6 indicating false suitability. Best and Worst Case Maps for Six Perturbation Cases
Table 35.1 summarises the percentage of area affecting the decision to cut stands in terms of stands signifying cut loss and false suitability to cut as well as the volumes associated with the changes in the decision to cut stands. Each of the six test cases is discussed in further detail below. Table 35.1: Comparison of best and worst case maps for various perturbation sets based on the impact on the decision to cut stands in terms of percentage of area and volume of cut Percentage of Area Impacted
Volume (in cords)
Best/Worst Case Maps
Cut loss
False cut
Total
Cut loss
False cut
Total
Best age 25% Worst age 25% Best age 5% Worst age 5% Best site index 25% Worst site index 25% Best site index 5% Worst site index 5% Best basal area 25% Worst basal area 25% Best basal area 5% Worst basal area 5%
3.27 12.93 0.00 6.23 2.72 12.40 0.00 2.07 0.00 6.01 0.00 1.66
1.81 12.35 0.00 2.94 0.31 6.25 0.00 4.50 1.33 4.48 0.00 0.00
5.08 25.28 0.00 9.19 3.03 18.65 0.00 6.57 1.33 10.49 0.00 1.66
2184 6274 0 2540 2429 11933 0 1693 0 2442 0 972
682 8709 0 556 210 5531 0 5026 448 1970 0 0
2866 14983 0 3096 2639 17464 0 6719 448 4412 0 972
• Age 25 percent perturbations: Figure 35.4 depicts the worst case map for the set of perturbations associated with the age attribute and the ±25 percent maximum expected error and resulted in 25.28 percent of the area impacting the decision to clearcut stands. More specifically, 12.93 percent of this area was misclassified and resulted in cut loss while 12.35 percent of the area was characterised as suitable to cut when the stands on the original suitability map represent non-cut areas. Of the 12.35 percent characterised by this false suitability, 5.14 percent was attributed to the designation of a new stand not found on the original map and 7.21 percent resulted from misclassification. This worst case map represents the greatest impact on the decision to cut among all the maps generated for comparison. The best case map had 5.08 percent of its total area affecting the decision to cut stands; 3.27 percent represented misclassification that resulted in cut loss and 1.81 percent represented misclassification that
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
449
Figure 35.4: Worst case map for perturbation associated with the age attribute subjected to ±25 percent error level.
signified false suitability. Both maps showed other stands that had been eliminated, newly represented, or misclassified but these did not have an impact on the decision to cut stands. • Age 5 percent perturbations: The worst case map chosen from the set of perturbations associated with the age attribute and a maximum error level of ±5 percent showed that the area affecting the decision to cut amounted to 9.17 percent; 6.23 percent of this was attributed to misclassification resulting in cut loss while 2.94 percent represented false suitability to cut. This map represented the biggest impact on decisions to cut among the worst case maps at the ±5 percent error level. One newly represented, as well as two misclassified stands, appeared; however, these did not influence the cut decision. The best case map for this set of perturbations had only one stand that was misclassified and represented no change in the decision to cut. • Site index 25 percent perturbations: The worst case for the site index attribute perturbed at the ±25 percent error level possessed the second highest impact on the decision to cut totalling 18.65 percent of the area. The majority of this affected area was attributed to several eliminated stands (12.4 percent of the area) that resulted in cut loss. The remaining 6.25 percent of the area represented a false suitability designation (three new stands accounted for 5.6 percent while two misclassified stands comprised 0.65 percent of this false suitability). The best case map had a total of 3.03 percent of its area that impacted
450
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
the decision to cut with 2.72 percent attributed to eliminated stands representing cut loss and one new stand (0.31 percent) classified as a false cut. • Site index 5 percent perturbations: The second highest impact on the decision to cut among the ±5 percent error level perturbations occurred on the worst case map for site index. Eliminated stands indicating cut loss accounted for 2.07 percent of the area while 4.5 percent represented two new stands with false suitability designations. The best case map for this set of perturbations indicated no impact on the decision to cut but did include one eliminated stand, three new stands, and one misclassified stand. • Basal area 25 percent perturbations: The worst case map for this set of perturbations had the least impact on the decision to cut in comparison with the other worst case maps perturbed at the ±25 percent error level. The total area impacting the decision to cut amounts to 10.49 percent; 6.01 percent was misclassified resulting in cut loss while 4.48 percent was misclassified as falsely suitable to cut. The best case map showed only 1.33 percent of the area impacting the decision to cut in the form of a misclassified stand representing false suitability to cut. This map had the least impact on the decision to cut among the best case maps at the ±25 percent error level. • Basal area 5 percent perturbations: The worst case map for this map set had only 1.66 percent of the area that affected the decision to clearcut. This impact was in the form of one misclassified stand that resulted in cut loss. Among the worst case maps at the ±5 percent error level, this one represented the least impact on the decision to cut. The best case map for these perturbations indicated no change in suitability ranks between the original and perturbed maps. Overall, the age variable displayed the most sensitivity to introduced random error based on the percentage of area impacting the cut decision and the frequency of change experienced by stands on the change map, particularly at the ±25 percent error level. Site index displayed an intermediate level of sensitivity to the introduced random error based on the percentage of area and frequency of change. Basal area per acre experienced the least overall change and impact in terms of the decision to clearcut stands. In terms of the total volume associated with changes in the cut decision, the worst case ±25 percent perturbation map for site index showed the largest amount equalling 17,464 cords. The worst case ±25 percent perturbation map for age followed with a total of 14,983 cords. Among the ±5 percent error cases, the worst case map for site index indicated a total of 6719 cords (a volume greater than the worst case ±25 percent perturbation map for basal area per acre). There were no impacts on the cut decision in terms of volume amounts for the best case ±5 percent perturbation maps associated with the age, site index, and basal area per acre attributes. These findings indicate that there were differences among attributes in terms of their sensitivity to introduced random error and the impact on the final model result. For this particular study, approximately 25 percent of the area on the worst case ±25 percent perturbation map for age adversely impacted the decision to clearcut stands. The explanation for the sensitivity of the age attribute is associated with two major factors: the critical cut-off values of the decision trees; and the predominant pulpwood species in the study site. A detailed examination of the specific characteristics of each of the stands identified on the best and worst case maps revealed that these stands were mostly comprised of black spruce, tamarack, and aspen cover types. Examining the decision trees for these specific cover types revealed not only the critical cut-off values for age (for black spruce and tamarack, a value of 80; for aspen, values of 20, 30, 45, and 55 (see Figure 35.3) but also that age is used to determine the cut-off values for basal area per acre. Additionally, the average stand age for the study site is approximately 52 years. For comparison, the decision trees for the least abundant operable cover types in the study site—namely, black ash and balsam fir—do not use age to determine cut-off values for basal area. Additionally, the average value for basal area per acre is
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
451
approximately 62 square feet for stands within the study site, while the cut-off values are 100 square feet for black ash, and 20 and 40 square feet for balsam fir. 35.5 SIGNIFICANCE AND IMPLICATIONS OF RESEARCH With increasing use of GIS to support decision making for a variety of application areas, a critical issue associated with GIS research relates to the quality of the decisions made based on spatial model results. The significance of this study lies in its contribution to a better understanding of the influence of data quality on the results of spatial modelling that can be used to support resource management decisions. It also highlights a methodology for examining data quality impacts by illustrating the potential usefulness of geographic sensitivity analysis. The important implications of this work can be addressed through several strategies for dealing with uncertainty in GIS applications. 35.6 STRATEGIES FOR DEALING WITH ERRORS AND UNCERTAINTY IN GIS APPLICATIONS IN FOREST RESOURCES MANAGEMENT Based on the results of this study, several recommendations can be made regarding approaches for improving the reliability of spatial analysis using GIS. Some suggestions can also be made concerning useful features that might be incorporated into newly developed systems. These recommendations are presented as a series of strategies for dealing with errors and uncertainty in GIS. Since it is not possible to assess comprehensively the actual error present in a database, strategies must be developed to understand and deal with the error that exists in spatial databases better. Strategies include documenting data quality and tracking error, developing more advanced spatial decision support systems that have the capability to handle error and exploratory data analysis, developing visual error assessment methods to assist managers in understanding the potential impacts of data quality on decision-making results, and expanding studies that use sensitivity analysis to examine the impact of data quality on spatial model results. 35.6.1 Documenting and Tracking Data Quality An important initial strategy towards improving GIS reliability is to adopt procedures for minimising error and carefully documenting source data. Data quality standards need to be defined and documented at the outset of a project to ensure that managers have an understanding of the data and its limitations. National data standards in the United States have been developed by the National Committee for Digital Cartographic Data Standards in an effort to address data quality issues. In Europe, the Technical Committees of the International Standards Organisation and European Normalisation Institute have been addressing similar issues related to data quality standards. The development of data quality standards should help in at least achieving a better understanding and awareness of the potential limitations of the data and its use for decision making purposes. Careful documentation of metadata for various projects can help to inform a user as to the reliability of certain databases. Researchers have also been working on designing systems that can document source data and track lineage as data are manipulated during analysis (Lanter, 1991).
452
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
35.6.2 Spatial Decision Support Systems for Forest Resources Management A major limitation of GIS is their inability to enable decision makers to reflect their particular approach to decision making, i.e., they cannot accommodate various decision making viewpoints and processes. Spatial decision support systems (SDSS) expand on the capabilities of decision support systems by including a spatial component and represent a means of achieving management level applications including analytical and modelling techniques that are unique to spatial analysis. In addition to facilitating better decision making a truly effective SDSS should also incorporate some aspect of dealing with error or uncertainly in spatial databases to give a user some sense of the reliability of the spatial analysis. Heuvelink (1993) discusses a GIS/SDSS capable of dealing with uncertain data. Such a system would be capable of handling stochastic attributes such as means and standard deviations as well as visualising data uncertainty. A system could also be designed to include warnings when data are manipulated inappropriately. The actual development of such a system will be an important step in designing improved systems for dealing with error and uncertainty and facilitate more powerful decision making. 35.6.3 Visualising Data Quality and Its Impact on Decision-Making Research in the area of visualisation has made important contributions toward the understanding of error and uncertainty. The NCGIA sponsored Initiative Seven-Visualisation of the Quality of Spatial Data—that investigated the possible use and development of visualisation techniques in portraying spatial data quality. Howard (Chapter 41, this volume) discusses the development of an interface to represent data reliability. The development of methods using sensitivity analysis discussed in this chapter resulted in various maps that helped to visualise the possible impacts of data quality on model results. These findings resulted in a conceptual design for an interface that could be incorporated into a spatial decision support system (Figure 35.5). It represents a visual error assessment interface that would allow a user to assess in real-time the potential impacts of changes in attribute error and resolution on a final model result. Such an interface could also be used to examine other forms of error that were not specifically addressed in this study. This interface would allow a user to change the error or resolution levels through a series of sliding bars. As these bars move to indicate different error levels, the map image would change to show the potential impacts on the map. Another feature of the interface would allow a user to examine how specified decision criteria might change based on different types and degrees of error and could aid managers in achieving a better understanding of their data and its limitations. 35.6.4 Expanding Research in Sensitivity Analysis Another strategy involves expanding current research using sensitivity analysis to understand better the impacts of data quality on a variety of GIS applications. Many possibilities exist for expanding the analysis discussed in this chapter. Since this research represented a single test case (sample of one), a similar study site could be examined to test the consistency of these results. Other possibilities include expanding the types of attributes that were examined, examining the sensitivity of nominal scale data, adding different error percentages to determine more specifically if there is some critical point at which error has an impact on model results and implementing the analysis with different software. While the approach for this study isolated each attribute and then perturbed that attribute by a certain level of error, further research could
THE IMPACT OF DATA QUALITY ON FOREST MANAGEMENT DECISIONS
453
Figure 35.5: Conceptual design of a visual error assessment interface for inclusion in a spatial decision support system.
involve simultaneously perturbing a set of attributes since it is likely that these attributes would contain different levels of error. Additional research could examine the impact of different weight assignments on the results of the spatial suitability analysis. Examination of the impacts of data quality on decision making for different applications such as a wildlife habitat suitability model or a forest pest management model could also be tested to study the consistency of results. Such research could lead to the development of a GIS module to conduct exploratory data analyses prior to undertaking the final analysis. Expansion of sensitivity analysis to many different case applications should contribute to an increased understanding of the way source data and spatial models behave when subjected to introduced random error. REFERENCES BRAND, G. 1981. Simulating Timber Management in Lake State forests. St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. ERDMANN, G., CROW, T., PETERSON, R and WILSON, C. 1987. Managing Black Ash in the Lake States . St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. FISHER, P. 1991. Modelling soil map-unit inclusions by Monte Carlo simulation, International Journal of Geographical Information Systems, 5(2), pp. 193–208. FISHER, P. 1992. First experiments in viewshed uncertainty: simulating fuzzy viewsheds, Photogrammetric Engineering and Rremote Sensing, 58(3), pp. 345–352. GOODCHILD, M., GUOQING, S. and SHIREN, Y. 1992. Development and test of an error model for categorical data, International Journal of Geographical Information Systems, 6(2), pp. 87–104. HAINING, R and ARBIA, G. 1993. Error propagation through map operations, Technometrics, 35(3), pp. 293–305.
454
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
HEINEN, J. and LYON, J. 1989. The effects of changing weighting factors on wildlife habitat index values: a sensitivity analysis, Photogrammetric Engineering and Remote Sensing, 55(10), pp. 1445–1447. HEUVELINK, G. 1993. Error Propagation in Quantitative Spatial Modelling: Applications in Geographical Information Systems. Utrecht: Faculteit Ruimtelijke Wetenschappen Universiteit Utrecht. JOHNSTON, W. 1977. Manager’s Handbook for Black Spruce in the North Central States. St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. JOHNSTON, W. 1986. Manager’s Handbook for Balsam Fir in the North Central States St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. LANTER, D. 1991. Design of a lineage-based meta-data base for GIS, Cartography and Geographic Information Systems, 18(4), pp. 255–261. LAYMON, S. and REID, J. 1986. Effects of grid-cell size on tests of a spotted-owl HSI model, in Verner, J., Morrison, M. and Ralph, C. (Eds.), Wildlife 2000: Modeling Habitat Relationships of Terrestrial Vertebrates. Madison: The University of Wisconsin Press, pp. 93–96. LODWICK, W., MONSON, W. and SVOBODA, L. 1990. Attribute error and sensitivity analysis of map operations in geographical information systems: suitability analysis, International Journal of Geographical Information Systems, 4(4), pp. 413–428. LYON, J., HEINEN, J., MEAD, R. and ROLLER, N. 1987. Spatial data for modeling wildlife habitat, Journal of Surveying Engineering, 113, pp. 88–100. MACEACHREN, A. 1991. Visualization of data uncertainty: representational issues in Buttenfield, B. and Beard, K. (Eds.) 1991. Initiative Seven—Visualization of the quality of spatial data . Castine: National Center for Geographic Information and Analysis. MEAD, D. 1982. Assessing data quality in geographic information systems, in Johannsen C. and Sanders J. (Eds.), Remote Sensing for Resource Management. Ankeny: Soil Conservation Society of America, pp. 51–59. MINNESOTA DEPARTMENT OF NATURAL RESOURCES 1989. Forestry Information Systems Blueprint: Technical Document. St Paul: DNR, Division of Forestry. NCGIA. 1989. Initiative 1 Specialist Meeting Report. Santa Barbara: National Center for Geographic Information and Analysis. NCGIA. 1991. Initiative 7–Visualization of the Quality of Spatial Data. Buffalo: National Center for Geographic Information and Analysis. OPENSHAW, S. 1989. Learning to live with errors in spatial databases in Goodchild M. and Gopal S. (Eds.). The Accuracy of Spatial Databases. London: Taylor & Francis, pp. 263–276. PERALA, D. 1977. Manager’s Handbook for Aspen in the North Central States. St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. RAMAPRIYAN, H., BOYD, R, GUNTHER, F. and LU, Y. 1981. Sensitivity of geographic information system outputs to errors in remotely sensed data, in Burroff, P. and Morrison, D. (Eds.), Machine Processing of Remotely Sensed Data. West Lafayette: Laboratory for Applications of Remote Sensing, Purdue University, pp. 555–565. STOMS, D., DAVIS, F. and COGAN, C. 1992. Sensitivity of wildlife habitat models to uncertainties in GIS data, Photogrammetric Engineering and Remote Sensing, 58(6), pp. 843–850. TUBBS, C. 1977. Manager’s Handbook for Northern Hardwoods in the North Central States. St Paul: US Department of Agriculture Forest Service, North Central Forest Experiment Station. TURNER, M., O’NEILL, R, GARDNER, R and MILNE, B. 1989. Effects of changing spatial scale on the analysis of landscape pattern, Landscape Ecology, 3(3/4), pp. 153– 162. VEREGIN,H. 1989. Accuracy of Spatial Databases: Annotated Bibliography. Santa Barbara: National Center for Geographic Information and Analysis.
Chapter Thirty Six Probability Assessments for the Use of Geometrical Metadata François Vauglin
36.1 INTRODUCTION Vector geographical data bases (GDB) contain data dealing with geometry and space (shape and location) linked with descriptive or attribute data. As digitising processes are not perfectly deterministic, some problems due to imprecise geometry occur, such as slivers in overlay. This chapter is a theoretical study that aims to provide GDBs with coherent metadata on positional accuracy, as well as rules for the correct use of these geometrical metadata. The context of this work begins with a discussion on the “nominal ground” and quality controls (Section 36.2), which form the basis for the proposed modelling of geometric accuracy. A model for the geometric information comes from prior research (Vauglin, 1995) and is presented in Section 36.3. Metadata on positional accuracy are created by associating the coordinates with some probabilistic information that represents the positional uncertainty. In this way, the positional information is augmented by a statistical function representing the probability of each feature being at another location not too far from the recorded coordinates. This model is compared to other techniques that are useful for describing or handling positional uncertainty. New rules for handling geometric information with respect to the geometrical metadata are then designed. The data samples that are used for testing this approach come from the French BDTopo and BDCarto. The French Topographic data base (BDTopo), is a vector data base. For its production, aerial photographs are stereoplotted at the 1:25,000 scale. It can be used at all scales from 1:10,000 down to 1:50,000. One of its main purposes is the production of the 1:25,000 topographic maps of the French National Geographic Institute (IGN). BDTopo is made of two layers: one for the relief and one for all the other topographic features, like hydrography, buildings, roads, land-use, etc. The French cartographic database (BDCarto) is a vector database. It is digitised from the 1:50,000 maps. It is made of eight layers: road network, railways, hydrography, administrative boundaries, names, infrastructure, and land-use, plus one for the relief. BDCarto can be used at all scales from 1:50,000 down to 1:200,000. One of its main purposes is the production of the 1:100,000 maps.
456
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 36.1: “Nominal ground” and data quality. The “nominal ground” is an idealised perfect data base.
36.2 THE “NOMINAL GROUND” In BDTopo, a river is represented as a single line if it is less than 7.5 metres wide. This is one of the rules that are used for the digitising of the database. The specifications of the database, or the specified requirements, consist of an extensive collection of rules to be obeyed when digitising. Formally, “nominal ground” is defined as the real terrain viewed through the filter of the GDB’s specifications: Using the same example, the nominal ground contains a simple line for the river when its width is smaller than 7.5 m. In other words, if we call universe the set of all desired features, the nominal ground is the abstracted view of this universe, as shown on Figure 36.1. This means that our nominal ground is an idealised content of the GDB, up-to-date, exhaustive, without any topological inconsistency, abiding by every specification and committing no semantic nor geometrical error. This idea is described by Aalders and Morrison, (Chapter 34, this volume) as well as in the literature (Bender, 1993; CNIG, 1993: GOM, 1991; Goodchild, 1987), and in most of the standards on geographical data quality (CEN, 1996). This kind of abstraction may seem complicated, but it turns out to be of very practical use, because assessing the data quality means comparing the nominal ground with the data base. A study is ongoing at the COGIT Laboratory at the French National Geographic Institute (IGN) which aims to model the positional accuracy in GDBs. An important point is that the modelling and the exploitation of the modelling should keep their roots in the reality of measuring geographical information. The next section presents briefly the results of this work. For an extensive presentation, see Vauglin (1996) and Yousfi (1995).
PROBABILITY ASSESSMENT FOR THE USE OF GEOMETRICAL METADATA
457
36.3 EXPLOITATION OF QUALITY CONTROL DATA 36.3.1 Generation of inaccuracy Digitising of points is an uncertain process. It is comparable to the result of a dart player aiming at the bull’s eye: the distribution of the shots is often bell-shaped, which resembles a Gaussian distribution. The digitisation of points is a source of positional inaccuracy, but any other stage of the production process generates errors too (e.g. geodetic calculus, projections, formatting the data, etc.). The Gaussian distribution was chosen to cope with the inherent inaccuracy of the data by considering the positional error of coordinates as a random variable. Once one is aware of this fact, the second step is to know how these errors are actually dispersed: what is the probability density function for the error? This approach naturally leads to statistical studies and probability-based modelling (Keefer et al., 1988, 1991; Marble et al., 1984; Merkitas, 1994). 36.3.2 Firsts results of quality control Controls of quality of BDCarto have already been implemented, especially controls of positional accuracy of the land-use layer. The geometric references used to emulate the nominal ground are SPOT images. Similar results have been obtained on quality controls of BDCarto using BDTopo as geometric reference, or of BDTopo using field measurements as reference. In each case, the differences between the reference and the data set with unknown positional accuracy are evaluated by the Hausdorff distance (Vauglin, 1996). Histograms of distances are computed, on which a statistical analysis enables the type of probability density function (pdf) that best describes the behaviour of the positional accuracy to be chosen. The resulting pdf is then validated by the Kolmogorov test. For a detailed description of these statistical techniques, see Tassi (1989) and Isaaks and Srivastava (1989). The features on which quality controls have been performed are linear features (polylines). A similar methodology could be used for points. As shown by Vauglin (1996), the pdf describing the statistical behaviour of positional accuracy of the linear features can be expressed as the sum of a Gaussian distribution and a symmetric exponential distribution: (36.1) The first part of Equation 36.1 is a Gaussian distribution. The second part is a symmetric exponential distribution, β is the ratio of the Gaussian distribution in the pdf f, therefore (1-α ) is the ratio of the symmetric exponential distribution, β is the standard deviation of the Gaussian distribution, and α is the parameter of the symmetric exponential distribution, m is the bias of the data set. As pdfs f of type (1) are cumulative distributions of Gaussian and Symmetric Exponential distributions, they will be called “GSE distributions”. For a better understanding of this kind of distribution, it is useful to display a Gaussian distribution (Figure 36.2A), a symmetric exponential distribution (Figure 36.2B), and the cumulated pdf f of Equation 36.1 (Figure 36.2C). The symmetric exponential distribution decays much more slowly than the Gaussian
458
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 36.2: GSE pdf computed on Perpignan in the South of France, compared with a Gaussian distribution and a symmetric exponential distribution. Note that the GSE has relatively long branches in comparison to the Gaussian distribution.
distribution. It enables the modelling of higher positional inaccuracies than is possible with a Gaussian distribution. Hence a rough interpretation of Equation 36.1 is that the Gaussian part stems from the digitising error, which is often described in the literature as being a Gaussian distribution when digitising points (Hord and Brooner, 1976; Rosenfield and Melley, 1980), while the other part stems from some degradation of the geometry (such as generalisation processes). The ratio α between these two parts usually varies between 60 percent and 100 percent (100 percent corresponds to a pure Gaussian pdf). The lower values for β (60 percent) are reached when about 40 percent of the pdf is Symmetric Exponential. This was obtained on the comparison between BDCarto land-use layer and SPOT images because the land-use layer contains much generalised data. In other cases, the symmetric exponential contribution is often lower, between 5 and 15 percent which corresponds to 95–85 percent for a. Figure 36.2 plots f and its normal and symmetric exponential components with α =0.72%, α =0.30, α =0.68, and m=0. The value of m is often found to be zero on large data sets. This means there is no bias.
PROBABILITY ASSESSMENT FOR THE USE OF GEOMETRICAL METADATA
459
Figure 36.3: Epsilon-band and its associated pdf.
This analysis gives a good idea of the origin of the phenomena, but it cannot be totally exact: the digitising and the changes arising from generalisation are probably correlated. What is more, a cumulated contribution of the two processes would not give in most cases a sum of the digitising pdf (Gaussian) and the generalisation pdf (symmetric exponential) as a result. That is why this interpretation gives an idea of where the two parts come from, but it cannot be considered as definitive. Furthermore, considering the symmetric exponential part as the effect of generalisation processes hides the fact that there are faults (gross errors). It seems more probable that the symmetric exponential part should be divided into two parts (maybe exponential or alike). The next section makes a comparison between the modelling of positional uncertainty using GSE and older models like Gaussian noise or epsilon-bands. 36.4 COMPARISON WITH OTHER DESCRIPTIONS OF POSITIONAL UNCERTAINTY It is often seen in the literature that positional uncertainty of points can be described by a Gaussian distribution or be handled with the help of an epsilon-band (see Chrisman, 1983; Dougenik, 1980; Edwards, 1994; Goodchild, 1977; Perkal, 1956; Peuker, 1976; Pullar, 1993; Zhang and Tulip, 1990). The same technique as presented in Section 36.3.1 was used to test the statistical validity of a modelling uncertainty with Gaussian or epsilon methods. Gaussian distribution for uncertainty is easy to compare with the GSE distribution (Equation 36.1) because a Gaussian is a particular GSE where α =l. This value can be obtained on point features like crossroads, but on most of the linear data samples, β had a value that is statistically different from 1. This means that GSE is a more detailed and more generic description of uncertainty than Gaussian distributions can be. Epsilon-bands can be interpreted as very simple distributions with equiprobability within the epsilonband and a probability of zero outside the epsilon-band (see Vauglin, 1994). The normalisation of this distribution enables us to compute the value of the equiprobability density within the band. In this way, a pdf associated with an epsilon-band can be defined as in Figure 36.3: Therefore, the corresponding probability density function f for the location of the point in the epsilonband is:
460
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(36.2) In Equation 36.2, α is the radius of epsilon-band and fα (x) is the pdf. None of our data samples could fit to this kind of distribution. Even so, the successful use of epsilonbands can be explained by considering the band as a confidence interval: the GSE distribution is a complex function that can be properly used through the definition of an adapted confidence interval. This simplification gives a description of uncertainty very similar to epsilon-bands. This may explain their positive results and their limitations because the statistical basis was never assumed. For example, the tests for matching points are simply based on the calculation of the geometric intersection between the epsilonbands (Pullar, 1993; Zhang and Tulip, 1990). That purely geometric procedure is very common, but it does not rely on the confidence interval nature of the epsilon-bands and results may be of unknown accuracy. The setting of relevant epsilon for the resulting point remains an unresolved problem. (Harvey and Vauglin, 1996; Schorter et al., 1994). The next section focuses on the possible use of the GSE distribution. GSE is first evaluated in terms of potential metadata. The use of the corresponding metadata is presented for the merging of two points— including proof. An example of results obtained is then given. 36.5 IMPROVING KNOWLEDGE ON GEOMETRICAL QUALITY WITH THE HELP OF METADATA 36.5.1 Choosing metadata Some important work has been carried out recently aiming at defining what metadata should be stored to describe the quality of the geographical data sets (CEN, 1996). It is very common to use the mean distance, the variance of distance, the standard deviation of distance, the 95 percent range value of distance, etc. to assess the absolute positional accuracy. This corresponds to parameters describing the statistical behaviour of the data set or the distance between the data set and the nominal ground. The complete mathematical description of a statistical behaviour is given by the pdf (Isaaks and Srivastava, 1989; Tassi, 1989). This means that the above four parameters-mean, variance, standard deviation, and 95 percent range value—can be computed from the pdf. The opposite is not possible: it is impossible to compute the pdf having just the above four parameters as input. Thus, the pdf GSE is the best description of statistical behaviour of absolute positional accuracy. Assessing the parameters α , α , α , m of Equation 36.1 gives the pdf GSE. The advantage to use GSE instead of mean, variance, standard deviation, and 95 percent range value is that it implies no supplementary metadata compared to the set of four parameters, but it gives richer information, α , α , α , m of Equation 36.1 will be used and referenced as metadata on positional accuracy in the following sections of this chapter.
PROBABILITY ASSESSMENT FOR THE USE OF GEOMETRICAL METADATA
461
36.5.2 Use of metadata through probabilistic analysis Defining a standard for geographical information is a long process in which most of the efforts aim to provide good definitions, including for data quality matters. One of the greatest difficulties in establishing standards is in the definition of the metrics for metadata. The European Normalising Committee (CEN, Comité Européen de Normalisation) has chosen to give no metrics (CEN, 1996). How to use metadata may be very obscure to users, as reported in Timpf et al, (1996). This section gives paths to the use of metadata α , α , α , m defined in Equation 36.1 for positional accuracy. As an example, it focuses on a very common operation of GIS based on geometry: the merging of points. The functions of type (1) represent the density of probability f(x) for a digitised point to be at a distance x from its nominal counterpart. Hence, the probability for the point to be at a distance between a and b from its nominal counterpart is f(x)dx (Tassi, 1989). Let there be two points P0 = (X0, Y0, Z0) and P1=(x1, y1, z1) respectively with (α 0, α O, α 0, m0) and (α 1, α 1, α 1, m1) as metadata. If P0 and P1 are coming from two different layers and they have close positions, they might be the same. Three problems are then raised: 1. How to decide whether P0 and P1 are actually the same point P = (x, y, z) with (α , α , α , m) as metadata? (matching) 2. If they are proven to be the same, how to assess the seven resulting parameters x, y, z, α , α , α , m? (merging) 3. How to compute the position of the resulting point? A position can then be derived once (α , α , α , m) are computed. For that problem of merging points, the input data are the former location of the points and their pdfs, while the required output data are the new location(s) and the corresponding pdf(s). The following solves (2) and (3) of the above points. The first step is to find a list of possible operations based on the most probable solution. Given two pdfs GSE (see Equation 36.1) fA and fB of two points A and B, it is possible to compute the probability Pp for the two points A and B to be within a small disk of radius pα 0 (as the coordinates are considered as a continuous random variable, the probability for the two points to be at the same place is zero). In the one-dimensional case, and supposing that FA and fB are non-correlated: (36.3) If j is a convolution kernel, let * denote the convolution product, defined as follows: (36.4) For the computation of the probability Pp for the two points A and B to be within a disk of radius r, the convolution kernel Cr is of type of Equation 36.2 if p>0 and a Dirac distribution if p=0. Therefore, let Cp be the normalised convolution kernel: (36.5)
462
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
where α is the Dirac distribution. Pp can then be expressed as follows: (36.6) In order to use the resulting distribution Pp (Equation 36.6) as a pdf for future assessment of the new quality parameters, it has to be normalised. Thus, it is possible to write the resulting pdf R as follows:
(36.7)
Hence, (36.8)
36.5.3 Examples The last form is now easier to manipulate for finer exploitation of Pp since we just have one parameter: p. Of course, the convolution kernel should stay smaller than the pdf itself. There is a point in computing Pp only when p is lower than α or β Figure 36.2C shows the pdf fA obtained on Perpignan (in the South of France) with β = 0.72 percent, α = 0.30, α = 0.68, and m = 0 (with 35,000 measurements). Another set of values for similar data could be obtained:, pdf defining fB. Figure 36.4 permits a comparison of the two pdfs. The product of fA and fB is also represented. It has been computed with a radius p of 0.1 pixels, or 2 meters (C0.1 in Equation 36.6). The major hypothesis here is the non-correlation between the variables. This is a critical point. Pp (x) represents the probability of concomitant presence of both points within a disk of radius p centred on x. This kind of calculus can be performed even when the means of the variables are different. In Figure 36.5 there are the same fA and fB but there is a shift of the pdf fB of 2 pixels (the means m differ by 40 meters: m=2 pixels). The product of fA and fB has also been computed with p = 2 meters. In that example, the resulting point P would have the following position: β = 0.89 with the resulting metadata β =0.81; β =0.82; α =0.76; m=0. The above calculus have been made only for the x-coordinate. The same rationale can be made for theycoordinate. Figure 36.6 is the plot of the resulting pdf of the same example, where shift of fB is of 2 pixels along x, but 0 along y. For questions of homogeneity, the resulting R function can easily be adjusted to a GSE distribution using the Kolmogorov test as in Section 36.3.2 (Tassi, 1989). Figure 36.6 plots the resulting distribution after its
PROBABILITY ASSESSMENT FOR THE USE OF GEOMETRICAL METADATA
463
Figure 36.4A: fA and fB Figure 36.4B: Corresponding R
Figure 36.5A: fA and fB (convolved by a C0.1) Figure 36.5B: Corresponding R (g was shifted by 2 pixels)
fit to a GSE. That method enables us to produce another GSE, which is important for mathematical stability matters. 36.6 CONCLUSION The method presented in this chapter for modelling the positional accuracy is coherent with the specificity of the geographic data and aims to provide tools that are justified by and for such data. Everything is based on
464
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 36.6: Resulting GSE distribution
actual data quality controls. It could be useful to develop more testing sets for assessing the pdf by feature class systematically. A part of this theoretical research has still to be done before codifying the way the product presented here has to be used so as to decide when points should be merged. The way to compute the threshold of separation of two points must be defined. Similar reasoning has been successfully applied to other operations based on geometry, such as computation of distance. This chapter was intentionally limited to positional accuracy of points and lines. A closer inspection of matters of relative or absolute accuracy should be included in future research, as well as the modelling of positional uncertainty for surfaces. The major contribution of this study is to prove it is possible to assess the probability density function of positional accuracy in GIS and to handle it using probabilistic reasoning. That means it is possible to have a complete description of statistical behaviour of positional uncertainty using the pdf GSE. It was also proved that this modelling provides more useful metadata with no increase of information stored when compared with the ISO standard project on metadata and quality evaluation procedures. Section 36.5 presented how the GSE pdf can be used for merging points. In fact, every GIS operation based on geometry has to be redefined using the same rationale for having the entire metric of these metadata. ACKNOWLEDGEMENT: This study was conducted in the COGIT laboratory of IGN, within the French Program for research in GIS (PSIG). It is closely linked with a French research network called Cassini. Within this framework, research efforts on quality issues embrace far more facets of data quality than positional accuracy (Cassini, 1995).
PROBABILITY ASSESSMENT FOR THE USE OF GEOMETRICAL METADATA
465
REFERENCES BENDER, L. 1993. Spatial Data Quality, An Overview. Internal Report of the IGN. Paris: IGN. CASSINI, 1995. Compte-rendu des recherches menées par le PSIG. Paris, http://lieu.univ-mrs.fr/GDR-CASSINI/ rapport95.html CEN (Comité Européen de Normalisation). 1996. Draft prEN 287008, Information Géographique Description des Données—Qualité. Paris: AFNOR—Association Française de Normalisation. CHRISMAN, N. 1983. Epsilon filtering: a technique for automatic scale changing, in Proceedings of 43rd annual meeting of ACSM. Washington DC, 1983, pp. 322–331. CNIG (Conseil National de l’Information Géographique). 1993. Qualité des Données Géographiques Échangées. Paris: IGN. DOUGENIK, 1980. Whirlpool: a program for polygon overlay, in Proceedings of Auto Carto 4, pp. 304–311. EDWARDS, G. 1994. Characterising spatial uncertainty and variability in forestry data bases, in Congalton, R. (Ed.), Proceedings of the International Symposium on the Spatial Accuracy of Natural Resource Data Bases, 16–20 May, Williamsburg, Virginia. Bethesda, Maryland: ASPRS, pp. 88–97. GROUPE ORGANISATION ET MÉTHODES (GOM). 1991. La Qualité de l’Information Géographique Numérique à l’IGN, Internal report of the IGN, version 1.0. Paris: IGN. GOODCHILD, M. 1977. Statistical aspects of the polygon overlay problem, in Harvard papers on GIS, Vol. 6. Reading, MA: Addison-Wesley. GOODCHILD, M. 1987. A model of errors for choroplet maps, with applications to GIS, in Proceedings of Auto Carto 8, pp. 165–174. HARVEY, F. and VAUGLIN, F. 1996. Geometric match processing: applying multiple tolerances, in Kraak M.J. and Molenaar M. (Eds.,) Proceedings of 7th International Symposium on Spatial Data Handling. Delft, 12–16 August. Delft: Faculty of Geodetic Engineering, pp.4A13–4A30. HORD, R.M. and BROONER, W. 1976. Land-use map accuracy criteria, in Photogrammetric Engineering and Remote Sensing, 42(5), pp. 671–677. ISAAKS, E.H. and SRIVASTAVA, M.R. 1989. An Introduction to Applied Geostatistics. New York: Oxford University Press. KEEPER, B.J., SMITH, J.L. and GREGOIRE, T.G. 1988. Simulating manual digitizing error with statistical models, in Proceedings of GIS/LIS’88, pp. 475–483. KEEPER, B.I, SMITH, J.L. and GREGOIRE, T.G. 1991. Modelling and evaluating the effects of stream mode digitizing errors on map variables, in Photogrammetric Engineering and Remote Sensing, 57(7), pp. 957–963. MARBLE, D.F., LAUZON, I and McGRANAGHAN. 1984. Development of a conceptual model of the manual digitizing process, in Proceedings of International Symposium on Spatial Data Handling, Zürich, August, pp. 146–171. MERKITOS, 1994. Description of accuracy using conventional and robust estimates of scale, in Marine Geodesy, 17 (4), pp. 251–269. PERKAL, 1956. On epsilon length, in Bulletin de l’Aacadémie Polonaise des Sciences, Vol. 4, pp. 399–403. PEUKER, 1976. A theory of cartographic line, in International Yearbook of Cartography, pp. 134–143. PULLAR, 1993. Consequences of using a tolerance paradigm in spatial overlay, in McMaster R. (Ed.), Proceedings of Auto Carto 11. Minneapolis, 30 October -1 November. Bethesda: ASPRS, pp. 288–296. ROSENFIELD, G.H. and MELLEY, M. 1980. Applications of statistics to thematic mapping, Photogrammetric Engineering and Remote Sensing, 48(1), pp. 1287–1294. SCHORTER, G. RAYNAL, L. and VAUGLIN, F. 1994. GéO2: module de superposition, in Laurini, R and Servigne (Eds.), Proceedings of Les journées de la recherche Cassini, Lyon, 13–14 October, pp. 251–261. TASSI, P. 1989. Méthodes Statistiques, collection Économie et Statistiques Avancées (ESA), série École Nationale de la Statistique et de l’Administration Économique (ENSAE). Paris: ECONOMICA.
466
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
TIMPF, S., RAUBAL, M. and KUHN, W. 1996. Experiences with metadata, in Kraak M.I and Molenaar M. (Eds.), Proceedings of the 7th International Symposium on Spatial Data Handling, Delft, 12–16 August. Delft: Faculty of Geodetic Engineering, pp. 12B31– 12B43. VAUGLIN, F. 1994. Modélisation de localisation, in Proceedings of Sixième journée de la recherche du CNIG (Conseil National de l’Information Géographique), Montpellier, 31 May. VAUGLIN, F. 1995. Approche probabiliste pour des métadonnées sur la géométrie des objets géographiques, in Proceedings of Journées de la recherche CASSINI, Marseille, 15–17 November. VAUGLIN, F. 1996. La qualité géométrique: un état, des concepts, des outils. Paris: IGN, Cours de DE A SIG. YOUSFI, K. 1995. Mesure de la précision géométrique dans les bases de données géographiques, rapport de stage. Paris: IGN. ZHANG, G. and TULIP, I 1990. An algorithm for the avoidance of sliver polygons and clusters of points in spatial overlay, in Brassel, K. and Kishimoto (Eds.), Proceedings of 4th International Symposium on Spatial Data Handling, Zürich, 23–27 July. Vol. 1, Ohio: International Geographical Union, pp. 141–150.
Chapter Thirty Seven Assessing the Quality of Geodata by Testing Consistency with Respect to the Conceptual Data Schema Gerhard Joos
37.1 INTRODUCTION The quality report of a dataset in a geographic information system (GIS) is very important information for a potential user to enable him or her to decide whether the data meets specific demands. Quality is described by a set of the quality parameters completeness, correctness, consistency and accuracy (Veregin, 1989; Lanter and Veregin, 1992). Once they are defined, methods must be applied for checking particular datasets to find data errors and obtain values for the quality parameters. Consistency is a basic requisite for handling geodata. Consistency can be examined at different levels corresponding to the data model. Therefore data modelling will be explained in more detail. Modelling of geographic data is based on a hierarchical structure of levels, each defined by a certain data schema (Figure 37.1).
Figure 37.1 : Different Levels of Data Modelling: Physical, Logical and Conceptual Data Schema
At the lowest level there is the physical data schema, in which data-handling specific to a particular GIS software is defined, e.g. internal file structure and interfaces. On top of the physical data schema is the logical data schema, in which the type of data representation is defined. It can be raster or vector based, administered as objects in an object-oriented or relational environment, and it can have different geometric primitives to represent the geometrical information. The conceptual data schema defines what types of features
468
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
of the real world are to be captured and their properties (CEN/TC 287, 1995a, 1995b). Modelling rules (e.g. minimum sizes, geometric representation) and quality objectives are laid down at this level. These levels of data modelling show the relationship between geodata, hardware and software, since the view someone has about geodata with respect to a certain data model is always realised in a certain format (internal GIS format, system-dependent or standardised exchange format). A GIS user, who wants to model his or her view of the real world, must therefore know about the abilities and restrictions of the GIS software (Joos, 1994). The European Standardisation Committee for Geographic Information, CEN/TC 287, suggests describing data quality by looking at the difference between the actual geodata and the abstract view of the real world, given by a particular data schema (CEN/TC 287, 1994, Aalders and Morrison, Chapter 34, this volume). Considering this definition of data quality, these different levels of modelling have to be taken into account, when the quality of geodata is assessed. In several standards and scientific publications dealing with quality in GIS, logical consistency is listed with accuracy and completeness as criteria which describe the quality of a dataset. Logical consistency defines how well a particular dataset meets the specifications in terms of internal structure on the level of the logical data schema (Kainz, 1995). These include correct topology, adherence to appropriate values for variables and attributes and consistency of the file structure itself. Whereas logical consistency is a fundamental requirement for uses of a dataset, the data also has to meet the specifications given by the conceptual data schema as a prerequisite for deriving reliable information from a GIS for a particular application. When a potential user examines the data specification, especially the object definitions provided, in order to judge whether a particular data set fulfils his or her needs, the user must rely on the assumption that this dataset coincides with its specification. This chapter deals with consistency with respect to the conceptual data schema. This special aspect of quality is hereafter referred to as conceptual consistency. It must not be mixed up with the consistency of the conceptual data schema itself, since this is not a matter of data quality, even if it has direct influence, but a matter of model quality, which is not the objective of this chapter. Model quality describes how appropriately a data model represents phenomena of the real world. The idea of introducing a richer set of consistency checks was published by Chrisman in 1991: “A simple computation of point-in-polygon could avoid placing buoys on dry land or rivers outside their floodplain. Although these seem ridiculous, a GIS database will contain such errors until the full analytical power of the tool is turned back against its own database” (p. 172). The objective of this chapter is to introduce a new language for describing formally the rules of the conceptual data schema and to test the objects of a given dataset with respect to these rules. The new language for formulating rules and restrictions is called FRACAS (formal rules for assessing the consistency with respect to application schemata). 37.2 THE CONCEPTUAL DATA SCHEMA The conceptual data schema provides instructions on how to represent objects of the real world in the GIS. It is usually comprised of a list of object class names, definitions of these object classes, the sub- and supertypes referring to this object class, their attributes with definitions, general rules concerning all object classes and special rules related to a particular object class only.
ASSESSING THE QUALITY OFGEODATA WITH RESPECT TO THE CONCEPTUAL DATA SCHEMA
469
Figure 37.2: External Consistency: a) identical data schemata, adjacent regions, b) different data schemata, identical regions, and c) identical data schemata, identical regions
General rules determine the relationship between object classes and how objects or parts of objects must be formed—they also determine which interpolation methods of line features or borders of area features are allowed. These rules also regulate the handling of user-defined, unique object identifiers. Default values for accuracy and minimum sizes, whether to include instances of objects in the GIS, are given. Special rules assign attributes to the particular object classes, they give minimum sizes, if different from the default values, and they specify which geometric primitives may be used to represent the shape of real objects. Special rules determine which attributes are mandatory, i.e. which attributes must have a valid value, and they give ranges of values for certain numerical attributes. 37.3 INTERNAL AND EXTERNAL CONCEPTUAL CONSISTENCY Quality parameters refer to certain data, which can be a single data set or several data files, which can be based either on the same or different data models. If there is a single data set, consistency refers to objects, their attributes and relations between objects of that data set. Then, we speak about internal consistency with respect to its data schema. Physical and logical consistency can only be internal. If there are several data sets, they can cover either distinct areas or they can be adjacent or congruent in their spatial extension. For completely distinct data sets it does not make sense to speak about consistency between them. But adjacent data sets based on the same conceptual data schema must fit at their common border in terms of objects, which continue seamlessly and with the same attribute values unless there are reasons for change (Figure 37.2a). When you investigate external consistency you need to know whether the data is divided into tiles delimited by coordinate lines or whether it includes complete objects only. Data sets can cover the same region or have at least some area in common. Their data schemata may be completely different or comparable (which means they deal with the same topic). In the case of different data schemata the geometric representation must still be consistent, even if they refer to different topics. First of all the coordinates must belong to the same coordinate reference system. They should then be comparable in accuracy and resolution (Figure 31.2b). Even if two data sets of the same region are based on identical data schemata, they will not be identical due to different data sources, object interpretation, freedom of setting vertices at data capturing and coarse errors. If two such data sets were used simultaneously for analysis and deriving information, misinterpretations would be unavoidable. Overlays produce sliver polygons, line features have different shapes, because the number and position of vertices will differ, and point features are missing or positioned at different locations (Figure 37.2c). This occurs when two data sets with the same thematic content, e.g. traffic objects like roads, captured with different sets of attributes, and some attribute values, e.g. street names, are transferred to the corresponding objects of the other data set. As long as there is no other means of identifying identical objects, the geometry of the objects must serve as the reference. Another example: if
470
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
there are two different data sets with different levels of accuracy and resolution, using the geometry of the more accurate one will increase the precision of objects in the coarser one without altering its attributes. External consistency has to be tested. But it is not necessary to introduce specific rules for external consistency, since it can be treated with the same functionality as internal consistency when the data sets are merged. 37.4 USING THE RULES OF THE CONCEPTUAL DATA SCHEMA FOR TESTING GEODATA ISO 9000 is a series of international standards for quality management (ISO, 1992b, and 1992c). Its vocabulary refers to the previous standard ISO 8402 (ISO, 1992c). ISO 9000 is cited whenever quality management for production or services is concerned. It does not explain how to introduce a quality management system, but it stipulates that whenever quality management is fixed within a handbook, everybody involved in the production process has to keep to those rules. Applied to geodata, ISO 9000 stipulates that it must be proven that the data reliably fulfils the stated rules of the data schemata. The rules of a conceptual data schema are usually written in normal language, because they are intended as instructions for institutions who are digitising data or as data description for potential or actual users. If the rules are specified in a formal language, testing can be done automatically. This is achieved using queries. The queries investigate the database for objects that do not comply with the conceptual data schema, i.e. the search condition is the logical negation of the corresponding rule. The rule can be formulated using the GIS software’s macro, programming or query language—assuming, of course, that the software is powerful enough to formulate all these conditions (especially when spatial operators are required). The programming code can get so complex that a normal GIS user might be able neither to read it nor change existing rules, let alone add further ones. For that reason it is better to introduce a systemindependent language, which the user can read and edit. This chapter suggests what such a formal language can look like. Using FRACAS the user does not necessarily need to have a deep knowledge of the GIS software in order to modify or create new rules within the existing functionalities. Because the same testing routines can be implemented in different geographical information systems, it is possible to compare results from quality checks even if they are executed on heterogeneous systems. System-independent rules must be transformed in such a way that the analysis tools of the target GIS software can handle them. A conversion tool is required for this step. The conversion can be performed by another program which analyses each item to see whether it is an object class name, an attribute name, a relationship between them or a statement composed of reserved words, known by the converter as stated rules. The “UNIX” operating system includes tools for writing these kinds of programs. The tools are called “lex” and “yacc”. They can be used for checking syntax, dividing a file into its smallest logical units (morphemes), recognising variables, reserved words and separators. As output they create a C-program, which transforms the predefined syntax for conceptual consistency rules into a query program for the GIS software. The compiled C program can be used to translate the system-independent rules into queries. It does not need to be modified as long as the reserved words, which represent the principal rules, do not change. Using a standard ASCII text editor, the user can easily adapt system-independent rules to different conceptual schemata.
ASSESSING THE QUALITY OFGEODATA WITH RESPECT TO THE CONCEPTUAL DATA SCHEMA
471
37.5 FORMAL RULES AND TESTING 37.5.1 Unique object identifier Every instance of an object class must be addressed in an unmistakable way. This is especially important if the database exists more than once, e.g. in different institutions. In this case, the system’s internal identifier cannot be used. Therefore, a user-defined identifier must be assigned. Objects change, disappear and new objects are captured. The database management systems of the other GIS mirroring the primary system must keep track of which objects’ attributes or geometry have to be modified, which object has to be deleted or what the identifier of the new object is. If an object ID exists more than once, ambiguous constellations occur and the consistency condition is violated. The database management system can test the uniqueness of the identifier. For that reason, all object IDs must belong to the same database. A qualifier “unique” in the table definition for the field “objectID” prevents double assignment of the same number. A returning error code may be handled by the testing software so that the user receives a log he or she can understand. 37.5.2 Minimum Size The size of an object, i.e. length for line features and area for area features in the real world, can serve as a criterion as to whether an object is digitised. In analogue data sources the operator can only approximately estimate the spatial extents of objects. If too many objects, which do not fulfil this requirement, are gathered, the amount of data increases. Once they are digitised the system can measure the extents of the objects and apply tolerances to find out if they are big enough to maintain in the data set. 37.5.3 Required Attributes Attributes can be divided into optional and mandatory attributes. Optional attributes are those which may be irrelevant for certain objects or whose values cannot be captured due to lack of information. Mandatory attributes are those necessarily required to perform certain applications. The object identifier, for instance, is a mandatory attribute. Another example of a mandatory attribute, for instance in a system used for planning routes of heavy trucks, is the maximum load for bridges. In FRACAS notation this rule is written as shown below. SET1 := {OC1 OC2,…, OCn) ; SET1 requires AT1, AT2,…ATm; From that notation a FRACAS compiler will produce a query like the one given in the following box. It is written in pseudocode, since this chapter will not favour any GIS vendor or query language. select OC1 from data_set where AT1 = NULL or AT2 = NULL or … ATm = NULL union OC2 from data_set where AT1 = NULL or AT2 = NULL or :…ATm = NULL union OCn from data_set where AT1 = NULL or AT2 = NULL or ATm = NULL;
472
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
37.5.4 Valid attribute values Most attribute values only make sense within a certain range. If values outside this range are assigned they are very likely to be wrong. If attribute ranges are regulated within the scope of the conceptual data schema, these values are inconsistent. Testing this range will improve the plausibility of geodata. This, of course, does not guarantee that the correct values are assigned. Attribute type testing is done by the GIS software, since assigning a value of a different type will cause an error at the logical data schema level. The FRACAS formulation for the rule of valid attribute values and their query expression are shown below. SET2 := (OC1, OC2,…, OCn}; SET2 needs_values_for ATi in [lowerlimit1, upperlimit1] or [Iowerlimit2, upperlimit2] or […,…] ; Corresponding query in pseudocode:select OC1 from data_set where: (ATi < lowerlimit1 or ATi > upperlimit1) and (ATi < lowerlimit1 or ATi > upperlimit1) and (…or…) union OC2 from data_set where (ATi < lowerlimit1 or ATi > upperlimit1) and (ATi < lowerlimit1 or ATi > upperlimit1) and (…or . . ) union OCn from data_set where (ATi < lowerlimit1 or ATi > upperlimit1) and (ATi < lowerlimit1 or ATi > upperlimit1) and (...or…) ; The predicate of the query can be derived by using the de Morgan rule several times. In Boolean algebra the de Morgan rule is used for the negation of expressions connected with the logical operators “AND”(α ) or “OR” (α ): (37.1) The rule from the conceptual schema is called: (37.2) Since objects which do not fulfil this condition are searched, the negation is required: (37.3) Replacing the interval expression with “greater than” (>) and “less than” (<) operators yields the SQL statement listed above (lli = lower limiti and uli = upper limiti): (37.4)
37.5.5 Overlapping of objects There are object classes which can only be exclusively assigned to one area. Others are allowed to overlap. For example an industrial site may simultaneously be an urban area, but it cannot be greenland at the same time. Conditions like this are usually defined in the conceptual data schema, or they can be derived with common sense for testing the plausibility of geodata.
ASSESSING THE QUALITY OFGEODATA WITH RESPECT TO THE CONCEPTUAL DATA SCHEMA
473
Implementing test routines for these rules requires that the GIS software supports spatial analysis tools. Otherwise, the programmer has to write his or her own functionalities using coordinates or topology of the objects. The overlap function needs two object classes as arguments. In a list of object classes, which are not allowed to overlap, they are checked in pairs in all combinations, since if object class A does not overlap object class B, and B not with C, object class A may still overlap C. The number of pairs to be checked in a list of n object classes ends up n(n+1)/2, i.e. it increases with the square of the number of object classes. Therefore it is reasonable to have fast algorithms for overlap checking to come up with a good performance. With FRACAS this condition can very easily be written as listed below SETS := {OC1, OC2,…, OCn}; SET3 must_not_overlap; The query to the dataset is more complicated even if it is already simplified as listed below: RESULT[1] = select OC1 rom data_set; RESULT[2] = select OC2 rom data_set; RESULT [n] = select OCn from data_set; overlap( RESULT[1], RESULT[1] ) union overlap( RESULT[1], RESULT[2] ) union overlap (RESULT[1], RESULT[3]) union : union overlap (RESULT[n-1], RESULT[n]); 37.5.6 Forming objects or parts of objects A conceptual schema provides rules on how objects and their subclass instances have to be formed. Where does an object start or end? Under which conditions is an object to divide into several parts? An example of such rules is given in the documentation of the German topographic database “ATKIS” (Authoritative Topographic-Cartographic Information System): 1. A new object has to be built: (a) if objects of different object classes are adjacent; (b) if the name of a feature changes; (c) if the type of geometric representation changes (e.g. from linear to complex); (d) if an important attribute value changes; (e) at a state border; (f) in individual, object-dependent cases. 2. A new part of an object has to be built: (a) if a less important attribute value changes; (b) at topological nodes.
474
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 37.3: Area and Line Features Must not Intersect (State-) Border Lines
An existing data set can be checked for compliance with the given rules. Most of them are trivial because they cannot be violated due to the implementation of the conceptual schema in the GIS (1a, c) or they depend on the logical schema, e.g. at which object class the attribute declaration is situated (1b, d; 2a). If the “NAME” or another important attribute is defined at the base feature level, it is possible for different parts of the same object to be assigned different values. This violates the rules and leads to inconsistency. This condition can be checked by queries. A data set can also be investigated for objects which overlap a (state-)border (corresponds to rule le) and it is possible to search for topological nodes where no new part of an object is built (rule 2b). These rules can be formulated using the overlap functionalities shown in the section “Overlapping of objects” (Figure 37.3). 37.5.7 Topological relations between objects Like features in the real world, digitised objects have a topological relation. Roads run into other roads, rivers fall into other rivers or lakes into the sea. To model this, the geometric representations of the objects in the data set must be connected. It is easy to fail to notice small gaps between them. Therefore, most commercial GIS have tools to detect so-called over- or undershoots and sliver polygons. The parameters for these functions are related to the tolerance or accuracy of the data set and must be found out empirically. There is another kind of relation at the crossing of objects, where no intersection exists. This can be modelled by introducing an object, which serves as a separator. At the point where two roads cross there might, for instance, be a bridge. For further analysis it is recommended to have references to the objects underneath or above this separator. These reference attributes must be consistent with the geometric situation and they can be checked automatically. FRACAS can formulate this condition as follows: SET4 := {OC1, OC2, OC3,…}; SETS := {OCX, OCy,…}; SETS has_reference_on SET4; For example OC1, OC2, OC3 can be road, river, and railroad and OCx, and OCy can be bridge and tunnel. The corresponding query is rather complicated and it depends highly on the GIS software, therefore only the idea of compiling it to an analysis tool is given in pseudocode below: select OCx from data_set where intersect(touches(select OCx from data_set, select OC1 from data_set), touches(select OCx from data_set, select OC2 from data_set)) and
ASSESSING THE QUALITY OFGEODATA WITH RESPECT TO THE CONCEPTUAL DATA SCHEMA
475
(OCx.above != OC1.ID or OCx.underneath != OC2.ID or OCx.ID != OC2.above or OCx.ID != OC1.underneath) and (OCx.above != OC2.ID or OCx.underneath != OC1.ID or OCx.ID != OC1.above or OCx.ID != OC2.underneath); 37.5.8 Other consistency conditions It is possible to formulate an unlimited number of meaningful conditions. Within this chapter a selection of rules are presented. Only the violation of conditions explicitly paraphrased in the conceptual data schema would be called conceptual inconsistency. Any other condition a user might state for his particular demand would be called plausibility condition, since any occurrence of objects which do not fulfil this condition is not against the conceptual data schema, but is still very likely to be wrong with respect to the situation in the real world. Additional conditions can be formulated with the syntax of FRACAS. If not, the set of operations supported by FRACAS has to be supplemented. 37.6 RESULT SETS AND ERROR LOGS The result of any of these queries is a list of objects which do not conform with the conceptual data schema. This list can be used by the user either as log for documenting data errors or for editing the data and correcting the inconsistencies. The log is written to a file with the same name as the data set and with the filename extension “log”. It gives a list of every faulty object with its unique object identifier, the object class name and the type of error or the rule it is violating. That list can be used for complaints to the data producer or to check every occurrence in the data set separately. This can be done more easily if the result set is put into a queue, which can then be revised interactively. The log is an essential part of the data and it must be stored together with the data as metadata. The person performing this test, the date of the check and the version of the test program should be also included in the log. This generates a comprehensive record of edits and changes made to the data. An organisation producing geodata should also make its error detection workflow available to the public. This will increase the reliability of the information gained by analysis of geodata and will promote confidence in those using the data. 37.7 CONCLUSIONS For all types of rules formulated in FRACAS a compiler was written using the UNIX tools “lex” and “yacc”. The target code is meant for the GIS software of INTERGRAPH, MGE Dynamic Analyst (MGDYNAMO). As corresponding conceptual data schema the German project for topographic geodata, ATKIS, is used. Any other quality parameter than consistency can also be tested, if a reference of higher quality is used (e.g. field survey, a more precise data set or a newer data source). The advantages of the method shown in this chapter are: • No secondary geodata is necessary, because the testing refers only to the conceptual data schema.
476
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
• FRACAS can be compiled for analysis tools of different GIS software, this is important as long as no standard for spatial extension of SQL is available. • Users need an easy to understand interface for the formulation of rules. • Users get an error log as quality report. • Users can interactively correct the faulty objects. • Testing geodata during the production process of capturing is not only important for the quality management system, but it also increases the reliability of information deduced from the geographical data base. REFERENCES CEN/TC 287 1994. Geographic information—Description: Quality. Draft document CEN/TC287/WG2 N15 revision 4, August. Paris: Association Française de Normalisation. CEN/TC 287 1995a. Geographic information—Reference model prEN 12009, Brussels: European Committee for Standardization. CEN/TC 287 1995b. Geographic information—Description: Geometry. prEN 12160, Brussels: European Committee for Standardization. CHRISMAN, N.R. 1991. The error component in spatial data, in Maguire, D.J., Goodchild, M. and Rhind, D.W. (Eds.), Geographical Information Systems: Principles and Applications, vol. 1. Harlow: Longman Scientific and Technical, pp. 165–174 GUPTILL, S.C. and MORRISON, J.L. (Eds.) 1995. Elements of Spatial Data Quality. Kidlington, New York, Tokyo: Elsevier Science. ISO. 1992a. INTERNATIONAL STANDARD ORGANISATION/DIS 8402, Quality Management and Quality Assurance Standards—Vocabulary. ISO. 1992b. INTERNATIONAL STANDARD ORGANISATION/DIS 9000 Part 2, Quality Management And Quality Assurance Standards; Generic Guideline for the Application of ISO 9001 . ISO. 992c. INTERNATIONAL STANDARD ORGANISATION/DIS 9004 Part 3, Quality Management and Quality System Elements; Guidelines for Processed Materials. JOOS, G. 1994. Quality aspects of geo-informations, in Proceedings of EGIS/MARI ‘94, Paris, 29 March -1 April. Utrecht: EGIS Foundation, pp. 1147–1153. KAINZ, W. 1995. Logical consistency, in Guptill, S.C. andMorrison, J.L. (Eds.), pp. 109– 138. LANTER, D.P., VEREGIN, H. 1992. A research paradigm for propagating error in layer-based GIS, Photogrammetric Engineering & Remote Sensing, 58(6), pp. 825–833. VEREGIN, H. 1989. A Taxonomy of Error in Spatial Databases, Technical Paper 89–1 Santa Barbara, California: National Center for Geographic Information and Analysis.
Chapter Thirty Eight Data Quality Assessment and Perspectives for Future Spatial-Temporal Analysis Exemplified Through Erosion Studies in Ghana. Anita Folly
38.1 INTRODUCTION Many countries in sub-Saharan Africa are faced with tremendous environmental problems, especially land degradation, which call for immediate attention. One important tool to address these issues is land use planning which has been defined as “a means of helping decision-makers to decide how to use land: by systematically evaluating land and alternative patterns of land use, choosing that use which meets specified goals, and the drawing up of policies and programmes for the use of land” (Dent, 1988, p. 183). When looking at land degradation problems, it is important to be aware of the interdisciplinary approach needed to solve these complex problems (Reenberg, 1996). It is here that geographical information systems (GIS) prove to be powerful for carrying out spatial analysis where for instance land use mapping can be combined with socio-economic information. When using GIS for planning purposes in some developing countries one is, however, faced with a number of problems. First and foremost data are often not available. This includes both maps, census data, etc. If data are available, the quality of these data do not always allow one to carry out the required type of analysis. GIS has the advantage over other methods that information from several data layers can be combined and analysed fast and efficiently linked with extensive databases. The end product is often seemingly reliable maps, but the question is whether the output reflects a “true” analysis. The risk of misinterpretation is considerable when data of mixed origin and of varying quality are linked in a geographical database (Jacobi et al, 1994). Therefore it becomes essential to consider the data quality aspect. Data quality has been defined in many ways. According to Ralphs (1993) it is the fitness of a dataset for use in a particular set of circumstances. With this definition, quality becomes largely application dependent and can be regarded as a question about fitness of use (Dale, 1992; Brassel et al, 1995). What constitutes a high quality dataset for one user therefore may be entirely inappropriate for another (Ralphs, 1993). The most cited element of data quality is “error”, here defined as the difference between reality and our representation of reality (Heuvelink, 1993). The major elements of data quality are lineage, completeness, logical consistency, currency, accessibility and semantic, positional and attribute accuracy (Burrough, 1986; Beard and Mackaness, 1993; Buttenfield, 1993; Jacobi et al., 1994; Morrison, 1995). In this chapter emphasis will be put on attribute accuracy although some aspects of data quality related to lineage and currency will be addressed. Attribute accuracy means a fact about some location, set of locations, or a feature on the surface of the earth. In this way the attribute serves to distinguish one location or set of locations from another, or one feature from another
478
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(Goodchild, 1995). It is not because other elements of data quality are not considered to be important, but information on these elements in Ghana is scanty. In many developing countries, government institutions are not capable of providing new data and updating and improving the existing data material due to financial constraints. When using existing data it is therefore important to evaluate these. One also needs to look at how best one can provide new information at the lowest cost for present and future analysis. Due to the complex nature of problems in the developing world, it is deemed necessary to use models, which can handle a wide spectrum of information, both biophysical and socioeconomic to ensure optimum decision making for sustainable development. This chapter discusses problems of data availability and quality with special emphasis on attribute accuracy and makes suggestions on how best to utilise existing data sources within a GIS framework. The ongoing research project uses a methodology incorporating both bio-physical and socio-economic parameters which can be used for land use mapping in an area in Northern Ghana with a generally high risk of erosion. It is demonstrated how GIS in different ways can be used to improve data quality. It is further anticipated that with other tools such as linear programming decisions on land use planning can be optimised. The issue of data quality is here discussed together with the spatial/temporal modelling because it is considered to be an integrated part of the GIS. 38.2 THE STATE OF GIS IN GHANA The use of GIS in Ghana was introduced in the late 1980s and consisted of a limited number of isolated projects (Allotey and Gyamfi-Aidoo, 1989). In 1991 Ghana’s National Environmental Action Plan (NEAP) recognised the importance of the collection of land resource information and it was identified that inadequate information on land use was a major stumbling block in environmental interventions (EPC, 1991). This was followed by the Ghana Environmental Resources Management Project (GERMP) which aims at strengthening the capacity of national institutions to manage environmental resources using the existing ones. As part of this project national, digital mapping is being undertaken of topography, land use, land suitability, land ownership, climate and non-spatial data including national census as well as information on air and water quality (Allotey and Gyamfi-Aidoo, 1995). Apart from the above mentioned project a number of private initiatives are taking place. At the moment, however, only a limited amount of material is yet available in a digital form. 38.3 THE STUDY AREA The study area (approx. 900 km2) is in the Upper East Region of Ghana, West Africa which is highly affected by soil erosion. The average rainfall in the region is about 1050 mm yearly but is characterised by considerable annual variations in both total amount as well as distribution. Soils are generally shallow, low in inherent soil fertility and weak with a low organic matter content (Quansah, 1990). The area falls within the Guinea savannah zone characterised by a natural vegetation of savannah woodland whereas land use mainly comprises compound farming areas, bush fields, pasture, fallow and natural wood savannah woodland (IFAD, 1990). The proportion of fallow has, however, declined and given way to arable land under permanent or semi-permanent cropping although the broad pattern of vegetation cover has changed very little over the past 20 years (Agyepong and Kufogbe, 1994). The majority of the population is rural
DATA QUALITY ASSESSMENT FOR FUTURE SPATIAL-TEMPORAL ANALYSIS
479
with population densities varying from 37 to 204 inhabitants per km2. This has created an immense pressure on the land and both seasonal and permanent migration is widespread (IFAD, 1990). 38.4 METHODOLOGY AND DESCRIPTION OF THE MODEL Assessment of desertification related issues requires a hierarchical approach where components operate at different spatial/temporal scales or levels within the hierarchy and where different modes of assessment are relevant at different levels within the hierarchy. Four different levels of decision making can be identified ranging from: 1. 2. 3 4.
the field level where the farmer is the decision maker; the local level with detailed data and analysis which can guide specific management interventions; . the regional analysis for general policy oriented assessment; and the continental or global evaluations representing a more general level of assessment (Grunblatt et al., 1992; Reenberg, 1996).
This study can be placed in the third category where decision-makers are presumed to be bodies within the regional administration including the extension service. According to Reenberg (1996), the selection of spatial scale for analysis may influence conclusions concerning important issues such as for example land use changes and the use of conservation measures. The overall methodology used for the land use planning is illustrated in Figure 38.1. Emphasis is put on the erosion risk assessment (ERA) which is a purely bio-physically oriented analysis within GIS (UNIXbased Arc/Info). The analysis is carried out by adding series of overlays thereby creating an erosion risk map. When assessing risk of erosion the most widely used model is the Universal Soil Loss Equation (USLE) developed by Wischmeier and Smith (1978). In the USLE the soil loss A (t ha-1) is determined by multiplying the following factors: • • • • •
R factor representing the erosivity of the precipitation; K factor describing the credibility of the soil; LS factor characterising the length and steepness of the slope; P factor being the cropping support factor; C factor describing the effects of cover and management on erosion.
It has been widely discussed to what extent this empirically derived model is applicable in West Africa where validation of the model is sparse and the ranking of parameters determining it, e.g. credibility, seems to differ from what has been observed elsewhere (Folly, 1995). The model does, on the other hand, incorporate the most important parameters determining erosion risk and provides a guideline for assessment. In this study the erosivity was considered homogeneous because of the relatively small extent of the study area and therefore was not mapped. The cropping support factor dealing with land management had to be left out due to the scale of analysis and the temporal variation (often on a yearly basis). The assessment of the length and steepness of slope, erodibility and the cover and management factor will be discussed in detail below.
480
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 38.1: The overall methodology used for the land use planning.
The output from the erosion risk assessment makes it possible to carry out a temporal analysis and by that to compare the state of degradation with the risk of erosion. In an attempt to incorporate socio-economic parameters, mostly available at a lower scale and often not on a map form, the output from the erosion risk assessment (delimiting areas with different risk of erosion) will be used as an input to a linear programming (LP) module (Optimiser). The LP is a mathematical technique which has been used since the late 1950s in a wide range of planning situations (de Wit et al, 1980; Shakya and Leuschner, 1990). The main purpose of LP is to optimise (either maximise or minimise) an objective function while respecting a set of constraints. Both the functions and the constraints must be formulated as linear equations (Chuvieco, 1993). The decision variables are defined, the first being the allocation of different types of land use, with emphasis on the use of certain conservation measures. This is followed by a statement of the objectives of the land-use planning which could be to increase income (gross margin) per unit area, to secure yield stability, to minimise erosion etc. Secondly the
DATA QUALITY ASSESSMENT FOR FUTURE SPATIAL-TEMPORAL ANALYSIS
481
constraints which reduce the number of feasible land use options in a particular area need to be identified. These could be labour shortage in periods of peak labour demand, a restricted market demand, available land (output from the ERA), etc. (Huizing and Bronsveld, 1994). Results from the LP are used as an input to the GIS to create land use scenarios reflecting different policy goals. The analysis finally ends up being spatial. 38.5 AVAILABILITY OF DATA AND DATA QUALITY ASSESSMENT When looking at the existing spatial data to be used for the three types of information being length and steepness of slope, credibility and land cover, it appears that the most common sources of error are currency, scale and the relevance of data (Table 38.1). Table 38.1: Sources of error associated with existing spatial data such as length and steepness of slope, credibility and the cover management factor (C) Required information
Existing spatial data
Lineage
Scale/ Currency Sources of error
Length and steepness of slope Erodibility
Topographical sheets
Aerial photographs (1960) Soil map (1:250000) Aerial photographs (1960) Aerial photographs (1960)
1:50000/1965
Scale
1:125000 1992
Relevance
1:250000 1969 1:30000/1960 1994/95 1991
Scale
C-factor
FAO soil map
Local soil map Adu (1969) Aerial photographs LandsatTM Land use map
SPOT images
Currency Relevance Scale & Relevance
Regarding the length and steepness of slope, the scale of the topographical maps makes it difficult to assess the length of slope. Using the FAO soil map as an credibility map poses the question of whether the information contained in the soil type is relevant for an credibility assessment. An existing soil map from 1969 which describes land suitability and soil types in detail is at a relatively small scale making it unsuitable for the required land use planning. Moreover, the soil classification on the last mentioned map is based on a local system which is not easily comparable with classifications from other areas. Finally, there is a dilemma with respect to the mapping of the cover and management factor C. This factor cannot be derived directly from the satellite images and therefore has to be assessed by determining land cover. This poses questions of relevance and currency of the available data. If a suitable scale should be satisfied, the available information is outdated but if the requirement of currency should be met, the scale becomes a problem because the local compound farming system is characterised by small, irregularly shaped fields that is not easily detected in the satellite images. One of the biggest problems with socio-economic data, is that they lack currency. With respect to the spatial aspects of socio-economic information a problem of scale arises and most data are not even in map form. The socio-economic information is generally collected at farm and/or village level and the question is how to generalise this information. At the other extreme, information which has been given a spatial dimension such as population density is only available on a district basis and does not show the district variability within.
482
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
38.6 INPUT PARAMETERS TO THE MODEL AND THEIR DATA QUALITY The methods used to evaluate and create spatial databases for the erosion risk assessment will be presented here. Emphasis will be laid on the data quality aspects of credibility and the cover management factor C. A brief outline of the linear programming part will also be given. 38.6.1 Length and steepness of slope In order to obtain information on slope, toposheets 1:50,000 were digitised and processed using the TINmodule, a digital terrain model (DTM) within Arc/Info. Slope length could not reasonably be deducted from the topographical maps because of the scale of analysis (1:50,000) and with contour intervals of 15.2 m it was not possible to detect, for example, local depressions that could accumulate transported sediment. Slope steepness calculated with a DTM may therefore have errors because slope length is used for calculating slope steepness. In this study only slope steepness was used as an input to the ERA, which is considered to be far the most important parameter of the two (Hudson, 1989). The resulting map was validated using field estimates of slope steepness. 38.6.2 Erodibility mapping The concept of credibility and which parameters reflect credibility has been debated extensively. Nevertheless it appears that the parameters affecting credibility of the soil are to some extent site-specific (Bryan et al., 1989; Manrique, 1993; Folly, 1995). Erodibility mapping is often based on existing soil maps where different soil types are assigned an credibility value derived from either plot experiments or from the nomograph used by Wischmeier and Smith (1978) taking into consideration information on soil texture, organic matter content, soil structure and permeability. Roose and Sarrailh (1989) have, on the other hand, questioned whether soil types do reflect differences in credibility. In this study it was therefore tested to what extent an existing soil map classified according to the Revised Legend of the FAO-UNESCO soil map of the world reflects credibility. The general methodology is outlined in Figure 38.2. A ten by ten kilometre grid was laid out covering four soil types—Luvisol/Lixisols, Leptosols, Plinthosols and Vertisols-which are the dominant soil types in the study area. Through systematic sampling topsoil samples were collected, shear strength was measured and soil surface characteristics described with an interval of 1 km (100 samples). Within the delimitations of the four soil types, four smaller grids were laid out with a distance between sample points of 60 m (nine sample points for each grid). In the laboratory the following parameters were determined: aggregate stability by wet sieving, organic matter content, content of iron and aluminium bound with organic matter and the textural composition of the soil. Analysis of variance (ANOVA) was carried out according to Webster and Oliver (1990). The ANOVA test was done on all erodibility parameters. The ANOVA showed a significant difference between the soil types with respect to all the soil parameters except for the indices describing aggregate stability. When ranking the soils with respect to factor K, the most erodible soil type turned out to be Luvisol/Lixisol and Vertisol, followed by Plinthosol and the Leptosol. Kriging was carried out on each of the credibility parameters using the GRID module within Arc/Info. This was done in order to get a visual overview of how the soil properties vary spatially and to be able to
DATA QUALITY ASSESSMENT FOR FUTURE SPATIAL-TEMPORAL ANALYSIS
483
Figure 38.2: Methodology used for assessing the relevance of a soil map as an credibility map.
compare the kriged maps with the FAO soil map. For the two parameters used for the K-factor, organic matter content and texture, it appeared that the distribution of these parameters varied significantly within each soil type. One particular soil type therefore contained areas characterised by a uniform distribution of the two parameters and these areas could be seen overlapping the soil boundaries. The same pattern was discovered for the parameters soil strength, shear strength, iron and aluminium whereas aggregate stability varied more or less randomly over the grid. On this basis one has to be cautious when using existing soil maps as credibility maps because considerable spatial differences occur within the respective soil types. Because of the limited spatial extent of the grid, the kriging exercise could not be used for an credibility mapping of the study area but only as an indication of how the soil map and a future credibility map would differ. This method therefore evaluates the relevance of the existing soil map. 38.6.3 C-factor mapping using a knowledge-based approach Knowledge-based approaches have been found to be powerful means of correcting unreliabilities in geographic databases (Fisher, 1992; Folly et al., 1996). The C-factor mapping in this study is based on the “post-classification” knowledge-based approach as described by Hutchinson (1982) and Bronsveld et al (1994). The overall methodology is outlined in Figure 38.3. The image processing part was carried out using WINCHIPS Geographic Information Research: Trans-Atlantic Perspectives (CopenHagen Image Processing System for WINdows) whereas the post-classification procedure was handled within the Arc/ Info framework.
484
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 38.3: The knowledge-based approach for the C-factor mapping.
For the ERA it is of the utmost importance that land cover classes depict differences with respect to the Cfactor. This is often difficult because many land cover classes have the same reflection characteristics that are influenced by the soil reflectance. In this analysis two Landsat TM images from the dry season (April 1994, bands 3, 4 and 5) and the wet season (June 1995, bands 3, 4, 5 and 7) were classified using the maximum likelihood algorithm. The classification of the dry season image was done using training areas based on knowledge about the area acquired through fieldwork. The classes derived from the classification were broad and resulted in a preliminary land cover map with the following classes: water, urban, forest, degraded, intensively and extensively cultivated land. The differentiation between forest, degraded and cultivated land turned out to be good whereas it proved impossible to identify classes such as fallow, grazing land and different crop types. The satellite image from the wet season was classified using training points (97 observations) representing field observations collected in 100 by 100 m grids. Each of the observations contained information on the cover percentage in general and the ground cover percentage, number of trees, land use and the dominant vegetation species. Excluded from the classification was an irrigated area, which had been
DATA QUALITY ASSESSMENT FOR FUTURE SPATIAL-TEMPORAL ANALYSIS
485
delimited by screen-digitising. The preliminary land cover map resulting from this classification had the classes: water, fallow, forest, grazing, groundnut and sorghum/millet. Because of the relatively dense vegetation cover at this time of the year, the class forest was difficult to distinguish from the other classes. Due to similar reflectance characteristics the classification did not depict the land cover well. The preliminary land cover maps were therefore imported into Arc/Info and a post-classification was carried out including information from an FAO soil map (as mentioned in the credibility section) and the crop calendar from the region. This “re-classification” of the images made it possible to change certain classifications which were obviously wrong and to differentiate partly between grazing areas and agricultural fields as well as pointing out fields grown with sorghum/millet (inter-cropped) alone. The classification-matrix and a verification of the final land cover map are shown in Table 38.2. Table 38.2: Classification algorithm using satellite images from the dry (1994) and the wet (1995) season respectively. Where more than one option is given for a particular combination of the two satellite images it refers to the use of the FAO soil map as follows: Luvisol/AixisoL LeptosoL, Vertisol. Plinthosol and FluvisoL S = sorghura M = millet G = groundnut Below a classification matrix showing the final land cover classification as opposed to the plot observations. April
June 1995 image
1994
Water Irrigated Forest
Grazing
Fallow
Groundnut
Sorgh/mil
Water Urban
Water
Water Irrigated
Water Forest
Water Grazing
Degrad.
Water
Irrigated
Degrad.
Water S/M/G Grazing Degrad.
Forest Intensive
Water Water
Irrigated Irrigated
Forest S/M/G
Forest Grazing
Forest Groundnut
Water Urban Groundnut Groundnut Grazing Forest S/M/G Grazing Groundnut Grazing S/M/G
Water Urban S/M Groundnut S/M Forest Groundnut S/M
Grazing
Groundnut Grazing G/M/S G/M/S Forest
Groundnut S/M/G
Forest
Graz.
Grou.
S/M/G
S/M
5 1
1 7 1
2 3
1 1 3 6
5
Exten sive
Water
Irrigated
Final Land Cover Map Plots Water Irrig. Water 4 Irrig 2 Urban Degr Forest Graz Grou
Forest
Urban
Groundnut S/M/G S/M/G S/M/G Grazing
Degr.
Grazing
3 5
486
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
April
June 1995 image
1994
Water Irrigated Forest
S/M/G S/M
Grazing 1 1
Fallow
Groundnut
Sorgh/mil
3
5 1
20 8
The final land cover map is comprised of the classes: irrigated land, urban, degraded land, forest, grazing, groundnut, sorghum/millet and sorghum/millet/groundnut. From the verification of the final land cover map using another set of plot observations it appears that areas with grazing often get classified as sorghum/ millet/groundnut. This is because of the similar reflection characteristics which cannot fully be compensated for by using the information in the soil map. The class grazing is, however, well identified and thereby reduces the area on which land use planning should be focused because the C-factor for grazing areas is very low. Fields with only sorghum and millet are well demarcated whereas groundnut fields were difficult to identify. Finally, many fields could only be classified as a mixed class (sorghum/millet/groundnut). The problems of differentiating between the various classes is partly due to the rainfall conditions in 1995 (the rainfall delayed) subjecting the plants to immense water stress. This method solves the problems of currency and relevance whereas scale is still a problem because of the pixel size on the satellite images making it difficult to identify individual fields (therefore mixels are registered). In those areas where rice farming is widespread, as it is the case outside the study area, the additional use of a slope map is considered to be a way of further improving the classification. 38.6.4 Optimising land use planning—different land use scenarios In the linear programming three decision-makers with conflicting goals have been identified: the farmers, a NGO/soil conservationist, and a government institution. Non-spatial data collected through questionnaires and interviews will be used as input. This will solve the problem of currency and scale related to the socioeconomic data. The output, together with a GIS analysis, using among other things distance to road, will identify specific map locations most suitable for using conservation measures (different land use scenarios). 38.7 CONCLUDING REMARKS In this chapter data quality issues related to soil erosion studies in northern Ghana were addressed with special emphasis on attribute accuracy. The most common sources of error for both the bio-physical and socio-economic data are currency, scale and the relevance of data. For the two input parameters used in the USLE (credibility and the cover and management factor) GIS was used to assess the data quality. Kriged maps for various erodibility parameters were compared with an existing FAO soil map showing that although soil types differ significantly, the spatial distribution of the erodibility parameters cut across soil boundaries. For the land cover mapping used for a C-factor map, a knowledge-based classification incorporating additional information made it possible to differentiate partly between grazing areas and areas dominated by particular crops. Finally, it was discussed how data can be used in a future temporal-spatial analysis for optimising land use planning.
DATA QUALITY ASSESSMENT FOR FUTURE SPATIAL-TEMPORAL ANALYSIS
487
REFERENCES AGYEPONG, G.T. and KUFOGBE, S.K. 1994. Assessing the Spatial Patterns of Land Cover and Land Degradation in the Upper East Region of Ghana Using Satellite Images. Paper presented at The Inter-University Seminar Workshop On Social Science Research, University of Ghana, Legon, 25–29 July 1994. ALLOTEY, J.A. and GYAMFI-AIDOO J. 1989. Potentials of remote sensing and geographic information systems for environmental management applications in Ghana, in Proceedings of the National Seminar/Workshop on Remote Sensing and Geographical Information Systems: Remote Sensing in Ghana, Accra, 20–22 June 1989, pp. 26–28. ALLOTEY, J.A. and GYAMFI-AIDOO J. 1995. Environmental information networking in Ghana, in Proceedings of AFRICAGIS’95, Abidjan, 6–10 March. Geneva: OSS-UNITAR, pp. 623–630. BEARD, K. and MACKANESS W. 1993. Visual access to data quality in geographic information systems, Cartographica, 30(2,3), pp. 37–45. BRASSEL, K., BUCHER, F., STEPHAN, E. and VCKOVSKI, A. 1995. Completeness, in Guptill, S.C. and Morrison, J.L. (Eds.), Elements of Spatial Data Quality. Oxford: Elsevier Science, pp. 81–108. BRONSVELD, K., CHUTIRATTANAPAN, S., PATTANAKAKOLE, B., SUWANWERAKAMTAM, R and TRAKOLDIT, P. 1994. The use of local knowledge in land use/land cover mapping from satellite images, ITC Journal, 4, pp. 349–358. BRYAN, R.B., COVERS, G. and POESEN, J. 1989. The concept of soil credibility and some problems of assessment and application, CATENA, 16, pp. 393–412. BURROUGH, P.A. 1986. Principles of Geographical Information Systems for Land Resources Assessment. Monographs on Soil and Resources Survey, 12. Oxford: Clarendon Press. BUTTENFIELD, B.P. 1993. Representing data quality, Cartographica, 30 (2,3), pp. 1–7, CHUVIECO, E. 1993. Integration of linear programming and GIS for land-use modelling, InternationalJournal of Geographical Information Systems, 7(1), pp. 71–83. DALE, P. 1992. Data quality, Official Publication—OEEPE (European Organization for Experimental Photogrammetric Research), 28, pp. 125–132. DENT, D. 1988. Guidelines for Land use Planning. Fifth draft. Rome: FAO. DE WIT, C.T., DE VAN KEULEN, H., SELIGMAN, N.G. and SPHARIM, I. 1980. Application of interactive multiple goal programming techniques for analysis and planning of regional agricultural development, Agricultural Systems, 26, pp. 211–230. EPC 1991. Ghana Environmental Action Plan (1). Ghana: Environmental Protection Council. FISHER, P.F. 1992. Knowledge-based approaches to determining and correcting areas of unreliability in geographic databases, in Goodchild, M. and Gopal, S. (Eds.), Accuracy of Spatial Databases. London: Taylor & Francis, pp. 45–54. FOLLY, A. 1995. Estimation of erodibility in the savanna ecosystem, northern Ghana, Communications in Soil Science and Plant Analysis, 25(5,6), pp. 799–812, FOLLY, A., BRONSVELD, M.C. and CLAVAUX, M. 1996. A knowledge-based approach for C- factor mapping in Spain using Landsat TM and GIS, InternationalJournal of Remote Sensing, 17(12), pp. 2401–2415. GOODCHILD, M.F. 1995. Attribute accuracy, in Guptill, S.C. and Morrison, J.L. (Eds.), Elements of Spatial Data Quality. Oxford: Elsevier Science, pp. 59–80. GRUNBLATT, I, OTTICHILO, W.K. and SINANGE, RK. 1992. A GIS approach to desertification assessment and mapping, Journal of Arid Environments, 23, pp. 81–102. HEUVELINK, G.B.M. 1993. Error Propagation In Quantitative Spatial Modelling-Applications In Geographical Information Systems. Utrect: Netherlands Geographical Studies. HUDSON, N. 1989. Soil Conservation. Batsford, England. HUIZING, H. and BRONSVELD, K. 1994. Interactive multiple-goal analysis for land use planning , ITC Journal, 4, pp. 366–373. HUTCHINSON, C.F. 1982. Techniques for combining Landsat and ancillary data for digital classification improvement, Photogrammetric Engineering and Remote Sensing, 48, pp. 123–130.
488
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
IFAD 1990. Upper East Region Land Conservation and Smallholder Rehabilitation Project. Appraisal Report. Vol. II Working Papers. Africa Division. Project Management Department, Report No. 0244-GH, November 1990. Accra, Ghana: IFAD. JACOBI, O., BRANDE-LAURIDSEN, O. and BUCK 1994. Datakvalitet og ajourføring, in Balstrøm, T., Jacobi, O. and Sørensen, E.M. (Eds.), GIS I Danmark. Denmark. Teknisk Forlag,pp. 175–183. MANRIQUE, L.A. 1993. Technology for soil erosion assessment in the tropics: a review, Communications in Soil Science and Plant Analysis, 24(9,10), pp. 1033–1064. MORRISON, J.L. 1995. Spatial data quality, in Guptill, S.C. and Morrison, J.L. (Eds.), Elements of Spatial Data Quality. Oxford: Elsevier Science, pp. 1–12. QUANSAH, C. 1990. Soil erosion and conservation in the northern and upper regions of Ghana, Topics in Applied Resource Management, 2, pp. 135–157. RALPHS, M. 1993. Data quality issues and GIS—a discussion, Mapping Awareness and GIS in Europe, 7(7), pp. 39–41. REENBERG, A. 1996. A hierarchical approach to land use and sustainable agricultural systems in the Sahel, Quarterly Journal of International Agriculture, 35(1), pp. 63–77. ROOSE, E. and SARRAILH J. 1989. Erodibilite de quelques sols tropiceaux. Vingt annees de mesure en parcelles d’erosion sous pluies naturelles, Cahiers, ORSTOM, Serie-Pedologie, XXV(1 and 2), pp. 7–30. SHAKYA, K.M. and LEUSCHNER, W.A. 1990. A multiple objective land use planning model for Nepalese hills farms, Agricultural Systems, 34, pp. 133–149. WEBSTER, R. and OLIVER, M.A. 1990. Statistical Methods in Soil and Land Resource Survey. Oxford: Oxford University Press. WISCHMEIER, W.H. and SMITH, D.D. 1978. Predicting Rainfall Erosion Losses-A Guide To Conservation Planning, Agriculture Handbook No. 537. Washington DC: US Department of Agriculture.
Part Six VISUALISATION AND INTERFACES
Chapter Thirty Nine Visual Reasoning: the Link Between User Interfaces and Map Visualisation for Geographic Information Systems Keith Clarke
39.1 INTRODUCTION Two topics of heightened interest to geographic information scientists as GIS software reaches maturity are scientific visualisation and computer user interfaces. These areas of software development, with origins in computer graphics and in software engineering respectively, have now begun to merge their functions within GIS queries and displays. At first sight, they appear to occupy opposite ends of the spectrum of human-computer interaction. In the former case, a new branch of scientific inquiry has evolved in which the scientist views and interacts with the data in all its complexity directly, rather than with abstractions, generalisations or statistical descriptions (Freidhoff and Benzon, 1989; Keller and Keller, 1993). In the latter, the direct visualisation of data has required major improvements in the flexibility of the tools available for active manipulation of the visualisations themselves. This interactive capability is an assumed capability of the typical visualisation system and increasingly in the computer mapping and GIS environment (Peterson, 1995). Visualisation and user interfaces for GIS have significant common ground in that both depend upon visual reasoning, a particularly spatial human cognitive capability that lends GIS its strength of approach. This discussion examines the components of visualisation systems and user interfaces that are indispensable to the next generation of GIS software, and covers some of the more innovative couplings of the two elements in existing software systems. Examination of the existing definitions of both visualisation and user interfaces in particular shows that GIS and cartographic research have seen visualisation in perhaps too broad a scientific and human cognitive sense, and less in a way that would benefit the software engineering approach to system design. A narrower definition of the visual information flow during GIS use is presented, that borrows and builds upon the phenomenon of visual reasoning. This approach is then used to examine some anticipated elements of future GIS, and to speculate on some future visual data transfer mechanisms of interest to geographic information science. In addition, a test for the existence of visual reasoning is suggested, so that computer modelling can move forward by simplifying the visual understanding process. Comprehension of the entire process is probably beyond our capability at this time. The concept of visual reasoning can be used to build possible unified approaches to the study of GIS interactions, and an initial definition of visualisation is presented based on a binary model of visual data processing. Binary visual processing simply detects whether information flows take place at all, rather than attempting to quantify or examine human cognitive aspects of the information content and meaning. In short, computer simulation modelling of information flow and information use as an automated process
VISUAL REASONING FOR GIS
491
circumvents the problems associated with the quantification of individual and collective human cognition, learning, and ability. 39.2 SCIENTIFIC VISUALISATION User interface design and software for scientific visualisation have evolved largely through software engineering and developments in computer graphics hardware and software. GIS’s visual information transfer has evolved from the discipline of cartography, and most recent scholarly work has been within the cartographic design and GIS literature (Hearnshaw and Unwin, 1994; MacEachren and Taylor, 1994; Raper, 1989). In spite of the developments in cartographic visualisation, there remains a fundamental difference between computer graphics and cartographic symbolisation. The accepted methods of cartographic representation have only just recently begun to incorporate issues long dealt with in computer graphics such as three and four dimensional visualisation, representation of complex multivariate and dynamic phenomena, and the representation of ambiguous and missing information. While cartography and GIS are relative newcomers to the human-machine interface, computer science has a long tradition, based in the needs of specific disciplines such as aeronautics, space flight, medical imaging, CAD/CAM, and military systems. Given this heritage, it is natural to ask the following questions. What is different about the human-machine interface in a GIS setting? Can the difference be used to assist in the definitional aspects of both visualisation systems and user interfaces? Is there a difference between “Geographic” visualisation and the GIS user interface and those of other types of systems? If GIS is indeed different, can its uniquely spatial viewpoint expand the knowledge base of these two fields? These questions remain as rich opportunities for future research. A first step in the direction they imply is taken in this review and discussion. In general, it is possible to characterise the GIS experience from a human-machine interaction viewpoint as a history of overselling. Most GIS have rather simplistic user interfaces given the level of sophistication they offer the analyst. And while most GIS can make quite pretty maps, few if any allow the use of even the most basic elements of visualisation systems, and none offer help to the novice user in this area. On the other hand, developments in computer graphics and extensions to the hypertext transfer protocol for the World Wide Web such as the virtual reality modelling language are already close to standardising the user interface for three dimensional interaction with virtual solid objects. Globus and Raible (1995) raise many of the same points that cartographers have brought up over the years. Thoughtless use of simplification and generalisation methods, choosing a convenient view angle or disguising errors, and the frequent use of representational methods and symbolisation techniques in an inappropriate manner are the very basis of effective misuse of maps (Monmonier, 1991). An understanding of map mistakes is obviously an important component of an interactive map design process. By continuously improving map effectiveness, the design loop can be used to ensure map efficiency (Clarke, 1995; Dent, 1993). Clearly, visual reasoning is a driving force behind the design loop, as not just errors in visual design end execution are corrected, but the map moves towards a design that “looks right”. Incremental improvement of map effectiveness through map design implies that the cartographic interaction between map and map user is a quantifiable process. If so, how do we measure a map’s visual effectiveness? The subjective nature of the process of interaction makes measurement complex and multivariate in and of itself, depending necessarily upon the individual learning levels of the information recipients and their cartographic experience. A rational approach to this sort of quantification would be to
492
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
start simple. Binary detection of information flows and computer modelling of flows is proposed rather than seeking the “quantity of information” as an initial measure. Building an approach to the understanding of both user interfaces and visualisation within GIS supposes not only the existence of visual reasoning, that is reasoning almost exclusively by seeing and thinking alone, but also that elements of this reasoning process be accessible to the scientific method. The study of such thinking might be called analytical visual reasoning. Of the means of communication, verbal, written, symbolic and visual, visual reasoning resembles most closely the symbolic reasoning of abstract mathematics. It seems ironic, therefore, that scientific convention requires its intellectual investigation using written text and spoken conference presentations, although at least in the latter it is possible to show graphics and demonstrate interactions. This discussion of visual reasoning first examines the relationship between cartography, GIS and scientific visualisation. It is then extended to user interfaces, and some opportunities for more GIS contributions to theory and practice are pursued. 39.3 VISUALISATION FOR CARTOGRAPHY AND GIS While it can clearly be argued that virtually all of cartography is scientific visualisation, the opposite is certainly not the case. A review of Keller and Keller’s extensive survey of visualisation applications shows that only perhaps a third of them are cartographic (Keller and Keller, 1993). While definitions of the map have been proposed that are galactic in scope (e.g. Hall, 1991), most academic cartographers limit maps to a display medium with only two dimensions. In a few cases, stereo displays, plastic and plaster models, globes and holograms have given a truly three-dimensional view in mapping. More usually, when concerned with a four-dimensional representation we use multiple displays or animation; with threedimensions we use map projection, upper surface only or field display, or a projected view such as a fish net or realistic perspective. Buckley (Chapter 40, this volume) gives a broad characterisation of the types of maps used within visualisations. Some examples are shown in Figure 39.1, and include multiple views, multidimensional symbols, flow vectors, streamers and cross sections with transparency. Tools for generating visualisations have been reviewed and compared by Belien and Leenders (1996) and in MacEachren and Taylor (1994). A typology of representational methods is contained in an appendix in Keller and Keller (1993). Maps are also generalisations of the mapped world in that they replace objects with symbols correctly located in space at a reduced cartographic scale. For multivariate data, cartography has traditionally used statistical generalisation (e.g., image maps of principal components of multispectral data) or intricate symbols at points, such as icons, Chernov faces and recently, morphed pictorial faces (Figure 39.2). No existing GIS or computer mapping package gives the user any of the basic, let alone the more advanced visualisation capabilities. Among the first cartographers to examine scientific visualisation from a GIS perspective was DiBiase (MacEachren and Taylor, 1994). DiBiase proposed a framework for visualisation, in fact a whole new subfield of visualisation systems termed geographic visualisation information systems (GVIS). It is easy to make the association between GIS and scientific visualisation systems such as Data Explorer, yet in reality no existing GIS incorporates scientific visualisation capability. Even basic three dimensional methods such as perspective rendering, have only recently migrated to a few of the GIS packages. Nevertheless, GVIS includes all aspects of science from data exploration to hypothesis formulation to final results, indeed a scope as broad as Goodchild’s (1992) geographic information science itself. In reality,
VISUAL REASONING FOR GIS
493
Figure 39.1: Some Methods Used in Cartographic Scientific Visualisation. Top left: perspective view with illumination (Source: NASA Goddard). Top right: multiple views (Source: EPA). Below left: field colours and animation (Source: EPA). Below right: translucent and opaque perspective views (Source: NASA Goddard).
scientific visualisation capabilities can only ever comprise a definite subset of GIS functionality, to be placed firmly in the output stage of GIS functions. DiBiase, in collaboration with MacEachren, has proposed that maps foster private thinking and facilitate public visual communication of results. That is, the map serves both the role of a device to think about spatial data by a non-communicating individual and the means by which to communicate the same thought to an audience (Figure 39.3). DiBiase saw GVIS as a “renewed way of looking at one application of cartography (as a research tool)”. This approach would imply agreement with Steven Hall’s implied definition of the map. In Mapping the Next Millennium, Hall (1991) included virtually all of scientific visualisation, geophysics, remote sensing, cartography, microbiology, physics and astronomy dealing with space as “mapping”. Taylor has taken a somewhat more selective definition of scientific visualisation with GIS, and has recognised the critical historical link to computer graphics. Taylor defined scientific visualisation as “a field of computer graphics that attempts to address both analytical and communication issues of visual representation” (MacEachren and Taylor, 1994). Taylor’s definition, which reasonably assumes that GIS visualisation is a subset of all scientific visualisation, reflects an old division within cartography. Cartography always has balanced science and art. These polar opposites or schools of cartography have become known as analytical cartography and design cartography. Analytical cartography (Clarke, 1995; Moellering, 1991; Tobler, 1979) considers the necessary data, computation, algorithms, modelling, and analysis that go with the process of visual reasoning. Communication and design cartography consider colour, balance, communications effectiveness, design and aesthetics as the direct components of visual reasoning, and as invokers of a response from the human visual system (Dent, 1993; MacEachren, 1994; Peterson, 1995). Both parts of cartography use scientific methods and apply data to generate solutions. While many have espoused one extreme or the other, it is reasonable to ask if there can ever be a “unified” approach. From
494
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 39.2: Morphed Face Images as an Election Map Visualisation
Figure 39.3: DiBiase’s conceptual model of visualisation as a tool for scientific research (Source DiBiase, 1994)
the cartographic theory perspective, such a cartographic paradigm could be called a cartographic theory of everything, like the grand unification theory of Physics. Can a unified approach to cartography, within the context of GIS, generate knowledge of use in interpreting the relatively new areas of user interface and visualisation studies? Such an approach would cover the process, the form and the response of visual reasoning. Computer graphics defines scientific visualisation as the “use of the human visual processing system assisted by computer graphics, as a means for the direct analysis and interpretation of information” (Freidhoff and Benzon, 1989). Scientific visualisation seeks to use the processing power of the human mind, coupled
VISUAL REASONING FOR GIS
495
with the imaging and display capabilities of sophisticated computer graphics systems, to seek out empirical patterns and relationships visible in data but beyond the powers of detection using standard statistical and descriptive methods (Clarke, 1997). The advantage of this more succinct definition is that it suggests a method to determine when visual reasoning is being used. Visual reasoning must be in operation when a spatial pattern can be detected in spite of the fact that statistical testing and non-graphic communication have been prevented. 39.4 SOME RESEARCH ISSUES Placing scientific visualisation in GIS within the information theoretic approach has some advantages. While in the past the paradigm proved to be of less value in cartography than in communications theory, it does permit some interesting issues and themes to be raised. For example, how does information content move through both the software visualisation system and the human visual system? How does learning (visual and non-visual) influence the human visualisation system? Is visual learning analogous to storing many images or part of images and rapidly matching them against the current input for pattern detection? Can analytical methods trace the information flow in scientific visualisation? What tools, methods, and measurements are appropriate? If these questions have answers, then it may be possible to apply the analytical cartographic method to visual human reasoning, a sort of analytical visualisation. Cartography and GIS are good testing grounds for studying human reasoning because formalisation of mapping methods is quite complete. Work by Bertin (1983), Robinson and Petchenik (1976) and Imhof (1982), for example, has already placed into context work that Tufte (1983, 1990) has since expanded to scientific information display. In addition, the scope for testing information flow within the map to mapuser data transformation is broad, with some pioneering work already in place in cartography and other disciplines such as human cognition and psychology. Science questions that could arise out of the study of cartographic or GIS visual reasoning are intriguing. For example, are some people better at visual map reasoning than others? If so, then different types of visual display may suit different people, implying that customisable displays, which fully integrate visualisation with the user interface, are desirable. Are males better at visual reasoning than females, as some psychological work has suggested? Is visual information processing performed in parallel? Is there value in studying why some people are visually illiterate, since in these people the visual system works but the visual reasoning process does not. How do visual and non-visual reasoning compare? Are they compatible? Are they sometimes or always used together? Gleick’s biography of the Nobel-prize winning physicist Richard Feynman notes that at least one of his major accomplishments involved generating a new method of visualising in quantum mechanics that led to a significant mathematical breakthrough (Gleick, 1992). Given this clear link between visualisation and scientific thinking, there is a very strong possibility that improvements in our understanding of visual reasoning will have payoffs far beyond GIS or even geography. 39.5 DETECTION BEFORE MEASUREMENT While the distant research questions and issues of analytical reasoning hold an exciting promise, more practical constraints exist in demonstrating the approach. An appropriate starting point is not to measure information flows but simply to ask, does a flow take place at all? This might be called a binary analysis of
496
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
visual information. Information theory defines information as the part of a message placed there by the sender and not known by the receiver. The traditional approach to data is that data contains information, that is, there is some distillation of data that is the critical subset that constitutes information. The simple information theory definition of information can be interpreted to have two corollaries. First, that a single data item in a data set may be all of the information if the remainder is already known to the receiver, what might be called the “today’s news” information component. Second, that information must be a part of the message. The latter is complex. For example, if a message is expected, and none arrives, has information content been passed? The definition implies no. Yet if learning has taken place, a non-message can pass information. Learning, then, could be defined as the expectation of information with a particular structure detected from prior messages. A visual reasoning example is informative. At the National Severe Storms Center, skilled weather interpreters monitor satellite images of clouds and other atmospheric data hoping to detect extreme conditions such as hurricanes and tornadoes, with the goal of warning the public of danger. A particular, though complex, set of circumstances must exist for such a weather system to develop, and the interpreters must be using visual reasoning to scan the large amount of incoming data for known and expected patterns. A storm develops and an interpreter calls a storm warning on the basis of a single radar image. Detection has taken place, and so the radar image must contain all of the information necessary to detect the storm. Nevertheless, a layman would not have detected the event, because no learned expected image base was present in their visual reasoning system. But how is it possible to prove that the reasoning is purely visual? Can the skilled interpreter make the detection without consulting written reports, numbers, and without speaking or hearing? Learning has always been a central component in map reading and in photo interpretation. Learning by visual reasoning must obviously involve storing images or parts of images with expected patterns. If this is the case, if we stored the radar image of every known severe storm as an image on the Internet, is this memory, and by implication learning? With an automated image matching system, could the layman have detected a storm? The immensity of these issues requires a fundamental definition of graphic data. Yet in GIS, data can have a huge number of formats, structures, models, and probably meanings. A fundamental definition must first reduce all GIS data to a sequential bit stream. If information is the part of a message placed there by the sender and not known by the receiver, then a graphic message must be defined at the most basic level as a bit stream of known length and structure that contains all or part of a map. Metadata has to be included as part of a map in this case, just as sources and attributions must be text elements of a printed map. Two different characteristics of the map message are content and amount. Amount is the numbers of bits, content the bits themselves. If the map is a compound object, say many vectors rather than a raster image, then storage is necessary to “see” the map in entirety. Disk storage for a raster image might be considered essential for visual display and reasoning, just as memory and learning were necessary in the example above. A diagrammatic simplification of the flow is shown in Figure 39.4. The binary nature of information flow can then be used to define visualisation. If information flows within the message transmission, the message information content is non-zero (visual information has been communicated) and if pattern detection is not possible using non-visual methods, then the process is visualisation. A natural extension of a binary flow to two dimensions is possible if we simply unravel the image, or allow packets of binary arrays (each the size of one image) to flow. Map visualisation is the successful transmission of map data so that detection of pattern in the mind of the interpreter without non-graphic information results. The television, the World Wide Web, and the fax machine are excellent examples of transmission mechanisms for visual information. Visual reasoning takes
VISUAL REASONING FOR GIS
497
Figure 39.4: Information Flow as a Binary Sequence
place, however, when they download an image in which the receiver detects an expected (not an unexpected) spatial pattern. GIS scientific visualisation consists then of the entire system, message, transmission and reception systems, and the reasoning, memory, and learning of the receiver. An example of such a system is the GIS hardware, software, the user and all the user’s prior GIS knowledge. Most, or even all, of these systems components can be modelled, and therefore the information flow should be able to be quantified. Nevertheless, the simplest form of model deals only with the binary image information flow and its detection (Figure 39.5). Critical here is the computed match between images. Obviously exact matches will be rare, and a fuzzy definition of a match may be more desirable. Finding the best match algorithms for binary raster images seems comparatively simple. Those for vector maps, and for matching maps at different scales and with varying spatial extents will be more complicated. A binary view of the message transmission shows that two elements must act in concert for scientific visualisation to be effective. First, the nature of the information flow must be exclusively visual for detection. Second, the transmission mechanism, including the hardware and software must act with the cognition system enhanced by visual learning as a distinctive way of thinking, that is visual reasoning. Figure 39.6 shows three different cognitive “triggers” to communicate the detection of a cat: one text, one symbolic and one purely visual. Maps share elements of the latter two, and involve far more complex visual reasoning than this example. In all three cases, detection of the information is possible, so the information content is identical using the binary model. The advantage of binary information models is that computer simulations of the human task are comparatively simple to build and test, and may therefore shed light on the real problems of understanding visualisation. For example, suppose a computer program recognised binary images of lines and was programmed to detect pixels that had at least three non-sequential eight cell neighbours that were also lines. This algorithm could create a new image showing the location of these features (line intersections), and store it. Given another image, the program could respond with “Six line intersections detected at the same locations as the previous image”. If the program took the form of a black box, was fed only images and was able to provide information of use to a sightless or blindfolded user, then the black box would be
498
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 39.5: Binary Visual Reasoning by Detection
Figure 39.6: Three Visual Representations of a Cat: a picture, an icon, and a word All use visual detection, but only the picture and the icon use visual reasoning. A map lies on a continuum between the icon and the picture.
performing visual reasoning by itself. Other examples could be detecting specific parts of a picture (a cat), a configuration of features on a map, a type of drainage system by its shape, and so forth. Of interest, however, is the speed with which the process of detection is enacted, a process probably measurable. Cartographic testing tasks have normally involved only retention of information quantity or specific content. Only occasionally have reasoning tasks been compared for efficiency by purely graphic testing mechanisms. A map testing equivalent could compare assigned tasks under different route finding methods, perhaps written or spoken turn directions versus sketch maps and road maps in an automated virtual system. Determining how the different mechanisms for reasoning are used together is a more difficult task, but could involve combinations of the tests above, with mixed information sources. Quite obviously, the use of formal tools can enhance visual reasoning since they can detect and show relationships. Many experiments in cartographic systems have combined visual and statistical representations, regression lines and statistics with their spatial distributions in an interactive “brushing” environment, (e.g. Monmonier, 1990). As the information transaction takes place from sender to receiver, it is likely that building more and more complex messages and transmission mechanisms, with higher bandwidth, can overload or confuse the receiver’s
VISUAL REASONING FOR GIS
499
cognition systems quite easily. This level can also probably be determined easily by experiment. Nevertheless, the information quantity and saturation level can probably be increased by using feedback, allowing the message transmission mechanism to become two way, so that information content and quantity can be adjusted so that detection can take place, and visual reasoning take over. This is the benefit of linking user interfaces directly to scientific visualisation systems in a geographic context. In cartography, the well-documented design loop is a feedback system between the software system and the cognition system. Most cartographic design packages, though very few GIS packages, allow this interactive process to make visualisation more effective. Design loops are slowly being expanded beyond those which simply move and resize cartographic elements, or change classes, shades, and colours, to include interactive animations, fly-throughs, and fly-bys. As in the normal sequence of software progression, these tasks have been highly batch-oriented in the past. New user interfaces are necessary to make such processing possible, for example three-dimensional cursors, new and expanded metaphors, and even new control and display hardware such as stereo displays, headsets and data gloves. 39.6 VISUALISATION AND THE FUTURE OF GIS The power of scientific visualisation is the ability to model complex, very large data sets and to seek the inherent inter-data relationships by visual processing alone, or with the assistance of standard empirical and modelling methods. Visualisation for GIS demands spatial data, implying the usual control over scale, resolution, edge-matching, and cartographic symbolisation. GIS and scientific visualisation software have so far functioned as stand alone elements with data sharing by common file formats. GIS should move towards a full integration of a selected set of tools and techniques of scientific visualisation, and has much to gain by doing this. Research to accomplish this merger needs both perceptual and analytical approaches. The merger will be only partial, because there are some major differences between scientific visualisation systems and GISs. The major differences are: 1. The complexity of the information in the database (not the amount); 2. The use of visual reasoning versus reasoning about spatial phenomenon with software tools (better included within geographic information science); 3. Visualisation inherently includes perceived motion and the third dimension; 4. Visualisation data need not be about geographic space or scales; and 5. Visualisation systems need not support an attribute data base. The possibilities of animated and interactive cartography are remarkable, and will strongly influence the future of GIS, especially as the computing power and tools necessary for animation become cheaper and more widespread. Animation has a particularly powerful role to play in showing time sequences in GIS applications. It is likely that animation will be one of the first parts of scientific visualisation systems to find their way into a next generation GIS. Animation also raises some interesting problems for binary visual tests. Slow motion and speeded action have strong parallels to different scaled insets on static maps. The multiple views method in cartography, illustrated by Buckley (Chapter 40, this volume) implies or at least gives the impression of time as discrete images. Time itself is discrete only as far as the visual reasoning system is concerned. The eye can be fooled into seeing continuous motion with images displayed at durations of one-thirtieth, but information is obviously taken in much faster, at very high rates indeed. Can time gaps be interpolated in the same
500
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
averaging method used for spatial variation? And should geographic information scientists care? These are significant issues to be addressed as visualisation meets GIS. 39.7 USER INTERFACES FOR CARTOGRAPHY AND GIS The user interface of a GIS is a critical point, where the human geographic analyst meets the hardware and software components of the GIS in a two way transmission. From a pure computing and information science perspective, a user interface is the physical means of communication between a human and software program or operating system. At the most basic, this is the exchange of statements in English or another textual language or in a program-like set of commands. From the GIS viewpoint, the interaction involves textual, numerical, statistically descriptive, and most important of all, graphic communication. The history of computer software interfaces is somewhat parallel to the history of GIS user interfaces. Initially, computers lacked effective user interfaces entirely, and programs were coded or hard-wired into the system. These programs remain today as part of the interaction process embedded as firmware in computers and peripherals. For interactive use, computer interfaces evolved slowly through command-line interactions, to hard-coded simple graphic menus, to the now-ubiquitous WIMP (windows, icons, menus and pointers) graphical user interfaces. The advent of object-oriented software tools, and their use to define the principal elements of the WIMP interface as objects has greatly expanded the functionality and capability of generic and GIS user interfaces. Some of the consequences are discussed at more length in Clarke (1997), and have included the increase of high end systems power, increased use of inexpensive but high power microcomputers, and a remarkable degree of mobility as mobile communications technology, the global positioning systems, and mass storage technologies have become commonplace. Future graphical user interfaces for GIS will challenge the ubiquitous desktop metaphor for interaction. In the desktop, the metaphor of the desk, with a roll file, a telephone, a calendar, an in- and out-tray and so on, is used to control file, data and computational operations. This metaphor has some obvious limitations for GIS routines, not the least being the complexity of the display and respond process. In addition, there is a need to support multiple views, networks, sound, animation, and so on as systems move towards multimedia and the Internet. Two interesting glimpses of future systems are offered by Magic Cap, the pen-based GUI used in personal digital assistant computers and by the Geographer’s Desktop. Magic Cap broadens the desktop metaphor to allow some movement away from the desk. In Egenhofer’s geographer’s desktop (Egenhofer and Richards, 1993), the metaphor of the map layer as a mylar separate, complete with light tables for viewing, has been expanded to include simple analytical operations. These examples are discussed in Clarke (1997), Some sophisticated user interface innovations have evolved directly from scientific visualisation packages. The Khoros package’s Cantata visual icon-based operation control system has much in common with IBM’s Data Explorer and ERDAS Imagine’s GIS Modeler. The concept is that the processing flow within an operation that may have been described as a sequential macro in a command line environment, can be coded visually within a simple diagramming sketch that links data, process and display by linking icons symbolically. These interfaces could easily be programmed with a “binary detection” operator. Other intriguing user interface elements include research in computer science at Georgia Tech into threedimensional cursors, magic cubes and display sensors. In the latter case, objects that can show multivariate
VISUAL REASONING FOR GIS
501
information by visualisation methods such as balls, traces, colours, etc., are placed into the graphic by the user for the purpose of allowing the user to detect process or form in time and space. 39.8 SIMILARITIES From the discussion, there are clearly many similarities between the user interface and visualisation systems within the context of GIS software. Developments in both fields will quite distinctively influence future GIS innovations, as GIS user interfaces become more user friendly and elements of scientific visualisation increasingly find their way into GIS. In the years ahead, geographic information scientists, in addition to those in computer science, cartography, and software engineering, will make important contributions to research in these areas. A review of the chapters of this book, or the program for a typical GIS research meeting, already confirms this fact. The similarities between the user interface and visualisation, however, beg a far more interdisciplinary approach to these fields than cartography or GIS alone has to this date brought. Clearly, the look and effectiveness of the GIS display and the user’s satisfaction with the two-way software to human user interface are both at the very core of expanding the scope of GIS as an approach to science and as a technology, both in the private realm as vehicles for spatial thinking, and in the public realm as communications media. Both fields, along with GIS have been for some years at the very cutting-edge of software engineering, and so have a lure for scholars interested in anticipating the future of GIS. A common approach to research in this area would seem a fruitful ground for innovative software solutions. Many display and GUI interactions are clearly determined by the visual reasoning under way. As such, placing binary image detection operators into the display and the GUI would allow many of the distinctions between the two to be dropped. For example, a GUI window could always detect and zoom onto a feature meeting certain preset constraints. In a GIS, for example, the largest space left available during a site suitability overlay procedure could be located and shown at a more detailed scale with additional information, a sort of intelligent inset. Or a colour could be used to highlight automatically features matching the shape and configuration of streams in a satellite image. The GIS need not be doing perfect visual reasoning to provide assistance to the user. Such first generation analytical visual reasoning systems might be called GIS visual assistants. While much research remains before a new generation of GIS can evolve, there exist some fascinating future prospects for where visually assisted GIS could lead science. Shepard (1994) has argued that “GIS users would benefit in many ways from being able to exchange information with the computer using multisensory GIS”. A multisensory GIS allows two-way user communication using all of the available senses. Traditionally, GIS has relied virtually entirely on visual reasoning. However, Shepard has pointed out that the other four senses, taste, touch, smell, and sound will be included in the user interaction far more in the era of multimedia GIS. The data exploration now undertaken using scientific visualisation and visual reasoning, could be supplemented by exploring the data by sound or feel. Shepard goes further to guess at future systems. The high degree of mobility of computer systems, developments in virtual imaging and portable communications, and the ever decreasing size of GPS hardware, make GIS systems that are worn a reasonable expectation. These systems, with head-worn displays perhaps on glasses or inside helmets, sound and touch input and output will probably first be worn by spies and the military, but could be of immense use in emergency response, and police enforcement; and multisensory output could greatly increase the use and value of geographic information systems and geographic information science.
502
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
While a GIS with output based entirely on smell output seems ridiculous, one that reads smells and chemically analyses and interprets them, and suggests their likely origins based on GIS data, could be of great analytical value. Theoretically, understanding the full range of human sensory interactions with space seems an underdetermined problem. Practically, simple computational and algorithmic visual models can generate new generations of slightly intelligent GIS. It is perhaps time that geographic information science concentrates less on problems of deep spatial cognition, and more on the technical and other impediments that prevent these futuristic ideas from becoming practical and beneficial intelligent assistants to support our existing imperfect systems. REFERENCES BELIEN, S. and LEENDERS, R. 1996. Comparison of Visualization Techniques and Packages. Version 2, with Applications to Plasma Physics. http://www.sara.nl/Rik/Report.html BERTIN, J. 1983. Semiology of Graphics, translated by W.J. Berg. Madison, WI: University of Wisconsin Press. CLARKE, K.C. 1995. Analytical and Computer Cartography, 2nd ed. Upper Saddle River, NJ: Prentice Hall. CLARKE, K.C. 1997. Getting Started with Geographic Information Systems, Upper Saddle River, NJ: Prentice Hall. DENT, B.D. 1993. Cartography: Thematic Map Design, 3rd ed. Dubuque, IO: Wm. C. Brown. DIBIASE, D. et al. 1994. Multivariate display in geographic data: applications in earth system science, in MacEachren, A.M., and Fraser Taylor, D.R (Eds.), Visualization in Modern Cartography. New York: Elsevier. EGENHOFER, M.J. and RICHARDS, J.R. 1993. Exploratory access to geographic data based on the map-overlay metaphor, Journal of Visual Languages and Computing, 4 (2), pp. 105–125. FREIDHOFF, R.M. andBENZON, W. 1989. Visualization: The Second Computer Revolution. New York: H.A.Abrams. GLEICK, J. 1992. Genius: The Life and Science of Richard Feynman. New York: Vintage. GLOBUS, A. and RAIBLE, E. 1995. 14 ways to say nothing with scientific visualization, IEEE Computer, 27 (7), pp. 86–88. See also http://www.nas.nasa.gov/NAS/TechReports/RNRreports/aglob 006.html GOODCHILD, M.F. 1992. Geographical information science, International Journal of Geographical Information Systems, 6 (1), pp. 31–45. HALL, S.S. 1991. Mapping the Next Millennium: The Discovery of New Geographies. New York: Random House. HEARNSHAW, H.M. and UNWIN, D.J. (Eds.) 1994. Visualization in geographical information systems. New York: Wiley. IMHOF, E. 1982. Cartographic Relief Presentation, translated by H.J Steward. Berlin: de Gruyter. KELLER, P.R. andKELLER, M.M. 1993. Visual Cues: Practical Data Visualization. Los Alamitos, CA: IEEE Computer Society Press. MACEACHREN, A.M. 1994. Some Truth with Maps: A Primer on Symbolization and Design. Washington, DC: Association of American Geographers Resource Publications in Geography. MACEACHREN, A.M. and TAYLOR, D.R 1994. Visualization in Modern Cartography. New York: Elsevier. MOELLERING, H. (Ed.) 1991 Analytical Cartography, Cartography and Geographic Information Systems. 18(1), Special issue on Analytical Cartography. MONMONIER, M. 1990. Strategies for the interactive exploration of geographic correlation, in Brassel K. and Kishimoto (Eds.) Proceedings, 4th International Symposium on Spatial Data Handling, Zurich, vol. 2, pp. 512–521. MONMONIER, M. 1991. How to Lie with Maps. Chicago: University of Chicago Press. PETERSON, M.P. 1995. Interactive and Animated Cartography. Upper Saddle River, NJ: Prentice Hall. RAPER, J. (Ed.) 1989. Three Dimensional Applications in Geographical Information Systems. New York: Taylor & Francis. ROBINSON, A.H. and PETCHENIK, B. 1976. The Nature of Maps: Essays Towards Understanding Maps and Mapping. Chicago: University of Chicago Press.
VISUAL REASONING FOR GIS
503
SHEPARD, I.D.H. 1994. Multi-sensory GIS: mapping out the research frontier, in Waugh, T.C. and Healey, R.G. (Eds.) Advances in GIS Research, Proceedings of the Sixth International Symposium on Spatial Data Handling. London: Taylor & Francis, Vol. 1, pp. 356–390. TOBLER, W.R 1979. Analytical cartography, The American Cartographer, 1(4), pp. 21–31. TUFTE,E.R 1983. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press. TUFTE, E.R. 1990. Envisioning Information. Cheshire, CT: Graphics Press.
Chapter Forty Visualisation of Multivariate Geographic Data for Exploration Aileen Buckley
40.1 INTRODUCTION Vision is the most powerful and efficient human information-processing mechanism. Visualisation enhances the human visual system by using the computer to produce images that were previously difficult or impossible to generate. These images can provide the user with previously unimagined displays from previously unattainable perspectives of previously nonvisual phenomena (Friedhoff and Benzon, 1989). This new technology is revolutionising the way data are being viewed and interpreted in such diverse fields as medicine, psychology, mathematics, geosciences, astrophysics, engineering, art, architecture, communication, and entertainment (Friedhoff and Benzon, 1989; McCormick et al, 1987). Geographers are also embracing visualisation, often coupling it with geographic information systems (GIS), to produce hightech graphics for spatial problem solving in what is called “geographic visualisation”. Geographic or cartographic visualisation is one tool that can be used to process, explore, and analyse the vast volumes of digital spatial data now available. The stores of digital spatial data have increased dramatically in the past few decades due largely to the collection of remotely-sensed imagery, conversion of maps from paper (analogue) to digital form, and creation of new data through the manipulation and analysis of existing data. Computerised methods for dealing with these data are constantly being introduced and updated, and exploration and analysis of these digital data would be impossible without computers. Visualisation may well be “the second computer revolution” (Friedhoff and Benzon, 1989), providing scientists with methods to recast data into displays from which hypotheses, ideas, theories, and confirmation can be derived. With these methods, large volumes of map information can be merged, manipulated, reassembled, and displayed specific to the geographic problem of interest. The resulting visual representation can provide insight that would have been difficult or impossible to obtain without the use of computers. Certain characteristics of visual representations can be exploited to explore data in more depth. These characteristics include dimensionality (i.e., the number of data dimensions), abstraction (as opposed to realism), and the number of variables displayed (i.e., multivariate information). Solving complex spatial problems often requires that some or all of these characteristics be manipulated in geographic data representations. This chapter reviews progress on visualisation within the field of geography. It describes the terminology related to visualisation in geographic and general scientific inquiry, the state-of-the-art methods for the visualisation of complex geographic phenomena, and the contributions of the ancient art and science of cartography to these modern developments in spatial data display. Primary emphasis is placed on the use of
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
505
visualisation for exploring rather than storing or presenting geographic information which has been the traditional role of maps. Additionally, this chapter focuses on methods for displaying multiple variables in geographic visualisation, thereby allowing for more in-depth exploration of the data. Deliberately integrating the more traditional works of cartographic lineage with the most advanced computer technology, this chapter introduces the reader to a variety of visualisation methods to display multivariate spatial data. 40.2 TERMINOLOGY 40.2.1 Visualisation Like a telescope or microscope, visualisation is a tool, external to the brain, that can be used to enhance the extraordinary power of visual thinking. By combining technologies such as image processing, computer graphics, computer-aided design, animation, simulation, and holography, computers can restructure a problem so more of it can be processed by the human visual system. Some believe that the biggest contribution of visualisation to the process of scientific thinking is liberating the brain from the fundamental activity of information retrieval and manipulation required to produce an image, thereby allowing the brain to devote its time and energy to higher levels of analysis and synthesis (Friedhoff and Benzon, 1989, McCormick et al., 1987). While most scientists would agree that “visualisation” has become an acceptable scientific practice, they often disagree on its definition. Terms such as visualisation (with an “s”), visualization (with a “z”), ViSC (visualisation in scientific computing), visual analysis, visual representation, and visual display are often confused and confusing. In its 1987 commissioned report to the United States National Science Foundation (US NSF), the Panel on Graphics, Image Processing, and Workstations defined visualisation as “a method of computing…a tool both for interpreting image data fed into a computer, and for generating images from complex multi-dimensional data sets…,” the goal of which is “…to leverage existing scientific methods by providing new insight through visual methods” (McCormick et al., 1987, p. 3). Some scientists counter that visualisation is not only a method of computing. MacEachren et al. (1992, p. 101) argue that “visualisation…is definitely not restricted to a method of computing…it is first and foremost an act of cognition, a human ability to develop mental representations that allow us to identify patterns and create or impose order.” Visvalingham (1994) distinguishes visualisation, (with an “s”) as the mental process of prompting visual images from visualization (with a “z”) as the sophisticated computing technology to create visual displays that facilitate thinking and problem solving. ViSC is the discipline concerned with both human and computer mechanisms to “perceive, use, and communicate visual information” (McCormick et al, 1987, p. 3). As Visvalingham (1994) suggests, it may be impossible to distinguish between the computing process, the cognitive process, and the created product, especially when the activity is highly interactive and exploration-oriented. Often, the image generated on the computer screen mirrors the imaginative thinking process and new mental images lead to new computer images. Once visualisation is externalised as a visual representation (i.e., the visual depiction of data and/or mental images), the distinction between process and product is easier to make. Although visual representations can be expressed in any medium, visual displays are electronic and usually temporary and easily modifiable (Visvalingham, 1994). Currently, electronic scientific visualisation is limited to display on the computer screen. The visual displays of tomorrow, however, may be able to break the cathode ray tube (CRT) bonds as technology
506
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 40.1: .MacEachren’s (1994) conceptual model of map use space. Map use, which can vary according to the characteristics along each of the three axes, involves both visualization and communication. Visualization, or prompting visual thinking, involves highly interactive map use for revealing unknowns by one or few individuals. Communication is a less interactive form of map use for presenting known facts to a wider public.
continues to advance in holography and virtual reality. Additionally, large format wall displays, akin to military “war boards”, are under development (Clarke, personal communication). Interactive wallboards, holographic projections, and virtual reality will help free visualisation from limitations of the output device, and data will no longer need to be reduced or eliminated to fit into a confined display space. 40.2.2 Geographic Visualisation A subset of scientific visualisation, geographic visualisation is defined by MacEachren et al. (1992, p. 101) as “the use of concrete visual representations—whether on paper or through computer displays or other media —to make spatial contexts and problems visible, so as to engage the most powerful human informationprocessing abilities, those associated with vision.” The form of visual representation most often used by geographers to display spatial information is the map. In terms of map use (Figure 40.1), visualisation can be described as a highly interactive process involving one or few people to examine unknowns in the data (MacEachren, 1994). 40.2.3 Exploration Data exploration is a key part of the scientific endeavour. Exploratory data analysis was formalised by the statistician Tukey (1977) who describes arithmetic methods for “looking at data to see what it seems to say” (p. v). The methods are intended to make problem solving that involves large bodies of data easier by making description simpler, and looking one layer deeper than the previously described surface. “The greatest value of a picture”, he states, “is when it forces us to notice what we never expected to see” (p. vi). Tukey’s work pointed to the importance of exploratory data analysis as the precursor to confirmatory data analysis and has since been adopted by most fields of scientific inquiry, including geography. The value of
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
507
Figure 40.2. DiBiase’s (1990) conceptual model of visualization as a tool for scientific research. For data exploration, maps facilitate visual thinking. For data presentation, maps communicate spatial information.
visualisation was reiterated to the scientific community in the 1987 NSF report, and scientists have since turned towards visual data exploration with renewed vigour, (see Bucher, Chapter 29 and Seixas, Chapter 25, this volume.) Exploratory visualisation is a creative process that leads to the construction of new ideas, hypotheses, and questions. DiBiase (1990) suggests that exploration is at one end in a continuum of visualisation use (Figure 40.2). At this end, visualisation is a private endeavour by an individual as he or she investigates data relationships, formulates new ideas, generates new questions, develops new hypotheses, or finds answers. At the other end of the continuum, visualisation is used for public communication to convince a larger audience of already-formulated ideas or conclusions. Whereas problem solving involves previous knowledge of a defined task, data exploration often involves multiple, changing, or ill-defined tasks. In the process of data exploration, experiments are done for the sake of experimenting, not hypothesis testing. With computers, data can be arranged and rearranged, and data representations can be edited and updated immediately and interactively. Experimentation or exploration in visualisation leads to more displays, more hypotheses, and eventually more technology. Simple curiosity rather than defined tasks (as in experimentation or modelling) can lead to scientific insight or discovery. 40.2.4 Multivariate In general, multivariate data refers to a number of independent mathematical or statistical variables. In geography, multivariate mapping refers to the display of multiple themes or variables interest (e.g., soil, slope, and aspect) over the framework of a locational basemap (Robinson et al., 1995). In addition to multiple themes, multivariate spatial data might be considered to include multiple data structures (e.g., raster, vector), extents (i.e., areas of interest), and resolutions (i.e., finest levels of detail). In this chapter,
508
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
multivariate geographic data refers to a spatial data set that has more than one theme, resolution, extent, and/ or data structure. Most GIS databases would fit this description. Visualisation of multivariate geographic data refers to the ability to see multiple views into the data set, either through multiple windows or multiple attributes (i.e., theme, resolution, extent, data structure). For example, under this definition, visualisation of one theme at various resolutions for one extent or visualisation of various themes at one resolution for various extents would both qualify as multivariate data visualisation. 40.3 FRAMEWORK FOR VISUALISING MULTIVARIATE GEOGRAPHIC DATA Data visualisation in general scientific inquiry can take many forms, such as graphs, charts, tables, plots, figures, and images. The form most often used to visualise spatial data is the map, which, in addition to showing the quantity and/or type of data, also shows the spatial distribution of data. Maps can show one or more types of data, but the more data shown, the more complex (and perhaps confusing) the map. Multivariate mapping is a challenging form of data display because the spatial distribution must be preserved (taking up two dimensions, x and y) and multiple variables must be depicted (in other dimensions or superimposed on the x, y space). At the same time, readability must be preserved and complexity must be held in check. Methods for displaying (often large amounts and different kinds of) geographic data clearly and concisely have been formalised in cartography. This formalisation is based on graphic elements and the graphic variables used to symbolise them. Combinations of graphic elements and graphic variables are related to levels of measurement, and the effectiveness of graphic variables, given the graphic elements and levels of measurement, can be evaluated. 40.3.1 Graphic Elements Graphic elements “constitute the primitive building blocks of pictorial representation” (Robinson et al., 1995, p. 318). The basic graphic elements are points, lines, and areas. Points are x, y coordinate locations that depict position, lines are arrays of points that depict position and direction, and areas are twodimensional arrays of points that depict position, direction, and extent. These three graphic elements can be used to construct both discrete units and continuous surfaces. Continuous surfaces are generally represented in GIS by a raster data structure in which pixels (picture elements) are combined to create a matrix of the total areal extent. Each pixel can be thought of (and symbolised) as a small area. Together, these small areal symbols represent the complete surface display. 40.3.2 Graphic Variables Systematic adjustment of the graphic properties of graphic elements is used to symbolise the data. Symbolisation is achieved by manipulating graphic variables. The list of graphic variables, which originally included only visual variables, has been expanded throughout the years to include aural and tactile variables, as well as dynamic variables (Table 40.1). Graphic variables were first proposed by Bertin (1981) who identified seven classes of visual variables (position, size, value, texture, colour, orientation, and shape). Improvements to his classification are
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
509
described by Robinson et al. (1995) who categorise visual variables as either primary or secondary. The primary visual variables include shape, size, orientation, hue (colour, such as red, green, or blue), value (lightness or darkness of colour), and chroma (saturation of colour). Repetitions of the basic graphic elements (i.e., combinations of points, lines, and areas) form patterns, and variations of the visual variables of the combined graphic elements form pattern visual variables. Derived from the primary visual variables and the basic graphic elements, pattern arrangement (shape of the elements), pattern texture (size of the elements), and pattern orientation (directional arrangement of the elements) are considered secondary visual variables. Table 40.1. Graphic variables used to symbolize data. The list of seven visual variables orignally proposed by Bertin has since been expanded to include variables for other senses(sound and touch) anf for dynamic displays. Whether variables in italics should be included may be debated.
A notable distinction between the classifications of Robinson et al. (1995) and Berlin (1981) is the omission of Berlin’s “position” visual variable by Robinson et al. (Table 40.1). Since the position of a geographic feature does not involve symbolisation (although it does involve generalisation), it cannot be varied (i.e., it cannot be a variable). If location of the graphic element in the visual display is determined by its relative location in geographic space, then position is not a “graphic variable” (i.e., it cannot be “changed” to “symbolise” some value) and perhaps should not be included in the list of graphic variables. In the context of visualising data uncertainty, MacEachren (1995, p. 276) proposed three additional visual variables that relate to clarity (Table 40.1). “Crispness” defocuses an element using selective spatial filtering of edges, fill, or both to adjust the visible detail of a map. “Transparency” is depicted as a “fog” whose hue and value are proportional to the values of the attribute of interest. “Resolution” refers to the spatial precision of the base information and can be translated as the grid size for raster data or the level of simplification for vector data. Spatial resolution is a function of cartographic generalisation, not symbolisation; therefore, like position, it may be argued that it should be eliminated from the list of graphic variables. Based on the idea that stimuli can be transposed from one sense to another (Smets and Overbeeke, 1989), variables for senses other than sight have also been proposed (Table 40.1). These variables can be used to
510
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
develop maps for the visually impaired or to augment visual representations. Nine aural variables have been proposed by Krygier (1994). These are location (position of sound in two- or three-dimensional space), loudness (magnitude of sound), pitch (frequency of sound expressed as highness or lowness), register (location of pitch within a range), timbre (general character of a sound), duration (length of time a sound is heard), rate of change (variations between sound and silence over time), order (sequence of sounds), and attack/decay (time required for a sound to reach maximum or minimum loudness). Vasconcellos (1992) proposed seven tactile variables, three of which are directly comparable to Berlin’s original set of visual variables. Size, shape, and orientation are similar to their corresponding visual variables; elevation is substituted for hue, pattern texture is substituted for value, and pattern arrangement is substituted for chroma. Volume, the substitute for position, should perhaps for the same reasons as position, not be included in the list. Three dynamic variables originally suggested by DiBiase et al. (1992) were later expanded by MacEachren (1995) to a total of six (Table 40.1). These variables are used to incorporate change in visualisation allowing for animated or dynamic maps. The six dynamic variables are display date (time the display is initiated), duration (length of time each scene is displayed), order (the sequence of scenes), rate of change (magnitude of change between scenes), frequency (number of scenes per unit of time), and synchronisation (correspondence between time series of different data sets). It should be noted that the graphic variables can sometimes be combined. For example, Peterson (1995) selects from the visual and aural variables in Table 40.1 to produce his own list of eight animation variables (position, size, shape, speed, viewpoint, distance, scene, and a combined variable called texture/pattern/ shading/colour). Whether used solely or in concert, all graphic variables should be related to the level of measurement of the data they symbolise. Inappropriate use of the variables with regard to the level of measurement will diminish or eliminate their effectiveness for symbolisation. 40.3.3 Levels of Measurement Most cartographic literature generally accepts that there are four basic levels of data measurement: nominal, ordinal, interval, and ratio. This classification is based on the work of Stevens (1946), who proposed four scales of measurement and described the statistics permissible for each. Nominal data only use numbers for names of classes; no quantitative measure is implied. Permissible statistics include counts of the number of cases in each class, modes, and contingency correlation. Ordinal data are ranked or ordered according to some quantitative measure. In addition to the statistics applicable to nominal data, medians or percentiles can be computed. Interval data are quantitative data with a zero point determined by convention or convenience. As well as the statistics for ordinal data, means, standard deviations, rank-order correlation, and product-moment correlation can be calculated. Ratio data are also quantitative, but an absolute zero is always implied. Permissible statistics include the coefficient of variation, as well as the statistical measures for all other levels of measurement. For symbolisation, data can be reduced to three levels of measurement: nominal, ordinal, and interval/ ratio (Table 40.1). Although interval and ratio data are different in terms of their level of measurement, they can be combined for symbolisation since techniques for one apply to the other.
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
511
40.3.4 Effectiveness of Graphic Variables The effectiveness of a particular graphic variable for displaying data with a particular level of measurement (Table 40.1) can generally be evaluated empirically. Cartographers have amassed a large volume of literature on the effectiveness of some graphic variables; however, others have yet to be tested empirically. Current consensus on effectiveness of the graphic variables based on empirical evidence or hypothesis is shown in Table 40.1 but will not be further elucidated in this chapter. 40.4 MAP USE AND MAP USERS If geographic visualisation is related to map use, as DiBiase (1990) and MacEachren (1994) suggest, then it is also related to the map user. The utility of visual representations for exploration is dependent upon not only how well they are created but also how well they are comprehended. 40.4.1 Complexity Simple data representations may be useful in all the phases of map use suggested by DiBiase (1990), from exploration, through confirmation and synthesis, to communication (Figure 40.2). More complex representations may be particularly useful in exploration, allowing users to ask more complex questions and see more complex patterns. Complexity can be increased with a corresponding increase in the number of dimensions, the level of abstraction, and the number of variables (Figure 40.3). Increasing the complexity of the visual representation may compromise its readability; therefore, the two must be kept in balance. 40.4.2 Readability Readability of a map is related to both comparability and distinguishability (DiBiase et al., 1994). These are defined as the ease of identifying correspondence between data sets (comparability) and the ease of detecting distinctions between data sets (distinguishability) (DiBiase et al., 1994). Readability can be compromised when symbols become too diverse in form or too densely distributed (Robinson et al., 1995). Complex representations (i.e., multivariate, multi-dimensional, abstract displays) are particularly prone to these drawbacks. Readability depends in part upon the appropriate use of symbol variables, but it also depends upon the expertise of the user. 40.4.3 Expertise of Users Currently, evaluation of scientific visualisation in relation to its users draws on the expert-novice paradigm in cognitive psychological studies. This paradigm places emphasis on “the amount of domain-specific prior knowledge which is brought to a task” (McGuinness, 1994, p. 186). This information is used to determine the level of expertise of the users, ranging from expert (for those who have an extensive amount of “domainspecific” prior knowledge) to novice (for those who do not).
512
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 40.3. Three characteristics of visualization. The level of abstraction, number of variables, and number of dimensions in a visual display can be increased to allow for “deeper” exploration of the data. DiBiase’s visualization continuum may be overlaid in this space showing that increasing complexity may improve use of the visual display for data exploration; however, complexity may compromise readability if user expertise is not adequate.
Use of the term “domain-specific” is not always clear or consistent. In some studies, “domain-specific” refers to the level of understanding of the content of the information (i.e., the type of phenomena represented). In other studies, the term relates to the ability and aptitude of the user to understand maps and other spatial representations. In addition, “domain-specific” is sometimes used to describe the ability of the user to interact with the visualisation method (e.g., level of computer expertise) (McGuinness, 1994). I suggest that, for visualisation, expertise should be evaluated based on four criteria that correspond to the ability and aptitude of the user to understand: 1. 2. 3. 4.
the nature of data (e.g., measurement, sampling, distribution); the content of the data (i.e., the subject matter, such as climate data, soils data, census data); the visual format (e.g., maps, graphs); and the media (e.g., computer hardware and software).
Together, these will influence the ability of the user to understand and use visualisation for exploration. For more complex visual representations to be used and to be useful, expertise in some or all of these areas is required.
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
513
40.5 TECHNIQUES FOR MULTIVARIATE GEOGRAPHIC VISUALISATION Cartographers were interested in developing methods for the display of multivariate data even before the advent of computers. Traditional cartographic methods for multivariate data representation have been seized and sometimes improved upon by the visualisation community (e.g., Cleveland, 1993; Tufte, 1990), and new methods that are possible only through the use of the computer have recently been introduced. The techniques described below represent a synthesis of traditional cartographic methods, many of which are described in a chapter entitled “Multivariate Mapping and Modelling” in Elements of Cartography (Robinson et al., 1995), and recent advancements in computer mapping technology, such as those described by DiBiase et al (1994). Some of the techniques can be combined, further increasing the number of variables that can be visualised, but there must be a balance between complexity and readability. Although cartographers have been trained always to strive for this balance, visualisation initiates with no mapping or graphic background may soon find themselves the creators of complex, unreadable, and therefore useless displays. Cautionary notes for each technique are included in the hopes that this cartographic knowledge will be transferred to the next generation of spatial data display artists. Although the following section includes only those methods that can be used to visualise geographic data, these techniques can be used for non-spatial data if the x and y axes are substituted with variables other than location. 40.5.1 Superimposition of Features One method for displaying multivariate data is to superimpose themes using different graphic variables. Changes in the values of the attributes are shown by changing the properties of the graphic variables. For example, meteorological maps often show barometric pressure as thin isolines, different types of storm fronts as lines with different patterns, the jet stream as thick isolines, and cloud masses as shaded areas. This technique is conceptually simple, and it is useful for displaying a few variables (two or three). However, readability decreases as the number of variables increases. Although useful for inspecting individual distributions (DiBiase et al., 1994), it is difficult to convey the relative importance of the various themes (Robinson et al., 1995). 40.5.2 Segmented Symbols There are two methods for using segmented symbols. One method segments or divides symbols to map the variables of interest. Most often, point symbols, such as pie charts, are divided to show proportion and are then placed on a locational base map. The other method displays multiple graphic variables in a single symbol, sometimes referred to as a “glyph”. Chernoff faces, which were originally designed to portray up to 18 variables and have since been expanded to show 36, are a fascinating example of glyphs (Dorling, 1994). The premise for Chernoff faces is that the human eye is able to distinguish, easily and simultaneously, many of the facial features that form a facial expression (Dorling, 1994). Different features are used to represent different variables (e.g., a variable each for the eyes, ears, nose, mouth, eyebrows, and shape of the head) to create a cartoon face depicting multiple variables. These techniques are good for inspection of individual variables, but it may be difficult to estimate and compare proportions, especially if there are many different graphic variables (e.g., size, shape, orientation).
514
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
In addition, the visual “field effect” of nearby symbols can alter perception of a symbol (Robinson et al, 1995). To overcome these limitations, it is best to use symbols that the user is already able to interpret easily (e.g., pie charts and cartoon faces), 40.5.3 Cross-variable Mapping. Bivariate mapping is used to “simultaneously depict magnitude of variables within a homogeneous area for two map themes” (Robinson et al., 1995, p. 390). Trivariate mapping is used to show three variables in the same way. Although black-and-white schemes, such as Carstensen’s (1986) unclassed areal texture maps, have been proposed, most methods use colour (i.e., variation in hue and value) to depict variation in the value of variables. Brewer’s (1994) colour scheme typology identifies methods for selecting appropriate colours for both bivariate and trivariate maps with various levels of measurement. Because the number of classes the human eye can distinguish is limited, cross-variable mapping is generally restricted to combinations of either two or three variables. Appropriate colour selection is important for map readability, and even then, trivariate mapping is maybe interpretable only by more expert users (Robinson et al, 1995). 40.5.4 Composite Indices. Also called cartographic modelling or composite variable mapping, composite indices are created when several data variables are combined into one. Multiple variables are generalised by statistically collapsing spatial data into fewer variables using combinations of links (+, –, *, /) or multivariate techniques (e.g., principal components analysis, cluster analysis, discriminant analysis, canonical correlation analysis). The variables can be weighted, as can the links between the variables. This technique is good for distinguishing patterns between variables (DiBiase et al., 1994). GIS can facilitate computation of the indices, although complex problems may require more sophisticated computing capabilities as more variables are added. This technique requires that some thought be given to the variables to be included, the importance of the variables (weighting), and the combination of the variables (links). The final model can be tested using sensitivity analysis (to examine the model behaviour in relation to changes in input and to evaluate the importance of input values) and uncertainty analysis (to estimate the uncertainty in model output as a function of the propagation of uncertainty in model input). The model can also be tested for goodness-of-fit based on an objective of generality (similarity of the model behaviour under different conditions). 40.5.5 Multiple Displays Multiple displays can be generated in either constant or complementary formats. Constant formats, like Tufte’s (1983) “small multiples”, are a series of displays with the same graphic design structure depicting changes in variables from multiple to multiple (i.e., map to map). The consistency of design ensures that attention is directed towards changes in the data. Complementary formats can also be used, combining maps with graphs, plots, tables, text, images, photographs, and other formats for the display of data. This “multimedia” approach can be extended to “hypermedia” by “linking the multiple channels of information
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
515
transparently” (Buttenfield, 1996, p. 466). An example of the use of complementary formats is geographic brushing (Monmonier, 1989) in which points in a scatterplot are linked to features on a locational base map. When a point or set of points is selected, they are highlighted in both displays. Multiple displays may be better for comparing data sets rather than distinguishing between data sets, especially if complexity of the display is increased. For small multiples to be useful, they should be comparable, multivariate, “shrunken, high-density graphics” that are based on a large data matrix and used to show shifts in relationships between variables (Tufte, 1983, p. 175). The utility of multiple format displays is dependent to some degree on the ability and aptitude of the user to understand each format. 40.5.6 Multi-dimensional Displays In multi-dimensional data display, each dimension can be used to depict one (or more) variables. A distinction can be made between 2-D (two-dimensional) mapping (in which location is expressed by the x and y axes), 2.5-D mapping (in which the surface is elevated in relation to some variable), and 3-D mapping (in which volume is expressed). 4-D mapping is generally considered as an extension of 2.5-D or 3-D mapping that includes time (see Miller, Chapter 28, this volume). Often, 2.5-D visualisation is used to display the topography of the earth’s surface; however, the elevation variable can be substituted with another variable of interest (e.g., population density). Examples of 3-D visualisation include volumetric displays of geologic formations below the earth’s surface, atmospheric distributions above the earth’s surface, and oceanic distributions below the sea surface (see Wright, Chapter 30, this Volume). One method for displaying multiple dimensions simultaneously is to show two 2.5-D planes moving through one another. This is achieved most easily with dynamic displays, but small multiples can also be used. Another method for displaying multiple dimensions is with the use of transparency indices, in which a variable is symbolised as a “fog” through which the underlying distributions can be seen. The fog is symbolised with the visual variables for colour (value, hue, and chroma). Because the fog is transparent, the map theme below it remains visible. Although the eye can discern variations in colour, especially when it approximates a realistic representation (e.g., clouds, smog), the effectiveness of transparency indices breaks down when there are several transparent layers and when the layers are used to represent more abstract variables. There are a number of factors to keep in mind when using multi-dimensional displays. A common rule of thumb is that the dimensionality of the display should not exceed the dimensionality of the data (Tufte, 1983). Because elevation of the surface in one location may obscure another location, varying perspectives are often used when more than two dimensions are displayed. If change in perspective cannot be achieved using either multiple or dynamic displays, the utility of this technique may be limited. MacEachren (1995) cautions against inappropriate use of realism in multi-dimensional displays reminding us that realistic representations tend to convince the user that the information on the map is “real”, when, in fact, all maps are abstractions of reality. 40.5.7 Dynamic Displays Dynamic displays introduce an element of change in time, space, or display parameters. These types of displays are described in terms of interaction and animation.
516
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
In the context of computer technology, interaction relates to the flow of information between the user and the computer. Interactivity provides the user with the capabilities of selection and transformation. Selection relates to the ability to select the data sets displayed, the graphic variables used to symbolise them, and/or individuals or groups of individuals in the data sets. Transformation relates to the ability to change coordinates or measurements of the visual displays. This allows the user to change perspective (vantage point, orientation, illumination), generalisation parameters (classification, simplification, exaggeration), scale (extent, resolution), level of measurement, and number of dimensions (Robinson et al., 1995). Animation is “a dynamic visual statement that evolves through movement or change in the display” (Peterson, 1995, p. 48). Achieved through the sequential display of computed and ordered scenes, change between scenes is used to depict something that would not be evident otherwise. Temporal animation depicts change over time; non-temporal animation shows change in some attribute other than time. One example of non-temporal animation is morphing, or the distortion of one image into another. However, the most common form of non-temporal animation is perhaps the “fly-though” in which flight (e.g., over a terrain or around an object) is simulated. The most straightforward approach to animation is to produce and order scenes in advance. The (often preferred) alternative is real-time animation in which the scenes and their order are computed in such rapid response to user controls that the change appears to be simultaneous. Dynamic displays are associated with increased costs in terms of both time for development and software/ hardware requirements. Methods for dynamic mapping have not been fully developed, and few studies on the effectiveness of such mapping techniques have been completed. 40.5.8 Auxiliary Senses According to the theory of direct perception, “our perceptual system is tuned to patterns in the stream of energy striking our senses rather than the energy as such. These patterns, specifying the behavioural meaning of objects, are not sensory-specific and so can be transposed from one perceptual modality to another” (Smets and Overbeeke, 1989, p. 227). If this is true, then patterns that stimulate one sense can theoretically be transposed to another sense. By transposing the stimuli from one sense (e.g., sound) to another (e.g., sight), an additional variable can be symbolised or an existing variable can be amplified. For example, in a fly-through of a populous urban area, a coughing sound can increase as areas with higher levels of air pollution are approached. Due to a generalisation effect, it is also possible to use stimuli that are approximate if not actual representations of the symbolised pattern (Smets and Overbeeke, 1989). Tactile variables are currently limited to maps that can be handled manually and therefore cannot be generated in a computer medium. Current computer technology also prohibits the use of smell and taste as auxiliary senses in visualisation. 40.6 CONCLUSION “In many fields [visualisation] is already revolutionising the way scientists do science” (McCormick et al., 1987, p. 3). Geography has already been dramatically impacted by the computer advancements, particularly through GIS; the discipline may soon witness a second wave of the technological impact through visualisation. Because geographers routinely deal with visual data, they have much to gain from the use of visualisation as an analysis tool. Because they have formalised the presentation of complex information in visual (map) form, they have much to offer visualisation in scientific computing.
VISUALISATION FOR MULTIVARIATE GEOGRAPHIC DATA
517
Geographic visualisation provides the opportunity to go beyond static images, to view spatial data more realistically (or more abstractly), to group and regroup data, and to examine the detail of the parts as well as the synthesis of the whole. With visualisation, design testing has the potential to be faster, easier, and more economical. Prototype scenarios and their outcomes can be generated without danger of adversely affecting the human or natural landscapes manipulated. Visualisation can also be used for training and education, providing students the opportunity to test sophisticated spatial concepts and see the results “before their eyes”. The increased use of maps (both in number and variety) and technological advancements (particularly in regard to computers) have dramatically changed the nature of map making and map use. Increased capabilities for multi- and hyper-media, interactivity and animation, multi-dimensional representation, multivariate representation, and increased realism (as well as abstraction) have combined to increase the potential for visualisation in geography. As Goodchild (1992, p. 157) notes, “The geographer’s view of the world has always been coloured by the data available for analysis and the ways by which those data have been presented”. Abler (1987,p. 515) echoes this, stating, “Geographers think in and through maps, and changes in mapping capabilities will affect the way we think about the world”. Visualisation will no doubt affect our mapping capabilities and the ways that data will be presented, thereby affecting our “view of the world” and “the way we think about the world”. At the present time, “geographic visualisation is still more overlooked than it is overused” (White, 1996); however, with increased use of visualisation in geographic inquiry and demonstration of its utility, this may change. REFERENCES ABLER, R. 1987. What Shall We Say? To Whom Shall We Speak? Annals of the Association of American Geographers, 77, pp. 511–524. BERTIN, J. 1981. Graphics and Graphic-Information Processing. New York: Walter de Gruyter. BREWER, C. 1994. Color use guidelines for mapping and visualization, in Visualization in Modern Cartography. New York: Elsevier Science Ltd., pp, 123–147. BUTTENFIELD, B. 1996. Scientific Visualization for Environmental Modeling: Interactive and Proactive Graphics in GIS and Environmental Modeling: Progress and Research Issues. Colorado: GIS World Books, pp. 463–467. CARSTENSEN, L. 1986. Bivariate choropleth mapping: the effects of axis scaling, The American Cartographer, 13(1), pp. 27–42. CLEVELAND, W. 1993. Visualizing Data. New Jersey: Hobart Press. DIBIASE, D. 1990. Visualization in the Earth Sciences in Earth and Mineral Sciences, Bulletin of the College of Earth and Mineral Sciences, 59(2). Pennsylvania: Pennsylvania State University, pp. 13–18. DIBIASE, D., MACEACHREN, A., KRYGIER, J. and REEVES, C. 1992. Animation and the role of map design in scientific visualization, Cartography and Geographic Information Systems, 19(4), pp. 201–214, 265–266. DIBIASE, D., REEVES C., MACEACHREN, A., VON WYSS, M, KRYGIER, J., SLOAN, J. and DETWEILER, M. 1994. Multivariate display of geographic data: applications in earth system science, Visualization in Modern Cartography. New York: Elsevier Science Ltd., pp. 287–312. DORLING, D. 1994. Cartograms for visualizing human geography, Visualization in Geographical Information Systems. New York: John Wiley & Sons, pp. 85–102. FRIEDHOFF, RM. andBENZON, W. 1989. Visualization: The Second Computer Revolution. New York: Harry N.Abrams. GOODCHILD, M. 1992. Analysis in geography’s inner worlds: pervasive themes, Contemporary American Geography. New Jersey: Rutgers University Press, pp. 138–162. KRYGIER, J. 1994. Sound and geographic visualization, Visualization in Modern Cartography. New York: Elsevier Science Ltd., pp. 149–166,
518
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
MACEACHREN, A. 1994. Visualization in modern cartography: setting the agenda, Visualization in Modern Cartography. New York: Elsevier Science Ltd., pp. 1–12. MACEACHREN, A. 1995. How Maps Work. New York: The Guilford Press. MACEACHREN, A., BUTTENFIELD, B., CAMPBELL, J., DIBIASE, D. and MONMONIER, M. 1992. Visualization in geography’s inner worlds: pervasive themes, Contemporary American Geography, New Jersey: Rutgers University Press, pp. 101– 137. McCORMICK, B., DEFANTI, T. and BROWN, M. 1987. Visualization in scientific computing, Computer Graphics, 21, pp. i-E-8. McGUINNESS, C. 1994. Expert/novice use of visualization tools. Visualization in Modern Cartography. New York: Elsevier Science, pp. 185–199. MONMONIER, M. 1989. Geographic brushing: enhancing exploratory analysis of the scatterplot matrix. Geographical Analysis, 21(1), pp. 81–84. ROBINSON, A., MORRISON, J., MUEHRCKE, P., KIMERLING, A. and GUPTILL, S. 1995. Elements of Cartography, 6th Ed. New York: John Wiley & Sons. SMETS, G. and OVERBEEKE, C. 1989. Scent and sound of vision: expressing scent or sound as visual forms, Perceptual and Motor Skills, 69, pp. 227–233. STEVENS, S. 1946. On the theory of scales of measurement, Science, 103(2684), pp. 677–680. TUFTE, E. 1983. The Visual Display of Quantitative Information. Connecticut: Graphics Press. TUFTE, E. 1990. Envisioning Information. Connecticut: Graphics Press. TUKEY, J. 1977. Exploratory Data Analysis. Massachusetts: Addison-Wesley Publishing Company. VISVALINGHAM, M.. 1994. Visualization in GIS, cartography, and ViSC, Visualization in Geographical Information Systems. New York: John Wiley & Sons, pp. 18–25. VASCONCELLOS, R. 1992. Knowing the Amazon through tactile graphics, in Proceedings of the 15th Conference of the International Cartographic Association, Bournemouth, UK, 23 September-1 October. Germany: International Cartographic Association, pp. 206–210. WHITE, D. 1996. Personal communication. 9 February.
Chapter Forty One Metaphors and User Interface Design David Howard
41.1 INTRODUCTION Geographic visualisation systems (GVIS) and geographic information systems (GIS) now serve an important role in science, allowing researchers to study many problems involving spatial data. Both of these tools encourage interaction between the system user and the data being displayed and manipulated and therefore the user interfaces for GIS and GVIS are critical. In relation to GIS, Frank (1993) states that the user interface is the most important single part of the system. He is highlighting the fact that the user interface is usually the only part of the system that the user has direct contact with (especially the novice or basic user) and that its design is thus crucial to the success of a GIS. In this chapter an approach for thinking about interfaces is described. The description of this approach and how it relates to GIS will lead to an extended discussion about the use of metaphor in user interface design and how it applies to systems for geographic analysis. 41.2 USER INTERFACE DESIGN At a basic level, an interface can be defined as the place where independent systems act on or communicate with each other. For example, a GIS provides an interface between the person using the system and data contained within the system. More specifically, however, the term ‘user interface’ is commonly used to specify the parts of a computer program that provide the means for this connection or interchange. Within a GIS or GVIS, then, the user interface is the combination of program modules that query the user for input, interpret that input, and provide feedback to the user in response to that input. A well-designed interface should facilitate simple, painless and intuitive use of a system. Everything should be easy for the user so that they can focus on the data, not the interface. This is true not only for novices but for experts as well. Interfaces are built precisely so that people do not have to become experts in computer usage and can spend their time being experts in their own discipline. An additional concern for designers of interfaces for geographic applications is that users must deal with the potentially new domain of interpreting and manipulating spatial data. Therefore, the interface designer needs to pay special attention to issues such as the design of map displays so as to minimise the additional amount that a user must learn to interpret the output of the system. However, system developers should not assume that a novice user can meaningfully interpret the output from a GIS or GVIS no matter how well the maps, graphs and other displays are designed. Some knowledge is necessary on the part of the user about how to understand spatial
520
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 41.1: Relations between Howard and MacEachren’s three levels for interface design (Howard and MacEachren, in press), Marr’s levels for analyzing information processing systems (Marr, 1982), and Lindholm and Sarjakoski’s levels for interface design (Lindholm and Sarjakoski, 1994).
displays, the data being explored, and data analysis in general in order to interact usefully with the system. This set of requirements should lead the system designer to an evaluation of the expertise of the expected users in several different areas (see Buckley, Chapter 40, this volume) before and during development. The present chapter will present a hierarchical way to think about interface design that will help interface developers to keep the interface focused and responsive to the users it was designed to serve. 41.3 A HIERARCHICAL APPROACH TO USER INTERFACE DESIGN Successful interface design requires a systematic approach to linking information, computer tools, and users. Alan MacEachren and I developed a hierarchical approach to user interface design that was targeted at geographic visualisation (Howard and MacEachren, 1996). We based our approach upon earlier work by Marr (1982) and Lindholm and Sarjakoski (1994), among others. Marr proposed that machines performing information processing tasks should be examined at three levels: the level of computational theory, the level of representation and algorithms, and the device/hardware implementation level (see Figure 41.1). Analysing machines at the level of computational theory yields a description of the processes it must carry out and why the processes are attempted. At the level of representation and algorithms, an examination of the system describes how the processes might be carried out. Marr suggests that an examination at this level should include both a description of the representation used for the input and output of the processes and a description of the algorithms that are used to accomplish the process. Lastly, scrutiny of the device/hardware implementation level describes how the processes are implemented given specific materials being used. The second approach that heavily influenced our research was provided by Lindholm and Sarjakoski (1994). Lindholm and Sarjakoski divide interfaces into three levels: conceptual, functional, and appearance. The conceptual level deals with the users of the system and their goals. This requires consideration of users’ previous training, their needs, and how best to meet these needs. Lindholm and Sarjakoski’s focus on the users of the system and their specific attention to user interfaces with the system is an important difference between this approach and that offered by Marr. However, the conceptual level deals with some of the same issues as Marr’s level of computational theory in that there is consideration of the goals that
METAPHORS AND USER INTERFACE DESIGN
521
need to be met, either goals of the users or goals for the machine. Lindholm and Sarjakoski’s functional level of the interface encompasses the specific kinds of actions the user can perform and the meanings of these actions. At this level the general capabilities of the system, and thus the controls that are necessary, are determined. The issues addressed at this functional level are similar to those examined in the second of Marr’s levels: the level of representation and algorithm. The last level introduced by Lindholm and Sarjakoski is the appearance level, which deals with “…the output language (how to present the application and data to the user).” (Lindholm and Sarjakoski, 1994, p. 172). Once the necessary elements of the interface have been determined, decisions are made about what the user interface, which encompasses both input to and output from the system, will look like. This appearance level is closely related to both Marr’s representation and algorithm level and hardware implementation level. Although Lindholm and Sarjakoski’s structure was developed with interfaces for geographic visualisation in mind, the distinction between the functional level and the appearance level seemed unclear. Also, the appearance level did not have the depth of Marr’s hardware implementation level. However, Marr’s information processing approach was not intended for application to development of user interfaces and therefore it does not specifically encompass the users and their goals. In the end, a synthesis of the most useful facets of these approaches seemed to be the best way to begin designing our own approach to user interface design (Howard and MacEachren, 1996). Our synthesis follows Marr’s lead in emphasising the goals of using a system over the details of system implementation on particular hardware-software configurations. We also draw heavily on Lindholm and Sarjakoski’s efforts to include the consideration of user training and needs as an integral part of the conception of the system. Lastly, we attempt to retain the distinction between representation and implementation made by Marr. Like both Marr and Lindholm and Sarjakoski, our approach has three hierarchical levels of analysis: conceptual, operational, and implementational (Howard and MacEachren, 1996). Next I will present a short synopsis of these three levels, adapting the original approach (developed for geographic visualisation) to GIS applications. 41.3.1 Conceptual Level At the conceptual level we are concerned with the GIS as a connection between a user and the information that may be accessed through (or created by) the system, rather than with specifics of the user interface that allow the user to command the GIS. The needs of the user and characteristics of the data are the two things that must be uppermost in the system designers’ minds at this level. When considering the needs of the user, Lindholm and Sarjakoski (1994) present a series of questions to help the interface builder: what need is met by the system, how is this goal reached, and what should be the result of working with the system? Of course, users are not a homogeneous group. Different users have different needs. They might be interested in different implications or applications of the data and should be allowed to explore these varying interests. Perhaps the three things that would be highest on most users’ wish lists for a system are: ease of use, flexibility, and power. These three goals may be contradictory, but by keeping all of them in mind, there is a chance of striking a balance between them. Another consideration at this level is the characteristics of the users of the system. This seems particularly important for GIS because of the broad range of users that GIS are built for. GIS is now used extensively in many geographic fields of research, as well as in forestry, marketing, urban planning, and other fields that deal with spatial information. The users in these fields have varying degrees of knowledge about computers, spatial data, display formats (e.g., maps), geographic analysis, and the data sets being explored. Therefore the systems that are built for them must have different characteristics.
522
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
41.3.2 Operational Level We can start to address conceptual level goals by dividing them into operations or functions that can be performed on available information. This is the operational level of interface design: the delineation of appropriate operations to match conceptual level goals (e.g., for a goal of checking individual data values, a table operation may be useful). It is at this level that the special requirements of the system, outlined in the conceptual level based on the characteristics and needs of the users, will be realised. When designing a GIS for general use, for instance, many operations must be included that encompass spatial data analysis, mapping, database management, data entry, etc. However, GIS that are developed for specialised purposes may only need a subset of these functions. At the operational level, designers make decisions about what general sets of tools will be required for users to meet the goals that were delineated at the conceptual level. An example that could apply to either a GIS or a GVIS, based upon the expertise of the users, might be: does the system require a “help” operation? Such an operation might include several specific tools. For instance, there could be a guided tour of the system and its functions, a way to search some database of tools to find out what is available, or even some quite sophisticated help tools that would guide the users through a set of actions to achieve a final result (like the “Wizards” that are now available in many Microsoft products). The decisions that are being made at the operational level should not be affected by the particular hardware/software environment that is being used. However, in many cases the environment will constrain the operations that may be considered. Therefore, the operational level blurs into the third and last level, the implementational level. 41.3.3 Implementational Level The implementational level, as its name implies, involves all issues that pertain to implementing the decisions made at the conceptual and operational levels using the hardware and software that are available. Some of these issues might be method of data storage and retrieval, choice of hardware/software platform, and optimisation of program routines. However, the most important considerations at the implementational level from the perspective of cartographers are those that affect the visual display. These issues include “… consideration of anything that the user will have to see and decipher in order to interact with the system.” (Howard and MacEachren, 1996) such as colour choices, layout of controls and display windows, and default settings for map and graph display. One example of these visual display issues that we explore further is interaction style (Howard and MacEachren, 1996). Interaction style is a significant concern for interface designers because it is important to make logical matches between the kinds of controls used in the interface and the functions of the system that these controls activate so that the control invites an appropriate type of interaction for the function. For instance, tools that require the user to set a value within a large range of choices are best controlled with slider bars or direct entry of values while tools that allow users a choice among a small number of options might be better controlled with a menu from which a user can make the proper selection. Other examples of visual display issues that are particularly important in a GIS context might be: 1. the placement of controls and windows on the display; 2. default values for the appearance of maps and other representations (colour schemes, legends, etc.) as well as mechanisms for overriding these defaults; 3. consistency of interaction style for similar tools;
METAPHORS AND USER INTERFACE DESIGN
523
Figure 41.2: Idealised design process for user interface developers. The entire process should be carried out in concert with the eventual users of the system The first circle represents the process of creating a prototype system. This prototype will then be evaluated and redesigned with respect to each of the three levels. As the iterative redesign process continues, emphasis will shift from conceptual issues to operational issues and later to implementational issues.
4. methods for switching between different views of the data; 5. navigation style; and 6. in systems that have different commands available in different modes, ways of telling the user where they are and how to get to a certain command that is not available in the current mode. It is important to realise that the conceptual, operational and implementational levels are not meant to be used one after the other in lockstep order (Figure 41.2). The process will be one of moving back and forth among the different types of decisions that are required at each level. The three levels are meant to suggest a way to structure the decisions that must be made. If conceptual level decisions are tentatively made early in the design process then operations will be suggested and controls can be implemented. This preliminary process should then lead to more discussion about (perhaps with!) the users of the system and revisions of the goals and needs they have as well as the operations needed to meet these new and revised goals. The three-level approach to interface design outlined by Howard and MacEachren (1996) serves as a way for interface designers to keep the development of an interface focused. Posing the conceptual level questions makes sure that the designer has a set of user needs in mind and the three-level structure helps the designer keep in mind decisions that need to be made at various stages of the process and how these decisions affect one another. Unfortunately, none of this offers any reassurance that the interface will be easy to understand when it is completed. To accomplish this goal, designers often turn to some framework within which to place controls. This framework can lend cohesion to the final product. Most commonly, the framework is a metaphor that relates the functions of the system to the functions available in some common environment. Metaphors make their most obvious contributions at the implementational level as a framework to guide the placement and interaction style of various tools. However, as will become clear in the next section, metaphors also spark ideas about other operations that might be useful in the system and metaphors are often suggested by the goals of the system and the users. Therefore the idea of metaphor in interface design permeates all of the levels while having the most visible impact at the implementational level.
524
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
41.4 USING METAPHORS IN INTERFACE DESIGN It has long been recognised by computer scientists that metaphors could be useful in user-interface design. For instance, a session was held at a 1982 Institute of Electrical and Electronic Engineers (IEEE) conference that featured papers such as “Metaphorical Restructuring of Ill-Defined Problems” “An A-Frame Model for Metaphor”, “A Structure-Mapping Approach to Analogy and Metaphor”, and “Metaphor, Computing Systems, and Active Learning”. But what is meant by metaphor in the context of user interface design? The dictionary definition of a metaphor is: “A figure of speech in which a term is transferred from the object it ordinarily designates to an object it may designate only by implicit comparison or analogy” (American Heritage Dictionary, p. 790). However, in the language of computer scientists and cognitive scientists, this definition has been extended to apply to the learning of unfamiliar situations. The main idea is that people use skills and concepts from one environment or ‘domain’ when trying to understand another domain (Halasz and Moran, 1981; Johnson, 1987). These two domains are sometimes known as the source and the target. Cartographers may have an interesting point of view about the use of metaphor. Downs (1981) points out that maps can be seen as a double metaphor because the map stands for the real world, while also standing for the representation of the world in the head. In this second instance, Downs is referring to the concept of the mental or cognitive map as a way to organise spatial knowledge (for example, see Langer, 1951; Polanyi, 1964; Toulmin, 1960). Downs notes a fact that can be both a strength and a weakness of metaphor in stating that the relationship between the source and target domain is implicit in a metaphor and that the user of the metaphor is thus permitted “…free rein of the imagination with respect to the meaning” of the metaphor (Downs, 1981). This openness allows the users to apply any interpretations, connotations, or associations that they may have with the source domain to the target domain. The power of this is that rich source domains thus allow many associations to be made which, in turn, allows many operations to be included in the interface within the structure of the metaphor. However, Downs’ observation also points out that users can have many different ideas about what the metaphor may encompass and thus they may make mistakes about what actions the interface will allow. Probably the best known example of metaphor in computer interface design is the Apple Desktop metaphor. On the Apple Desktop, files are stored in folders, which can be stored in still more folders. Files are placed in these folders by simply dragging the icon for the file onto the icon for a folder. Similar actions allow files to be moved between folders and copied from disk to disk. Files are deleted by dragging them onto a trash can. The idea is to mimic the workings of a real desk complete with its manila folders containing all sorts of documents and its handy trash can for disposing of unwanted debris. In so doing, icons are created that resemble objects in the office in order to remind people of the actions that can be carried out. Familant and Detweiler (1993) discuss the power of icons as well as some drawbacks in the context of their use in user interfaces. An example of the use of metaphor within the geographic realm is the use of ‘overlay’ and related terms in GIS packages. This terminology obviously stems from a mental picture of laying maps over one another (on a light table, for example) so that one can look at both of them simultaneously in order to examine the relationships between two mapped features. User-interface designers use metaphor in order to enhance the usability of an interface. This idea is similar to the concept of schemata as “…structures for representing and organising knowledge” (MacEachren, 1995). Neisser (1976) says that schemata are "…plans for finding out about objects and events”. This emphasises their role as an aid to understanding, much like metaphors. Carroll and Thomas (1982) argue that people learn about new things by trying to extend their knowledge about old things. They
METAPHORS AND USER INTERFACE DESIGN
525
call this the Metaphor Principle: “People develop new cognitive structures by using metaphors to cognitive structures they have already learned” (Carroll and Thomas, 1982, p. 109). This definition suggests a link between metaphors and schemata. It has been suggested that a particular schema is applied to a situation even if it imperfectly matches the details of the situation (Eastman, 1985). For instance, when viewing a new type of graph, a schema for interpreting a different type of graph might be applied if there is no schema for the new graph. This old schema is then modified to apply to the new type of representation (Eastman, 1985; Pinker, 1990). Perhaps metaphors provide the link between the current situation and the old schema until a new, more appropriate schema has been created. The approach of using metaphors to aid users in the understanding of interfaces has been successful. For example, the text-editor = typewriter and screen = tabletop metaphors have been shown to exert measurable effects on learning (Mack et al, 1983; Rumelhart and Norman, 1981). Another example is again the familiar Apple Desktop (Kuhn and Frank, 1991). Although there are certain problems with this metaphor, it has proven to be very successful at making users feel more comfortable with their computers. Kuhn and Frank are of the opinion that much of the perceived power of metaphor today is a result of the success of the Desktop metaphor. The idea of metaphors is powerful enough that Neves et al (1996) use a GIS metaphor to direct exploration of spatial data sets within a virtual reality system. As mentioned earlier, the use of metaphors in user interface design aids not only the users of the system but also the designers. Having a metaphor can help designers retain sight of the desired capabilities of the system and can suggest extra capabilities for the application. One example of well thought-out extensions is found in some implementations of the stage and play metaphor used by many animation programs. In some of these programs, the animator is able to define certain objects as ‘lead actors’ who are always in front of the ‘supporting actors’ unless specifically directed to stay in the background. Another potential benefit of metaphors is that they can be used to make a routine job less onerous. Carroll and Thomas use the example of a worker whose job it is to monitor a set of measurements continuously for warning signs. They suggest that the output from various sensors might be input into a flight simulator and a game be made out of the task in order to keep workers interested (Carroll and Thomas, 1982). 41.4.1 Designing with Metaphors Many designers agree about the utility of metaphors (Halasz and Moran, 1981; Kuhn and Frank, 1991; Johnson, 1987; Mack et al, 1983; Rumelhart and Norman, 1981). Carroll and Thomas state that interface builders should try to anticipate and support metaphors that are likely to be intuitively applied to the task. As a designer, however, how does one successfully “anticipate and support” a metaphor? Carroll and Thomas offer best-guess recommendations for choosing and developing a metaphor. Building on previous work in computer science and human factors engineering, Madsen (1994, p. 59) offers several guidelines for generating metaphors. She lists five strategies: 1. “Listen to how users understand their computer systems.” Madsen suggests recording users when they are talking about interacting with their computers and then taking advantage of metaphorical relations that already seem to exist within the minds of the users. If we believe the arguments made by Tilton and Andrews (1993), a procedure like this might have led us to something different than the navigation metaphor for hypertext interfaces. They argue that people would not intuitively apply the navigation metaphor to the activity of connecting to remote computers
526
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
and viewing the files that are stored there. Further, they imply that much confusion among those that are using the World Wide Web today results from this improper choice of the navigation metaphor. 2. “Build on already existing metaphors.” This suggestion can refer to the metaphors that users reveal when they talk about their systems as in rule (1) above or metaphors used in applications with which the users are already familiar. Building on existing metaphors allows the development of universally accepted standards for certain types of applications. 3. “Use predecessor artefacts as metaphors.” Predecessor artefacts are objects or sets of objects that were used in the past to perform similar tasks to those being programmed into the interface. Again, the best example of this is the Apple Desktop metaphor. The designer should look around for things that are used often and successfully. Gersmehl (1990) lists nine metaphors for animated cartography and among them he lists the flipbook (where one flips pages and small drawings make an animation) and the slideshow metaphors (where animation is used to switch smoothly between static representations). Both of these are examples of things that users would be familiar with (and hopefully have used successfully) before using the program. 4. “Note metaphors already implicit in the problem description.” When designing an application for a specific use, the use itself can sometimes give clues to picking a successful metaphor. Again, Gersmehl provides an example from animated cartography: the model and camera metaphor. This model and camera metaphor is used extensively in three-dimensional rendering programs in which the object being drawn is treated like a model being photographed: it can be rotated, spotlighted in different ways, etc. 5. “Look for real-world events exhibiting key aspects.” There are often parts of the problem that can be described in physical, real-world terms. An example would be talking about the flow of information and then using a metaphor of rivers to describe that flow. Having chosen a metaphor, Madsen further suggests methods for developing it into a useful interface. These are helpful hints which basically centre around fully exploring the facets of the metaphor. People make all sorts of associations that can be used for their own benefit with a complete user-interface metaphor. For instance, the trash can on the Apple Desktop realises the non-permanent nature of a trash can under someone’s desk. If one throws something away, it is not truly gone until the trash is emptied. Similarly, the items placed in the Apple trash can may be retrieved until the time that the user actually selects a menu item entitled “Empty Trash”. This is an example of exploring an extension of a metaphor to make better use of it for human-computer interaction. See Familant and Detweiler (1993) for a discussion of the use of icons to resemble real-world objects both in looks and in functionality. 41.4.2 Problems with Metaphors In spite of their apparent successes, there are potential problems with the use of metaphor in interface design. Nardi and Zarmer (1993) note that metaphors are not well suited for expressing rich semantics. This problem stems from the fact that metaphors lack precision. They are necessarily incomplete representations of the systems they explain. Metaphors serve well in the role of guiding users with a general orientation but poorly in terms of communicating exact semantics. Halasz and Moran noticed that although users are usually quite successful at deriving the central meaning from a metaphor, they have difficulty when they
METAPHORS AND USER INTERFACE DESIGN
527
have to figure out exactly what concepts are mapped to what actions and which assumptions that are valid in the source domain are allowable in the target domain (Halasz and Moran, 1981). A secondary problem is that it is difficult to find metaphors that can be used for multiple specific applications. It would be handy, as a computer programmer, to find a metaphor that one could always use, no matter what the application. Unfortunately, the stage-and-play metaphor, which works so well within a program like Macromedia Director, would only work imperfectly in a presentation package like Microsoft Powerpoint, which successfully and intuitively uses the slideshow metaphor as a basis for constructing and editing presentations. The stage-and-play metaphor would further be totally unsuitable for a database program like Filemaker Pro since there is no logical stage in a database and nothing is moving, etc. Filemaker Pro uses a notecard metaphor which is quite user-friendly for anyone who has ever kept track of information with a box full of index cards. In Carroll and Thomas’ eight recommendations for the use of metaphor, the very first is that metaphors should be developed on a case-by-case basis (Carroll and Thomas, 1982). Nardi and Zarmer summarise their argument by stating that they have little hope that any set of metaphors could be bundled together for use as a sort of interface designer’s toolkit. Later in their eight recommendations, Carroll and Thomas touch on a third weakness of metaphors which links with the incompleteness problem mentioned above. They recommend that users be slowly weaned from the metaphor after initial learning has taken place. This weaning is done to instil a “more literally correct view of the system”. Carroll and Thomas argue that the metaphor eventually becomes a barrier to correct knowledge (Carroll and Thomas, 1982). Metaphors, by their nature, try to explain the possible actions in one domain in terms of the possible actions in another domain. But what happens when there are things that do not match well? In the Desktop metaphor used for Macintosh computers, there are several mismatches between the source domain, a common office work area, and the target domain, the computer. For instance, in an office area, new documents are created in some common area (on the desk) and must be placed into folders for organisational purposes. However, on the computer, there are often default placements of new documents into certain folders. This may result in confusion when a user says: “How could that file be in this folder? I didn’t put it there!” At some point, it is important to start to educate users away from the metaphor. Otherwise, they may not be able to comprehend certain actions by the system. 41.4.3 Spatial Metaphors: Their Promise and Problems Geographic software has its own set of unique problems because of the inherent intuitiveness of spatial metaphors for the user-interfaces. Although spatial metaphors seem natural for geographic applications, they run the risk of confusing the metaphor with the spatial data itself. Mark (1992) has explored some of these issues in the context of spatial human-computer interface models for geographic information systems. Mark suggests a mapping from cognitive models of space to human-computer interaction paradigms. The cognitive models of space that Mark considered are all based on ways that we deal with the world around us: haptic space is the representation based on sensory experiences of touch, pictorial space is based upon ‘remotely-sensed’ perceptions, and transperceptual space is composed of a set of haptic and pictorial objects experienced over time (Mark, 1992). According to Mark, these can be mapped, respectively, to the direct manipulation paradigm, the camera (or pan and zoom) metaphor, and the wayfinding metaphor. Mark goes on to explain how most of the metaphors that we know best are grounded in one or more of these three spaces. Mark (1992) relates the conceptual points of his paper to research themes raised at the National Center for Geographic Information and Analysis (NCGIA) Specialist Meeting on “User Interfaces for GIS”, some of which are presented below:
528
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
• the need for typologies of GIS tasks, users and use types; • recognition that spatial concepts are critical to the design of user interfaces for GIS; • the issue of trade-offs between learnability on the one hand, and performance for experienced users on the other (see http://www.spatial.maine.edu/~max/i-13-specialistMeeting.html). In the end, Mark warns that interfaces for GIS may be very hard to design because care must be taken to prevent the user from confusing spatial aspects of a metaphor with spatial aspects of the data. This is similar to problems of map symbolisation in general due to the use of space in some parts of the map to represent space and in other parts to represent data values (e.g., symbols that are made larger or smaller based upon the data being depicted). Mark contends that metaphors based in haptic or pictorial space should be safe but that those based in transperceptual space should probably be avoided (Mark, 1992). In answer to Mark’s cautions about spatial metaphors that may confuse a user, Egenhofer and Richards (1993) make a compelling case for the use of data cubes (cube-shaped icons representing the data sets available within the system) for visualisation of data and ‘map templates’ as a way to organise them into a coherent whole in a GIS. These could be used in a direct manipulation paradigm where users would be able to pick up individual data cubes in order to stack them or to place them on a map template. The map templates would contain standard formats for representing sets of data. Stacks of cubes and templates could be built. There would also be several viewing areas for the stacked sets. One would lead to a mapped output, another might offer the user a graph of some aspect of the data and a third might allow the user to look at the data in a tabular format. Egenhofer and Richards suggest this as a replacement or extension of the existing map-overlay metaphors used in some GIS systems. This interface might allow a more intuitive grasp of data layers, map formats, and the types of analyses that are possible with a GIS. It may also encourage users to look at data in a variety of ways, since it would be easy to switch the map templates or move the stacks from the map viewer to the graph viewer, etc. Interfaces like this one offer exciting new metaphors to exploit when looking at geographic data. 41.5 CONCLUSION GIS and GVIS applications are being used more frequently in both professional and academic endeavours. Because of the larger and more varied audience that such applications are being designed for, their user interfaces need to be more flexible and easier to learn than when GIS was mainly a tool used by those who created it. Cartographers and GIS experts are now designing both GIS and GVIS systems as well as integrated GIS/GVIS systems and should continue to do so. In so doing, they should combine ideas that have been researched and used successfully by computer scientists for years, like the use of metaphors, with ideas that arise from their own expertise. Cartographic ideas that will increase the usability of user interfaces for geographic applications include theories about the appropriateness of spatial metaphors as well as map design and symbolisation experience. In this chapter I have suggested two interlocking approaches that should help GIS designers to accomplish these usability goals. First is the three-level interface design approach. The approach outlined offers a way to think about the creation of interfaces that focuses attention on the characteristics of the user and the goals of the system. It also tries to prevent having decisions about what the system is for and about operations it should do from being dictated by current hardware/software implementation constraints. However, although this approach focuses on the theory of interface design, there is often something more needed to keep the interface coherent and consistent. I suggest that metaphors can play this role. Metaphors
METAPHORS AND USER INTERFACE DESIGN
529
are useful in interface design by providing a cohesive way to structure controls so that users have an easy time understanding the capabilities and limitations of the system. Metaphors also allow users to learn interfaces more easily by reminding them of environments that they are already comfortable with. REFERENCES AMERICAN HERITAGE DICTIONARY, 1985. 2nd College Edition. Boston, MA: Houghton Mifflin Company. CARROLL, J. and THOMAS, J. 1982. Metaphors and the cognitive representation of computing systems, Institute of Electrical and Electronic Engineers Transactions: Systems, Man, and Cybernetics, 12(2), pp. 107–116. DOWNS, R. 1981. Maps and metaphors, The Professional Geographer, 33(3), pp. 287– 293. EASTMAN, J.R. 1985. Cognitive models and cartographic design research, Cartographic Journal, 22(2), pp. 95–101. EGENHOFER, M. and RICHARDS, J. 1993. Exploratory access to geographic data based on the map-overlay metaphor, Journal of Visual Languages, 4(2), pp. 105–125. FAMILANT, M.E. and DETWEILER, M.C. 1993. Iconic reference: evolving perspectives and an organizing framework, International Journal of Man-Machine Studies, 39, pp. 705–728. FRANK, A.U. 1993. The use of geographical information system: the user interface is the system, in Medyckyj-Scott, D. and Hearnshaw, H. (Eds.), Human Factors in Geographical Information Systems. London: Belhaven Press, pp. 3–14. GERSMEHL, P.J. 1990. Choosing tools: nine metaphors of four-dimensional cartography, Cartographic Perspectives, 5, pp. 3–17. HALASZ, F. and MORAN, T. 1981. Analogy considered harmful, in Proceedings of Conference on Human Factors in Computer Systems, Washington, DC. Washington, DC: Association for Computing Machinery, pp. 383–386. HOWARD, D. and MACEACHREN, A. 1996. Interface design for geographic visualization: tools for representing reliability, Cartography and Geographic Information Systems, 23(2), pp. 59–77. JOHNSON, J. 1987. How faithfully should the electronic office simulate the real one?, Association of Computing Machinery Special Interest Group in Computer-Human Interaction Bulletin, 19(2), pp. 21–25. KUHN, W. and FRANK, A. 1991. A formalization of metaphors and image-schemas in user interfaces, in Mark, D. and Frank, A. (Eds.), Cognitive and Linguistic Aspects of Geographic Space. Dordrecht: Kluwer Academic Publishers, pp. 419–434. LANGER, S. 1951. Philosophy in a New Key. New York, NY: New American Library. LINDHOLM, M. and SARJAKOSKI, T. 1994. Designing a visualization interface, in MacEachren, A.M. andTaylor, D.R.F. (Eds.), Visualization in Modern Cartography. Oxford: Elsevier, pp. 167–184. MACEACHREN, A.M. 1995. How Maps Work: Issues in Representation and Design. New York, NY: Guilford Press. MACK, R., LEWIS, C, and CARROLL, J. 1983. Learning to use office systems: problems and prospects, Association of Computing Machinery Transactions on Office Information Systems, 1, pp. 10–30. MADSEN, K.H. 1994. A guide to metaphorical design, Communications of the Association of Computing Machinery, 37, pp. 57–62. MARK, D. 1992. Spatial metaphors for human-computer interaction, in Proceedings of the 5th International Symposium on Spatial Data Handling, Charleston, SC. Charleston, SC: IGU Commission of GIS, pp. 104–112. MARR, D. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco, CA: W.H. Freeman. NARDI, B. and ZARMER, C. 1993. Beyond models and metaphors: visual formalisms in user interface design, Journal of Visual Languages and Computing, 4, pp. 5–33. NEISSER, U. 1976. Cognition and Reality: Principles and implications of Cognitive Psychology. San Francisco, CA: W.H.Freeman. NEVES, J.N., GONCALVES, P., MUCHAXO, J. and SELVA, J.P. 1996. Interfacing Spatial Information in Virtual Environments. http://virtual.dcea.fct.unl.pt/gasa/papers/vgis
530
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
PINKER, S. 1990. A theory of graph comprehension, in Freedle, R. (Ed.), Artificial Intelligence and the Future of Testing. Hillsdale, NJ: L.Erlbaum Associates, pp. 73–126. POLANYI, M. 1964. Personal Knowledge. New York, NY: Harper and Row. RUMELHART, D. and NORMAN, D. 1981. Analogical processes in learning, in Anderson, J.R. (Ed.), Cognitive Skills and their Acquisition. Hillsdale, NJ: L. Erlbaum Associates pp. 335–359. TILTON, D. and ANDREWS, S. 1993. Space, place, and interface, Cartographica, 30(4), pp. 61–72. TOULMIN, S. 1960. The Philosophy of Science: An Introduction. New York, NY: Harper and Row.
Chapter Forty Two Universal Analytical GIS Operations—A Task-Oriented Systematisation of Data Structure-Independent GIS Functionality Jochen Albrecht
42.1 INTRODUCTION Current geographic information systems (GIS) are so difficult to use that it takes some expertise to handle them, and it is not unusual for an operator to spend a year mastering the system. This is especially limiting for cursory users who employ GIS as one tool among many others. This chapter presents a universal framework of spatial analytical operations that can be applied regardless of the structure of the data being analysed. Since no such classification has existed up to now, this chapter contributes to the general development of geographical information science. This chapter shows that it is possible to perform most GIS analyses with a set of only 20 universal operations. Further analytical functionality can be achieved by combinations of these universal operations. All operations are defined from a user perspective rather than an abstract technical one. Their function is readily understood by any spatially aware person; they do not require any knowledge about abstract spatial concepts. The universal applicability of these operations is ensured by having them based on a universal data structure, i.e., the latest Open Geodata Interoperability Services (OGIS) and Spatial Archives Interchange Format (SAIF) technical references. Sample applications of the virtual GIS (VGIS), which is based on universal GIS operations, reveal the advantages for a spatial modelling environment. It provides the means to concentrate on the analytical process instead of having to cope with the intricacies of current GIS. The next section provides the setting of this chapter. It contrasts traditional definitions of GIS with a process-oriented view. GIS usage, here, is described according to the level of user expertise rather than data structures. This section also provides an overview to prospective GIS developments based on trends in software engineering. This is followed by an overview of traditional taxonomies of GIS operations. It reveals the minor role of analytical functionality in current systems and compares the delivered functionality with the results of a survey of user requirements. The latter are analysed and reduced to a small set of universal analytical GIS operations. Since the focus of this chapter is on the definition of GIS operations, only a simplified version of the Virtual Geodata Model (OGF, 1993) is introduced before describing the universal GIS operations in the last section. 42.2 BACKGROUND One of the significant differences between GIS and an automated mapping system is the ability to perform analyses (Burrough, 1986; Goodchild, 1987b). Present GIS have sophisticated functions to capture, edit,
532
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
query and display spatial information, yet these systems have limited capabilities to perform spatial analysis and modelling (Goodchild, 1987a). With the growth of GIS applications, many GIS users are demanding more sophisticated analytical functions in GIS. To understand how GIS are used to achieve project objectives, it is necessary to examine the functionality that both has traditionally been offered by the software and that is expected by the users. Although a number of vendors claim their GIS to be data structure-independent, none really is; they all show their origin as either raster (grid cell-based) or vector systems. This data structure distinction has dictated differences in analysis functionality. The available set of analytical procedures, and the names used for them, differ between individual GIS products. A surprisingly small portion of GIS research is devoted to the functionality of GIS. There are numerous committees (e.g., Comité Europeén de Normalisation (CEN), Federal Geographic Data Committee (FGDC), International Hydrographic Organisation (IHO), International Standards Organisation (ISO), and Open GIS Foundation (OGF)), trying to standardise geodata formats. Object-oriented data models (Egenhofer and Frank, 1992) and the handling of metadata (FGDC) are hot topics, but no one seems to be interested in trying to define a universal framework for the methods that deal with the results of this research in data issues. Any development of new methods is focused on the extension of current application areas such as landscape ecology (McGarigal and Marks, 1994), transport and networking (Maguire et al., 1993), or climate modelling (Schimmel and Burke, 1993). The last attempt to analyse the underlying spatial principles of all GIS operations was Tomlin’s (1990) “Map Algebra”. This is an impressive feat, and its applicability has been proven by the adoption of its functionality in most major GIS products (Microstation raster module, Arc/Info GRID, GRASS mapcalc). However, it has two serious drawbacks. First, it is restricted to raster GIS. Even though Tomlin himself claimed the “Map Algebra” to be universal, all his examples are in raster format, and implementations of cartographic modelling principles in a vector, or object-based environment are yet to come. Second, word monsters consisting of combinations of range terms such as “focal”, “incremental”, and “zonal” with functions like “partitioning”, “proximity”, or “insularity”, are inadequate for most non-experts. “Map Algebra”, as valuable as it is, requires (or at least deserves) special training and certainly is not the easy way to understand GIS. 42.3 USER’S VIEW OF GIS FUNCTIONALITY Functional taxonomies—as they can be found in the relevant literature (Aronoff, 1991; Burrough, 1992; de Man, 1988; Goodchild, 1992; Rhind and Green, 1988; Unwin, 1990)-represent a developer’s view of GIS functionality rather than that of non-expert users. All (in the broadest sense) analytic GIS operations mentioned in these works are given in Albrecht (1995) and will not be repeated here. The basis of the following discussion is the result of feedback from GIS students in Germany and Austria, and colleagues during a number of international GIS conferences in 1993 and 1994 who were given a list of 144 GIS operations (Albrecht, 1995) and then asked to rate their importance. This group of users is certainly not representative; for example., the far majority of them were Arc/Info users with Erdas and Idrisi ranking second and third. Most of these users did not have experience with more than one GIS. With these limitations in mind, there are still a number of results in this informal survey that are applicable to the GIS community as a whole. Each of the 144 operations were screened in relation to the following two questions: (1) How does an operation fit into a thematic context, i.e., is an operation similar to another one? and (2) How does an
UNIVERSAL ANALYTICAL GIS OPERATIONS
533
operation fit into the workflow, i.e., what needs to be done before that operation can be executed, and what other operation does it lay the ground for? The result is a very complex net of relations that is comprehensible only if transformed to a net of more general tasks (Albrecht, 1995). “Task”, as it is used here describes all actions that require human input or knowledge about the context, whereas the term “function” is used to describe singular actions or sequences that can be automated. Tasks are usually composed of functions. One of the insights gained was that operations occur at a multitude of conceptual levels, prohibiting reduction to the simple goal-task-function hierarchy described by Huxhold (1989). A parallel phenomena is that a number of operations essential for some applications are superfluous for others. This reflects the heterogeneity of the user community and can be visualised by one of many possible views onto the semantic net of GIS operations. One finding from the answers to the questionnaire was that most of the people interviewed were surprised that this study concentrated on procedures rather than data. They learned to conceptualise GIS in terms of data which is reflected by the sheer amount of data-related operations as opposed to those related to analysis. This is easy to understand since the majority of these users spend most of their time performing data capturing, transformation, and editing. These data-related operations can be described (in sequence of their frequency): • make a map • enter data • select items • display a map • classify attribute data This corresponds with the findings of an analysis of lineage diagrams. Using the GeoLineus software (Lanter, 1994), it can be shown that up to 80 percent of all GIS commands, issued within a usual GIS project, are related to data management and not to analysis. Smith et al. (1994) reported similar results. Conscious conceptualisation of tasks seems to be restricted to expert users who form their own strategies for aggregating elementary functions into work units that they deal with as if they were single units (Albrecht, 1995). These experienced users refer to tasks in a way that signals implicit knowledge; this corresponds with the findings of cognitive science (Rasmussen, 1986). Only the most experienced GIS users are able to abstract the technical procedures and refer to them in the form of concepts that are used to guide the application of domain knowledge. Normally, the analyst will start from the required or desired decision and work towards a representation of the problem. However, the design flow may return to any of the questions, for example, examining alternative proposed changes, expanding the set of processes under consideration, or redefining criteria for evaluation. Each of these design flows seems to be centred on a certain kind of meta-task or goal. These goals are almost too fuzzy to be categorised, but an attempt is made here to present a preliminary list of the goals. This list of meta-tasks shown in Table 42.1 presents a high correlation with the last chapter of Tomlin’s GIS and Cartographic Modelling (1990). 42.3.1 Consequences for the design of future GIS user interfaces Many GIS users possess expert-level knowledge in the application field in which the GIS is to be utilised, but have neither the time nor the desire to learn the technical intricacies of a specific system (Albrecht, 1994). The user’s overall goal should not be the mastery of a new system but more productive interaction with geographic information. An obvious response is to provide a user interface which alleviates the need for specialised training. This user interface should aim at enhancing user interaction with geographic information and with geographic problem solving rather than with systems. Much of the user interface problem is therefore not a programming problem but a conceptual problem (Mark and Gould, 1991). Frank
534
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
(1993) describes the crucial importance of the user interface for the usability of a GIS: “The user interface is the part of the system with which the user interacts. It is the only part directly seen and thus is the system for the user” (p. 11). Table 42.1: List of Meta-tasks INVENTORY: RESOURCE: RESTRICTION: REFERENCE: PREDICTION: THREAT: TARGET: SOLUTION:
Locating, counting, or recording items without concern for desirability Desirable, useful, or limited phenomenon to be conserved or protected Constraint that limits the availability, desirability, or location of a resource Spatial control or anchor for locating features in other goals Attribute values correlated with the presence of target phenomena Phenomena that may injure, destroy, or have other negative effects Desired or valued phenomenon to be found or located Composite result of analysis, embodies the application of analysis criteria
Modern GIS software is a multidisciplinary tool that must allow for interdisciplinary support and is expected to be able to integrate a variety of data sources. These data sources will be used in many ways in a variety of decision support situations. To meet these demands, a user interface is required whose generic functional model consists of a small set of universal GIS operations that allow for the automatic construction of a domain or task specific derived model. It would act as a shell based on a high level language consisting of spatial operators that have definable hierarchical constructs. These spatial operators can be organised following a programmable schema that allows them to generate the derived model. The core of such a shell, however, would be the generic functional model. Section 42.6 introduces the virtual GIS (VGIS) project which aims at implementing the above mentioned ideas. 42.3.2 Derivation of a set of universal GIS operations The analysis of current user interfaces provides a good opportunity to study different approaches to the categorisation of GIS functionality. Again, most operations serve non-analytical means and are therefore not of concern to this study. Using the meta-tasks depicted above as a guideline, a few functional groups can be derived. All systems offering special operations for that particular application have labels such as “terrain analysis” or “neighbourhood”. A large number of these operations (such as “sliver removal” or “coordinate thinning”) do not make much sense or are regarded as a nuisance by the average user. Others, such as “line-of-sight” and “viewshed analysis”, are either synonymous or part of another and therefore confuse an occasional user. This does not mean that those with more experience should have no access to their functionality, but rather that such operations are hidden from an entry-level menu and that the system has default values for the results of each of these operations. Generalisation is a task that is closely related to “zoom” and scale change operations. As such, these tasks are auxiliary, and users may expect them to be performed automatically. This is not a trivial requirement and needs further research (Timpf and Frank, 1995). In a similar vein, abstraction procedures (for example, the reduction of an area to its centroid, or the regionalisation resulting from a Dirichlet (Voronoi/Thiessen) tessellation) are tasks that continue to appear in the GIS literature (Aronoff, 1991; Burrough, 1986;
UNIVERSAL ANALYTICAL GIS OPERATIONS
535
Egenhofer and Frank, 1992; Laurini and Thompson, 1992); however, they were never mentioned by any user in the survey. Operations such as “clump/labelling” are relics of the underlying data structure (i.e., raster-based tessellation) and therefore need to be eliminated from the list of truly universal GIS operations. Purely geometric operations like “line intersection” or “point-in-polygon” are typical for the vector-based way that GIS functionality is currently implemented. These operations are too advanced for the non-technical average GIS users to support them in solving tasks. Figure 42.1 lists all the user-oriented, analytical, universal GIS operations, as derived from the original 144 operations. Essentially, there are 20 operations, plus an indeterminate number of measurements and variations of these operations (see “interpolation”). The assignment of this reduced set of 20 operations to the seven functional headers is arguable i.e., the “proximity” measure could just as well be assigned to “Spatial Analysis” or “Measurements”. The main achievement here, however, is the elimination of all operations that have no direct analytical purpose. Auxiliary functions such as “clump”/”labelling” in the raster domain or “topology building” in the vector world have been discarded, yet it is exactly this group of operations that make up to 80% of all GIS operations in a regular session (Yuan and Albrecht, 1995). A most critical case represents those operations that can be subsumed into interpolation and surface generation. With “Search” and its subsidiary “(re-)classification”, already one functional group was introduced that is only vaguely analytical in character, but it is such an important predecessor to all analytical operations that it needs to be included. In a similar manner, interpolation is of critical importance in some applications, while users in other domains ardently expect the GIS itself to perform all necessary interpolations. It is therefore arguable whether it should be included in a set of universal analytical GIS operations. Furthermore, tasks such as gravity modelling are at the core of many geographic and other human science applications and could hardly be performed automatically. Yet, they were not included here because they can be built using a combination of the given operations (Dodson, 1991). Similar considerations might have led Aronoff (1991, p. 196) to conceive his classification. Idiosyncrasies such as categorising the ‘buffer’ operation as a proximity function within the “Connectivity” group are a result of its being based on Berry’s (1988) Map Analysis Package. Even more interesting, with respect to the user-centred compilation of universal analytical GIS operations is the aggregation of “retrieval, classification, measurement” as a mixed bag similar to “search” and “classification” in Figure 42.1. A number of operations have different names in different domains, despite being essentially the same, “cost, diffusion, spread” is an example for such polymorphism. The operations listed in Table 42.1 are claimed to be universal. A complete assessment, however, can only be given if it can be demonstrated that universal GIS operations work with an equally universal data structure. The definition of the latter is at the heart of the open geodata interoperability specification (OGIS) presented in the following section. If the universal GIS operations can be shown to work with the data structure defined in the virtual geodata model of the Open GIS Foundation (OGF), then all reservations against the true universality should cease. 42.4 THE OGIS DATA STRUCTURE—A FRAMEWORK FOR UNIVERSAL GIS OPERATIONS All GIS operations need to operate on some data. They are universal only if they can operate on any conceivable set of spatial data (which is difficult to prove), or if they work with a definition of universal data. This section introduces such a specification.
536
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
Figure 42.1. Conclusive List of Universal Analytical GIS Operations (as implemented in Section 42.6)
Building on the works of Kucera and Sondheim (1992), members of the OGF developed the Virtual Geodata Model (VGM), which is at the heart of the Open GIS concept. It provides a consistent logical view of geographic information, independent of the underlying data model or format. Because it is a comprehensive geodata representation, it allows for the creation of a set of high-level functions or operations which can be used to access different data sets. Geographic information is collected and managed for numerous purposes, each of which has its own requirements for how data are most efficiently organised, what features of interest to include, what degree of precision and accuracy is necessary, how information is analysed and displayed, and so on. As a result, there are now many geodata models that are largely incompatible and therefore have limited utility for users. The objective of the VGM then is to create a single comprehensive model which embraces the range of existing models and their associated formats. That is, the VGM must be able to describe any datum held in any format developed to the parameters associated with any data model. From an application, rather than data, perspective, the VGM must provide methods by which a user can query geographic information
UNIVERSAL ANALYTICAL GIS OPERATIONS
537
contained in the VGM. This framework was developed in close cooperation with the standardisation committees for spatial and temporal object-oriented extensions of the structured query language (SQL). The specification of the VGM is a mixture of plain English, mathematical and logical expressions, and programming pseudo-code. Attempts to express the full specification in a formal language are currently undertaken at the Departments of Geinformation in Münster and Vienna. These formalisation endevours are based on the functional programming language Gopher (Frank, et al., 1997). One of the features of this programming language is that it checks the functional dependencies and the consistency of the code. The complete code, compiled in Gofer, is given in Albrecht (1996). Since the compiler accepted this code, it has been proven to be syntactically correct and without contradictions. 42.5 UNIVERSAL ELEMENTARY GIS OPERATIONS In this section, each operation is verbally described, with special emphasis on the inter-relationship and suitability for a user-oriented classification. For the algebraic specification of each operation, the reader is again referred to Albrecht (1996). Algebraic specifications are a conceptual aid that allows a noncontradictory definition of the specified operations. They can not, however, assure completeness. Applications that are not yet standard, such as geophysics, are likely to require analytical operations that are not covered here. This section builds upon Figure 42.1 and describes the operations that were considered for the final list of universal analytical GIS operations. A number of operations are well-known, and there is no discussion about how to categorise them (e.g., the group of terrain analytic operations). The omission of “Network” functions is arguable; the author’s argument here is that they can all be substituted by “Neighbourhood” and “Measurement” operations. This statement will be substantiated below. However, a GIS user interface needs to adjust to the field of application, and in a utility (network) application it makes sense to have network functionality as a separate group heading. Similarly, the “cost” operation could also be called “diffusion” or “spread”. The naming convention is a matter of domain; however, the algorithms are the same. Finally, there is a large overlap among the operations within the “Spatial Analysis” and the “Measurements” group. Those operations that result in a single value could be justified to be measurements, although some of these operations are algorithmically so complex that they could also be categorised as “Spatial Analytic”. In the following, six groups of analytical GIS operations are identified (group headers will always be capitalised, whereas the individual operations begin with a small letter). “Search” operations can be partitioned into search by theme and search under geometrical constraints. In most cases, the search operation is succeeded by a “select” and subsequent analysis of the selected object (s). As such, it is not a typical analytical operation in itself. It is included here, because the “search-byregion” operation in particular can only be found in GIS. The search-by-region operator uses a user-defined (rectangular) search window, an arbitrarily shaped mask, or a filter that has some spatial properties. “(Re-) classification” is basically a database operation. In most cases, however, the filter that is used for a reclassification has a spatial determinant. Indeed, the whole concept of Map Algebra can be regarded as a form of reclassification. The unit of measurement does not need to be metric, and distances could be expressed either in time or in relative space, such as number of nodes in a network. The “Locational Analysis” group is comprised of four operations that are among the best-known and most often used GIS operations. “Buffer” and “corridor” are quite similar, and it could be argued that a “corridor” is a “buffer” operation that uses two distances (the inner and the outer boundary) and can only be applied to
538
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
a group of two- or higher-dimensional features. However, that is a technical description of the operation and does not meet the requirement of user-orientation set forth as a prerequisite of this work. Probably the best-known analytical GIS operation is “overlay”. It is comprised of many other operations, such as “clip”, “erase”, “split”, “identity”, “union” or “intersect”, and can be applied to any combination of spatial features. The last operator in this group is the “Thiessen/Voronoi” operation. This is sometimes also categorised as a “Neighbourhood” operation. However, from a task-oriented perspective, it fits well with the three other operations of the “Locational Analysis” group. These four operations satisfy most needs in the large set of location/allocation problems. There are overlaps with functionalities in the “Neighbourhood” and “Terrain Analysis” groups, but for the sake of this categorisation, other operations suitable for location/ allocation problems were kept with their prototypical group headings. Three operations were determined to form a group called “Network/Flow”. These three operations are “connectivity”, “shortest path” and “flow between regions”. The latter became subsumed under the “Thiessen/Voronoi” operation. By assigning weights to each origin, the flow to neighbouring cells can easily be modelled. The “shortest path” operation can be substituted by the repeated application of the “nearest neighbour” operation, while “connectivity” was found to be a measure that extends beyond the scope of network or flow operations. The operations of the next functional group deal with explicitly or implicitly three-dimensional data. They need no explanation beyond the description of parameters used in the algebraic specification. “Slope/ aspect” requires an input file that contains height values. In the exceptional case of the input file being a TIN, the result of this operation is already implicitly recorded. Although not a truly analytical operation, hill-shading can be accomplished within the same operation, if the sun azimuth and the viewer’s elevation are provided as additional parameters. The “catchment/basins” operation uses either a “height” or a “slope” file to calculate the extent of a single basin (if an additional selection point is provided) or an entire set of basins. Similarly, the “drainage/network” operation computes the flow either from a single source location or for the complete stream network. This includes the delineation of stream links, stream order, flow direction and upstream elements. The “viewshed” operation is the only other operation that requires an input file containing height values. In addition, a viewpoint must be designated (this could also be a route or an area), and the viewer’s height above ground needs to be specified. The search distance is an optional parameter. The “Distribution/Neighbourhood” group of operations is probably the most geographic (Tobler’s first law of geography). Non-statistical queries about the relationship between spatial features are usually answered with this set of functions. The “cost/diffusion/spread” operation takes one arbitrarily dimensioned feature and calculates the value of neighbouring attributes according to some spread function. The spread can be influenced by barriers which simulate spatial impedance (the “cost” character), accessibility, or the relative distance under anisotropic conditions. The spread is usually expressed by an equation, while the friction can be represented by either an equation or by special friction coverage. The “shortest path” functionality, described in the “Network” section above, can also be categorised as a “spread” along a given network. If the shortest path calculation is based on relative distances, “cost/diffusion/spread” might be more appropriate than the “nearest neighbour” operation. “Proximity” is less a singular operation than a functional group of numerous technical operations that provide the same functionality. Proximity measures can be applied to all features of an input file or to selected features only. In the case of multi-dimensional features, the user needs to specify whether they should be measured from edge to edge or from centre to centre. Finally, a maximum distance may be specified to identify what is considered to be proximal. Similarly, the “nearest-neighbour” operation uses a number of different algorithms, depending on the mode, which usually is (but does not have to be)
UNIVERSAL ANALYTICAL GIS OPERATIONS
539
conditional on the input data. Aside from the common specification of input and output files (the input can be one or several features of any type), this operation requires information about the unit of measurement (i.e., length or number of nodes) and the mode (e.g., along a path or as-the-crow-flies). It could be argued that “proximity” belongs in the “Measurement” group, while “nearest-neighbour” is a special case of the “cost/diffusion/spread” operation, which in itself is nothing but a complex “reclassification”. This would render the whole group obsolete. From a technical viewpoint, this argument is valid; however, it does not correspond with the requirements on the user’s side and is therefore not adhered to here. All statistical measures that exhibit a certain degree of complexity are categorised as “Spatial Analysis”. This includes the landscape ecological “pattern and dispersion” measures, (such as frequency, indices of similarity, relative richness, diversity, dominance, fragmentation, density, Shannon index, and degrees of freedom) as well as “centrality or connectedness”, “shape measures” (e.g., skewness, compactness), and the whole set of tools for “multivariate analysis”. All result in singular numbers, and can therefore be categorised as “Measurements”. Some of the computations, however, are so complex that users would be confused if they were grouped among measures like “perimeter” or “acreage”. “Pattern” and “dispersion” measures are possibly the most prototypical of all “Spatial analysis” operations, at least with respect to descriptive statistics. Geographers in particular, but also biologists, economists, historians or sociologists, try to explain the underlying processes by analysing the spatial patterns of their research objects. Although developed for applications in forestry, the FRAGSTATS package (McGarigal and Marks, 1994) sets the standards for this type of statistics that can be used in many other domains. “Centrality” yields either the centre of a point cluster or a measure of connectivity in a network. “Shape” measures are used in a wide array of applications, for example, in geomorphological, biogeographical, political (“gerrymandering”), or archaeological practices. A number of basic parameters that can be found in the “Measurements” group (“acreage”, “perimeter”, “centroid”, etc.), are used here to describe elongation, orientation, compactness, or fragmentation. The last operation in the “Spatial analysis” group is again a header for a whole bag of secondary operations. “Multivariate analysis” is comprised of a number of techniques to describe the relationships and dependencies among the spatial objects under scrutiny. Although these are definitely analytic in character, it can be argued that their functionality is covered by such well-established statistical software packages as SAS, SPSS or S-plus and therefore, do not need to be classified as a universal GIS operation. On the other hand, operations like “regression”, “autocorrelation” and “cross-tabulation” are so often used in a GIS context that they are included here as well. The “Measurements” group is virtually infinite. In its core, it consists of a number of simple geometric calculations (“distance”, “direction”, “perimeter”, “acreage”, “height”, “volume”, “surface”, “fractal dimension”); these are then extended with simple statistics (“number”, “histogram”, “mean”), and ultimately include a few topological measures (such as “adjacency” and “doughnuts/holes”). 42.6 IMPLEMENTATION A user-friendly interface should allow users to concentrate on the task at hand by offering them a preferably small set of operations. Such a set is provided in the previous section. Now it needs to be demonstrated that these operations are an efficient means to categorise GIS tasks, specifically those related to spatial problem solving (e.g., path, location, allocation, layout, districting) and predictive modelling (inference, simulation, modelling) (Aronoff, 1991). Section 42.6.1 describes the Virtual GIS, which is based on the universal GIS
540
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
operations identified earlier. Section 42.6.2 outlines the window of opportunities provided by this new GIS user interface. 42.6.1 VGIS Currently available GIS technology is isolated from both the actual problem context and the users’ conceptualisation of the problem. It provides “powerful toolboxes in which many of the tools…are as strange to the user as a robot-driven assembly plant for cars is to the average home handyman” (Burrough, 1992, p. 9). The Virtual GIS (VGIS) project in progress at the Institute for Spatial Analysis and Planning in Areas of Intensive Agriculture (ISPA) is geared to overcome this problem. The VGIS project emphasises a user-oriented visualisation of tasks instead of the commonly-used, technically-oriented operations (Albrecht, 1994). These tasks can be dissected into the universal GIS analysis operations, that are independent of any data structures and thereby of any underlying GIS. The table of universal operations can be regarded as a metaphorical periodic chart of universal GIS functions which enables building typical applications, much like molecules are made up of atoms (Albrecht, 1994). Their difference from the generally available GIS functions is in their top-down approach to user needs (as they were derived in Section 42.3 from general tasks instead of technical conceptions of the manufacturer) and their independence from the underlying system. Almost any situation that requires the presentation of a series of processing steps, especially in the planning stage, is best visualised by flow charts. VGIS utilises flow charts as a graphical means to guide the user through the steps necessary to accomplish a given task. By selecting a task from the main menu, the first questions for input parameters are triggered. If the input data exists in the correct format, then the macro of universal GIS operations, stored in the base file, is executed. Otherwise, the system tries to generate the necessary data based on the knowledge stored in the base files. This might require further input by the user. Data and all operations on them are displayed by icons and connectors. Detailed descriptions of the VGIS project are given in Albrecht (1995), Yuan and Albrecht (1995), and Albrecht et al. (1996). 42.6.2 Geographic modelling One motivation for the search for GIS usage simplifying universal GIS operations was the observation that, in spite of their name, geographic information systems are hardly ever used by geographers in their scientific work. One reason might be the fact that current GIS have little to offer to the scientist, who is interested in modelling spatial phenomena. Tomlin’s (1990) cartographic modelling language is the most sophisticated GIS modelling environment so far. The lack of analysis capabilities in GIS has been lamented by many in the modelling community and has resulted in a conference series devoted to overcoming this discrepancy between the GIS and the modelling community (Goodchild et al, 1993). The answers so far have been specialised applications, mainly in the hydrological domain (e.g., PCRaster, DYNAMO) (van Deursen and Kwadijk, 1993). Even these languages do not really support the creative process of model building. But they require an intricate knowledge of the model and the language and are harnessed to fine-tune a fixed-model run. VGIS, on the other hand, attempts to be a prototyping tool and development platform similar to STELLA (High Performance Systems Inc., 1994), while working with real GIS data and thereby graphically extending Map Algebra according to the concepts presented by Kirby and Pazner (1990).
UNIVERSAL ANALYTICAL GIS OPERATIONS
541
Figure 42.2: Sample Modelling Flow Chart
Flow charts are a standard process-oriented tool in visual programming (Glinert, 1988; Monmonier, 1989). Such visual programming can be displayed as a modelling flow chart that allows the user to “play” with the data flow. A sample exercise used in the first semester of the post graduate course in environmental monitoring given at the University of Vechta shall illustrate the advantages of VGIS in a practical application. The task is to locate the optimal site for a factory. This task, which has previously been solved using the software package Erdas/Imagine, is now done through the workflow technique of the VGIS environment, in a fraction of the time previously required (see Figure 42.2). 42.7 CONCLUSION During the past couple of years, emphasis has shifted from algorithms to data structures, data models and the means of communication between them. Data have been recognised as the single most expensive factor in an institutionalised GIS. Current efforts to deal with this factor more efficiently are comprehensible, yet it is difficult to understand why there has been so little progress in the development of new GIS analysis methods. Most, if not all, of the 144 functions extracted from GIS literature were developed during the 1970s. Functionality and user issues deserve more attention; therefore the universal GIS operations, presented in this chapter, represent a radical departure from the current paradigm in GIS. Some of the issues that are pertinent in current GIS research have not been addressed. There is no discussion of possible additional needs for true 3-D applications, nor for temporal issues. Although the data
542
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
structures given in Section 42.4 account for both, and the operations defined in Section 42.5 are “aware” of the third and the temporal dimension, the author does not rule out that there might be the need for additional operations to satisfy all requirements. The 20 operations describe the set of functionalities that is currently covered by the big selling GIS vendors. Based on this collection, future cognitive science research might try to extend this set to those operations that spatially-aware professionals use implicitly in their daily work but are difficult to solicit in questionnaires. Any attempt to shift the focus in GIS research from the computation to something as fuzzy as “use” is destined to meet resistance. The difficulty lays in the need to unite the user’s and the developer’s view, something that has not be attempted before. The author’s first efforts to contrive a taxonomy of elementary GIS functions were accompanied by recurrent relapses to the technical side, and he had to become a wanderer between two worlds. The results presented in this chapter are but a first step towards a usercentred GIS that transforms what has become a colossal system into a handy tool. Only then will GIS be able to fulfil Norman’s (1991) dictum: “Good tools do not just extend or amplify existing skills; they change the nature of the task itself, and with it the understanding of what it is we are doing” (p. 19). REFERENCES ALBRECHT J. 1994. Universal elementary GIS tasks—beyond low-level commands, in Waugh, T. and Healey, R. (Eds.), Proceedings Sixth International Symposium on Spatial Data Handling, Edinburgh, 5–9 September. London: Taylor & Francis, pp. 209–222. ALBRECHT, J. 1995. Semantic net of universal elementary GIS functions, in Peuquet, D. (Ed.), Proceedings ACSM/ ASPRS Annual Convention and Exposition Technical Papers, Vol. 4 (Auto-Carto 12), 27 February-2 March. Charlotte, NC: ACSM/ASPRS, pp. 235– 244. ALBRECHT, J. 1996. Universal GIS operations—a task-oriented systematization of data structure-independent GIS functionality leading towards a geographic modeling language. PhD Thesis, University of Vechta: ISPA. ALBRECHT, J., BRÖSAMLE, H., and EHLERS, M. 1996. A graphical front-end for user-oriented analytical GIS operations, in Proceedings XWII ISPRS Congress, B2, Commission II, Vienna, pp. 78–88. ARONOFF, S. 1991. Geographic Information Systems: A Management Perspective. Ottawa: WDL Publications. BERRY, J. 1988. Computer-based map analysis: characterizing proximity and connectivity, in Proceedings of International Geographic Information Systems, IGIS Symposium. Washington, DC: NASA, pp. 11–22. BURROUGH, P. 1986. Principles of Geographic Information Systems for Land Resources Assessment. Oxford: Clarendon Press. BURROUGH, P. 1992. Development of intelligent geographical information systems, International Journal of Geographical Information Systems, 1, pp. 1–11. DE MAN, E. 1988. Establishing a geographic information system in relation to its use, International Journal of Geographical Information Systems, 2(3), pp. 257–261. DODSON, R. 1991. VT/GIS: the von Thünen GIS package. NCGIA Technical Report 91–27. Santa Barbara: National Center for Geographic Information and Analysis. EGENHOFER, M. and FRANK, A. 1992. Object-oriented modeling for GIS, Urisa Journal, 2, pp. 3–19. FRANK, A. 1993. The use of GIS: the user interfaceis the system, in Medyckyj-Scott, D. and Hearnshaw, H. (Eds.), Human Factors in Geographic Information Systems. London: Belhaven,pp. 11–12. FRANK, A., KUHN, W., HÖLBLING, W., SCHACHINGER, H., and HAUNOLD, P. (Eds.), 1997. Gofer as Used at Geoinfo (TU Vienna), Vol 12, Geolnfo Series. Vienna: Dept. of Geoinformation, Technical University. GLINERT, E. Ed. 1988. Visual programming environments. Los Alamitos: IEEE Computer Society Press. GOODCHILD, M. 1987a. A spatial analytical perspective on geographical information systems, International Journal of Geographical Information Systems, 4, pp. 327–334.
UNIVERSAL ANALYTICAL GIS OPERATIONS
543
GOODCHILD, M. 1987b. Toward an enumeration and classification of GIS functions, in Proceedings of International Geographic Information System Symposium ‘87: The Research Agenda. Washington: National Aeronautics and Space Administration, II, pp. 67–77. GOODCHILD, M. 1992. Spatial analysis using GIS. 2. Santa Barbara: National Center for Geographic Information and Analysis. GOODCHILD, M., PARKS, B. and STEYAERT, L. (Eds.) 1993. Environmental modeling with GIS. New York: Oxford University Press. HIGH PERFORMANCE SYSTEMS INC. 1994. STELLA II: An Introduction To Systems Thinking. Hanover: High Performance Systems. HUXHOLD, W. 1989. An Introduction to Urban Geographic Information Systems. New York: Oxford University Press. KIRBY, K. and PAZNER, M. 1990. Graphic map algebra, in Brassel, K. and Kishimoto, H. (Eds.), Proceedings of the 4th International Symposium on Spatial Data Handling, Zürich, 23–27 July. Zurich: International Geographical Union, pp. 413–422. KUCERA, H. 1994. Spatio-temporal databases: SAIF at any speed, in Proceedings of 90th Annual General Meeting of the Association of American Geographers. Washington: AAG, p.201. KUCERA, H. and SONDHEIM, M. 1992. Proceedings GIS’92: Working Smarter, Sixth Annual GIS Symposium, 10–13 February 1992, Vancouver, BC. KUCERA, H. and SONDHEIM, M. and FLAHERTY, M. 1992. SAIF—Conquering space and time, in Proceedings of the 88th Annual General Meeting of the Association of American Geographers. San Diego CA: AAG, p. 127. KUCERA, H., CHIN, R. and JAMESON, C. 1993. SAIF—conceptualisation to realisation, in Proceedings GIS’93: Eyes on Future, 15–18 February, Vancouver, BC. KUCERA, H., FRIESEN, P. and SONDHEIM, M. 1994. SQL/MM: The open systems approach to spatio-temporal databases, in Proceedings of GIS’94: Eighth Annual Symposium on Geographic Information Systems, 21–24 February, Vancouver, BC. LANTER, D. 1994. A lineage metadata approach to removing redundancy and propagating updates in a GIS database, Cartography and Geographic Information Systems, 2, pp. 91– 98. LAURINI, R. and THOMPSON, D. 1992. Fundamentals of Spatial Information Systems. London: Academic Press. MAGUIRE, D., SMITH, R and JONES, S. 1993. GIS on the move: some transportation applications of GIS, in Proceedings of the 13th Annual ESRI User Conference, vol. 3, Palm Springs, pp. 39–46. MARK, D. and GOULD, M. (1991). Interacting with geographic information: a commentary, Photogrammetric Engineering & Remote Sensing, 11, pp. 1427–1430. MCGARIGAL, K. and MARKS, B. 1994. FRAGSTATS: Spatial Pattern Analysis Program for Quantifying Landscape Structure, Version 2.0. Corvallis: Forest Science Department, Oregon State University. MONMONIER, M. 1989. Graphic scripts for the sequenced visualization of geographic data, in Proceedings GIS/ LIS’89, Orlando, FL, 26–30 November. Falls Church: ASPRS/ACSM, pp. 381–389. NORMAN, D.A. 1991. Cognitive artefacts, in Caroll, J.M. (Ed.) Designing Interaction: the Psychology at the HumanComputer Interface. New York: Cambridge University Press, p. 17–38. OGF 1993. The Open Geodata Interoperability Specification, Version 1.0, Preliminary draft, 5 November 1993. Cambridge: Open GIS Foundation. RASMUSSEN, J. 1986. Information Processing and Human-Machine Interaction. New York: North Holland. RHINED, D. and GREEN, N. 1988. Design of a geographical information system for a heterogeneous scientific community, International Journal of Geographical Information Systems, 2(3), pp. 171–261. SCHIMMEL, D. and BURKE, I. 1993. Spatial interactive models of atmosphere-ecosystem coupling, in Goodchild, M., Parks, B. and Steyaert, L. (Eds.) Environmental Modeling with GIS. New York: Oxford University Press, pp. 284–295. SDTS 1992. Spatial Data Transfer Standard. Federal information processing standards publication 173. Gaithersburg: National Institute of Standards and Technology.
544
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
SMITH, T., SU, J., EL ABBADI, A., AGRAWAL, D., ALONSO, G. and SAMAN, A. 1994. Computational Modeling Systems. Technical Report TRCS 94–11, Santa Barbara: Department of Computer Science, University of California. SNODGRASS, R. 1994. TSQL2 Language Specification. Tucson: University of Arizona. TIMPF, S. and FRANK, A. 1995. A multi-scale DAG for cartographic objects, in Peuquet D. (Ed.), Proceedings ACSM/ ASPRS Annual Convention and Exposition Technical Papers, Vol. 4 (Auto-Carto 12), 27 February-2 March. Charlotte, NC: ACSM/ASPRS, pp. 157– 163. TOMLIN, D. 1990. Geographic Information Systems and Cartographic Modeling. Englewood Cliffs: Prentice Hall. UNWIN, D. 1990. A syllabus for teaching geographical information systems, International Journal of Geographical Information Systems, 4, pp. 461–462. VAN DEURSEN, W. and KWADIJK, J. 1993. RHINEFLOW: an integrated GIS water balance model for the river Rhine, in International Association of Hydrological Sciences (Ed.), Applications of Geographic Information Systems in Hydrology and Water Resources, Vol. 211. Wallingford: IAHS, pp. 507–518. YUAN, M. and ALBRECHT, J. 1995. Structuring geographic information and GIS operations, in Frank A. and Kuhn W. (Eds.) Spatial Information Theory: A Theoretical Basis for GIS. Berlin: Springer, pp. 107–122.
POSTSCRIPT
Postscript Reflections on Past, Present and Future Geographic Information Research Ian Masser
The two European Science Foundation/National Science Foundation Summer Institutes present a unique opportunity to reflect on the state of past, present and future geographic information research. This is because the Summer Institutes themselves are unique in several important ways. Firstly, the event itself is unique as a showcase for the findings of the work carried out in the two main geographic information research programmes on both sides of the Atlantic: the NCGIA programme funded by the National Science Foundation; and the GISDATA programme funded by the European Science Foundation. The main thrust of both these programmes is clearly evident in the structure of these Institutes, each of which covers six of the main research priority areas defined in both programmes. In this way the Institutes provide opportunities to reflect on the issues raised at the specialist meetings organised by these programmes to review the state of the art on these topics. Secondly, the procedures used to select the individual contributions for the Institute and subsequently for the book are also unique in several ways. In practice the participants of each of the two Summer Institutes fall into two categories. In the first place there are a number of position papers by individuals who have been involved in the organisation of the specialist meetings which took place prior to the Institutes on both sides of the Atlantic. Because of their direct involvement in the organisation of these meetings it is felt that they are in a privileged position to draw upon the collective experiences of a much larger group in the preparation of their contributions to the Institute and subsequently to the book. The other group of contributors are the early career scientists who were selected to participate in these Institutes through open competitions on both sides of the Atlantic. The main criterion for selection was the quality of the extended abstract that they submitted as part of their application and its relevance to one of the themes of the meeting. Given that the number of applications from both sides of the Atlantic to attend the Institutes out-numbered the number of places by several times, each of the successful applicants had not only to undergo the traditional review process with respect to their contributions to this volume but also they had to go through a highly competitive selection process before being invited to participate in the Institute at all. As these early career scientists will be among the trendsetters in geographic information research in the future it is felt that their contributions are particularly likely to point to future directions as well as past and current trends in geographic information research. THE EVOLVING GEOGRAPHIC INFORMATION RESEARCH AGENDA The topics discussed at the two Summer Institutes give a good picture of the evolving geographic information research agenda on both sides of the Atlantic. The main themes for the 1995 Summer Institute
546
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
at Wolfe’s Neck in Maine were Geographic Information Infrastructures, GIS Diffusion and Implementation, Generalisation, Concepts and Paradigms, Spatial Analysis and GIS and Multimedia. These were the topics of the first six GISDATA specialist meetings and six of NCGIA’s initiatives in related activities. Similarly the main themes of the 1996 Summer Institute at Villa Borsig in Berlin are the topics from the second six GISDATA specialist meetings and six more NCGIA initiatives and related activities. When these topics are compared with earlier geographic information research agendas there are some interesting differences. As Goodchild (1997, p. 588) noted in his postscript to the volume of papers from the first Summer Institute, technical issues such as database structures and spatial analysis accounted for four out of the list of five key topics put forward by Abler ten years ago as the main research themes to be addressed in the original National Science Foundation solicitation for the new NCGIA. Technical issues of this kind also occupy a prominent place in the proceedings of both Summer Institutes. However, the wide range of questions that Abler grouped together under the heading of social, economic and institutional issues to create his fifth topic had become much more refined by 1995 to include matters relating to GIS diffusion and implementation as well as the public policy issues associated with the development of national spatial data infrastructures. This volume further develops this research agenda in the context of geographic information and society. The titles of the papers in this section contain terms such as societal problems, information ethics, environmental equity, cultural influences, community empowerment and the commodification of geographic information which highlight the range of current research in this part of the geographic information field. As noted above most of the contributions to this debate come not from established academics but from researchers who are in the early years of their academic careers. For this reason the diversity of contributions to this theme can be regarded as a good indicator of one way in which the research agenda will develop in the future. With the past, present and future theme in mind, it is also useful to look again at the agendas contained in other benchmark documents in the geographic information field. The report of the committee chaired by Lord Chorley which was set up by the British government to consider the handling of geographic information (Department of the Environment, 1987) is particularly interesting in this respect, because, unlike the list prepared by Abler, it was primarily concerned with the policy dimension of the geographic information research agenda. The Chorley Report highlights the extent to which research and policy on political and organisational issues are entwined in this applications driven field. For this reason it has proved an important source of ideas for the geographic information research community as well as for policy makers in all parts of the world. The Chorley Report identified seven major barriers to the take-up of geographic information technologies: • • • • • • •
the lack of digital topographic mapping data; the limited availability of data held by government agencies; the inability to link data effectively; the lack of awareness of GIS especially among key decision makers; the need for more focused education and training; the importance of promoting GIS related research and development; and the extent to which recent technological developments make it necessary for governments to rethink their roles with respect to the coordination of geographic information collection and dissemination.
POSTSCRIPT
547
The current relevance of this agenda was considered at a symposium organised by the Association for Geographic Information (1997) to celebrate the tenth anniversary of the publication of the report. The discussion at the symposium highlighted the continuing importance of these issues, particularly as everyone involved had underestimated the length of time that would be required to overcome some of them. For example, while considerable progress has been made in creating a seamless digital topographic database for the whole of Britain, relatively little progress has been made in promoting greater availability of data collected by government sources. Similarly the British geographic information community as a whole still suffers from the government’s failure to take a more active role in the development of national geographic information strategies in contrast to the United States government who have given considerable support to the development of the National Spatial Data Infrastructure. The symposium also compared the Chorley list of barriers with some current concerns of the geographic information community. It was argued that, whereas data availability and data access remain key issues as does the need to facilitate the linking of data of all kinds, the interpretation of some other issues needs to be revised considerably. For example, given the growing mass market for geographic information products that has been created through the World Wide Web, it is increasingly necessary to see the main objective of education and training as to create a spatially literate population rather than to meet the specialist staffing requirements of the key players in the GIS industry. It was also argued that there is a growing transnational dimension to geographic information as a result of the globalisation of national economies and that it will be increasingly important to take this into account when considering local initiatives. This is particularly important in the context of Europe where the European Union has put forward proposals for a European geographic information infrastructure in its GI2000 document (see Chapter 4, this volume). This retrospective evaluation of the achievements of the Chorley Report provides a number of useful pointers to the way in which the policy agenda is currently developing and the research questions that may arise out of this agenda. It is clear from this evaluation that matters regarding the impacts of geographic information on society and the use of geographic information technologies to promote environmental and socially driven policy agendas will take up a greater proportion of the research effort in this field in the future than is currently the case and that concepts such as environmental equity and community empowerment will help to shape the next generation of the technology. It also seems likely that the scope of the geographic information research agenda will continue to expand in the future. The report of a workshop on The Future of Spatial Data in Society organised by the United States National Academy of Sciences Mapping Science Committee (National Research Council, 1997) identifies five primary forces that are likely to shape the future of spatial data activities by the year 2010. These are: • • • • •
synergy of information, technology and access; expanding global interdependence; increasing emphasis on sustainability; emergence of community based governance; the individual: “As spatial information becomes embedded in widely applied information technologies and is increasingly accessible to the general public, new uses and demands will change many of the current practices related to these concerns” (p. 15).
Reading this list one may wonder about what is distinctive about geographic information research. In practice there is a marked shift away from the technical issues that formed the core of geographic information research agenda ten years ago and still account for the majority of papers contained in this
548
GEOGRAPHIC INFORMATION RESEARCH: TRANS-ATLANTIC PERSPECTIVES
volume. However it can also be argued that such a list brings geographic information research into mainstream social and environmental science research and that this is a reflection of the extent to which geographic information research itself, like the technology it uses, is becoming increasingly pervasive throughout society as a whole. REFERENCES ASSOCIATION FOR GEOGRAPHIC INFORMATION 1997. Ten Years after Chorley. London: Association for Geographic Information. DEPARTMENT OF THE ENVIRONMENT 1987. Handling Geographic Information: Report of the Committee Chaired by Lord Chorley,. London: HMSO. GOODCHILD, M. 1997. Postscript: new directions for GIS research, in Craglia, M. and Couclelis, H. (Eds.) Geographic Information Research: Bridging the Atlantic. London: Taylor & Francis, p. 596. NATIONAL RESEARCH COUNCIL 1997. The Future of Spatial Data and Society: Summary of a Workshop. Washington: National Academic Press.
Index
accessibility of healthcare delivery 151–2 of spatial information technology 18–19 age in Swedish health study 168 aggregation in exploratory data analysis 392 AGNPS model 205 agrarian system analysis (Guinea) 323–37 area studied 325 clustering process 333–4 contiguous zone clustering 328 methods 326–7 optimal resolution 328–31 spatial generalisation 327, 332 spectral information 327 extraction of 332 synthesis of 331–2 Alentejo (Portugal), desertification of 345–6 American Congress of Surveying and Mapping (ACSM) 467 ANSWER model 205 Apple Desktop metaphor 570–1, 572 ARC/INFO GIS 412, 480, 578 areal unit boundary-dependency hypothesis 44 artificial neural networks (ANN) in non-linear spatial systems 176, 177–82 advantages 177–8 disadvantages 178 implementation 181–2 kriging 182–5 results 183–5 structure 178–80 testing network 181 theory 177 training network 181 in Swedish health study 164–5 atmospheric modelling 117 attribute accuracy of spatial data 470
automated topographic and cartographic information system (ATKIS) 59, 60 back-propagation neural networks 165 biological/ecological systems modelling 117 Boreal Ecosystem-Atmosphere Study (BOREAS) 280 Boston Redevelopment Authority 91 boundaries of spatial socio-economic units (SSEUs) 370 breast cancer screening 152 Bristol (UK), urban spatial data 308, 311–12 buffer as GIS operation 582, 584 Campeche state (Mexico), land cover changes 266–71 cartograms 145 cartography, visualisation for 537–40 and user interfaces 545–6 catchment/basins as GIS operation 582, 584 CATMOD procedure in landscape modelling 268 centrality as GIS operation 582, 585 Chihuahuan Desert (New Mexico), desertification in 339–40 China, secondary climate impacts 359–60 climate change, global integrated assessments 353–62 assessment models 357–9 data quality 358 multiple measurements 354–7 secondary climate impacts 359–60 specificity 357–8 role of GIS 277–95 data and process models 286–7 data for 285–6 data life cycle 289 data modelling continuum 289 generalisations of 288 hierarchical and aggregated structures 279–81 information management 289–90 549
550
INDEX
integrative modelling in 279 perspectives on 282–4 progress, impediments 281–90 research communities 284–5 specificity 287–8 visualisation in 281 clustering in agrarian system analysis 328 commercialisation of geographic information 73–4, 77–8 Committee on Global Change (NRC) 278 commodification of geographic information 69–85 commercialisation 73–4, 77–8 dissemination 74, 78 information exchange 75, 78–9 in metropolitan information bureaux 76–80 model of 72–6 value-added services 75, 79 communicative theory of society 44, 50–1 community use of GIS 87–102 adoption of GIS 88–9 benefits and constraints 96–8 context of GIS 90–2 homeless people, research on 94–6 research concepts 93–4 scope 88–93 community-based decision making development of 90 information, role of in 90–2 completeness of spatial data 470–1 complexity in map use 556 conceptual data schema, assessing geodata 511 conceptual consistency 511–12 consistency conditions 518 formal rules 513–18 minimum size 513 object overlap 515–16 objects, forming 516 required attributes 513–14 results and errors 518 testing 512–13 topological relations 517–18 valid values 514–15 conceptual models 116 confidentiality in GIS data 38 connectivity as GIS operation 582, 584 continuous fields in exploratory data analysis 391 corridor as GIS operation 582, 584 cost-benefit studies in earthquake hazard estimation 228–9 cost/diffusion/spread as GIS operation 582, 584, 585
cultural influences on GIS 55–68 differences 61–3 implementing regulations 61–2 negotiations in establishing GIS 59, 63–5 research design 58–60 culture, as shared beliefs 55 data handling capability in spatial information technology 18 data integration in GIS 38 data organisation in spatial models 119–21 data quality see spatial data quality decision actor unit in spatial decision making 133 decision aids in spatial decision making 135 decision making in forest management 487–91 multi-participant 109 see also spatial decision making; spatial decision support systems decision outcomes in spatial decision making 134 Delaunay-Triangulation in urban analysis 300–1 density estimation 145 desertification, spatial-time analysis 339–51 application 345–9 association patterns 347–8 desertification process assessment 341 land cover analysis 346–7 landscape analysis 341–2 landscape spatial structure 343–5 time in 344–5 methodology 342–3 spatio-temporal analysis 347–9 variability patterns 348–9 deterministic models 116 developing countries, urban analysis by remote sensing 304 digital elevation model (DEM) 286 digital information market in GIS 38 directional reasoning in GIS 435–47 disaster management, urban analysis by remote sensing 304 discretization in exploratory data analysis 391 disenfranchised groups and GIS 28 dispersion as GIS operation 582, 585 dissemination of geographic information 74, 78 distances, uncertain topological relations 449–59 distribution/neighbourhood as GIS operation 582, 584 drainage/network as GIS operation 582, 584 dynamic models 116
INDEX
earthquake hazard estimation 217–30 cost-benefit studies 228–9 GIS-based methodology 219–20 ground shake, maps of 220–2 monetary and non-monetary losses 227–8 overview 218–19 secondary seismic hazards 222–3 structural damage, maps of 223–7 ECHO database 32 ecological fallacies 42–3 ecological modelling of wetlands 204 economic modelling 118 education dissemination of geographic information to 74 in Swedish health study 168 employment in Swedish health study 168 enframing theory of technology 44, 50–1 environmental equity 41–54 areal unit boundary-dependency hypothesis 44 ecological fallacies 42–3 and environmental racism 42 Houston study 44–50 scale-dependency hypothesis 44 and society 43–4 Environmental Protection Agency (EPA) 44 environmental racism 42 environmental sciences in mountains see mountain environments spatial models for 117 epidemiology and GIS 144–50 exploratory analysis 145–9 and healthcare planning 153 modelling data 149–50 visualisation of data 144–5 ethical policies in information science 25–30 academic sector, roles for 28–9 advancements needed 29 legal conduct 26–7 unethical conduct in 25, 27–9 European Commission 31 European Geographic Information Infrastructure 70 European Geographic Information Resource Base 31–40 applications 38–9 databases in 37 emergence of policy 31–5 GI 2000 initiative 32–4 objectives 33–4 political context of 32 R&D developments 34–5
551
research agenda 35–9 fifth framework 36–7 fourth framework 36 Eurostat 32 expertise of map users 557–8 exploration 551–2 exploratory data analysis (EDA) 391–404 implementation 400–1 procedure 394–400 data characteristics 395–6 disturbing properties 399–400 test statistics 396–9 selection of model 392–3 role of 393–4 extended exploratory data analysis (EEDA) 391–404 implementation 400–1 procedure 394–400 data characteristics 395–6 disturbing properties 399–400 test statistics 396–9 selection of model 392–3 role of 393–4 extrapolation in exploratory data analysis 392 Federal Geographic Data Committee 70 field studies in spatial decision making 136, 137 filtering in spatio-temporal geostatistical kriging 379–81 First ISLSCP Field Experiment (FIFE) 280 Five Lakes Valley (Tatra mountains), mountain environments in geographical characteristics 245–7 geological map 249 map processing 247–51 slope map 248 vegetation map 250–1 flood protection planning 62–3 flow as GIS operation 582, 584 forest management 477–95 attribute error sensitivity analysis 486–7 testing 485–6 change maps 487 data analysis 481–2 data collection 480 data processing 480 data quality 478–9 attribute error 479 tracking 491–2 visualising 492–3
552
INDEX
decision to cut stands 487–91 SDSS in 492 sensitivity analysis 493 spatial model for 482–5 study area 480 suitability map 486 uncertainty in 491–3 Foutas Djallon (Guinea), agrarian system analysis 323–37 fractal-based urban density functions 314–15 analysis 315–17 full cost recovery in commercialisation of geographic information 73, 82 geocomplexes in mountain environments 252–9 comparisions 259–61 diversity 255–6 dominance 256–7 likeness index 257–8 mean size 252–3 roundness index 254–5 shape dismemberment 254 shape indices 254–5 spatial patterning 255–8 type, frequency 253–4 geographic modelling 118, 587–8 geographic monitoring 38–9 geographic triangle 21–2 geographic visualisation 551 Geographic Visualisation Information Systems (GVIS) 537, 539, 565 Geographical Analysis Machine 147 Geographical Information Systems (GIS) academic sector, ethical roles for 28–9 accessibility 18–19 as commodity 18 contribution of 21–3 cultural influences on 55–68 data models 288 perspectives on 282–4 societal viewpoint on 17 geometric attributes of spatial data 465 geometrical quality, metadata on 502–6 choosing metadata 502 use of 503–4 geostatistical kriging, spatio-temporal 375–89 geostatistics 376 multi-dimensional issues 379–82 approximate variograms 382 direction and anisotropy 381–2
filtering 379–81 variogram 377–82 Ghana, erosion studies 521–32 and GIS 522 study area 522–3 GISDATA 37, 115, 143, 147, 367 global climate change see climate change, global Global Learning and Observations for a Better Environment (GLOBE) 285 Greater Manchester Research 78 greenhouse gases 277 GRID (UN Environment Program) 277 ground shake hazard 218, 219 maps of 220–2 health and GIS 143–58 epidemiology 144–50 exploratory analysis 145–9 and healthcare planning 153 modelling data 149–50 visualisation of data 144–5 healthcare delivery 150–3 accessibility 151–2 and epidemiology 153 outcome 153 planning 150–1 utilisation 152 health variations, neural nets in 159–74 artificial nets 164–5 data preparation 165–7 ill-health rates 162–4 linear regression in 167–71 analysis 167 comparisons 169–70 network analysis 167–9 prediction error maps 170–1 scale problems 171 Swedish study area 161–2 healthcare delivery 150–3 accessibility 151–2 and epidemiology 153 outcome 153 planning 150–1 utilisation 152 heterogeneous spatial reasoning 438 definition 440–1 need for 439–40 heteroscedasticity in exploratory data analysis 400 hierarchical spatial reasoning 420–1
INDEX
algorithm for 425–7 ontology of 424–5 principles 421 hierarchical wayfinding 419–33 algorithm, hierarchical case 425–7 algorithm, non-hierarchical case 425 approach to 421–2 comparisons 428 formalisation method 429–30 and hierarchical spatial reasoning 420–1 ontology of 422–5 performance analysis 428–9 hierarchies, types of 421 homeless people, research on (Milwaukee) 94–7 Houston study on air quality 44–50 analytical procedures 45–7 deterministic aggregation 47–9 stochastic aggregation 49–50 hydrological flow processes in wetlands modelling 209– 10 hydrological modelling 117, 189–202 initial model 195–6 models 192–3 on Morava River system 198–200 space in 190–1 time in 191–2 on Trkmanka catchment 196–8 IDRISI 578 in mountain environments 247 in soil erosion estimation 234 illegal conduct 25 illiteracy studies and MAUP 43 IMPACT-2 project 34 individualism as cultural dimension 55–6 INFO2000 project 34, 35 information exchange of geographic information 75, 78–9 information technology and spatial data quality 464–7 in spatial decision making 130–2 information transaction theory 57 integrated assessments of climate change 353–62 assessment models 357–9 data quality 358 multiple measurements 354–7 secondary climate impacts 359–60 specificity 357–8 Integrated Climate Assessment Model (ICAM) 355 integrated modelling 117, 118
553
integrated spatial reasoning 438 and combined reasoning 442–4 definition 440–1 need for 439–40 Intelligent Vehicle Highway Systems (IVHS) in SDSS 106, 110 interaction in spatial decision making 133–4 interaction matrices in spatial models 119 International Land Surface Climatology Project (LSLSCP) 280 internet 71 interpolation in exploratory data analysis 391, 392 Karlsruhe (Germany), urban analysis in 299 kernel estimation 145 King County (Washington, USA) cultural research 58 GIS design on 59–60 Kreis (County) Osnabrück (Germany) cultural research 58 GIS design on 59–60 kriging in non-linear spatial systems 175–6, 182–5 implementation 183 spatio-temporal geostatistical 375–89 approximate variograms 382 direction and anisotropy 381–2 filtering 379–81 geostatistics 376 multi-dimensional issues 379–82 variogram 377–82 laboratory experiments in spatial decision making 136, 137 land cover study, Mexico 263–74 change in, driving forces 267–8 classification 267, 269 integrated modelling 268–71 study area 266 land use modelling 122 and soil erosion 238–41, 530 urban change 307 LANDSAT TM 324–5, 346 landscape zone analysis (Guinea) 323–37 area studied 325 clustering process 333–4 contiguous zone clustering 328 methods 326–7 optimal resolution 328–31 spatial generalisation 327, 332
554
INDEX
spectral information 327 extraction of 332 synthesis of 331–2 landscapes, desertification of spatial structure 343–5 statistical analysis 341–2 time analysis 344–5 land-surface-subsurface process modelling 117 legal conduct 25, 27–9 Leuven (Belgium), soil erosion estimation 235–8 linear programming in soil erosion studies 524–5 lists in spatial models 119, 120 local government commodification of geographic information 69–85 commercialisation 73–4, 77–8 dissemination 74, 78 information exchange 75, 78–9 in metropolitan information bureaux 76–80 model of 72–6 UK, USA compared 81–2 value-added services 75, 79 locational analysis as GIS operation 582, 584 location-allocation models in SDSS 105 Loch Coruisk (Skye), mountain environments in geographical characteristics 245–7 geological map 249 map processing 247–51 slope map 248 vegetation map 250–1 logical consistency of spatial data 471 London Research Centre 77–8, 81 long-term annual average rainfall, neural networks on 181, 184–5 malaria transmission modelling 149, 153 man-machine efficiency in spatial information technology 19 Map Algebra 578, 583 map scale orientated uniform spatial coordination, Germany (MERKIS) 59 map use and users 556–8 marginal cost recovery in commercialisation of geographic information 73 Markov chain models 263, 264–6 masculinity as cultural dimension 55–6 mathematical models 116 measurements as GIS operations 582, 583, 585 Merseyside Address Referencing System 78, 79, 81 metadata, probability assessments on 497–508
geometrical quality 502–6 choosing metadata 502 use of 503–4 nominal ground 497–8 positional uncertainty 501–2 quality control data 499–501 inaccuracy in 499 results 499–501 metaphors in user interface design 565–76 hierarchical approach to 566–70 conceptual level 567–8 implementational level 568–70 operational level 568 iterative design process 569 use of 570–4 designing 571–2 problems with 573 spatial metaphors 573–4 strategies for 572 metaquality of spatial data 467 metrics for spatial data 472–3 metropolitan information bureaux (UK) 71, 76–80 microsimulation in spatial models 121–4 Milwaukee (USA) housing stock research 87, 89 minority groups and GIS 28 model, definition 116 Modifiable Areal Unit Problem (MAUP) ecological fallacies and 42–3 and environmental equity 41 and SDSS 106 in spatial socio-economic units 369 monetary losses from earthquake hazard 218, 219, 227–8 monopoly control of information 29 Morava River system, hydrological modelling in 198–200 mountain environments 245–62 comparisions 259–61 database construction 247–51 statistical analyses 252–9 diversity 255–6 dominance 256–7 geocomplex size 252–3 geocomplex type, frequency 253–4 geocomplex vertical structure 258–9 likeness index 257–8 roundness index 254–5 shape dismemberment 254 shape indices 254–5 spatial patterning 255–8 vegetation types 251
INDEX
multiple criteria decision (MCD) models 131 multivariate geographic visualisation 549–64 framework for graphic elements 553 graphic variables 553–5, 556 measurement levels 555–6 map use and users 556–8 techniques for 558–62 auxiliary senses 561–2 composite indices 559–60 cross-variable mapping 559 dynamic displays 561 multi-dimensional displays 560–1 multiple displays 560 segmented symbols 559 superimposition of features 558 NAACP v. American Family Mutual Insurance case 91 National Centre for Geographic Information and Analysis (NCGIA) 15, 281 databases 37 Initiative-19 92 National Committee on Digital Cartographic Data Standards (NCDCDS) 467 National Gepospatial Database 70 National Severe Storms Centre 541 National Spatial Data Infrastructure 70 National Toxic Release Inventory (TRI, EPA) 44, 46 navigation, urban analysis by remote sensing 304 nearest-neighbour as GIS operation 582, 584 neighbourhood as GIS operations 582, 583, 584 network data structures 189–202 initial model 195–6 models 192–3 on Morava River system 198–200 space in 190–1 time in 191–2 on Trkmanka catchment 196–8 network/flow as GIS operation 582, 584 networks and hierarchical wayfinding 419–20 in spatial models 119, 120 neural nets, health variations in see health variations nominal ground and probability assessments on metadata 497–8 in spatial data quality 466–7 role of 466 non-hierarchical spatial reasoning algorithm for 425
555
ontology of 422–4 non-linear spatial systems 175–87 artificial neural networks in 176, 177–82 advantages 177–8 disadvantages 178 implementation 181–2 results 183–5 structure 178–80 testing network 181 theory 177 training network 181 kriging 182–5 implementation 183 non-monetary and losses from earthquake hazard 218, 219, 227–8 non-normality in exploratory data analysis 399 North American Landscape Characterisation Project (NALC) 267 Norwich (UK), urban spatial data 311–12 Oakland Healthy Start program 87 oceanographic data, semantic modelling 405–17 formulation and implementation 407–11 insights 413 rationale 406–7 visualisation and modelling 413–14 ontology of spatial information technology 20 Open GIS (OGIS) 577 data structure 582–3 opportunistic behaviour in information sciences 27 Ordnance Survey 70, 81 outliers in exploratory data analysis 399 overlay as GIS operation 582, 584 Pacific Marine Environmental Laboratory (PMEL) 405 Palo Alto (California), earthquake hazard studies 217 parcellation, and soil erosion 234–8 and land use 236–41 participatory decision making and GIS 87–102 adoption of GIS 88–9 benefits and constraints 93–94, 96–98 context of GIS 90–2 homeless people, research on 94–6 research concepts 93–4 scope 88–93 pattern as GIS operation 582, 585 personal information privacy 29 Peterborough (UK), urban spatial data 311–12 PlanGraphics 59
556
INDEX
planning for healthcare delivery 150–1 pollution, urban analysis by remote sensing 304 positional accuracy of spatial data 469–70 positional uncertainty of metadata 501–2 power distance as cultural dimension 55–6 Powys County Council 81 precipitation in upland areas 175–87 ANN in see artificial neural networks prediction error maps in Swedish Health study 170–1 probability assessments on metadata 497–508 geometrical quality 502–6 choosing metadata 502 use of 503–4 nominal ground 497–8 positional uncertainty 501–2 quality control data 499–501 inaccuracy in 499 results 499–501 problem solving in spatial information technology 18 property rights 29 proximity as GIS operation 582, 584, 585 public sector dissemination of geographic information to 74 qualitative data handling 37 quality of geodata, assessing 509–19 conceptual consistency 511–12 conceptual data schema 511 consistency conditions 518 formal rules 513–18 minimum size 513 object overlap 515–16 objects, forming 516 required attributes 513–14 results and errors 518 testing 512–13 topological relations 517–18 valid values 514–15 of spatial data see spatial data quality quality control data, and probability assessments 499–501 inaccuracy in 499 results 499–501 quasi-experiments in spatial decision making 136–7 raster-based GIS 119, 120, 124 readability in map use 556–7 reclassification as GIS operation 582, 583 regionalised variable theory 395
registry of porperty ownership, Germany (ALK) 59 remote sensing 297–306 applications 304 data integration in 298–9 integrated measurement 311–13 patterns and process 297–8 research in 297–9 tools for 303–4 of urban change 307–8 measurement 310–11 in urban dynamics 298 urban feature extraction 299–303 representation of world in spatial information technology 20 retail site location models in SDSS 105 Revised Universal Soil Loss Equation (RUSLE) 233, 238, 240–1 Ridge Inter-Disciplinary Global Experiments (RIDGE) 406 risk management 38 river networks 189 ROUTE programme 193 RouteSmart 103, 106 benefits 104 description of 104–5 Salt lake County (Utah), earthquake hazards in 221–6 SASPAC (small-area statistics package) 77, 78 scale models 116 scale problems in Swedish health study 171 scale-dependency hypothesis 44 scientific visualisation 536–7 search as GIS operations 582, 583 seismic hazards 218, 219 secondary, maps of 222–3 semantic accuracy of spatial data 471 semantic attributes of spatial data 465 semantic modelling 405–17 formulation and implementation 407–11 insights 413 rationale 406–7 visualisation and modelling 413–14 sensitivity analysis in forest management 493 attribute error 486–7 on data quality 478–9 attribute error 479 shape as GIS operation 582, 585 shortest path as GIS operation 582, 584 slope/aspect as GIS operation 582, 584
INDEX
social sciences, spatial models for 118 social structure outcomes in spatial decision making 134 society and environmental equity 43–4 and spatial information technology 15–24 fusion with GIS 19–20 geographic triangle in 21–2 issues 18–20 socio-economic units (SSEUs), spatio-temporal analysis 365–73 analysis of 370 boundaries 370 design, creation, maintenance 369 justification of 368–9 spatial units, status of 367–8 sociological modelling 118 soil erosion evaluation 231–44 and field size 236 in Ghana 521–32 methodology 232–6 benefits of 234–5 implementation 235–6 model input parameters 526–30 C-factor mapping 527–30 erodibility mapping 526–7 and land use planning 530 slope, length and steepness 526 parcellation, effect of 234–8 and land use 238–41 risks assessment of (ERA) 523 topography-based 236 space in hydrological modelling 190–1 Spatial Aggregation Machine (SAM) program 45 spatial analysis as GIS operations 582, 583 Spatial Analysis System (SPANS) 480, 482–3, 484 Spatial Archives Interchange Format (SAIF) 577 spatial autocorrelation in exploratory data analysis 400 spatial cognition 436–7 spatial data handling 37 spatial data quality 463–75 assessment in spatial-temporal analysis 521–32 availablilty of data 525–6 model input parameters 526–30 in forest management 478–9 attribute error 479 tracking 491–2 visualising 492–3
history of quality 464 information technology 464–7 metaquality 467 metrics for 472–3 nominal ground 466–7 structuring of 467–71 parameter set 469–71 visualisation of 473 spatial decision making 129–42 decision actor unit in 133 dynamics of 134 emergent structures 134 and GIS/SDSS use 135–6 information technology 130–2 inputs for 130–3 as interaction process 133–4 outcomes 134–5 research strategies 136–7 scope of 130–5 structural issues 132–3 spatial decision support systems (SDSSs) 103–12 computer scepticism in 109 data resolution in 106–7 description of 104–5 in epidemiology 153 extensions 108 improvements 109–10 labour relations in 109 management goals in 109 and manual methods 109 organisation 108–9 typologies 105 user feedback 107–8 spatial information in agrarian system analysis 328–31 generalisation 327, 332 for decision making 130–2 and society 15–24 issues 18–20 spatial metaphors in user interface design 573–4 spatial models 115–27 categories 116 data organisation 119–21 definition 116 in environmental sciences 117 and GIS 118–21 integrating 39 and microsimulation 121–4 in social sciences 118
557
558
INDEX
zonal data, disaggregation of 123–4 spatial reasoning 436–7 combined and integrated reasoning 442–4 heterogeneous and integrated 438 definition 440–1 need for 439–40 qualitative 437 formalisms for 438–41 spatio-temporal analysis 339–51 application 345–9 association patterns 347–8 data quality assessment 521–32 availability of data 525–6 model input parameters 526–30 of desertification 347–9 desertification process assessment 341 land cover analysis 346–7 landscape analysis 341–2 landscape spatial structure 343–5 time in 344–5 methodology 342–3 socio-economic units (SSEUs) 365–73 analysis of 370 boundaries 370 design, creation, maintenance 369 justification of 368–9 spatial units, status of 367–8 variability patterns 348–9 spatio-temporal geostatistical kriging 375–89 geostatistics 376 multi-dimensional issues 379–82 approximate variograms 382 direction and anisotropy 381–2 filtering 379–81 variogram 377–82 static models 116 stock matrices in spatial models 119 structural damage hazard 218, 219 maps of 223–7 Swedish health study see Västerbotten (Sweden) study Swindon (UK), urban spatial data 311–12 temporal accuracy of spatial data 470 Thiessen/Voronoi as GIS operation 582, 584 time in hydrological modelling 191–2 topological reasoning in GIS 435–47 topological relations, uncertainty in 449–59 abstraction 455 classification of 455–6
clusters 452–3 combined observation 456 measurement 455 morphological distance functions 453–4 objects and relations 450 observations for 451–4 representation of 450–1 ternary skeletons 453 TOSCA, in soil erosion estimation 234 traffic analysis zones (TAZs) in SDSS 106 transformation in exploratory data analysis 392 transport, urban analysis by remote sensing 304 transport engineering modelling 118 triangulated irregular network (TIN) 286 Trkmanka catchment, hydrological modelling in 196–8 Tyne and Wear Reaearch and Intelligence Unit 77, 79 uncertainty in forest management 477, 491–3 data quality 491–2 SDSS in 492 sensitivity analysis 493 visualising data 492–3 positional, of metadata 501–2 in SDSS 109–10 in topological relations 449–59 abstraction 455 classification of 455–6 clusters 452–3 combined observation 456 measurement 455 morphological distance functions 453–4 objects and relations 450 observations for 451–4 representation of 450–1 ternary skeletons 453 uncertainty avoidance as cultural dimension 55–6, 58 unethical conduct 26–7 universal analytical GIS operations 577–91 derivation of 580–2 elementary operations 583–5 implementation 585–7 Open GIS (OGIS) data structure 582–3 users’ view 578–82 Universal Soil Loss Equation (USLE) in soil erosion 231, 232–3, 234, 238, 523 in wetlands modelling 205 upland areas, precipitation in see precipitation in upland areas
INDEX
urban analysis, remote sensing 297–306 applications 304 data integration in 298–9 dynamics of 298 research in 297–9 tools for 303–4 urban feature extraction 299–303 Urban and Regional Information Systems Association (URISA) 87 urban density monitoring 307–21 analysis 313–17 fractal-based analysis 315–17 fractal-based functions 314–15 traditional functions 313–14 measurement 309–13 GIS and RS 310–11 integrated 311–13 traditional 309–10 urban land use change 307 urban simulation models in SDSS 105 usability tests in spatial decision making 136, 137 user interfaces design of 579–80 metaphors in design of 565–76 hierarchical approach to 566–70 strategies for 572 use of 570–4 and visualisation 535–48 for cartography and GIS 545–6 similarities of 546–7 utilisation of healthcare delivery 152 value-added services in geographic information 75, 79 variogram in spatio-temporal geostatistical kriging 377–82 approximate variograms 382 Västerbotten (Sweden) study on health variations 159–74 ill-health rates 162–4 microregional approach 161–2 public health 163 welfare in Sweden 159–60 vector-based GIS 123 Vents Program 405 Vents Program Scientific Information Model (VPSIM) 406 data function table 408–9 description of 411–12 formulation and implementation 407–11 insights from 413 rationale for 406–7
Virtual Geodata Model (VGM) 582 Virtual GIS (VGIS) 577, 586 visualisation for cartography and GIS 537–40 definition 550–1 detection and measurement 541–4 of epidemiology data 144–5 and future of GIS 544–5 in global climate change 281 and human reasoning 540–1 of multivariate geographic data 549–64 graphic elements 553 graphic variables 553–5, 556 map use and users 556–8 measurement levels 555–6 techniques for 558–62 scientific 536–7 of spatial data quality 473 in spatial models 123 and user interfaces 535–48 for cartography and GIS 545–6 similarities of 546–7 and visual learning 540 in VPSIM 413–14 watershed analysis models in SDSS 105 wayfinding see hierarchical wayfinding wetlands dynamic modelling of 203–15 framework for 206–7 hydrological flow processes 209–10 implementation 207–8 in lumped domain 204–6 movement 208–9 simulation 206 spatial approaches 205–6 stream flow algorithm 211–212 sustainability of 203–4 ‘wicked problems’ in spatial decision making 132 World Wide Web 71, 289, 401, 412 zero charge in commercialisation of geographic information 73
559