INFORMATION MODELLING AND KNOWLEDGE BASES XIII
Frontiers in Artificial Intelligence and Applications Series Editors: J. Breuker, R. Lopez de Mdntaras, M. Mohammadian, S. Ohsuga and W. Swartout
Volume 73 Previously published in this series: Vol. 72, A. Namatame et al. (Eds.), Agent-Based Approaches in Economic and Social Complex Systems Vol. 71, J.M. Abe and J.I. da Silva Filho (Eds.), Logic, Artificial Intelligence and Robotics Vol. 70, B. Verheij et al. (Eds.), Legal Knowledge and Information Systems Vol. 69, N. Baba et al. (Eds.), Knowledge-Based Intelligent Information Engineering Systems & Allied Technologies Vol. 68, J.D. Moore et al. (Eds.), Artificial Intelligence in Education Vol. 67, H. Jaakkola et al. (Eds.), Information Modelling and Knowledge Bases XII Vol. 66, H.H. Lund et al. (Eds.), Seventh Scandinavian Conference on Artificial Intelligence Vol. 65, In production Vol. 64, J. Breuker et al. (Eds.), Legal Knowledge and Information Systems Vol. 63, I. Gent et al. (Eds.), SAT2000 Vol. 62, T. Hruska and M. Hashimoto (Eds.), Knowledge-Based Software Engineering Vol. 61, E. Kawaguchi et al. (Eds.), Information Modelling and Knowledge Bases XI Vol. 60, P. Hoffman and D. Lemke (Eds.), Teaching and Learning in a Network World Vol. 59, M. Mohammadian (Ed.), Advances in Intelligent Systems: Theory and Applications Vol. 58, R. Dieng et al. (Eds.), Designing Cooperative Systems Vol. 57, M. Mohammadian (Ed.), New Frontiers in Computational Intelligence and its Applications Vol. 56, M.I. Torres and A. Sanfeliu (Eds.), Pattern Recognition and Applications Vol. 55, G. Gumming et al. (Eds.), Advanced Research in Computers and Communications in Education Vol. 54, W. Horn (Ed.), ECAI 2000 Vol. 53, E. Motta, Reusable Components for Knowledge Modelling Vol. 52, In production Vol. 51, H. Jaakkola et al. (Eds.), Information Modelling and Knowledge Bases X Vol. 50, S.P. Lajoie and M. Vivet (Eds.), Artificial Intelligence in Education Vol. 49, P. McNamara and H. Prakken (Eds.), Norms, Logics and Information Systems Vol. 48, P. Navrat and H. Ueno (Eds.), Knowledge-Based Software Engineering Vol. 47, M.T. Escrig and F. Toledo, Qualitative Spatial Reasoning: Theory and Practice Vol. 46, N. Guarino (Ed.), Formal Ontology in Information Systems Vol. 45, P.-J. Charrel et al. (Eds.), Information Modelling and Knowledge Bases IX Vol. 44, K. de Koning, Model-Based Reasoning about Learner Behaviour Vol. 43, M. Gams et al. (Eds.), Mind Versus Computer Vol. 41, F.C. Morabito (Ed.), Advances in Intelligent Systems Vol. 40, G. Grahne (Ed.), Sixth Scandinavian Conference on Artificial Intelligence Vol. 39, B. du Boulay and R. Mizoguchi (Eds.), Artificial Intelligence in Education Vol. 38, H. Kangassalo et al. (Eds.), Information Modelling and Knowledge Bases VIII Vol. 37, F.L. Silva et al. (Eds.), Spatiotemporal Models in Biological and Artificial Systems
ISSN: 0922-6389
Information Modelling and Knowledge Bases XIII Edited by
Hannu Kangassalo University of Tampere, Finland
Hannu Jaakkola Tampere University of Technology, Finland
Eiji Kawaguchi Kyushu Institute of Technology, Japan
and
Tatjana Welzer University of Maribor, Slovenia
/OS
Press Ohmsha
Amsterdam • Berlin • Oxford • Tokyo • Washington, DC
© 2002, The authors mentioned in the Table of Contents All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior written permission from the publisher. ISBN 1 58603 234 8 (IOS Press) ISBN 4 274 90523 3 C3055 (Ohmsha) Library of Congress Control Number: 2002107673
Publisher IOS Press Nieuwe Hemweg 6B 1013 BG Amsterdam The Netherlands fax: +31 20 620 3419 e-mail:
[email protected]
Distributor in the UK and Ireland IOS Press/Lavis Marketing 73 Lime Walk Headington Oxford OX3 7AD England fax: +44 1865 75 0079
Distributor in the USA and Canada IOS Press, Inc. 5795-G Burke Centre Parkway Burke, VA 22015 USA fax: +1 703 323 3668 e-mail:
[email protected]
Distributor in Germany, Austria and Switzerland IOS Press/LSL.de Gerichtsweg 28 D-04103 Leipzig Germany fax: +49 341 995 4255
Distributor in Japan Ohmsha, Ltd. 3-1 Kanda Nishiki-cho Chiyoda-ku, Tokyo 101-8460 Japan fax: +81 3 3233 2426
LEGAL NOTICE The publisher is not responsible for the use which might be made of the following information. PRINTED IN THE NETHERLANDS
Preface A recent trend in thinking about information systems, data bases, and knowledge bases is that information modelling forms a fundamental conceptual and methodological basis on which the information content of systems depends. Information modelling means structuring originally unstructured or ill-structured information by applying various types of abstract models, theories, and principles, for different purposes. Information modelling is the core of conceptualisation, system structuring, and justification for information system design. It supports and facilitates the understanding, explanation, prediction and reasoning on information, its manipulation in the systems, as well as understanding and designing the functions of the systems. It is applied in various areas, as in the theory of science, in scientific research, conceptual modelling, problem solving, organisational knowledge management, data and knowledge discovery, database design, software development, cognitive science, neurocomputing, as well as in other areas. Very often people do not recognise that they are, in fact, applying methods and principles of information modelling when they are constructing new knowledge. It also helps to understand, explain, predict, and reflect on information needed in the development of the methods and theories for the design. Some of these areas are briefly introduced in the following, and many of them are discussed in more detail in this volume. Conceptual modelling is one of the most important subareas of information modelling. It is based on the result of the conceptualisation process, in which the concepts are constructed. A conceptual model will be created using these concepts. Conceptual modelling means creating those conceptual models which describe the abstract system of the Universe of Discourse (UoD) and its information content, and which are needed in defining and designing information systems, or other applications of the UoD. A conceptual model can also in itself be regarded as a desired system. It consists of concepts and rules of the UoD, e.g. of concepts and rules of an enterprise. There are many theories about the nature and origin of concepts, which are basic notions for scientific research and conceptual modelling. According to some theories, concepts are cognitive 'tools' necessary to human thinking. They result from human cognitive processes and indicate what kind of information a person has received, or formed for himself concerning the subject matter corresponding to the UoD. The result of conceptual modelling, i.e. the content of concepts and conceptual models, depends on: - information available about the UoD, - information about the UoD, regarded as not relevant for the concept or conceptual model and therefore abandoned or renounced, - ontology used as a basis of the conceptualisation process, - epistemological theory which determines how the ontology is or should be applied in the process of recognising adequate conceptual models or theories, - additional knowledge included by the modeller, e.g. some knowledge primitives, some conceptual 'components', mathematical structures, additional known theories, proposed new theories, etc., - the purpose of the conceptual modelling work, - the process of the practical concept formation and modelling work,
- knowledge and skill of the person making modelling, as well as those of the people giving information for the modelling work. By means of an externalised conceptual model the set of concepts and rules between them is made visible and communicable. Applications of conceptual modelling appear e.g. in scientific research, in information systems design, and in many other areas. Designers and users try to construct well-organised and consistent conceptual models which as well as possible reflect the UoD. These external models are then applied for understanding, explaining, predicting, and reasoning on information in the system by other persons than the inventor of the internal conceptual model. In software development, users and designers often apply more technically oriented concepts than in conceptual modelling in general. They start the development process with quite general conceptual modelling approaches, but then they start to use more specific, computer-oriented concepts. However, information about the UoD should be the main concern of modelling, and not the implementation of it. The topics of the articles in this volume cover a wide variety of themes in the domain of theory and practice of information modelling, conceptual modelling, design and specification of information systems, software engineering, databases and knowledge bases. This book is the thirteenth volume in the series "Information Modelling and Knowledge Bases". The series dates back to the last half of the 1980's, with annual publications now amounting to 340 reviewed articles. The articles introduce results of the work and collaboration in a wide researcher network originating from the Finnish-Japanese research initiative begun in 1988. Later (1991) the geographical scope expanded to cover the whole of Europe and other countries, too. Annually, the published papers are formally reviewed by an international program committee and selected for the annual conference forming a forum for presentations, criticism and discussions, taken into account in the final published versions. Each paper has been reviewed by three or four reviewers. The selected papers are printed in this volume. This effort had not been possible without support from many people and organisations. In the programme committee, there are 36 well-known researchers from the areas of information modelling, logic, philosophy, concept theories, conceptual modelling, ontology, data bases, knowledge bases, information systems, and linguistics, and related fields important for information modelling. In addition, 22 external referees gave invaluable help and support in the reviewing process. We are very grateful for their careful work in reviewing the papers. We also gratefully thank all the sponsors of this effort for their help and support. We hope that we have managed to show that this support has been productive and valuable in the advancement of research and practice of information modelling and knowledge bases. The Editors Hannu Kangassalo, Eiji Kawaguchi, Hannu Jaakkola, Tatjana Welzer
Program Committee Eiji Kawaguchi (co-chairman), Kyushu Institute of Technology, Japan Hannu Kangassalo (co-chairman), University of Tampere, Finland Setsuo Arikawa, Kyushu University, Japan Alfs Berztis, University of Pittsburgh, USA Pierre-Jean Charrel, Universite Toulouse 1, France Valeria De Antonellis, Politecnico di Milano, Universita' di Brescia, Italy Olga De Troyer, Vrije Universiteit Brussel, Belgium Marie Duzi, Silesian University, Czech Republic Maria Grazia Fugini, Politecnico di Milano, Italy Nicola Guarino, National Research Council LADSEB-CNR, Italy Jaak Henno, Tallinn Technical University, Estonia Wolfgang Hesse, University of Marburg, Germany Seiji Ishikawa, Kyushu Institute of Technology, Japan Yukihiro Itoh, Shizuoka University, Japan Manfred A. Jeusfeld, Tilburg University, The Netherlands Yasushi Kiyoki, Keio University, Japan Pavel Materna, Academy of Sciences of Czech Republic, Czech Republic Isabelle Mirbel, Universite de Nice Sophia Antipolis, France Yasuaki Nakano, Shinshu University, Japan Bjorn Nilsson, Astrakan Strategic Development, Sweden Setsuo Ohsuga, Waseda University, Japan Antoni Olive, Universitat Politecnica Catalunya, Spain Jari Palomaki, University of Tampere, Finland Alain Pirotte, University of Louvain, Belgium Veikko Rantala, University of Tampere, Finland Colette Rolland, University of Paris I, France Michael Schrefl, University of Linz, Austria Klaus-Dieter Schewe, Massey University, New Zealand Cristina Sernadas, Lisbon Institute of Technology (IST), Portugal Arne Soelvberg, Norwegian University of Science and Technology, Norway Yuzuru Tanaka, University of Hokkaido, Japan Bernhard Thalheim, Brandenburgian Technical University, Germany Takehiro Tokuda, Tokyo Institute of Technology, Japan Benkt Wangler, University of Skovde, Sweden Roel Wieringa, University of Twente, The Netherlands Esteban Zimanyi, Universite Libre de Bruxelles(ULB), Belgium Organising Committee Tatjana Welzer, University of Maribor, Slovenia Bostjan Brumen, University of Maribor, Slovenia Hannu Jaakkola, Tampere University of Technology (Pori), Finland Ulla Nevanranta (Publication), Tampere University of Technology (Pori), Finland
Permanent Steering Committee Setsuo Ohsuga, Waseda University, Japan Hannu Jaakkola, Tampere University of Technology (Pori), Finland Hannu Kangassalo, University of Tampere, Finland Additional Reviewers Mina Akaishi, University of Hokkaido, Japan Kazuhiro Asami, Tokyo Institute of Technology, Japan Martin Bernauer, University of Linz, Austria M. Dahchour, University of Louvain, Belgium Paula Gouveia, Lisbon Institute of Technology (IST), Portugal Noriaki Izumi, Shizuoka University, Japan Kornkamol Jamroendararasame, Tokyo Institute of Technology, Japan Tatsuhiro Konishi, Shizuoka University, Japan Stephan Lechner, University of Linz, Austria Herve Luga, Universite Toulouse I & IRTT, France Paulo Mateus, Lisbon Institute of Technology (IST), Portugal Tomohiro Matsuzaki, Tokyo Institute of Technology, Japan Erkki Makinen, University of Tampere, Finland Alessandro Oltramari, National Research Council LADSEB-CNR, Italy Guenter Preuner, University of Linz, Austria Heri Ramampiaro, Norwegian University of Science and Technology, Norway Jaime Ramos, Lisbon Institute of Technology (IST), Portugal Tetsuya Suzuki, Tokyo Institute of Technology, Japan Thomas Thalhammer, University of Linz, Austria Hallvard Traetteberg, Norwegian University of Science and Technology, Norway Pascal van Eck, University of Twente, The Netherlands Yoshimichi Watanabe, Yamanashi University, Japan
Contents Preface Conference Organisation
v vii
Topica Framework for Organizing and Accessing Intellectual Assets on Meme Media, Yuzuru Tanaka and Jun Fujima \ On the Study of Data Modelling Languages using Chisholm's Ontology, Simon Milton, Ed. Kazmierczak and Chris Keen 19 Semantics and Conceptual Modelling — Explicating the Semantics of Concept Diagrams, Marko Niinimaki 37 Enlarging the Capability of Information System.— Toward Autonomous Multi-Tasking Systems, Setsuo Ohsuga and Shigeaki Takaki 51 Learning in Multi-agent Systems, Jaak Henno 63 A Logic of Ontology for Object Oriented Software Components, Naoko Izumi and Naoki Yonezaki 83 Provability of Relevant Logic ER, Noriaki Yoshiura and Naoki Yonezaki 100 Implications of Standardisation Efforts for Business Process Modelling, Eva Soderstrom 115 Two Structural Axes for a Coordination Pattern Catalogue, Patrick Etcheverry, Philippe Lopisteguy and Pantxika Dagorret 123 Conceptual Modeling of Image's Content, Petteri Kerminen and Hannu Jaakkola 131 Modelling the Development of Children's Conceptual Models and Conceptual Change, Marjatta Kangassalo 135 Intensional Logic for Integrity Constraint Specification in Predesign Database Modeling, Thomas Feyer, Marcela Varas, Marta Fernandez and Bernhard Thalheim 138 Applying Intensional Concept Theory to OLAP Design and Queries, Tapio Niemi, Jyrki Nummenmaa and Peter Thanisch 152 Conceptual Modelling and Knowledge Management for Narrative Multimedia Documents, Gian Piero Zarri 164 Time in Modeling, A. T. Berztiss 184 A Practical Approach to Intelligent Multi-Task Systems — Structuring Knowledge Base and Generation of Problem Solving System, Setsuo Ohsuga and Hiroyoshi Ohshima 201 Autonomous Control Structure for Artificial Cognitive Agents, Roland Hausser 215 Modelling Variant Embedded Software Behaviour Using Structured Documentation, Pekka Savolainen 236 A Component-based Application Framework for Context-driven Information Access, Mina Akaishi, Nicolas Spyratos and Yuzuru Tanaka 254 Modelling the Boundaries of Workspace: A Business Process Perspective, Marite Kirikova 266 Designing Methods for Quality, Elvira Locuratolo 279 Concept Descriptions for Text Search, J0rgen Fischer Nilsson 296 Towards Cost-Effective Construction of Classification Models, Bostjan Brumen, Tatjana Welzer, Hannu Jaakkola, Izidor Golob and Ivan Rozman 301
Deriving Valid Expressions from Ontology Definitions, Yannis Tzitzikas, Nicolas Spyratos and Panos Constantopoulos A Semantic Information Filtering and Clustering Method for Document Data with a Context Recognition Mechanism, Dai Sakai, Yasushi Kiyoki, Naofumi Yoshida and Takashi Kitagawa Understanding Concepts and their Relationship to Reality, Bart-Jan Hommes and Jan Dietz Two Generators of Secure Web-Based Transaction Systems, Kornkamol Jamroendararasame, Tomohiro Matsuzaki, Tetsuya Suzuki and Takehiro Tokuda Ontology Engineering by Thesaurus Re-Engineering, Udo Hahn and Stefan Schulz Medical Knowledge Extraction via Hydrid Decision Trees, Vili Podgorelec, Peter Kokol, Ryuichi Yamamoto, Gou Masuda and Norihiro Sakamoto Concepts of Symbiotic Information System and Its Application to Robotics, Vuthichai Ampornaramveth and Haruki Ueno Are there Ontological Grammars? J0rgen Fischer Nilsson Our Knowledge of Concepts, Jari Palomdki Intensionality in Concept Theory, Thomas Feyer, Klaus-Dieter Schewe and Bernhard Thalheim
422
Author Index
427
307 325 344 361 375 390 398 412 419
Information Modelling and Knowledge Bases XIII H. Kangassalo et al (Eds.) IOS Press, 2002
Topica Framework for Organizing and Accessing Intellectual Assets on Meme Media Yuzuru Tanaka, and Jun Fujima Meme Media Laboratory, Hokkaido University, Sapporo, 060-8628 Japan {tanaka, fujima} @meme.hokudai.ac.jp Abstract. With the growing need for interdisciplinary and international availability, distribution and exchange of intellectual assets including information, knowledge, ideas, pieces of work, and tools in re-editable and redistributable organic forms, we need new media technologies that externalize scientific, technological, and/or cultural knowledge fragments in an organic way, and promote their advanced use, international distribution, reuse, and re-editing. These media may be called meme media since they carry what R. Dawkins called "memes". An accumulation of memes in a society forms a meme pool that functions like a gene pool. Meme pools will bring about rapid accumulations of memes, and require new technologies for the management and retrieval of memes. This paper reviews our R&D on meme media, meme pools, and proposes a new framework called 'Topica' for organizing and accessing the huge accumulation of intellectual resources in our societies. Topica uses documents to contextually and/or spatially select and arrange mutually related intellectual assets. Each topica document stores relations among some other topica documents and/or meme media objects. Relations in a topica document are called 'topica tables', and may be defined by tables, or by queries that may access local or remote databases, or relations defined in other topica documents. A topica document has some areas through which users can store and retrieve other topica documents, meme media objects, or character strings; we call these areas on a topica document 'topoi'. Each topos is basically associated with an attribute of the topica tables stored in the topica document. Each attribute within a topica table may take as its value a character string, or a URI identifying a topica document or a meme media object stored in a local or remote file. The Topica framework provides a unified approach for organizing and accessing local and/or remote files, databases, conventional web documents, and topica documents over the Internet,
1.
Introduction
If we look back over the last three decades of computer systems, we can summarize them as follows. In the 1970s, we focused on the integrated management of enterprise or organization information, and developed database technologies. In the 1980s, we focused on providing an integrated environment for personal information processing and office information processing, based on the rapid development of personal computers and workstations that began in the late 1970s. The object-orientation paradigm played an essential role in developing graphical user interface and unified treatment of data, texts, figures, images movies, and programs. In the 1990s, we focused on the publication and browsing of documents and services, based on the rapid development of WWW and browser technologies. One of the possible scenarios for the coming decade may be the further growing need for interdisciplinary and international availability, distribution and exchange of intellectual assets including information, knowledge, ideas, pieces of work, and tools in re-editable and redistributable organic forms. We need new media technologies that externalize scientific,
2
Y. Tanaka and J. Fujima / Topica Framework
technological, and/or cultural knowledge fragments in an organic way, and promote their advanced use, international distribution, reuse, and re-editing. These media can carry a variety of intellectual assets. A media object denotes such a medium with some intellectual asset as its contents. Such media objects can replicate themselves, recombine themselves, and be naturally selected by people reusing them. We call them 'meme media' since they carry what Richard Dawkins called 'memes'. In his book, "The Selfish Gene" [1], Dawkins suggested provocatively that ideas (he called them memes) are like genes and that societies have meme pools in just the same way as they have gene pools. Whereas genes exist in a physical environment, 'memes' exist within a society. A fundamental and necessary framework for the growth and distribution of 'memes' is a 'meme pool'. A 'meme pool' is an accumulation of 'memes' in a society and functions like a gene pool. 'Meme media', together with a 'rneme pool', provide a framework for the farming of knowledge. When economic activities are introduced, a 'meme pool' becomes a 'meme market' where providers and distributors of 'memes' should be able to carry out their business without prohibiting the replication, re-editing and redistribution of 'memes' by users. Based on these predictions, we have been conducting research and development on 'meme media' and 'meme market' architectures since 1987. We developed 2D and 3D meme media architectures 'IntelligentPad' and 'IntelligentBox' respectively in 1989 and in 1995 [2~7], and have been working on their meme-pool and meme-market architectures [8, 9], as well as on their applications. IntelligentPad represents each object as a pad, i.e., a card-like visual object that you can directly manipulate on the display screen, whereas IntelligentBox represents each object as a box, i.e., a 3D visual polyhedron object with direct manipulability. Both of them allow us to directly combine different objects on the screen to compose new objects. In the IntelligentPad architecture, you can paste a pad on another pad, whereas, in the IntelligentBox, you can embed a box in the local coordinate system defined by another box. In each of these cases, the former object becomes a child of the latter. Each of pads and boxes exports its functional linkage capability as a list of slots. When you combine an object with another, you can functionally connect the child object with one of the slots defined by the parent. Based on these meme media architectures, our group and our collaborators from academia and industry have developed a large variety of applications. Some of them include PIM (Personal Information Management) systems, CAI systems, multimedia KIOSK systems, GISs (Geographical Information Systems), digital archive and interactive access of Kyoto cultural heritage, international exchange and distribution of nuclear reaction data and their analysis tools, interactive visualization of a cDNA database, and interactive visualization and simulation of electromagnetic fields caused by cellular phones of different shapes. The meme media and meme pool architectures will bring about a rapid accumulation of memes in our societies, which will require a new way of organizing and accessing them. No conventional information organization method, such as table-based, hierarchical, or indexed, is suitable for organizing and allowing access to a huge number of heterogeneous intellectual assets. The situation here is similar to the management and access of commodities in our societies. While commodities of the same type can be managed by a single database, there are so many different types that consumers cannot tell either which commodity belongs to which type, or which database manages which type. To solve this problem, we used to use documents or spaces to arrange information about mutually related commodities. Examples include catalogs, stores, department stores, malls, and towns. Here we propose a new framework for organizing and accessing intellectual assets. This framework uses documents to contextually and/or spatially select and arrange mutually related assets. Examples of such documents may include figures, images, movies, maps,
Y. Tanaka and J. Fujima / Topica Framework
3
and any combinations of them. These documents, as well as their component assets, are all represented as meme media objects. Therefore, these documents, together with related assets, may also be arranged in other documents, which forms a complex web of such documents.
2. IntelligentPad and IntelligentBox as Meme Media Systems 2.1 Outline of IntelligentPad and IntelligentBox In object-oriented component architectures, all types of knowledge fragments are defined as objects. IntelligentPad exploits both an object-oriented component architecture and a wrapper architecture. Instead of directly dealing with component objects, IntelligentPad wraps each object with a standard pad wrapper and treats it as a pad (Figure 1). Each pad has a card like view on the screen and a standard set of operations like 'move', 'resize', 'copy', 'paste', and 'peel'. Users can easily replicate any pad, paste a pad onto another, and peel a pad off a composite pad. A pad can be pasted on another pad to define both a physical containment relationship and a functional linkage between them. When a pad P2 is pasted on another pad P1, the pad P2 becomes a child of P1, and P\ becomes the parent of P2. No pad may have more than one parent pad. Each pad provides a list of slots that work as connection jacks of an AV-system component, and a single connection to a slot of another pad. You can functionally connect each child pad to one of the slots of its parent pad. Each pad uses a standard set of messages—'set' and 'gimme'—to access a single slot of its parent pad, and another standard message 'update' to propagate changes of state to its child pads. In their default definitions, a 'set' message sends its parameter value to its recipient slot, while a 'gimme' message requests a value from its recipient slot. Pads can be pasted together to define various multimedia documents and application tools. Unless otherwise specified, composite pads are always decomposable and re-editable. Figure 1 shows a set of pulleys and springs that are all represented as pads, and a composition with them. Each of these pulleys and springs is animated by a transparent pad.
(a) primitive pulley and spring pads (b) a composite pad Fig. 1 An example pad composition.
IntelligentBox is a 3D extension of IntelligentPad. Users can define a child-parent relationship between two boxes by embedding one of them into the local coordinate system defined by the other. A box may have any kind of 3D representations. Figure 2 shows a car composed of primitive boxes. When a user rotates its steering wheel, the steering shaft also rotates, and the rack-and-pinion converts the rotation to a linear motion. The cranks convert this linear motion to the steering of the front wheels. This composition requires no additional programming.
Y. Tanaka and J. Fujima / Topica Framework
(a) primitive boxes. (b) a composite box. Fig. 2 An example box composition.
2.2 XML definition of pads Each pad consists of a display object and a model object; while the former defines the appearance and the reaction to events, the latter defines the internal mechanism. These two parts are usually defined as C++ codes, which makes it difficult for non-programmers to develop a new pad. Some pads have very simple internal mechanism that requires no coding. They include multimedia documents with some parameters exported through their pad slots. For the definition of such a document, we may use XHTML, or a pair of XML and XSL to define its content and style, which requires no programming expertise. You may specify any of its phrases enclosed by a begin-tag and an end-tag to work as a slot value. An IEPad, when provided with a document content in XML and a style in XSL, generates the correspond ing XHTML text to view on itself. It also generates a slot for each specified phrase in the original XML or XSL texts. For the development of the lEPad, we wrapped Microsoft Internet Explorer with a pad wrapper, and provided it with the slot-definition capability. Figure 3 shows a parameterized XHTML that displays any text string in the specified orientation. Two parameters, the angle and the caption, are parenthesized with tags to specify that they work as slots. Figure 4 shows its viewing by an lEPad, which has two child pads; one is used to specify the caption, while the other is to specify the angle. In addition to these functions, an lEPad allows us to embed any composite pad in an XHTML text using a special tag, and generates this pad on itself when viewing this XHTML text. ... Example of XHTML documents with slots Meme Media Lab. 90 Fig. 3 A parameterized XHTML definition.
Y. Tanaka and J. Fujima / Topica Framework
Fig. 4 The viewing of the parameterized XHTML by an lEPad connected with two child pads.
3. Intellectual Assets on Meme Media IntelligentPad and IntelligentBox have versatile application fields. They have a capability of covering all kinds of client applications using 2D and 3D graphical representations. Each application may require the development of new primitive pads or boxes. Figure 5 shows PIM (Personal Information Management) tools developed as composite pads, while Figure 6 shows a GIS (Geographical Information System) using IntelligentPad and a legacy GIS database engine. Nigel Waters, a professor in the Department of Geography at the University of Calgary, proposed a GIS application of IntelligentPad [10] in which a map display, a traffic simulation model, a video image of an intersection, and a display in graph form are all represented as mutually interacting pads. He mentioned that such a GIS is not only a great pedagogical device, but is also invaluable for planning.
Fig. 5 PIM (Personal Information Management) tools developed as composite pads.
Fig.6 a GIS using IntelligentPad and a legacy GIS database engine.
Seigo Matsuoka of Editorial Engineering Laboratory (EEL) Inc. applied IntelligentPad to the production of 'The Miyako', a digital archive system for Kyoto cultural heritage (Figure 7). It stores all the multimedia contents in a relational DBMS, and uses IntelligentPad for its front-end interface to provide full interactivity so that users can navigate through the huge contents library using various types of association search based on Japanese culture. As to IntelligentBox, we have already developed two important generic application frameworks; one for interactive database visualization and the other for interactive scientific visualization. As an application of our database visualization framework, we have been collaborating with Takashi Gojobori's group at National Institute of Genetics to
6
Y. Tanaka and J. Fujima / Topica Framework
develop an interactive animation interface for accessing cDNA database on the cleavage process of a sea squirt egg from a single cell to 64 cells (Figure 8). The cDNA database stores, for each cell and for each gene, the expression intensity of this gene in this cell. Our system animates the cell division process from a single cell to 64 cells. When you click an arbitrary cell, the system shows the expression intensity of each of the genes specified in advance, as shown in the left lower area of this figure. You may also arbitrarily pick up three different genes to observe their expression intensities in each cell. The expression intensities of these genes are associated with the intensities of RGB color components to highlight each cell in the cleavage animation. Keeping this highlighting function active, you can advance or step back the cell-division animation. The cDNA database is stored in an Oracle DBMS, which IntelligentBox accesses using Java JDBC.
Fig.7 A digital archive system 'The Miyako' using IntelligentPad.
Fig. 8 An IntelligentBox application to a 'biosimulator' accessing cDNA database
For interactive scientific visualization, IntelligentBox provides a generic linkage mechanism with the AVS system. This allows us to define a box as a program module of AVS, so that combination of such boxes defines a composition of an AVS program, and the manipulation of such a box changes parameter values of its corresponding AVS program module. These allow us to define a virtual laboratory in which we can construct a scientific simulation world through direct manipulation of previously constructed components, directly manipulate objects in this world to change situations, and interactively observe the simulation result in this world. Figure 9 shows a virtual laboratory for experimenting with the antenna of a cellular phone. Users can directly change the
Y. Tanaka and J. Fujima / Topic a Framework
1
location and the length of the antenna to observe the changes of the radiation pattern, the electric field, and the surface current on the phone body. The system uses NEC2 as a numerical computation solver, which is invoked through AVS.
Fig. 9 Virtual Lab System using IntelligentBox technologies.
All these application pads and boxes are decomposable. Users can reuse some of their components, or customize them by replacing some components with others or adding new ones.
4. Meme Pool and Meme Market Architectures In order to make pads and boxes work as memes in our societies, we need a worldwide publication repository that works as a meme pool. The Piazza architecture allows us to define such repositories of pads over the Internet (Figure 10). Each repository is called a piazza. You can drag-and-drop pads between any piazza and your own local environment. You can easily define and open your piazza over the Internet, and register its entry-gate pad to any other public piazza. People accessing the latter piazza can access your new piazza by double-clicking this entry-gate pad. Users of web browsers have to ask web page owners by sending, say, e-mails for including their information in some others' web pages, or for spanning links from some others' web pages to their own pages. This is similar to the situation between tenants and owners. The Piazza architecture, on the other hand, provides a large public marketplace for people to freely open their own stores or galleries of pads. The Piazza architecture consists of a Piazza server and a Piazza browser. A Piazza browser is represented as a pad, and supports browsing among different 'piazzas'. Each piazza is associated with a file managed by a remote Piazza server. Pads can be drag-and-dropped to and from the currently accessed piazza, to upload and download pads to and from the associated remote server file. When a piazza is opened within a Piazza browser, all the pads registered at the associated server file are immediately downloaded onto this piazza and become available. An entrance link to a piazza is also represented as a pad, and can be put on another piazza to define a link. Users are welcome to install their own Piazza servers anywhere, anytime, and to publish their piazzas. Piazza enables end users to open their own gallery of pads on the Internet, or to exhibit their pads in some other private or public space. Such pad galleries work as flea markets, shops, shopping centers, community message boards, community halls, or plazas.
Y. Tanaka and J. Fujima / Topica Framework
Fig. 10 A piazza with registered pads (top), and a piazza editor to define a new piazza (bottom).
Transportation of pads and boxes undefined at their destination platform requires their cross-platform migration; their execution on the destination platform requires that all the libraries necessary for their execution should be available there in advance. These libraries include pad definition libraries, API libraries, and class libraries. These are defined as DLLs (Dynamic Link Libraries), and dynamically called when required. Migration of a new pad to a different platform requires migration of all the required DLLs that the destination lacks. Pads that someone has uploaded to a PiazzaPad can be downloaded from the same PiazzaPad and executed if and only if the destination platform has all the required DLLs. Each PiazzaPad allows privileged users to upload a new pad together with its required DLLs. When another user opens this PiazzaPad, it checks if the destination platform has all the required DLLs. If yes, this user can drag this pad out of the PiazzaPad. If not, the PiazzaPad asks the user if he or she wants to download the missing DLLs. Only after the required downloading, he or she can drag this pad out of this PiazzaPad. The automatic DLL migration by Piazza systems simplifies the distribution of pads among users. Ikuyoshi Kato's group at Graduate School of Physics, Hokaido University, applied IntelligentPad and Piazza to the international availability, distribution and exchange of nuclear reaction experimental data and their analysis tools (Figure 11). Meme Country Project that started in 1999 will become the first large-scale field experiment of a meme pool It started the service in the end of 1999. This is a joint project with a private research organization Editorial Engineering Laboratory directed by Seigo Matsuoka and various contents provider companies including SONY Music Entertainment. Hokkaido University and Hitachi Software are participating respectively as a technical advisor, and as a system developer. Meme Country Project aims to establish a virtual country with various social infrastructures for the publication, the finding, and the utilization of knowledge, talents and people, and for their matching with other knowledge, talents and people. The project used IntelligentPad technologies both for the construction of its infrastructures and for the representation of knowledge, talents and people in this virtual country. During its one-year experiment, more than 1,500 people accessed the system to attend some courses such as rhetoric, industrial design, comics, animation, and Japanese poetry. Participants to each course were initially all nonprofessionals. More than a hundred of them were very active, and about 20 of them improved themselves to the professional level. Among them, five people received job offers. EEL's report says that meme media and meme pool technologies worked more effective in finding out new talents over the Internet than they expected.
Y. Tanaka and J. Fujima / Topica Framework
Fig.11 International distribution and reuse of nuclear reaction data and analysis tools.
5. Topica for Organizing and Accessing Intellectual Assets 5.1 Organization and access of intellectual assets All the example composite pads and boxes in preceding chapters are subject to international distribution, exchange, and reuse. These data and tools, as well as their documents in pad or box forms, serve as intellectual assets in our societies. The meme media and meme pool architectures will rapidly increase their variety, and form their huge accumulation. Now we consider how to manage and access a huge accumulation of intellectual assets represented as pads or boxes. Here we consider only pads, but our conclusions are also applicable to boxes. Let us first consider if databases can manage pads. If the platform has the pad definition code and all the necessary DLLs, the storage of a composite pad only needs to store its exchange format representation; no other information needs to be stored. The exchange format representation of a composite pad includes two kinds of information. One is the form information that describes what kinds and sizes of component pads are used, how they are geometrically pasted, and which slot is used in each connection between component pads. The other is the state information of this pad. The state information needs to be sufficient to specify the current values of all of its internal variables whose values are not specified by its form information. Composite pads with the same form information but with different states are said to share the same form. Without loss of generality, we can assume that the state information has a record type, i.e., it can be represented as a list of attribute- value pairs for the ordered attribute set that is determined by each form. If we only have to manage a large number of pads of a few different forms, we can keep the form information outside the databases; we only need to store the state information of pads in the databases. Such a database is called a form base. If the state information of a record type has only atomic and simple values for its attributes, we can use a relational database system to store these pads. If some attributes allow variable length data, stream data such as movies and sounds, or complex data such as compound documents and other relations, we can use an extended relational database system or a structural OODB system. In this case, we can even deal with a composite pad storing other composite pads in some of its state attributes. Pads representing various intellectual assets accumulated in our societies, however, have a large number of different forms. While we may store a group of pads of the same form in a single database relation, we have to manage a huge number of different relations together
10
Y. Tanaka and J. Fujima / Topica Framework
with the same number of different forms. The situation here is similar to the management and access of commodities in our societies. Different from standardized prefabricated parts that are usually managed by databases, commodities in our societies have a huge variety and no common attributes to describe them, which makes it difficult to manage them with databases. While commodities of the same type can be managed by a single database, there are so many different types that consumers cannot tell either which commodity belongs to which type, or which database manages which type. Types are usually defined by producers, and not always directly related to functions, uses, or appearances that consumers can identify. To solve this problem, we typically use documents or spaces to arrange information about mutually related commodities for ease of access. Examples of such documents are catalogs published by producers or independent publishers, advertising brochures from producers or stores, books, periodicals, and newspaper articles referring to commodities. Catalogs adopt various different criteria in the selection and arrangement of commodities. Books, periodicals, and newspapers may refer to each other. Examples of commodity-organizing spaces include shops, department stores, malls, and towns. The first three use planned selections and arrangements, while the selections and arrangements in towns evolve emergently. Shops are nested in malls and department stores, which are again nested in towns. Let us consider one more example. 'The Trinity' is one of the most popular themes of Christian paintings. Each painting with this theme includes the images of the Father, the Son, and the Holy Spirit. Suppose you have a collection of these three images extracted from a large number of paintings on the Trinity. Our question here is where to store this collection so that we or even other people can access this collection in a future. You may think that we can define a relation in a database to store this collection. This relation has three attributes, the Father, the Son, and the Holy Spirit. Each tuple is a triple of file pointers, pointing to three images extracted from the same painting. This solution, however, does not tell where to memorize the fact that this newly created relation represents the collection of three images from a large number of paintings on the Trinity. We have to deal with a huge number of different concepts as well as relations among them. 'The trinity' is only one of them. A potential solution in this example may be to store this collection in association with the article on the Trinity in some encyclopedia. 5.2 Topica Framework Based of the above observations, here we propose a new framework for the organization of and access to intellectual assets represented as pads. This framework uses documents to contextually and/or spatially select and arrange mutually related intellectual assets. Such documents may be texts, images, figures, movies, maps, or compound documents consisting of various multimedia components. These documents as well as these intellectual assets are all represented as pads. Therefore, these documents may be also arranged together with related assets on other documents. We call this framework 'Topica', named after Aristotle's Topica. In the Topica framework, documents to arrange assets are called topica documents. Each topica document is a pad that displays a document and stores relations among some other topica documents and/or some pads. Such a document is represented by an XHTML text, with some slot definitions. Relations in a topica document are called 'topica tables', and may be defined by tables, or by queries that may access local or remote databases, XHTML texts defining other topica documents, or relations defined in other topica documents. A topica document has some areas through which users can store and retrieve other topica documents, pads, or character strings; we call these areas on a topica document 'topoi'. Each topos is basically
Y. Tanaka and J. Fujima / Topica Framework
11
associated with an attribute of the topica tables stored in the topica document. Each attribute within a topica table may take as its value a character string, an exchange format representation of a pad, or a URI identifying a topica document or a pad stored in a local or remote file. A topos of a topica document is either a geometrically specified area of this document or a tagged text string in the XHTML document that is viewed by this topica document. Figure 12 shows an XHTML document on 'the Trinity' in Christianity, where a special kind of tag is used to specify that the three phrases 'the Father', 'the Son' and 'the Holy Spirit' in this article, together with the title Trinity, The', work as four topoi of this topica document, that stores a relation among the images of the three depicted within each of a number of paintings of the Trinity. Instead of directly storing images, the relation stores URIs of these image files. We can use a topica viewer pad to view the corresponding topica document as a pad. The topica viewer pad is basically the Microsoft Internet Explorer (IE) wrapped by the pad wrapper. It has extended IE to perform topoi functions. Topica documents may also provide some slots, which can be easily defined by using special tags in their XHTML definitions. Figure 13 shows the topica document of the XHTML definition in Figure 12, a selector popped up by double-clicking the 'Father' topos, and the selection of one candidate within this selector, popping up the corresponding image. This selection automatically influences the information available through other topoi; the clicking of 'son' topos now pops up a selector showing only one candidate. All these images also work as topica documents. Each image of a whole Trinity painting includes three topoi respectively covering the Father, the Son, and the Holy Spirit, and a topica table that refers to the topica table in the original Trinity article in such a way as shown in Figure 14. Trinity file://C:/pub/trinity/fatherl.jpg file://C:/pub/trinity/son 1 .jpg file.7/C:/pub/trinity/spirit 1 .jpg >Trinity, The
The central ... in Three Persons, the Father, the Son, and the Holy Spirit. ..
Fig. 12 An XHTML definition of a topica document on the Trinity.
Topoi are different from Xlinks [11] in the following two respects: Topoi on the same topica document are related with each other by the topica table stored in this topica document. Secondary, you may drag-and-drop new topica documents into some topoi to update the topica table.
12
Y. Tanaka and J. Fujima / Topica Framework
Fig.13 A topica document on the Trinity defined by the XHTML text in Fig. 13. Trinity1 $t IN "http://ca.nieine.hokudai.ac.jp/trinity.xml" CONSTRUCT $t ]]>
Fig. 14 A topica document showing a painting on the Trinity shares the same topica table a the topica document in Fig. 12.
5.3 The application horizon of the topica framework Figure 15 shows the management of invitation letters using a topica docur Invitation letters from the same person for the same category of purposes maysharethe
Y. Tanaka and J. Fujima / Topica Framework
13
same letter template, which he or she can reuse repeatedly to generate such letters by filling in the blanks. This topica document, on one hand, works as such template; underlined italicized strings may be rewritten for different letters. The same topica document, on the other hand, works to store and manage all the letters created using this template; the underlined italicized strings work as topoi. When clicked, each topos pops up a selector showing all the candidate strings filling in this placeholder; a selection of one of them replaces the current string, and rewrites the letter replacing all other topoi with appropriate strings. The figure shows the case in which we have selected 'Prof. Tanaka' for the topos just after 'Dear'. This topica document has another topos to store resumes sent from invitees; an instantiation to some specific invitation letter also instantiates this topos to pop up the resume of the selected invitee. Here the resume of 'Prof. Tanaka' is popped up. This resume also works as a topica document with some topoi. When you send invitation letters, you may also send the template for a resume, and ask the invitee to fill in and send back this form. All the sent-back resum6s can be stored in the same single topica document.
Fig. 15 A topica document for the management of invitation letters together with invitees' resume.
Topoi of a topica document may be defined in terms of the transposed image of the stored relation. Figure 16 shows a topica document that stores a relation between filenames and files. The transposed image of this relation has filenames as its attributes; it has a single tuple specifying for each filename the URI of the corresponding file. The topica document in this figure can switch views between the default and its transposition, and vice versa. Its XHTML definition is generated by processing its XML content using one or other of its two stored XSL styles. These two styles access the stored relation in different ways to define different sets of topoi. Figure 17 (a) shows the XML definition of the contents, while Figure 17 (b) shows the XSL definition of the transposed file directory view in the right hand side of Figure 16.
Fig. 16 This topica document storing a relation between filenames and files can switch its view from one to its transposition, and vice versa.
14
Y. Tanaka and J. Fujima / Topica Framework
trinity 1 .jpg file://c:/pub/topica/folder/trinity 1 jpg 40KB text/plain
(a)
XML definition of the file directory contents
File (b)
XSL definition of the right directory view in Figure 16
Fig. 17 The XML contents definition and the XSL definition of the transposed view in Figure 16.
Suppose you have presented talks at many conferences in the past. For each conference, you have files of the call- for- paper mail, the submitted paper manuscript, the letter of acceptance with reviewers' comments, the camera-ready manuscript, the conference program, and the Power Point presentation. With our conventional file directory system, you have two alternative ways to store these files. You may either define an independent folder for each conference to store all the related files, or define six different folders for six different categories of files. In the first case, you cannot scan through all the files of the same category. In the second case, you cannot jump from one file to another of a different file category, but of the same conference. The topica document in Figure 18 solves this problem. Each folder in this directory corresponds to one of the six file categories, and works as a topos. Its double-click pops up a selector that looks like another file directory
Y. Tanaka and J. Fujima / Topica Framework
15
listing all the files of this category. A selection of one file in this selector determines the corresponding conference, and restricts every other topos to show only one file of the same conference.
Conference Files
Fig. 18 A topica document that works as a file directory with 6 categories of files.
Since the definition of a topica document exploits XHTML, and is viewed by an extension of the IEPad (Internet Explorer pad), it is not difficult to extend an arbitrary web page document to work as a topica document without losing any of its functionality. While, for example, home pages of Holiday Inn hotels at different locations use different documentation styles, they provide information of the several common entities including the hotel name, the location, the pictures, the address, the telephone number, and the fax number. Each of these home pages can be easily translated to a topica document with these entities working as topoi. These topica documents share the same topica table, which could be a view relation defined over, for example, the Holyday Inn headquarter database. You may also define the Holyday Inn logo in each of these topica document to work as another topos, which lists up the home-page topica document URIs of the selected hotels on its corresponding selector. Each topica document may play three different roles. First, it works as it is. Second, it may work as a template in such a way as shown in the invitation- letter example. Third, it works as a schema of the stored topica table; you may use a topica document to specify a query to the topica table. The last role will be detailed in the following. To distinguish these three different roles of the same topica document, we have introduced three modes for each topica document; they are the document mode, the template mode, and the schema mode. You can change the mode of a topica document by popping up the right button menu of this document. Unless otherwise specified, each topica document is in its document mode. If we restrict every topica document to store a single relation with a single tuple, then each topos works as an anchor to another topica document. In this way one could replicate the standard linking behavior of a WWW page. However, topoi can provide the additional behavior of allowing update of their underlying values by users other than the topica document owner. Update of topica document relations through direct manipulation operations will be reported elsewhere. 5.4 Queries over the web of topica documents The Topica framework provides a unified approach for organizing and accessing local and/or remote files, databases, conventional web documents, and topica documents over the
16
Y. Tanaka and J. Fujima / Topica Framework
Internet. In addition, the framework allows us to describe queries in XML-QL that, by navigating through these different types of information, quantifying properties of some documents on the navigation path, and picking up selected assets on the way, can construct the XHTML documents and relations of new topica documents. Figure 19 shows an example XML-QL query. CONSTRUCT This is a collection of , t,oo}, where £ is a set of entity sorts (given solely by a property, like "being a person"), D is a set of descriptive sorts (any recursive class, like "printable" in IFO [1]) and T is the set of time points, co the set of possible worlds as above.
42
M . Niinimaki I Semantics and Conceptual Modelling It is not a logical necessity that a person, say N.N., has a certain address. Therefore, the (actual) world and the time point must be considered when expressing an attribute "Address of a person", leading to the form of construction
where T1 and T2 are sorts, in this case PERSON and ADDRESS, respectively. The logical distinction between intensional and extensional T-object is based on the form of the construction by which it is achieved: • A T-objects that is NOT of the form w —> T is an extension (a.k.a intension of the 0th degree). Examples of this are, naturally, time points and individuals, but also analytical functions, that is, functions that are not dependent on possible worlds or points of time. • Let T-object T be of intension of the nth degree, (w —» T") -object, where T" is either T or (t —> T) , t e i is an intension of (n + 1 )th degree. Intensions of the 1st degree or higher are called, briefly, intensions. Specifically, ((wt) —> (T1 —» T 2 ))is called an empirical function. Because of its form, the function "Address of a person" is an empirical function and thus an intension. HIT-attributes are, indeed, empirical functions and of the form: • (wt) -> (T1 -» T2) (singular attributes) or • (wt) —> (T1 -» (T2 -» o)) (multivalued attributes) As stated earlier, a HIT-database schema consists of HIT-attributes and consistency constraints. Consistency constraints rule out impossible or undesired states of affairs in the world (see [9]). These impossible states can be logically impossible (like person's age decreasing over the years) and ruled out by analytical consistency constraints. Undesired states of affairs violate the (empirical) business rules of the organisation. Therefore there needs to be a way to enforce rules like "for each material there is always a supplier". Consistency constraints are expressed using constructions, but we suppress the details here. The conceptual schema in HIT-semantics is, however, not equal to a CONCEPT D diagram. In CONCEPT D diagrams, some of the relationships between concepts cannot be seen as empirical functions - especially some forms of intensional containment and functions. Thus, we consider concepts and intensional containment on the basis of constructions, following [9] and [8]: • A concept is a closed construction, i.e. a construction without free variables. • A simple concept is a construction °X, where A' is a variable (of any type) or an object that is not a construction. • The content of a concept C is the set of subconstructions of C that are themselves concepts. Some forms of intensional containment in CONCEPT D correspond very apparently to the last item: in HIT-semantics, concept C intensionaHy contains concept C', iff C is a member of the content of C. On the basis of this, the role of CONCEPT D diagrams will be considered in section 4. 1 .
M. Niinimaki / Semantics and Conceptual Modelling
3. 3
Theories of predication
Theories of predication are based on the idea of equating prepositional functions, functions whose value range is a proposition, with concepts. Propositional functions can be logically analysed using second order predicate logic (2OPL). As Palomaki has demonstrated, different ontological views concerning concepts (nominalism, conceptualism, realism) can be studied this way as well. The views mainly differ with respect to the CONCEPTUALISATION PRINCIPLE that is interpreted in different ways in different views, but whose basic formulation is:
where Fn is a n-place predicate variable, n > 0, Fn does not occur free in and 0 is a well-formed formula (wff) on 2OPL with distinct individual variables x1 , ..,xn. The main benefit of this approach is that it is a thorough formalisation of different views and the formalisation contains semantics as well. As an example, we shall briefly examine logical realistic semantics, that can be seen as a fruitful background to conceptual modelling. This study has been adopted from Palomaki 's presentation [28] with only minor editing. First, we need to introduce the syntax of 2OPL as follows: • The alphabet of language L20PL consists of logical constants -i,— >,V, =, a countably infinite number of individual variables Xi, a countably infinite number of n-place predicate variables Fn, a countably infinite number of m-place predicate constants Pm, the parentheses and the comma. • The grammar can be expressed as follows: Wffs are valid expressions. Atomic wffs are of form (x=y), where x andy are individual variables. Other wffs are of form -4, ((j> —»• 0), (Vx)(|>, (VF"), where 0 and 6 are wffs, x is an individual variable and Fn is a To define the semantics, two auxiliary tools will be needed, an interpretation function and a completely referential assignment. Let L2OPL be a language and D a non-empty set. Let / be an interpretation function such that its domain is L20PL and its range (depending on the argument) a n-place tuple of elements of D. Now U ={D,f) is a model for L20PLA completely referential assignment in D is a function A, whose domain is the union of sets and predicate variables, such that A(x) e D for each individual variable* and A ( F n ) C Dn for each n-place predicate variable Fn (n < 0, n € N). When d € D, let A(d/x] = (A {(x,A)(x))})U {{x,d)} and whence Dn let A(X/Fn) = (A - { ( F n , A ( F n ) ) } ) ( j { ( F " , X ) } . The semantics can be defined as follows, when L20pL is a language, U =• (D,f) is its model, and A is a completely referential assignment in D: • A satisfies (x =y) in U iff A(x) = A(y), • A satisfies Pn (x 1 ,,..,x n )in U iff (A(xi),..,A(xn)) • A satisfies
e f(Pn),
in U iff A does not satisfy § in U,
• A satisfies ( —> 0) in U iff A does not satisfy in U or A satisfies 6 in U, • A satisfies (Vx) ( A — > ((B 1 ,B 2 ,..B n ) —+ o)). 4.3.3: Iteration structures: Respectively, an iteration structure is an attribute in HIT semantics as well. This attribute constructs multiple instances and, consequently, it is multi-valued. Given the example above, we express it: (wt) —> (A —> (B —> o)). 5
Discussion and conclusions
Previously, in [25], we tried to explicate the semantics of intensional containment by normal modal logic. According to this theory, concepts are predicates, there is a notion of logical necessity and intensional containment is based on that. This account can be challenged in many ways; here we only state that it is profitable to seek alternatives that contribute to conceptual modelling from the point of view of semantics. We have identified the requirements of a suitable semantic theory to be as follows: • A clear distinction of the language, occurrences and conceptual levels, • A capability to express what is intensional and what is extensional, • A capability to explain subconcept/superconcept relationships (IS-A, intensional containment). Sufficient means to map the different uses of intensional containment into different relations in the domain of application. A possibility to include concept theoretical aspects into the theory. We have considered the following alternatives: situation semantics in section 3.1, HITsemantics in section 3.2 and theories of predication in section 3.3. Among the alternatives, theories of predication focus on the philosophical background and situation semantics deals with the semantics of utterances in daily life situations. However, HIT-semantics has features by which it can, among the candidates, be best applied to conceptual modelling. This is because the theory clearly postulates concepts (as constructions), defines a criteria for intensionality (intension as an empirical function, intensional containment as a containment between constructions). HIT-semantics is, furthermore, flexible enough to incorporate other concept theoretical aspects. For instance, the structural limitations imposed by Kauppi's theory and discussed in [14] can be expressed as consistency constraints. In section 4 we discuss the other main theme of this paper, i.e. the semantics of CONCEPT D diagrams. A limited version of CONCEPT D is introduced by means of a definition of its syntax in section 4.2 and informal semantics in section 4.3. The semantics of the language is then discussed using the tools of HIT-semantics. As the discussion indicates, HIT-semantics is applicable for this purpose and could be applied to explain the semantic background of other formalisms as well.
M. Niinimaki / Semantics and Conceptual Modelling
References [1] S. Abiteboul and R. Hull. IFO: A formal semantic database model. ACM Transactions on Database Systems, 12(4), 1987. [2] J. Barwise and J. Perry. Situations and Attitutes. MIT Press, 1985. [3] A. Borgida. Description logics in data management. IEEE transactions on knowledge and data engineering, 7(l):671–682, 1995. [4] A. Borgida et al. Classic: A structural data model for objects. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data, Portland, Oregon, May 31 - June 2, 1989. SIGMOD, 1989. [5] P. Chen. The entity-relationship model - towards a unified view of data. ACM Transactions on Database Systems, 1(1), 1976. [6] P.M. Donini et al. The complexity of existential quantification in concept languages. Artificial Intelligence, 53, 1992. [7] M. Duzi. Logic and Data Semantics. PhD thesis, Dept. of Logic, Institute of Philosophy, Czechoslovak Academy of Sciences, 1992. [8] M. Duzi. A contribution to the discussion on concept theory. Discussion paper in the 10th European-Japanese Conference on Information Modelling and Knowledge Bases, May 2000. [9] M. Duzi. Logical foundations of conceptual modelling. A manuscript, 2000. [10] M. Duzi. Two approaches to conceptual data modelling. In Studies in Logic and Philosophy, Topics in Conceptual Analysis and Modeling. Academy of Sciences of the Czech Republic, 2000. [11] R. Elmasri and S. Navathe. Fundamentals of Database Systems. Benjamin/Cummings, 2 edition, 1994. [12] S. Even. Graph Algorithms. Computer Science Press, 1979. [13] M. Hammer and D. McLeod. Database description with SDM: A semantic database model. ACM Transactions on Database Systems, 6(3):351–386, 1981. [14] M. Junkkari and M. Niinimaki. An algebraic approach to Kauppi's concept theory. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases X. IOS Press, 1999. [15] H. Kangassalo. Concept D - a graphical formalism for representing concept structures. In H. Kangassalo, editor, Second Scandinavian Research Seminar on Information Modelling and Database Management, volume 19 of B. University of Tampere, 1983. [16] H. Kangassalo. On the concept of concept for conceptual modelling and concept detection. In S. Oshuga et al., editors, Information Modelling and Knowledge Bases HI. IOS Press, 1992. [17] H. Kangassalo. Comic: A system and methodology for conceptual modelling and information construction. Data and Knowledge Engineering 12(4), 1993. [18] H. Kangassalo. Conceptual description for information modelling based on intensional containment relation. In F. Baader, M. Buchheit, M. A. Jeusfeld, and W. Nutts, editors, Knowledge Representation meet Databases, Proceedings of the 3rd Workshop KRDB '96 Budapest, Hungary Aug 13, 1996. [19] R. Kauppi. Einjuhrung in die Theorie der Begriffssysteme. Acta Universitatis Tamperensis, Sen A, Vol. 15, 1967. [20] P. Lambrix. Part-Whole Reasoning in Description Logics. PhD thesis, Linkoping Studies in Science and Technology No. 448, 1996. [21] D. Mac Randal. Semantic networks. In G.A. Ringland and Duce D.A., editors, Approaches to Knowledge Representation. Research Studies Press, 1988. [22] P. Materna. Two notions of concepts. In O. Majer, editor, Topics in Conceptual Analysis and Modelling. The Institute of Philosophy, Academy of Sciences of the Czech Republic, 2000. [23] T. Niemi. New approaches to intensional concept theory. In E. Kawaguchi, H. Kangassalo, H. Jaakkola, and LA. Hamid, editors, Information Modelling and Knowledge Bases XI. IOS press, 2000. [24] T. Niemi. A query method based on intensional concept definition. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases XII. IOS Press, 2001.
50
M. Niinimaki / Semantics and Conceptual Modelling [25] M. Niinimaki. Kasitteellinen mallintaminen, ekstensionaaliset ja intensionaaliset tesitekielet. Licenciate Thesis, University of Tampere, 1998. [26] M. Niinimaki. Concepts and semantic data models - a comparison of comic and ifo. In Studies in Logic and Philosophy, Topics in Conceptual Analysis and Modeling. Academy of Sciences of the Czech Republic, 2000. [27] M. Niinimaki. Intensional and extensional languages in conceptual modelling. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors. Information Modelling and Knowledge Bases XII. IOS Press, 2001. [28] J. Palomaki. From Concepts to Concept Theory. Ph.D. dissertation, Acta Universitatis Tamperensis, Sen A, Vol. 416, University of Tampere, 1994. [29] P. Tichy. The Foundations of Frege's Logic. DeGruyter, 1988. [30] W. A. Woods. Understanding subsumption and taxonomy: A framework for progress. In J. F. Sowa, editor, Principles of Semantic Networks, pages 45-94. Morgan Kaufman Publishers, 1991.
Information Modelling and Knowledge Bases XIII H. Kangassalo et al, (Eds.) IOS Press, 2002
51
Enlarging the Capability of Information System - Toward Autonomous Multi-Tasking Systems Setsuo Ohsuga and Shigeaki Takaki
Department of Information and Computer Science, Waseda University 3-4-1 Ohkubo Shinjyuku-ku Tokyo, 169-8555, Japan Abstract. This paper discusses a way to make computers more intelligent and expand the scope of information processing. It does not mean that computer can be more intelligent than human being but that the computer help human to make the best decision by undertaking the management tasks of environment of decision-making. This paper proposed a way to develop new information technology that can back up and aid human activity. The major topics discussed in this paper are a new modeling method with which different types of problems can be dealt with and its application. One of the important characteristics of the system discussed in this paper was that it could be a tool of experimentation for supporting human to develop new methods of problem solving. Some special problems including design type and evolution type are discussed as the examples. After a brief overview of an intelligent system that the author's group are conducting research, a way of applying it to these types of problems is discussed.
1.
Introduction
Computers are sharing large part of human activity today. In some job fields such as the applications around Internet the computers can cope with many tasks very strongly and they are accelerating the emergence of many new human activities. But in some other fields, e.g. in system design, the computer technology stays rather in the low level comparing to what are needed. Roughly, it is said that the difference comes from whether an activity can be represented in an algorithmic way (the first class activity hereafter) or need trial-and-error operations (the second class activity). As the social activity grows large, the needs for such activities as design and development tasks increase. Therefore the increase of this inequality of the computer's aids among human activities is causing many troubles in human society because people have to do these tasks without enough supporting method. It is worried that the required tasks might go beyond the human capability. For example, it was pointed out in [2] that a new type of problem was arising in the software development because of the increased size of software systems. In the other fields also the unexpected troubles are increasing such as the accidents in a nuclear plant, the failure of a rocket vehicle, etc. There were many discussions on the causes of these accidents. Apart from those considered the primary causes, the lack of the person's capability to follow the increase of the scale and complexity of task lies as the basis of troubles in the conventional human-centered developing style. In other word human being create these troubles and nevertheless they can no more solve these troubles by themselves. The author thinks that if there is someone that can resolve these troubles in place of human being, it must be a computer system. But it is not the currently available computer system but a novel one.
52
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
The notable characteristic of these large-scale problems arising recently is that a single problem involves various tasks of the different characteristics. What is required in the future therefore is such an information system that can accept and process many tasks autonomously. It is necessary to make computers more intelligent for the purpose and change the work style by introducing them into system development. It needs AI technology but with the wider applicability than whatever has been developed so far [14]. The objective of this paper therefore is to discuss a way to extend the capability of information systems in the areas in which current technology is considered unsatisfactory. A novel information technology is discussed that can aid the multi-task problem solving and cover wide range of human activities. One of the important characteristics of the system discussed in this paper is that it could be a tool of experimentation for supporting human to develop new methods of problem solving. 2. An Approach toward Enlarging the Capability of Information System Many decisions have to be made in a system development. Human being can make the best decision if they are in an environment suited for their decision-making. But very often the real environment becomes unsuited for them but they are forced to make decision in an unsuited environment. For example, problem scale becomes too large to be included in a view of an individual, the time requirement is too short for human to do things with rational, the scope of knowledge required for problem solving is too wide to be understood by an individual, etc. The possibility of solving this problem is to make computers more intelligent to manage the environment of human decision-making. 2.1.
Computer-led System
One of the practical problems that arose in a large-scale system development was that no one could check these decisions afterward because those are distributed to the different persons and became invisible. The minimum requirement for such computers is therefore to make them visible by recording the history of decisions made by human being. It leads us to change the style of human-computer interaction from the conventional human-led interactive systems to computer-led interactive systems where X-led interactive system means that X has initiative in interaction and become able to manage problem-solving process. Then it can make records of individual decisions made by human being in this process. Generally speaking, a problem solving is composed of a number of stages such as problem generation, problem representation, problem understanding, solution planning, searching and deciding solution method, execution of the method, and displaying the solution. Currently computers join only partly in this process and human has to do most parts, i.e. from the beginning until deciding solution method and, in addition, making programs thereafter in order for using computers. Many decisions are made in this process. In order to record these decisions with the background information based on which these decisions are made, the process has to be managed by computers. Automating problem solving to a large extent is necessary for the purpose. It does not mean to let computers do everything but to let them manage the problem solving process. Hence autonomy is primarily important for achieving the goal of computer-led interactive systems. If autonomy could be achieved to a large extent, the system can do not only recording the decision but also providing more actively the proper environment for human decision making. For example, the system can aid human to decompose a large-scale problem into smaller-scale problems and also prepare and use multi-disciplinary knowledge in various domains. 2.2.
Autonomous system
Autonomy is defined in [1] in relation with agency as an operation without direct intervention of human. It also says that autonomous system should have some kind of control over its actions and internal state. But this is a too broad definition for those who are going to realize autonomy actually. Every problem solving method is different by the problems to be solved, and it is not easy to find the strategy for controlling the operation to be determined for each problem.
S. Ohsuga andS. Takaki / Enlarging the Capability of Information Systems
Problem
Model Structure Building
^
Model-Based Computation
53
Solution
a) Analysis type Problem
Incipient Model Building
^
Solution (b) Design type
Problem (Model Structure^
fr othesis sation
^
Model-Based Computation
w ^
Hypothesis Testing
I
Solution (c) Diagnosis type Figure 1. Problem solving structure
Therefore, it is necessary to make it more concrete. Instead of this broad definition, the autonomy is defined in this paper as the capability of a computer to represent and process the problem solving structure required by the problem. The problem-solving structure means the structure of the operations for arriving at the solution of the given problem. It is different by each specific problem but the same type problems have the same skeletal problem solving structure. For example design as a non-deterministic problem solving is represented as the repetition of model analysis and model modification (Figure 1). Its structure is not algorithm-based as in the ordinary computerized methods. The detail of the operations for design is different by the domain of the problem and the extent to which the problem is matured. But if the method of using the domain knowledge base for problem solving is made the same for many problems, the difference in the detail can be absorbed by the content of the domain specific knowledge base. Then a unified problem solving structure is made to this type of problems. Conversely, a problem type is defined as a representation of a set of problems with the same problem solving structure. The necessary condition for a computer to be autonomous for wide classes of problems therefore is that the computer is provided with various problem-solving structures as well as such a mechanism as to generate a proper problem-solving structure from them dynamically for every different type of problems. When a problem is large, the problem decomposition is necessary before going into the detailed problem solving procedure for finding solution. The problem decomposition is problem-dependent to a large extent but the decomposition method can be made common to many types of problems. The common method of autonomous problem decomposition to cover the wide range of problems is one of the key issues for solving large-scale problems.
54
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
3. New Modeling Method The difference between the conventional style and the new style of problem solving is in the location of a problem being transferred from human to computer in the problem solving process mentioned before and accordingly the representation of the problem. In the old style the problem is represented in the form of program but in the new style the problems must be represented in the form close to those generated in human brains, be transferred to computers as soon as they are created and be processed there. The formal representation of the problem is called here a problem model. A problem is a concept created in a person's brain and a problem model is its explicit representation. The principle of problem modeling is to represent everything that relates the problem solving explicitly so that no information remains in human brain in the form invisible to the others. A method of problem model representation must be decided such that variety of problems can be represented in the same framework. A problem model is represented as a compound of predicates and the structures of the conceptual entities that are also related with the language. Here is a big issue of ontology [3] [6]. In order to assure the common understanding of the model by many people, the structuring rules must be standardized first of all so that people come to the same understanding of the meaning of a structure. It is also necessary that people have the common understanding for the same language expressions [5]. In the author's research program, this problem is to be dealt with in a human-computer interface by a new method of matching between the meaning that user intends to express and the meaning of the expression of problem solving structure provided to the computer system. However it is out of the scope of this paper. Problem model is a formal representation of user's problem. It must be comprehensive for persons because it is created and represented by person. It must also be comprehensive for computers because it is to be manipulated by computers. It plays a key role in this new information technology. It is assumed in this paper that every problem is created in relation with some object in which a user has interest. If a representation of an object is not complete but some part lacked, then problems arise therefrom. Therefore to represent a problem is to represent the object with some part lacking, and to solve problem is to fill the lacked part. This is named an object model. The basis of the object model is to represent the relation between the structure of the object being constructed from a set of components and functionality of every conceptual object (the object itself and its components). This is the definition of an object model in the problem representation in this paper. Actually only the limited aspects of object that are within the scope of user's interest are represented. Then different problems can arise from the same object depending on the different views from the users. It means that representations not only of an object but also of person's view to the object must be included in the problem model in order to represent the problem correctly. A person may have interests in everything in the world, even in the other person's activity. This latter person being interested by the former person may have an interest in still the other person. It implies that, if the persons are represented explicitly as subjects, a problem model forms a nest structure of the subject's interests. It is illustrated in Figure 2. For example, a problem of making programs needs this scheme. Program is a special type of automatic problem solver and three subjects at the different levels concern defining automatic programming. Subject S1 is responsible to execute a task in a field, for example, in a business. Subject S2 makes program for the task of S1. For the purpose the subject S2 observes the S1's activity and makes its model as an object of interest before programming. Subject S3 observes the subject S2's activity as an object and automates S2's programming task. The activities of these subjects can be represented by the predicates such as processTrans(S1, Task), makeProgram(S2, processTrans(S1, Task), Program) and automateActivity (S3, makeProgram(S2, processTrans (S1, Task), Program), System), respectively. The upper activities need high order predicates. This kind of stratified objects/ activity is called the Multi-Strata Structure of Objects and Activities. A model of MultiStrata Activities is called a Multi-Strata Model [8][11]. A multi-strata model is composed from three different sub-models; Pure Object Model, User Subject Model and Non-User Subject Model. In the above example, S3 is the user who intends to solve a problem of making an automatic programming system. Looking from the user, the subjects S2 and S1 are in the objects being considered by the subjects S3 and S2 respectively and, therefore, the non-user subjects. The subject S1 does a task as a
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
55
Figure 2. Multi-strata object and model
work on a pure object. These are in the following relations (Figure 2). Problem Model = Subject Model + Pure Object Model Subject Model = User Subject Mode 1 + Non-User Subject Model Various types of problems are defined depending on the lacked part in a model. For example, if some functionality of an entity in a pure object model is lacked, an analytic problem is generated. If some functionality is given as the requirement but the structure of entities in a pure object model is lacked, then a design type problem arises. If the structure of activities is lacked, then scheduling type problem arises. By representing problems explicitly in this form information is made visible. 4. Shared Model Building It is not an easy task to build a problem model for a large-scale problem. It is necessary for user to externalize his/her idea and represent it in the form of model. The computer system must be able to support human in this operation. Let it be called externalization. It is a computer's supports for (1) human cognitive process to help user for clarifying his/her idea and (2) model building in order to represent the idea in the form of model. It requires a human-computer interface in the much broader sense than the ordinary ones. Very often novice users are not accustomed to represent their ideas formally. Sometimes, their ideas are quite nebulous and they cannot represent the ideas in the correct sentences. How can the system help these users? This issue belongs to cognitive science. What the system can do is to stimulate the users to notice what they are intending. Some researches are made on this issue [4] [15] but these are not discussed in this paper any more. In the following it is assumed that the users have clear ideas on their problems. The system aids them to build models to represent the ideas. Shared modeling is discussed. The model of a large-scale problem is also large. It is difficult for a single person to build it up alone but model building has to be shared by many people. In Figure 3 the user represents his/her intention in the form of the user model. At the same time he/she can makes the other part of the problem model that he/she wants to decide oneself. The figure 4 represents an example of modeling an enterprise. The user specifies a part of the model enclosed by the thick line other than the user model including the subjects with their activities. An externalization subsystem is provided to every subject as an agent of human. The activity to this subject is defined as an interactive operation. The other subjects are computer(s) and their activities are automated. The user activity is performed first. After
56
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
then, the operation goes downward to the lower nodes. When it comes to the lower human subjects it behave as the users. This continues successively. That a human subject behaves as a user means that the externalization subsystem begins to work for the subject for aiding the human to build the still lower part of the model. But the activity of some lower human subject may already have been decided by an upper human subject and is given to the subject. Then the upper decision has a priority and the lower subject is obliged to work to achieve the activity. In this way the subjects SubjectA, SubjectB, SubjectC and so on in Figure 3 behave as the users and extend the model downward. This model building concerns closely with the object decomposition, and this model decomposition concerns the activity of the related subject. For example, let the subject be required a design task. The design-type problemsolving structure is prepared. According to this structure the object model is built downward from the top. Following to this object model building a subject model is formed. 5. Autonomous Problem Decomposition and Solving If a problem is large, it is decomposed to a set of smaller problems before or together with the execution of the problem solving structure. This problem decomposition and sharing is also achieved semi-automatically based on the problem model. By the combination of the decomposition methods with the multi-strata modeling, variety of interesting problems are represented and solved. Because of the limited space, only a few examples are shown. In the following it is shown that design-type and evolution-type problem solving are represented in the same way but with the exceptions that whether the
Figure 3. Overview of problem model
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
51
decomposition is achieved based on knowledge backed up by human decision or based on the fixed rule automatically in computer. The autonomous problem solving follows the decomposition. In principle, it is to execute the problem solving structure for the given type of problem. The detail is abbreviated here. Refer to [9] [12] [13]. In fact decomposition is necessary not only for the convenience of executing problem solving but also for defining special type of problems. It is shown later on in 5.2 and 5.3. The objective of this chapter is to show that by means of the new modeling method and various operation methods defined on the basis of the modeling method different types of problems can be represented and solved in the same framework. In particular it is shown that autonomous decomposition is useful for defining various new type of problem solving methods. Decomposition starts after a conceptual object entity is defined. Then it is decomposed top-down to form an object model structure. The way of decomposing the object depends on the problem type, problem domains, and to what extent the problem area is matured. After new objects have been generated in whatever the ways by decomposition however a new subject is assigned to each new lower level object resulting in a new subject structure corresponding to the object structure. The same activity representation as given to the parent subject is copied to every new subject. The created pair of new object and new subject in this way forms an agent in the system. That is, this problem decomposition creates a new multi-agent system. Each agent shares a part of the workload given to the system.
Object Model Decomposition ; (Object-Subject Correspondent); Subject Model Formation ;
Object -> {Object!, Object2, —, ObjectN } | | I Subject
*2 DecomposeObject/
/
( S3m J ..-•'
*/•> o
**'••..
t o K- *
^ O2a
"Functionality 2a O2a
..**
Figure 5. Problem decomposition
Functionality 3m
S. Ohsuga andS. Takaki / Enlarging the Capability of Information Systems
58
In a design type problem for example the object model should be built in such a way that the required functionality of the object is satisfied. The user subject creates a human subject in the lower level and decides the subject's activity as design. The design activity starts from this subject and decomposes the object entity using knowledge base. Occasionally the human subject modifies it. Then a new human subject is assigned to a new object. In the other case the user subject specifies an activity different from this to the lower subject, for example an evolutionary rule, for decomposing an object automatically as weU as a condition of the created object to survive in a given environment. This rule is to decompose an object autonomously. A global object is formed as their aggregation. If the object thus created cannot meet the survival condition, then the object dies. An object to adapt to the environment is created by evolution after a number of trials. This is a problem type with a special problem solving structure. The design and evolution are compared in the followings. 5.1.
Design as interactive object model building
Design is an activity of building an object model. In many cases models are built top-down. In a matured domain the experiences of the similar design and knowledge on decomposition is accumulated. For example, if the problem is to design an airplane, there is such a decomposition rule of an airplane usually used as, Aircraft -> (Engine(s), Mainwing, Control-surfaces, Fuselage, Landing-gear, Wire-harness, Electronic-System, —}. Using this rule the system builds tentatively a part of an aircraft model as composed of these assemblies. After then human can change this structure according to his/her idea. He/she decides also the functional requirement to every assembly. This stage is called a conceptual design. The object is evaluated whether the design requirement for the aircraft could be satisfied with this structure and the functional requirements to these assemblies within the scope of this object structure. After then the design work moves downward to each assembly. The functional requirement decided as above to each assembly becomes the design requirement for the assembly design and the assembly is decomposed further. Knowledge
Generation-Rule
Person
Object Model Decomposition ; Object -> { Object 1, Object2, —, ObjectN } (Object-Subject Correspondense) ; | | I Subject Model Formation ; Subject -> { Subject 1, Subject?, — , SubjectN }
J
I
I
Person! Person2 - PersonN Figures. Design process
Knowledge
Generation-Rule
Object Model Decomposition ; (Object-Subject Correspondense); Subject Model Formation ;
Cell
Object!
Object2
Subject!
Subject2
Cell 1
Cell 2
Figure 6. Evolution process
CellN
5. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
59
Generating a new structure is important in creative design. As the matter of course human creativity plays important roles. Other than this, an automatic generation of new structures is possible in some cases. An example is seen in genetic algorithm. But human makes final decision. It is an important characteristic of design. The author has discussed knowledgebased design so far and the further detail is abbreviated. Refer to [10]. 5.2.
Evolution as autonomous object model building
Evolution is a process of a system increasing complexity by an autonomous expansion of a structure so as to adapt to an environment. The expansion is achieved by decomposing object entities in the system. The decomposition is performed based on an internal rule given to every entity. In case of living things, a system grows by the cell division. The rule of division exists inside each cell. It is inherited from the parent by copying DNA. Various attempts have been made to simulate this process as the study of artificial lives. This study is different from but similar partly to design. The difference and the similarity are made clear by representing it using multi-strata model. The similarity is that both design and evolution proceed top-down based on decomposition of object entities. The difference is that in design the subject entity is defined independently from the object model and different method of decomposition is defined to each subject while in the evolution a subject entity is included in an object entity. When an object is decomposed into sub-objects, a subject is created to every sub-object. Every subobject is the same. Accordingly every subject is also the same.. Every subject has the same activity, i.e. decomposition rule. The subject is also the same as the parent subject that generated the subject through the decomposition of the object. In spite of this difference, both cases can be modeled by means of the similar multi-strata model and the basic decomposition scheme is used. The difference is that, in design, the final decision on decomposition of an object is left to a person even though a computer system presents the possible ways of decomposition and also that the different subject is assigned to a subobject. In an evolution system simulated in a computer system on the other hand, every subject is the same software and decomposition is performed autonomously. The software is represented in the form of rule. This rule is copied every time an object is decomposed and new subjects are created. The behavior of the evolution system depends on generation rule and an environment with respect to which object is evaluated. As the supervisor, the human user defines/designs a generation-rule and let it be an activity of the user subject as well as the environment and the condition of the evolutional system to survive in the given environment. Then he/she can simulate its behavior. An example is shown below. To make a generation rule and condition of the evolutional system to survive in the given environment is to define a new (artificial) creature. There is a lot of possible ways for defining them. This possibility may be more than the number of really existing creatures in the world. Only a very simple case is presented as an example. Example: An artificial creature is considered. An object is a cell including a subject. The subject decomposes the object when evoked. An object is divided into two objects creating correspondingly new subjects as discussed above. Every object has two arms to make connections with the other objects. Each subject is given an activity that works either to decompose the object or to extend two arms from the object in two arbitrary directions. For simplicity the direction of the extension of an arm is selected randomly from one of the equally separated eight directions (Figure 7(a)). An arm is extended once a fixed unit of length. The real length is different by the direction but it is ignored for simplicity. The arm extension is repeated three times. If an arm hits with the arm of the other object during this extension, these objects are connected together. A remaining free arm of each of these objects continues further extension and may hit with still the other object. Thus a chain of the connected object is formed of which only two objects at the both ends have the single free arms. If a free arm of an object fails to connect with the other object after three extensions, the arm is pruned and the object is decomposed. If both arms of an object are pruned, i.e. no hit has ever occurred, then two new objects are created by decomposition. If an arm of the object at one end of connected chain of objects is pruned, the object is decomposed but the other arm connected with the other object keep this connection. That is, the connected chain is copied and one chain is connected to one of the new created objects (Figure 7(b)). Thus the probability of forming the connected chain increases.
60
5. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
(a) Selection of direction
(b) Decomposition of objects
(c) A connection path between sensor and actuator
Figure 7. An example of generating evolutional system
On the other hand, the user provides a sensor block and an actuator block. Each block has eight arms corresponding to eight directions. A sensor arm, say k-th arm, generates information when there is a food in k-th direction. An actuator arm, say j-th arm, corresponds to the move of the whole system in j-th direction. Thus if the free arms at both ends of a connected chain of the objects are connected to k-th sensor arm and j-th actuator arm respectively, then information flows from k-th sensor arm to j-th actuator arm. That is, if the evolutional system find a food in k-the direction, then it moves to j-th direction and end in the fail to get the food. The survival condition is that the creature can survive when the connection from j-th sensor to j-th actuator is formed for j = 1,2, —, 8. This is the simplest case but in this framework the more complicated case can be represented. For example, if the number of the arms of an object is more than three, not the sequential path but the more complicated network is generated. Also the more elaborate system can be made. For example, the connection that failed to meet the survival condition may be killed automatically. Such details are abbreviated here because this is an example to show that the system discussed so far is available to this kind of applications. This system is used as a study tool of evolution system because, by changing the generation rule and the condition of surviving in the given environment, the user can define various evolutional systems and can study their characteristics. 5.3.
Evolution type problems
There are some application problems that belong to this evolution type. May of them are newly developed methods. For example, agent-based finite-element method has been proposed recently [7]. It is a modification of conventional finite element method for analyzing structure of a complex object to enable the distributed processing. The principle of finite element method is to divide an object into a set of small cells and analyses each of these cells taking the inter-relations between adjacent cells into account. In a new approach an agent is assigned to each cell and these agents work cooperatively. In this case the same
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
61
method of generating subject according to the object decomposition can be used. At present the object decomposition is performed independently from and prior to the analysis. Alternatively it is worth trying to decompose an object together with analysis. That is, after the rough division of the object the analysis starts. If an agent judges that the analysis to the relating object is not correct enough, then it decomposes the object and assign a new agent to each smaller object. This process is repeated until the cells are reached to which the analysis becomes correct enough. It is necessary to make a proper criterion with which the correctness of the analysis is evaluated. As a still other example, agent-based manufacturing system is considered as a new concept on Intelligent Manufacturing Systems (IMS). To every part coming into a manufacturing system an agent is assigned which controls transport, manufacture and store of the part. When two or more parts are assembled, the agents are merged into a new agent. As a similar concept Holonic Manufacturing Systems (HMS) have been proposed also on IMS. Holons are the building blocks of a manufacturing system, which can transport, manufacture, store and control parts in a factory. Holons are agents and cooperates to each other in order to restructure a manufacturing system adapting to the rapid change of manufacturing environment. But in this case the agents are not generated dynamically but specified in advance. Therefore it is not the evolution type but a design type problem. Many evolution type problems came appear rather recently. This type came into the scope of consideration because computers became fast and large and distributed processing became possible. Still the other evolution type problems may increase from now on. The computer system discussed so far in this paper enable human to define new evolutionary methods experimentally. That is, it can be a system to create new problem solving methods. 6. Conclusion The author has discussed in this paper a way to make computers more intelligent and expand the scope of information processing. It did not mean that computer could be more intelligent than human being but that the computer could help human to make the best decision by undertaking the management tasks of environment of decision-making. It requires autonomy and multi-tasking to computer systems. This paper proposed a way to achieve this goal. The major topics discussed in this paper were a new modeling method with which different types of problems can be dealt with and its application. Some special problems including design type and evolution type were discussed as the examples. After a brief overview of an intelligent system that the author's group is conducting research, a way of applying it to these types of problems was discussed. One of the important characteristics of the system discussed in this paper was that it could be a tool of experimentation for supporting human to develop new methods of problem solving. In order to show it a method of representing evolution type problems was discussed in detail. As the future works it is necessary to apply this method not only to the single type of problem but the complex problems including different types of problems. Acknowledgement The author expresses his sincere thanks to The Science and Technology Agency of Japanese Government for their support to the research. Without their support this research could not be continued.
References [1] C. Castelfranchi, Intelligence Agents : Theories, Architectures, and Languages, in Guarantees for autonomy in cognitive agent architecture, (edited by M.Wooldridge and N.RJennings). Springer, 1995. [2] W. Wayt Gibbs, Software Chronic Crisis. ScientificAmerican, Volume 18, No.2, 1994. [3] T. Gruber, What-is-an-ontology? http://www-ksl.stanford.edu/kst/what-is-ontology.htmL
62
S. Ohsuga and S. Takaki / Enlarging the Capability of Information Systems
[4] K.Hori, A system for aiding creative concept formation. IEEE Transactions on Systems, Man and Cybernetics, Vol.24, No.6, 1994. [5] R. Mizoguchi, Knowledge Acquisition and Ontology. Proc. of the KB & KS, Tokyo, pp. 121–128, 1993. [6] R. Mizoguchi et al., Ontology for Modeling the World from Problem Solving Perspectives. Proc. of IJCAI-95 Workshop on Basic Ontological Issues in Knowledge Sharing, pp.1–12, 1995. [7] Y. Nishi, G. Ben: FEM Preprocessor by using Multiple Agents, International Association for Computational Mechanics 4th World Congress on Computational Mechanics, 1998 [8] S. Ohsuga, Multi-Strata Modeling to Automate Problem Solving Including Human Activity. Proc. Sixth European-Japanese Seminar on Information Modelling and Knowledge Bases, 19%. [9] S. Ohsuga, Toward Truly Intelligent Information System - From Expert Systems To Automatic Programming. Knowledge Based Systems, Vol.10, 1998. [10] S. Ohsuga, A Modeling Scheme for New Information Systems - An Application to Enterprise Modeling and Program Specification. IEEE International Conference on Systems, Man and Cybernetics, 1999. [11] S. Ohsuga, How Can AI Systems Deal with Large and Complex Problems ? (to appear in) International Journal of Pattern Recognition and Artificial Intelligence, 2001. [12] S. Ohsuga, To What Extent Can Computers Aid Human Activity ? - Toward Second Phase Information Technology, (to appear in) Lecture Note on Computer Science, Springer, 2001. [13] S. Ohsuga and Hiroyoshi Ohshima, A Practical Approach to Intelligent Multi-Task Systems Structuring Knowledge Base and Generation of Problem Solving System, (submitted to) E-J Conference on Information Modelling and Knowledge Bases, 2001. [14] F. H. Ross, Artificial Intelligence: What Works and What Doesn't ? AI Magazine, Vol. 18, 1997. [15] Y.Sumi et al., Computer Aided Communications by Visualizing Thought Space Structure. Electronics and Communications in Japan, Part 3, Vol. 79, No.10, 11– 22, 1996.
Information Modelling and Knowledge Bases XIII H. Kangassalo el al. (Eds.) IOS Press, 2002
63
Learning in Multi-agent Systems Jaak Henno Tallinn Technical University Tampere University of Technology, Pori
[email protected] Abstract. All the time increasing flow of new information has made learning a most common activity of mankind. More and more people spend more and more time learning new things, but our understanding of the learning process is still very limited. There are more than 50 learning theories [4], but they use terms, which are difficult or impossible to measure, and none of them is accepted by all researchers. Learning studies emphasize importance of social interactions and communication, but proposed interactivity definitions and classifications are vague, based on totally different phenomena and thus can not (yet) be used as a base for research of interactivity and significance of interactions/communication in learning. It may well be that the main role of interactivity and communications is in ability to follow, what majority does (the basic principle of politics and democracy). Experiments with multiagent systems and studies of ecological, social, economical, organizational etc systems show, that individual learning and interactions between individuals can be eliminated (almost) and the system can still exhibit quite highlevel behaviors, which justly can be called learning. Multi-agent communities can exhibit learning capabilities even when agents do not have memory or have a very limited memory and are not interacting, but only follow each other's behavior. In many systems minimal capabilities of individuals, which make their behavior "a bit better" than totally random, sum up to create behavior of the whole community, which is already close to optimal. Here is considered non-supervised reinforcement learning of sequential tasks, where reinforcement (information about progress) becomes available only at the end (which can be arbitrary far); as a concrete example of such a task is considered the problem of finding a path in a maze. It is investigated how even small improvements in abilities of individual agents influence behavior of the whole multi-agent system resulting (learned) behavior of the whole system.
1.
Introduction
Together with all jobs becoming more and more intelligent, learning is becoming everyday activity of more and more people. Already in sixties Marshall McLuhan noticed: " ...under the conditions of electric circuitry, all the fragmented job patterns tend to blend ... into involving roles or forms of work that more and more resemble teaching and learning..."[1]i.e. the more are people able to interact and communicate, the more they become evolved in learning. This (at that time rather surprising) prophecy seems to be very true. Many professions are nowadays called "knowledge workers"; these people operate in complex environments that require broad skills and knowledge that are not easily explained or demonstrated. It is difficult or impossible to measure or observe someone knowing or understanding. Even more difficult is to understand how people actually manage to learn something. In spite of active research in areas of human knowledge acquisition and learning, most countries are constantly complaining about inadequacy of their current education system. All technically developed countries are nowadays interested in import of well-educated brains (first of all, programmers). Educated brains have become
64
J. Henno / Learning in Multi-agent Systems
internationally traded and sought product. But the production methods of this product are still wrapped into mystery of words dealing with quantities, which are hard or impossible to measure and mean different things for different people: "cognitive processes" and "cognitive development", "observable behaviors", "reality perception" and "reality of perception", "deep understanding", "deep cognition", "learning environment", "social setting" etc etc. In learning and instruction theory discussions even "cognitive theories" seem to be sometimes "primitive materialism", explanations are sought in "whole-person perspective, which includes emotions and intentions"[2]. However, it may well be that learning is just a natural part and consequence of the modem development of communication facilities and the process of the whole world becoming "one big village" - the more people are interacting, the more they can follow each other's activities, the more they (as a multiagent society) learn. 2.
Learning and Intelligence
Human intelligence and especially human learning have remained rather badly understood topic, a kind of "ding an sich"[3]. Minsky: "...we humans know less about the insides of our minds than we know about the outside world.. ."[8]. There are tens of theories of human learning: constructivism, situated learning, operant conditioning, observational learning, behaviorism, self-regulated learning, cognitive apprenticeship, distributed cognition, cognitive flexibility theory, sociocultural theory, zone of proximal development, inquisitivism and so on and so forth; for instance, in [4] are presented 50 (!) learning theories. However, none of those theories has won overall recognition and practitioners (acting teachers) are trying to combine them as e.g. "distinctive axes of field of learning" [5]. The first AI models of intelligence and learning were based on some kind of "internal model" of environment and the main problem was seen that the models were not big enough, e.g. did not cover so-called "common knowledge" etc; researchers (Simon) tried to overcome difficulties by building bigger and bigger models. The "model-building" has been questioned for a long time (Dreyfus, Penrose). Instead, lately a new approach has been proposed, so-called "intelligence without internal models", "intelligence without representation"[6],[7], which proposes that even high-level behaviors can be a result of some simple basic principles. The main idea of this approach was expressed already by Marvin Minsky[8]: "The power of intelligence stems from our vast diversity ". The behavior of single agents can be based on very simple algorithms; however, in collectives of multiple agents can emerge behaviors, which are essentially higher-level than behavior of individuals[29], [33], [9]. 3.
Interactivity studies
Researches of human intelligence and learning (e.g. Vygotsky[lO],[ll], Piaget[l2] etc) have emphasized importance of interactions and society, social interactions, social communication in developing constructs, and believed that cognition is created by interaction in social groups and that cognition cannot be isolated from social life. "Interactivity in learning is...a necessary and fundamental mechanism for knowledge acquisition and the development of both cognitive and physical skills"[ 13].
J, Henno /Learning in Multi-agent Systems
65
However, they mostly approach learning and cognition from psychological viewpoint, and thus interaction still remains a very controversial and little understood topic. There is no common definition of interaction and researchers have proposed rather controversial classifications of interactions and interactivity even in comparatively restricted areas. For instance, E. J. Downes and S. J. Mcmillan[14] used interviews with distinguished media and communication experts and come up with six dimensions/characteristics of interactivity in computer-mediated communication. The first three dimensions are message-based: - the nature and direction of messages - (usually) two-way, recursive, sender and receiver roles are interchangeable; even in mostly one-way interaction (web browsing) has user many choices; - close to 'real time', participant(s) have some control over timing of messages; - interactive computer-mediated communication can create a kind of virtual place. The other three dimensions relate to individuals who participate in interactive media: - control - participant controls the timing, content, to whom the message will be addressed, i.e. participant controls with whom he/she wants to have exchange and interaction; - responsiveness - the participant must respond to control options that are provided in the medium; "interactivity requires all messages in a sequence to relate to each other" [15]; - perceived goals - the goal of communication may be determined by the creator of the message; often participants perceive that the goal of communication is more oriented to exchanging information than to attempting to persuade. Unfortunately these characteristics are rather vague, these dimensions describe more individual feelings than objective, controllable features. Quite different dimensions are used in characterizations of interactions between learner and multimedia courseware [16]: - Passive, 2-way flow control (i.e. moving the next and previous display/page, selections from linearly presented material); - Choices from a hierarchy (i.e. selection from hierarchically designed material); - Information update control; - Construction with components; - Participation in simulation (simulating/modeling some real-world process/device); - Navigation of hyperlinked information - Operation in a microworld; - Multimedia creation. Simms[l7] presented a different classification of interactions with multimedia computer courseware: - Object Interactivity - (proactive inquiry) ... objects (buttons, people, things) are activated by using a mouse or other pointing device (i.e. serve as hyperlink anchors) - Linear Interactivity - (reactive pacing) refers to applications in which the user is able to move (forwards or backwards) through a predetermined linear sequence of instructional material. Often termed electronic page-turning
66
J. Henno / Learning in Multi-agent Systems
- Support Interactivity - (reactive inquiry) the facility for the user to receive performance support, which may range from simple help messages to complex tutorial systems. - Update Interactivity - application components or events in which a dialogue is initiated between the learner and computer-generated content ... the applications presents or generates problems (either from a database or as a function of individual performance levels) to which the learner must respond; the analysis of the response results in computergenerated update or feedback - Construct Interactivity - (proactive elaboration) is an extension to update interactivity, and requires the creation of an instructional environment in which the learner is required to manipulate component objects to achieve specific goals. A classic example of this form of interaction is a lesson created for the original PLATO system (Odistill), which required the learner to construct distillation apparatus from component parts - Reflective Interactivity - (proactive elaboration) covers... situations in which instructional designers wish to include text responses to prompt or questions. Usually, if N correct alternatives are provided to a text response, the user will enter the N+lth correct response, which will be judged "incorrect". To prevent this, reflective interactivity records each response entered by users of the application and allows the current user to compare their response to that of other users as well as recognized "experts". In this way, learners can reflect on their response and make their own judgement as to its accuracy or correctness. - Simulation Interactivity - (which ranges from reactive elaboration to mutual elaboration, depending on its complexity) extends the role of the learner to that of controller or operator, where individual selections determine the training sequence - Hyperlinked Interactivity - (proactive navigation), the learner has access to a wealth of information, and may "travel" at will through that knowledge base - Non-Immersive Contextual Interactivity - combines and extends the various interactive levels into a complete virtual training environment (mutual elaboration) in which the trainee is able to work in a meaningful, job-related context All the above types of interactivity are based on topology, structure/design of presented knowledge. This is usually different from the domain model, i.e. schema of domain[l8]. Knowledge presentation is more and more seen as a design problem [19], i.e. essentially an art problem: "Design refers to the human endeavor of shaping objects to purposes"; "I construct knowledge broadly, including facts, concepts, principles, skills, and their intelligent, insightful, and sensitive use"[19]. Unfortunately, art is a very subjective topic, which is nearly impossible to formalize and study, and moving interactivity classifications onto domain of art does not clarify the situation substancially. There are even less classifications of interactivity between learners (social interactivity). When analyzing online-conferencing [20] and other computer-mediated asynchronous communications [21], T. Duffy separated two interaction styles: conversation - "interaction where the emphasis is on moving the conversation forward...the pace is quick, the topic meanders around with some frequency, and the focus of the participants is always on the last one or two things that were said. For archival of issues and on-line support of this type of interactions the most natural is a linear, timebased system..." issue-based discussion is grouped by topic, "putting the emphasis on in-depth examination of specific assertions; this kind of discussions are best supported by topically organized, hierarchical conferencing systems".
J. Henno / Learning in Multi-agent Systems
This classification can help in designing on-line conferencing systems, but it does not provide much help for understanding learning. It remains only to agree with the basic conclusion from [22]:" ... despite attempts to provide a context for interactivity through taxonomies, levels and dimensions, there remains a level of mystery about its function and purpose" 4.
Learning studies in Computer Science
Machine learning research has been formalized in several ways. The best-studied approach is supervised learning, the so-called Gold-style [23] learning, where a "teacher" provides the learning system with a set of training examples. In this model, learning can be considered e.g. as a formal language recognition task: to learner is presented a sequence of strings (examples) from the target language and learner has to construct a description (recognizing program) for the target language. For the sequence of examples learner produces a sequence of hypotheses HI, H2... such that the limit of this sequence of hypotheses is a correct description (accepting program) for the target language. The limit approach allows to study, what is learnable (what language classes can be learned) and to compare learnability, but it does not tell much about practice of human learning, which is mostly non-supervised. Much closer to real-world situations is reinforcement learning[24],[25], where the learning system is not provided with a "correct" output for every input, but must itself generate an output (hypothesis) and receives for its output an evaluation, which the system uses as reinforcement to improve its behavior. This approach requires exploration, i.e. the system has to find the best output for any input over several runs. The evaluation/reinforcement can be satisfaction of some built-in goals, e.g. finding food, reaching a state of pleasure or reducing pain. Usually the process is sequential and the evaluation is not provided at each step, but only after some sequence of actions (steps). Here this process is modeled as search of a path in a maze (graph). With every new example (in every new room) learner has to check, whether his last hypotheses agree with the new situation -weather it is possible to continue. If it is, the process continues - the search moves to the next room in the maze; if it does not, learner has to backtrack to the state, where the last decision was made (hypothesis was generated) and generate a new one, which allows to continue. The maze search metaphor is used to describe hypertext and WWW browsing and social communication (see e.g. [26], [27], [28]) and to study development of evolving multi-agent systems [29], [30]. 5.
Learning and multi-agent systems
If the evaluation (reinforcement) is provided only after a sequence of steps (arbitrary long, i.e. the system does not have any information about how far the goal can be), simple agents (agents without any memory or with very limited memory) have somehow to store their previous experience. In biological/social systems this kind of memory device is so-called social memory, external memory which all agents can use: "cooperative behavior in people is shaped by the accumulated cultural-historical knowledge of the community"[31], and use of this external memory can essentially reduce importance of communication and interactions for improving collective performance, i.e. learning [33], [29].
68
J- Henno / Learning in Multi-agent Systems
In many multi-agent systems individuals, pursuing their own interests, without intentional cooperation, nonetheless generate collective behavior with essentially higher characteristics, better corresponding to needs of the whole community. A simple example is formation of walkways in a new park. Planned pathways fail nearly always, since planners can never capture needs of all walkers and all the essential factors: start areas and destinations, terrain, security, modes of travel etc. The best (allowing most walkers quickly to achieve their destinations) system of pathways is the one, which emerges from developed by people walkways. Collecting and classifying actions/preferences of individual people allows to discover deep and useful links between (explicitly) unlinked topics and this approach has been used are e.g. in the Fluswat[32] system, which "lets you click on any word on your screen and get a choice of links to related info". Also the book linking system of Amazon.com: "... who bought this book also bought..." has created extremely useful system to search new books considering close topics; there are many other examples from biology [33], [34], social sciences etc. Emergence of such higher-level linkings/classifications can be seen as learning of the whole collective, which in turn helps also individual members to improve their performance. The emerging higher-level capabilities are a cumulative effect of better than average performances of individuals. If individuals (in average) solve a task (e.g. searching a path in a maze) even "a bit better" than the totally random search, then for the whole community these pieces of knowledge sum up to create a higher level group performance. For instance, suppose a group of tourists is searching for a hotel or restaurant in an unfamiliar city. Nobody knows the exact route, but everyone has some ideas ("this can't be on other side of this highway", "I have seen this coffee shop yesterday" etc), so that their search is not totally random. Even if they do not negotiate, but (if uncertain, which way to turn) follow those who (using their own criteria) take a decision, the group will eventually reach the destination - the "positive bits" of individual knowledge sum up, often to a nearly deterministic route. Here is important just presence of multiple independent, diverse agents. In most systems, individual agents are never totally random. For instance, humans tend to repeat (unconsciously) their previous choices, even if there is no obvious reason for this (our "free will" is rather restricted), and thus soon appear to be running in a circle (cycle). Only community of independent agents can and will (if it is big enough) search all the possibilities and as a result develop better (learned) behavior. The bigger the community, the quicker the learning. 6.
How collective behavior depends on abilities of individuals?
In the following emergence of more efficient behavior (learning) was studied using multiagent search of maze. Maze consists of n rooms (nodes), some of which are connected by a link; i.e. maze is a connected undirected graph. One room is entry (in the following examples denoted 1) and another - exit (denoted E). The only room, which agents recognize, is the exit E, all other rooms look for them similar. A community of agents searches the exit. Agents do not have any previous information (an arbitrary randomly selected maze is considered) and do not get any global information about the maze; the only information they get is what they see in rooms. Rooms all are for
J. Henno / Learning in Multi-agent Systems
69
them undistinguishable, except the number of links (doors) connecting rooms with other rooms and (possibly) marks, what they have left on previous visit. All agents have similar capabilities (i.e. use the same basic search algorithm). In situations, where the collective search algorithm does not determine, which link should be selected next, every agent "uses its own logic", i.e. makes its own decision. Decisions of different agents are independent and selected by them links have uniform distribution, i.e. the ratio of agents taking some particular link is the same for all possible links. For instance, if there are two possible links from a room and the search algorithm does not determine, which one should be selected, then both links are taken by (approximately) half of agents (the number of agents is assumed to be sufficiently large). In computer simulations randomness of individuals and of the whole community was modeled in the following way: all agents used the same algorithm, which was sensitive to the order of listing connections in the maze (Prolog), but for every agent random substitutions were applied to lists of connections (maze was presented by lists of rooms, connected to the current room). If agent search algorithm allowed cycles, then cycling agents were "stopped" by using counters of rooms visits - if the number of visits to some room become larger than a constant (in examples below, 1000), then agent was stopped. Agents do not communicate; the only interaction is that they can see the information stored by them earlier (marks in rooms or on links). The learning/search procedure setup follows the one used in[29] and consists of three sequential phases. In the first, Learning Phase, individuals use their search rules to find a path to the exit. During the search they can access only local information (i.e. what they see in a room) and may store some bits of information - mark the room or mark a link from a room to another (this is an analogue of insects marking with pheromones or the Ariadne's red thread). The individual search rules should allow them all (or most of them) to find the exit, i.e. probability of infinite loops should be arbitrary small, so that they all (most of them) will finish the search in some finite time. In the second, Application Phase agents repeat the search, using the information stored in the learning phase, thus results of the second phase show how much their individual behavior has improved, i.e. what they have learned. In the third, Collective Phase maze is passed using all the learned information (marks) of all individuals; this phase is started only when they all have finished their individual Application and Collective Phases. The search rules for collective phase are same for all individuals and the search uses information, collected earlier by individual agents. The main problem considered here is - how does the emerging group behavior (learning) depend on agents individual abilities: can they mark rooms and/or links, use memory, if they can share their markings and if they can exchange their knowledge. It is shown that even minimal capabilities (ability to store 1 bit) can make group behavior much better than that of any individual and with increasing abilities of individuals also group performance increases (sometimes rapidly). We were mostly interested in qualitative features, e.g. how (arbitrary long) cycles are eliminated, but sometimes also quantitative measures, e.g. number of marks or length of path are considered. The idea was to compare different search methods, especially methods, which are similar to human search. For instance, humans do not usually backtrack (return to previous room), but always try to continue; they always try to visit places, which they know less etc.
J. Henno / Learning in Multi-agent Systems
70
The resulting behavior (search in collective phase) uses information collected by all the agents. Therefore it is essential, that all the possibilities (possible routes from entrance to exit) should be covered, i.e. the overall number of agents should be greater than the total number of different cycle-free paths from the entrance (room i) to exit (room E).
7.
Search rules
In the following is considered how emerging collective capability depends on agent's individual abilities. Random walk Here agents select links totally randomly, do not make any marks in maze and do not have any memory:
move(Room):exit(Room),!,exit. move(Room):connected(Room, Room1), move(Room1). Lemma. In random walk the exit will be find with probability 1; however, the exit path can be arbitrarily long.
The two simplest 3-node mazes L, (2), Suppose n agents start to traverse a maze and all agents traverse a link in a unit time. Because of equal probabilities of all possible continuations, number of agents at each node will in the mazes L1 (2),L(2) be:
T
L,(2)
U2)
1
2
E
1
2
E
1
n
0
0
n
0
0
2
0
n
0
0
n/2
n/2
3
n/2
0
n/2
n/2
0
0
4
0
n/2
0
0
n/4
n/4
5
n/4
0
n/4
n/4
0
0 n/8
0
6
0
n/4
0
0
n/B
7
n/B
0
n/B
n/B
0
J. Henno / Learning in Multi-agent Systems
1\
For instance, in L2(2) for the (arbitrary long odd) exit path w = K2i)*E (* - iteration, i.e. the iterated path is repeated 0,1,2,... times) we have p(E) = ]£(—)' —> 0 and from the 1=1 ^ connectivity the claim holds for arbitrary maze. This means, that the main assumption (the number of unfinished searches can be made arbitrary small) holds for random walk; it is easy to check, that the assumption holds also for all more complicated search rules considered in the following. If agents use the random walk method, do not make any marks and do not have memory, then their behavior does not improve in the application phase and also the collective behavior will be the same. Therefore we shall next consider setups, where agents can store minimal amount of information, e. g. they can mark either rooms or links. No-backstep random walk Here agents select randomly, trying always to take a link, what has not yet been used and excluding (if possible) the previous node. For this they mark links- what they use (only once, i.e. the marking is not cumulative): move(Room):exit(Room), i . move(Room):connected(Room, Room1), not clause(link(Room, Room1),true), not (last_link(Rooml,Room)), retractall(link(Room,_), assert(link(Room,Room1)), move(Room1). move(Room):connected(Room, Room1), move(Room1). In the application phase agents follow the previous search, i.e. they take only links, what they have used before. Here the individual learning phase can consist of several runs, since (if they use different marks on the second time) their performance can get better in every new run. This method will generally improve the individual performance (exclude some cycles), for instance, in the above maze up) the worst search path would be 121E, but it still can create arbitrary long cycles in the phase of individual search, e.g. 1(234)*15E in the maze L1(4):
The maze The collective behavior rule is: From all possible links select a link, which has maximal number of marks of individual agents. The collective behavior will already be much better, e.g. in the above maze the collective path will be the shortest path 15E - link (15E) will have marks of all the agents (who
72
J.
Henno / Learning in Multi-agent Systems
reached the exit in the learning phase; for this), but the link (12) will have marks of only half of agents, since another half has selected the link (12) - at the beginning of search their probability is the same (both are unmarked). The same way, even in a rather "symmetrical" maze,
the collective phase finds one of two optimal (cycle-free) routes (124E) or (134E). Computer simulations on randomly created maze Labyri show that marking only links is rather weak ability and with this method agents start cycling rather frequently.
Example maze Labyn
Curiosity-based random walk
Here agents are rather limited - they are able only to mark rooms and remember the last link they used (from where they come to the current room). They always try to go to a room where they have not been earlier and (if this is not possible) try not to turn back, i.e. not to go back to room where they come from. To be able to follow this rule, agents will mark (before leaving) the node where they are currently (if this node was not yet marked by the agent earlier) and will also mark the link they take. move(Room):exit(Room),!,exit. move(Room):connected(Room, Rooml), not clause(visited(Rooml),true), not clause(last_link(Rooml,Room),true), assert(visited(Room)), retractall(last_link(_,_)), assert(last_link(Room,Rooml)), move(Rooml). move(Room):retractall(last_link(_,_)), assert(last_link(Room,Rooml)), move(Rooml).
J. Henno / Learning in Multi-agent Systems
73
This method will exclude some cycles (e.g. in the above maze L, (2) the worst search path would be 121E), but can still generate arbitrary long cycles in the learning phase, e.g. i ( 2 3 4 ) *15E in the maze L 1 ( 4 ) . The collective behavior rule is: select a node, which has maximal number of marks. Let us first consider the example Li(4). When n agents start the search, they all mark the room 1, so the room 1 will get n marks. Links (12), (15) have equal probabilities, so half of them move (15E) and do not further contribute any marks. The other half goes to node 2; this half will at some time (after long enough time) return to node 1 (and continue from there 15E), so the total number of marks on node 1 will be 3n/2. Half of agents from the node 2 will continue (2342) and the other half - (2432), so all the n/2 agents (whatever direction they use on the cycle) will return after the first cycle to node 2 and the number of marks on this node becomes n/2+n/2=n. Now nodes 1, 3 and 4 all are marked and they will be used with equal probability, so n / 2 > / 3 ) agents will leave to node 1 and 2((n/2)/3) agents will enter the cycle and after a while will again arrive to node 2. Thus the total number of marks at node 2 becomes n + n/3 > 3n/2, the total final number of marks on node 1 and the collective behavior will be 15E. New-searching walk (with cumulative marking)
With many mazes cycles (i.e. re-visiting already visited rooms) are unavoidable. To eliminate influence of cycles (cycling could rise number of visits arbitrary high and this way, get the whole community into a trap), agents should be able to understand, how many times they have already been in a room. Therefore marking was made cumulative - agents add a new mark every time they visit a room. They always try to go into room with minimal number of marks. To speed-up the search, simple cut-off condition was added agents try to avoid rooms, where number of visits is higher than a pre-determined constant. In computer simulations, max 6 visits turned out to be enough to avoid cycles (i.e. after 6 visits they already "remember", that this room is useless). This simple rale turned out to be rather effective, For instance, in the maze Labyri a small community of 10 agents was able to find the shortest path l-4(5)-5(6)-e(10); numbers in brackets show, how many agents used the previous link (in application phase).:
Links used by a community of 10 agents; frequencies of links use is indicated by line thickness However, routes of individual agents were quite long: 16,19,22,12,12,9,9,9,18,12 links, average: 13.8, standard deviation: 13.8 In an another search in maze Labyri 20 agents produced (non-optimal) path l-6(9)-7(7)-2(7)5(13)-E(20), i.e. because of randomness present in the search algorithm, a small number of agents are not always able to find the shortest path.
74
J. Henno / Learning in Multi-agent Systems
With local markings (nodes or/and links only), the collective behavior may sometimes generate not a shortest path and there may be several equally probable collective paths.
I
Maze U(4). In L2(4) all paths 14E, 1E and 135E have probability 1/3. To find a shortest path agents can only if they either have unlimited memory (so that they can build an internal "model" of the maze) or (with limited memory, using only marks) use some kind of "back-propagating" strategy to store in nodes information "there is still (at least) one node ahead". Using nodal path preferences
This rather complicated method was used in [29]. The main idea is not only store "I was here", but also store, which direction (link) was used when this node was left the last time. For this, links are considered as directed and agents store on links so-called nodal path preferences, which can be 0, 0.1 or 1; the initial value is 0. Let P^, be the nodal path preference for agent m for link q from node i to node j. The next node j when agent m is at node i is selected according to rules[29]: "If there exists any connecting node with pmiq = 0.0: - Choose a link randomly from the set of links with a value ofPmiq=0.0, - Set pmiq for this node to unity and pmiq for the reciprocal link from j to I to 0. 1, Set all other links with pmiq of unity to zero. If all links have pmiq, greater than zero: Choose a link with the maximum value of pmiq " This method tries always to avoid the last choice - for the previously taken choices pmiq is set to zero, so that these can be repeated, but for the last choice pmiq, =1, so that if the agent visits the node again, the earlier choices will be available, but the last one not. This method can create arbitrarily long cycles (in [29] this method is called robust, i.e. not creating cycles) as in the maze:
Maze L1(5). The above method allows paths 12(324)*5 in the above maze; however, the probability of the cycle becomes all the time smaller, so at some moment the agent will exit the cycle. However, possibility of infinite cycles make the whole method unpractical, e.g. when should the collective learning phase start, if some agents are still walking in a cycle?
J. Henno / Learning in Multi-agent Systems
75
Marking visited and dead-end nodes
The previous methods could all create cycles in the learning and/or application phases. In order to avoid cycles, agents can mark rooms, which cannot contribute anything new every link from such a dead-end room takes to an already visited room. This search algorithm is essentially the classical depth-first search and here agents are allowed to use two types of marks: for visited and for dead-end nodes: Search first for an unvisited room, but if this is not possible, select a room, which has been already visited and mark the current room both as visited and as a dead end; do never enter rooms, which are marked as dead ends. move(Room):exit(Room),!. move(Room):connected(Room, Rooml), not visited(Rooml), assert(visited(Room) , move(Rooml). move(Room):connected(Room,Rooml), not (dead_end(Rooml)) , assert(visited(Room)), assert(dead_end(Room)), move(Rooml).
In some mazes this algorithm is not able to find the exit, as e.g. in the following simple maze:
Maze Labyr2
If an agent passes node 4 without using the exit link 4-E then later it will be trapped at node 2 - all the connecting nodes will be marked as dead ends. In order to avoid this, marking of visited rooms was made cumulative (in every visit a new mark was added) and in trapped situation (all connected rooms are dead ends) agent should select a room with minimal number of visiting marks. With small number of agents, the randomness present in search algorithm has still significant influence. For instance, in example maze Labyrl an simulation 20 agents were not to discover the shortest path, since after the first link 1-2 (taken by 11, i.e. approximately by half of agents) there were too many distractions possibilities to investigate, and only one (links 2-5-E) contributed to shortest path.
76
J. Henno / Learning in Multi-agent Systems
Links used by a community of 20 agents A community of 50 agents (using the same rules) was already able to fond the shortest path l-2(27)-5(25)E(50); frequency of "useless" links 1-6, 6-7 and 7-2 has reduced. However, the frequency of another "useless" link 3-4 has increased, which indicates, that the community is still not big enough:
Links used by a community of 50 agents For this algorithm it is easy to estimate complexity of the search and speed-up of search in the application phase. Suppose the maze has m rooms. As a complexity measures we will consider the number of steps t (m) (the time complexity, i.e. number of evocations of the predicate move) and the number of marks s (m) (the space complexity, i.e. total number of clauses visited (Room), dead_end (Room)); for both measures, we are interested in the estimates of the upper bound (i.e. worst case). Clearly every room will be entered not more than twice (after that it will be marked as dead_end), so t (m) < 2m ; also s (m) < 2m (every room marked as visited and as dead_end) and both these bounds can occur: Using this method agent will already in the second search find the exit without entering any cycles - the agent now "knows" all the dead ends. Suppose now that there are n agents (learners) searching the same maze and they can see each other's marks. Whenever a problem occurs (i.e. there are multiple possible paths to consider), it is supposed, that there are enough agents, so that some agent can handle every possibility. Now
t(m) = length of a cycle-free path from start to exit s(m)< t(m)*n i.e. collective search (learning) will be at least twice quicker.
J. Henno / Learning in Multi-agent Systems
77
Cumulative Curiosity Search Humans usually are curious to see places, which they know least (the number of visits is minimal). With cumulative curiosity search, agents add a mark every time they visit a room (cumulative marking). This way they can tell, how many times they have already been in the room. They always try to continue with a room, which has minimal number of (their own) marks (i.e. first of all try to find an unvisited room). When leaving a room they also mark the link what they used. These links should indicate, which way they left the last time when they were in a room, so if they have been already in that room, the earlier marked links from this room to some other room are all deleted. This allows avoid cycles in the application phase. In the application phase they use the link what they used last time when they were in the room; the collective search follows links with maximal number of marks (of different agents). With this method even 20 agents were able to find the minimal path 1-2(10)-5(11)E(20):
Links used by 20 agents using the cumulative curiosity search For the 35-room maze used in [29] 5
10 : - - - -
15
20
25
30
E
c
14
1 g ...::
24
29
3
3
£
13
18
23
2B
/ 3
,
,
12
17
22
27
n
16
21
26
g
!
3
31
Example maze Labyt-3 this method allows 50 agents to find a minimal route: 1 -6(27)-11 (18)-16( 16)-21 (24)-22(24)-27(24)-26(26)-34( 13)-35(26)
78
J. Henno / Learning in Multi-agent Systems •10 ^^
I
I
15
I
20
I
25
30 I
I
Frequency of used links in 50-agent search
8.
Agents with memory
In all the previous methods agents did not "remember" anything, the only memory was marks left in rooms. Humans usually remember at least some places where they have been. In order to resemble better "human" search, agents were made able to remember some rooms where they had been lately (e.g. the last 4 visited rooms) together with number of connections (links) from each of these rooms. When searching for next room, they first try to find a room where they have not yet been, but if this is impossible then a room, where they have not been lately (i.e. a room which is not in their "memory"). If this also is impossible, they continue from room with maximum number of links. As seen below, these humanly features made search somewhat more effective, but not very essentially. 9.
Comparing different methods
To compare above-described methods, the maze Labyr3 was searched by community of 100 agents, using four different methods: - Visited and dead-end nodes marking, dead-end rooms forbidden (go back, if only deadend rooms available); with this method average of lengths of individual routes was 32.1 (standard deviation - 15.3); agents passed altogether 1757 links; collective route length was 18; - The Cumulative Curiosity Search - average individual route length was 34.3 (standard deviation: 17.4), agents passed altogether 1541 links; collective route length - 12; - Visited and dead-end nodes marking; when only dead-end rooms are available, select a room with minimal number of previous visits; average length of individual routes was 19.8 (standard deviation - 3.9); collective route length was 13, total number of links traversed 1476; - Using memory to avoid lately visited rooms, but if this was not possible, select a room with maximal number of connections: the average length of individual routes was 17.4 (8.6), total number of links passed was 1452 and agents find one of two optimal routes:
J. Henno / Learning in Multi-agent Systems
Dead-end nodes forbidden Average Total indivi- number of links dual passed route length 1757 32.1 (15.3)
Cumulative curiosity
79
Using memory From dead-end rooms select least visited Collec- Average Total Collec- Average Total Collec- Average Total Collective indivi- number indivi- number tive indivi- number tive tive dual of links route dual of links route dual of links route route route passed length route passed length route passed length length length length length 1541 1476 17.4 1452 9 19.8 12 34.3 13 18 (17.4) (8.6) (3.9)
ll
I
I
li/
Links used in a search, where dead ends were forbidden
Used links and collective route (dotted line) using the Cumulative Curiosity search
80
J- Henno / Learning in Multi-agent Systems •10 gr—15
20
25«"S30«"535
16 """Z!
26
31
Links used in search, were from dead-end rooms were selected a least-visited one
I
I
I
\//
I
I
I
I
\
I
I
Links used and minimal route found by agents with memory These results agree (more or less) with common sense. In the first method, agents use minimal amount of information when searching for next room (only two bits - is a room already visited and is it a dead-end), and since dead-end rooms should be never entered again, the total number of visited links grows with backtracking. Backtracking was not considered a "humanly" feature, therefore in the other three methods this was not used. In the second method agents use more information - they can compare number of visits from all rooms, connected to the current one. Here they never backtrack, so the individual paths tend to be rather long, but collective path is already close to optimal. In the third method they first try to avoid dead-end rooms, which seems to reduce the overall length of individual paths. If nevertheless all possible links lead to a dead-end room, agents start searching for a least visited one (they do not backtrack), i.e. again use more information to select the next room. This method give short individual routes, also the collective route is nearly optimal. Agents with memory made best: the individual paths were shortest (in average) and they were able to find one of optimal routs. However, the differences with e.g. the third method were not very essential.
J. Henno / Learning in Multi-agent Systems
10.
81
Conclusions
Above was presented a model describing learning in multi-agent systems. If every agend behaves "a bit smarter" than totally randomly, then the community of agents can learn, just following what the majority does, without any interactions. The "smarter" (using more complicated algorithms) the agents are, the quicker the community learns. This "majority rules" principle is very similar to what we see e.g. in politics (politicians try to please most of people). A similar mechanism works in many areas, e.g. in concept formation - concepts are the way how majority of us uses words. Linguistic/semantic concepts emerge as use of language by the whole community. We use concepts for communication; if somebody is successful in communicating his/her ideas to other people, we follow his use of words, i.e. accommodate our concept system to his/hers - follow the communication path which has proven to be successful. When a child is shown first time a ball and told, "This is a ball", he next points to Sun and claims: "Ball!" It takes several iterations before he understands, what (most of) people around him mean by word "ball", but the concept will still change when he first sees e.g. pushball. References [1]
Marshall McLuhan, Quentin Fiore. The Media is the Massage: An Inventory of Effects. New York: Bantam Books, 1967 [2] http://training.trainingplace.com/newsletter/Jan2001 .htm [3] Immanuel Kant. Kritik der reinen Vernunft. http://gutenberg.aol.de/kant/krvb/krvb.htm [4] http://www.gwu.edu/~tip/theories.html [5] Johannes Cronje. Paradigms Lost. Towards integrating Objectivism and Constructivism. http://it.coe.uga.edu/itforum/paper48/paper48.htm [6] Hubert L. Dreyfus. Intelligence Without Representation, http://www.hfac.uh.edu/cogsci/dreyfus.html [7] R.A. Brooks. Intelligence without Representation. Artificial Intelligence, Vol.47, 1991, pp. 139–159 [8] Marvin Minsky . The Society of Mind. Simon and Schuster, New York, NY, 1986 [9] http://www.alife.org [10] Vygotsky, L. S. (1978). Mind in Society. Cambridge, MA: Harvard University Press. [11] Vygotsky, L. S. (1962). Thought and Language. Cambridge, MA: MIT Press [ 12] Piaget, J. (1929). The Child's Conception of the World. NY: Harcourt Brace Jovanovich [ 13] Barker, P. (1994). Designing Interactive Learning, in T. de Jong & L. Sarti (Eds), Design and Production of Multimedia and Simulation-based Learning Material. Dordrecht: Kluwer Academic Publishers [14] Edward J. Downes, Sally J. McMillan, Defining interactivity. New Media & Society 2(2), 2000, 157179 [15] Rafaeli, S. and Sudweeks, F. Networked Interactivity. Journal of Computer-Mediated Communication 2(4), 1977, http://www.usc.edu/dept/annenberg/vol2/issue4/rafaeli.sudweeks.html [16] Juhani E. Tuovinen, Multimedia Distance Education Interactions. Education Media International 37:1, 2000, 17–23 [17] Rod Sims, Interactivity: A Forgotten Art? Available online at http://intro.base.org/docs/interact/ [18] Jaak Henno, Design of Intelligent Learning Support Environments. Nordic Conference on Computer Aided Higher Education, Proceedings, Helsinki, August 21–23, 1991, 119–137 [19] David Perkins. Knowledge as Design. Hillsdale, NJ: Erlbaum, 1986. [20] Thomas Duffy. Both types of interaction for critical thinking. Socialand Instructional Dynamics Online, http://www.umuc.edii/cgi-bin/HyperNewsl _9_5-readonly/get/dia_l/17/10.html [21] Thomas Duffy. Both types of interaction for critical thinking. Socialand Instructional Dynamics Online, http://www.umuc.edu/cgi-bin/HyperNewsl_9_5-readonly/get/dia_l/l2/33.html [22] Rod Sims (2000). An interactive conundrum: Constructs of interactivity and learning theory. Australian Journal of Educational Technology, 16(1), 45-57 [23] Gold, M.E. Language identification in the limit. Information and Control 10, 1965, 447-474
82
J. Henno / Learning in Multi-agent Systems
[24]
Narendra, K.S. and Thathachar, M.A.L. Learning Automata: An Introduction. Prentice-Hall, Englewood Cliffs, NJ, 1989 Kaebling, L.P; Littman, M.L. Reinforcement Learning: A Survey. Journal of Artificial intelligence Research 4 (1998), 237–285 ARIADNE Project, http://tina.lancs.ac.uk/computing/research/cseg/projects/ariadne/ariadne.htm] Vladimir Dimitrov . Communication As Interaction In Synergy With Uncertainty. http://www.pnc.com.au/~lfell/vladimir.html Ryder, M, Wilson, B. From Center to Periphery: Shifting Agency in Complex Technical Learning Environments. American Educational Research Association, March 27, 1997; http://carbon.cudenver.edu/~mryder/coss.html Norman L. Johnson. Collective Problem Solving: Functionality Beyond the Individual. http://ishi.lang.gov/Documents 1 .html Norman L. Johnson. Developmental Insights into Evolving Systems: Rules of Diversity, NonSelection, Self-Organization, Symbiosis. Los Alamos National Laboratory report LA-UR-00-844, http:// Andrew Garland. Overview of Collective Memory. http://www.cs.brandeis.edu/~aeg/aipsw98/node2.html http://www.flyswat.com/ C.K. Hemelrijk (1997). Cooperation without Genes, Games or Cognition. I. Harvey (Ed), Fourth European Conference on Artificial life, pp 511–520, Cambridge, MIT Press E.M. Bonabeau, M. Dorigo, G. Theraulaz (1999). Swarm Intelligence: From Natural to Artificial Systems. New York, Oxford University press.
[25] [26] [27] [28] [29] [30] [31] [32] [33] [34]
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
A Logic of Ontology for Object Oriented Software Components Naoko Izumi*, Naoki Yonezaki^ Faculty of Social Science and Computer Science, Jumonji University* Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of technology*
Abstract. Lesniewski's Mereology is based on 'is a' relation and 'part-whole' relation. Software component has the basic structure concerning 'refinement relation' and the relation between 'products and parts'. The former is a kind of 'is a' relation and the latter is a kind of 'part-whole' relation, however it is not possible to check consistency among software object construction using the mereology. In this paper we present axiomatic systems which formalize object construction based on 'is a' relation and 'has a' relation. These systems have two types of model theoretic semantics for 'has a' relation, i.e. 'strict semantics' and 'abstract semantics'. The axiomatic system is sound and complete for 'strict semantics' by giving a method for extracting an 'is-has' model from a set of Henkin theories.
1
Introduction
Software component structure is basically constructed from 'refinement' relation and the relation between 'products and parts', which correspond to 'is a' relation and 'has a' relation respectively. Software relations concerning 'is' and 'has' are intricate. Classical mereology[11] is the theory of 'part-whole' relation and Ontology[12][2] is the theory of 'is-a' relation. Lesniewski has amalgamated these theories and has presented the theory which consists of axioms describing interrelationship between 'is a' and 'part-whole' relations[l], however, Lesniewski's Mereology is not qualified to describe software component structure. First, It does not have means to describe the complex component structure. Second, 'is-a' relation of Lesniewski's Mereology is too poor to express refinement relation among software objects. Third, it is not possible to check consistency among software object constructions, because Lesniewski's Mereology does not have regorous semantics. Recently Clarke proposed a mereology: RCC(Region Connection Calculus) [1], [13] which is applied to geometrical information systems. RCC provided qualitative reasoning on the properties of spatial data and has its semantics based on topological structures. But again, it is not suitable to describe software component structure. In [7], we presented a theory to formalize relations, which are 'component' relation, 'refinement' relation and 'version' relation between software objects. It is based on Object Logic which allows us to express concepts of object-oriented databases [3] [4] [5] [6]. We gave axiomatic systems for the refinement relation which corresponds to the 'is a' relation of this paper. The relation of 'having parts' presented in [7] was involved in the refinement relation, however the theory is too poor to prove component consistency.
83
84
N. Izumi and N. Yonezaki/A Logic of Ontology
Our new mereology presented in this paper, formalizes refinement relation and component structure which correspond to 'is a' relation and 'has a' relation on software objects respectively. We call it 'is-has theory'. With this theory we can infer whether software objects can be combined to produce a product or not. There are two views concerning the relationship between 'is' and 'has'. We presented two axiomatic systems: 'strict' and 'abstract'. Each of them has its own set of axioms and semantics. As a software model structure, we use a term which which represents object constituted of the parts of its components, so that instance relation or class relation, and part relation are defined on a set of terms. 'Is a' relation represents the class hierarchy on instances and classes and its concept is similar to the is-a concept in mereology, in which is-a relation stands for both 'is one of and 'is one among' [1]. Inheritance relation in objectoriented structure is derived from the axioms of only abstract system but the strict system is fit to describe the software configuration relation. For example, if we describe the fact that a car has tires based on the premise that both 'car tires' and 'airplain tires' are types of tires, we can interpret the fact in the following two ways: (a) A car has tires of a certain type. (b) A car has tires in the class of car tires. The former is called 'abstract semantics' and the latter 'strict semantics'. We can interpret the part relation in the following two different ways. If a product 'A' has a component part'B', then (1) There is at least one object described as 'B' which is a part of the object described as 'A'. (2) Any object described as 'B' is a part of object described as 'A', and any object described as 'A' has an object described as 'B' as a part of it. In this paper we will define the semantic functions M(A}9t of abstract semantics and M(S)9l of strict semantics in section 2. Axiomatic systems for each semantics and the theorems derived from them are given in section 3. In section 4, we show that these axiomatic systems are sound. In section 5 we give the method for extracting an 'is-has' model from Henkin theories of the axiomatic system of strict semantics so that it can be shown that it is complete. In section 6 we compare is-has theory with Lesniewski's Mereology. In section 7 we show an example of deduction and conclude in section 8.
2 Is-has theory In this section we present is-has theory based on first order logic. 2.1
Syntax
In this section we introduce a formal language. Definition 1 The following sets are prepared for the definition of formulae. Oa a set of atomic object constants O a set of denumerable object constants V a set of object variables L a finite set of attribute labels (L = \J ^ La) VL a set of label variables D Definition 2 For objects a, b EOa U 6 UV, a is b, a isr 6, a /iasj 6, atom(a) and tna(a) are atomic formulae, 1 where I £ (LuVi)+. D Definition 3 Formulae are recursively defined with the atomic formulae and A, V, -«,=*• or V as in the case of first order formulae. Especially, for a formula (f and an variable v € (Fu VL), is a formula and we define 3v(y>) by ->Vv(-xp). D ll
a isr fr' means that 'a is b' or 'a equals b'.
N. Izumi and N. Yonezaki / A Logic of Ontology
85
In the above, parentheses () are omitted in obvious cases and (0 =>• t/>) A (tp => 0) is abbreviated to i/). Let Language Lang be (Oa U 6, {La}a Q , V U VI,,{A, V, -i,=>, V, 3},{hast \ I 6 L} U {is, atom, isr, iris}).
2. 2 Semantics In this section we will define a model based semantics. We prepare a set Oa of auxiliary object constants for constructing object terms which constitute a semantic domain of objects. Semantic domain (U, • /,• ^ lj and for ao[^i = cri[...|7n = "„]...]], i*j=*li *lj. n The restriction on labels in definition 4 assures that the length of any term in U_ is finite, because L is & finite set. Definition 5 ^0 is defined as follows. • a ~0 a
.
a
~0 ft =» a[l =T][/' = S] ^0 0(1' = 6}(l = 7]
• a ~0 /3,7 ~0 • 3v'(u' hasi v' A v' isr v)) V«V«'VuV/(-'(ii has/ v) V -i(u' is u) V 3v'(u' /iast v' A v' isr v)) VuVw'WV/Hu hasi v A Vw'(-i(u' /ias, u') V ^(v' isr v))) V -(u' is u) VuVu'WV/((u /ias, u A Vu'(-(u' /ias/ v1) V -(u' isr v))) => -(u' is u) VuVu'Vu(3/(ti /ias/ v A Vu'(-1(u/ ftasj v1) V -.(u' isr v))) => ^(u' is u))
D
Theorem 5 H0 VuVv(u is v =$ u isr v) (Proof by lemma 5, axiom 5, axiom 10 and axiom 17.) D Theorem 5 and corollary 1 are the inter-relationship between is and isr. Theorem 6 \-e VuVvVv'(((v' isr w) A (v isr v1)) =>• ((u is v • v isr u) (By axiom 13) Theorem 8 \-g VuVvViu(u is v A v is w => « is to) (By axiom 10, theorem 5)
D D
Theorem 9 VuVvVt/((f7/ isr u) A (v isr v')) =>• ((ins(v) & tns(u)) A(atom(u) •» atom(w')))) (By axiom 8, axiom 16, axiom 22 and theorem 6) D Theorem 10 V«-i(« is u) (proof) Vu(u isr u) (by axiom 12 ) Vu(u isr u =» ->u is u) (by axiom 14) Then Vu(-i(u is u)) Theorem 11 V«V«'(u' is u => -i(u is u')) (proof) VuVu'(u is u' =>• it isr u') (by theorem 5) Then VuVu'f-m isr u' => --u is u'). V«Vu'(u' is u =*• -^u isr u') (by axiom 14) Then V«Vu'(u' is u => -vu is u')
D
D
N. Izumi and N. Yonezaki / A Logic of Ontology
4
9]
Soundness
In this section we enumerate lemmata for soundness theorem together with the proof of it. Lemma 6 (/ \=9 a' isr a, M(9)9(a) = A4(0)?(a0)[/i = M(6)aI(al)]...[lmi = M(9}gI(ami}}) =» 3ni3a'1..3a'n(mI < n/ A (/ (=, a( isr a,) A M(B)9I(af) =' M(9}9I(aQ}[ll = M(9)9 (ai)]...[/n/ = M0)/(a'n,)D for any i s.t. 1 < j < m/) Lemma 7 (I f=e ma(o), Ai(tf)?(a) = .M(0)?(a 0 )Ri = M(e)aI(a1)]...[lmj = / (= ins(oj) for any i s.t. 1 < z < m/ Lemma 8 ForM(0)?(o) € U and Ai(0)?(o) = .M(0)?(a0)[/i = .M(0)?(ai )]...[/„, = .M(0)?(an/)], I |=0 a /iasj. aj for any i s.t. 1 < i < n/ (Proof) ' 1. In the case that .M(#)f(a) is an instance, M(9)9i(a.i) is an instance for any i s.t. 1 < i < n/ by lemma 7. Since .M(0)j(a) is an instance, Vx(x e 5(Ai(^)f (a)) ==> x = M(tf)J(o)). A =» A Hs ¥>) ^=> (Fs U A U {-¥>} t/Foi -I =»> A u {-•¥>} \£s -L) (proof) (A |=s v ^^ A ^s V3) ^^ (A 1=5 V ^^ TS U A I-FOZ, ) (by definition 12) •$=»
(AU{-^} 1=5 ± =» F5uAu{^} I-FOI J-) (rsuAu{-^} \/FOL j_ ==> AU{-^} \£s i)
D
This lemma means that the completeness of the system can be established by showing the fact that if FS U A U {$} is consistent then there exists an is-has model which satisfies . Let FS U A be a set of sentences in language Lang d= (Obj(Obja C Obj), {Label a}a€Obja, Variables, {A, V, =», V, 3}, {hasi | / 6 Lafee/ + } U {is, atom, isr, ins}). In the following, if we assume FS U A U { (c3I^)(I)) | 3x afom(u)). There also exists an equivalence witness for axiom 17 to one. Then, it is sufficient to take witnesses of atomic object constants for axiom 17 into consideration. 2 For each sentence 3xp(x) there is a constant c such that (3x(p(x) => (3w'(w isr w' A atom(to')) A u isr w A v isr to A u tsr a A v tsr 6 A atom(a) A atom(b)) ( by axiom 17) Tm h 3to(u tsr to A v isr w A u tsr a A u tsr 6 A a£om(a) A atom(b)) =>• 3to'(« tsr w' A aiom(to') A u tsr a A v tsr to' A v tsr 6 A atom(a) A atom(b)) (by axiom 11) Tm h 3w'((u isr w1 A otom(to') A u tsr a A v tsr u;' A u tsr 6 A atom(a) A a£om(b)) => (a tsr to' A w' isr a A b isr to' A w1 isr b)) (by axiom 20) Tm I- 3to((u isr to A u tsr to A u tsr a A u tsr 6 A atom(a) A atom(b)) =>• (a tsr 6 A 6 tsr a)) (by axiom 11) When Tm r- 3to(u tsr to A v tsr u; A u tsr a A u tsr b A atom(a) A atom(6)), we can define a~6. Then ~ becomes an equivalence relation in Ob fa. For c1, let c1 be a representative of c1. Ai = {c1 I [c1] 6 (Ol U Obja)/~} where [c1] is an equivarence set. 4. When atam(b) e Tm, we define m s.t. m(6) ^ m(61) ^/ b1 = b 5. If a is in Aa then Label'a is a set of label constants of a. Any label / in Labe/* can be decomposed into some labels with the length 1 by axiom 9. For any a, ->atom(a) => 3l3b(a hasi b) A 3al(a is a1 A atom(a1)) (by axiom 16, 17) When a hasi b e Tm and |/| = 1, m/ and m are defined as follows: m,(a) < ^ / m(a 1 )[/ = m(61)]. If ->atom(b) e Tm, then Tm I- 3x3/'(6 hasp z) (By axiom 16). Then mt'(6) can be recursively defined but by axiom 15 it is finitely defined. a tsr a' A a' tsr a 6 Tm ==> mj(a) = mj(o') for some label / (By axiom 8). 6. Labell ^/ {/ | a hast b e Tm, |/| = 1} 7. For a such that a e Obj* and ->atom(a) 6 Tm, m(: OT -> A) is inductively defined as follows. 771.4(0) = ir^a1) for a 6 Aa AQ = mA(al) = Aa for \li\ = 1, i; e (La6e/*)+, m;,(6i) 6 A0, a hastibi 6 T 7
* >lo U ma
a € Ob*
for |/i| = 1, I'i € (La6e/*)+, m*,(6i) € An_!, a /las^bi 6 Tm
For any a in Obj*, m^ ,fc (a) is recursively defined until {/i, ..., Ik} = {fi|a /iasj; 6 6 Tm}. m x (o) *; m,* ...,fc(a) iff {llt ...,lk] = {li\a haSli b 6 Tm}.
N. Izumi and N. Yonezaki/A Logic of Ontology
95
can be recursively but finitely defined because n is finite by axiom 15 and axiom 16.
8. Ad=f {mA(a) \ a € Obj*} 9. TH d=f (Aa U A, Aa) U ((Obj* - Obj*a) -> A)) from Tm. Theorem 13 TH becomes a model, i.e. 1. Tm \~s a hasi b TH\=s a hasi b
(Proof Appendix.)
2. Tm hs a is b m\=s a is b Tm h5 a isr 6 TH\=s a isr b For is or isr, proof is omitted but it is easy to prove. For l~H\=s ais b => Tm h a is 6, it is inductively prove by the construction of (3u;(o /ias, tw A 6 isr w) V Vtu(->a /ias, «;))) (By corollary 1) Tm l~s 3w(a /ias, tw A 6 isr w) V V«;(-ia /ias, w) V a /ias, 6 (by (C) and (D)) Tm t-5 3tw(a hasi w A 6 isr w) V ofcas/6 (By (B)) Tm l~s 3tw(a /iasj w A 6 isr w;) A 3y(a /ias/ yAy isr 6) (by (B) and (E)) Tm ^5 3u;3y(o hasi w A 6 isr u; A o /iasj yAy isr 6) V a /iasj 6 Tm l-5 a hasi b (by axiom 8 and (F))
(C) (D) (E) -(F)
Theorem 13 TH\=S a has/ 6 Tm hs a /ios/ 6 1. in\=s a hasi b =>• Tm h5 a /ias/ 6 (a) When myi(o) is an instance, • in the case that 01.4(0) = a1 [I = m.a(&)], it is easy to derived by the construction of in, • in the case that m,i(b) Tm H5 6 /ws/. c. Then Tm 1-5 a /ias/ 6 A b has\ c. By axiom 9, we get Tm hs a /ias//> c. (b) When 771.4(0) is not an instance, m\=s a /ias, 6 «
Vx(x 6 5(myl(a)) => 3y(y 6 5(mx(6)) A m A (y)
3x(x e 5(mx(o)) A mA(y) 3y(y € 5(m A (fr)) A m A (y) ^^ m A (x)))) and I^Ns Vy(y e 5(mA(6)) => 3x(x 6 5(mA(o)) A m A (y) m A (x))) => Tm h5 Vy(y € 5(mA(6)) =>• 3x(x e 5(mA(o)) A m A (y) ^^ m A (x))). Then we prove that Tm hs a /ias/ 6 by lemma 12. 2. Tm hs a hasi b =*TH\=s o hasi b (proof) If |/| > 1, then it is reduced to the case where |/| = 1 by axiom 9. Now |/| = 1;
N. Izumi and N. Yonezaki / A Logic of Ontology
In the case of that Tm 1-5 ins(a), if Tm (-5 a hasi b then m,i(a) = mA.(al)[l = m A(&)]['i =TOA(&I)]-"[^»=TOA(&n)]for some li, ...,{„, 61,..., 6n by construction of a model TH, Then it is trivial that TH\=s o. has\ b. In the case of that Tm [-5 -i(ins(a)), - When Tm (-5 a hasi b, for all x, Tm HS x is a => Tm \~s 3y(y is b/\x hasi y) (by axiom 1). Tm t~5 x is o A ins(x) ==> Tm (-5 3y(y is b A x /iasj y A ins(x)) T"m (-5 x is a A ins(x) ==> Tm \~s 3y(y is 6 A x hasi y A ins(x) A ins(y)) (by axiom 23). There exists a witness c3y(j, is frAz ^4, j,Am»( z )Ains(y)) in A*Then, Tm 1-5 Vx(x is a A ins(x) => 3y(y is 6 A x has; y A ins(x) A ins(y)). So that J?i [=s Vx(x 6 S(mA(a)) => 3y(y € S(mA(b) A x hasiy))) - When Tm 1-5 a hasi b, for all y, Tm hg y is 6 =» Tm 1-5 3x(x is a A x hasi y) (by axiom 3). Tm t~s y is b A ins(y) ==*• Tm (-5 3x(x is a A x /ias; y A ins(y)) I'm l~5 y »s 6 A ins(y) =» Tm 1-5 3x3x'(x is a A x ftasj y A ins(y) A x' isr x A ins(x')) ( by axiom 21). Tm \-s VxVx'Vy(x hasi yAx' is xAins(y) => 3y"(x' /iasz y"Ains(y")Ay" isr y)) (by axiom 1, 13, 22, 8) Then when Tm r-$ a /iasj 6, for all y, Tm hg y is 6 =>• 3x3x'3y"(x' isr x A ins(x') A x is a A x' has\ y") So that J"H |=5 Vy(y € 5(m^(6)) => 3x(x 6 5(m A (a) A x hasi y))). - Hence, Tm (-5 a /msj 6 =>Z'H|=5 a hasi b.
99
100
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
Provability of Relevant Logic ER Noriaki YOSHIURA1 and Naoki YONEZAKI2 1
Gunma University 1-5-1 Tenjin-cho, Kiryu City, Gunma, Japan
[email protected] 2
Tokyo Institute of Technology 2-12-1 Ookayama Meguro-ku Tokyo, Japan
[email protected] Abstract. Relevant logic has been studied from the viewpoint of philosophy and artificial intelligence. The aim of relevant logic is to remove the fallacies of implication from classical logic. Many relevant logics have been proposed, however, those logics are weak because of the side effect of removing the fallacies. In fact, some classical logic theorems without the fallacies of implication can not be proved in relevant logic. Thus, it is an interesting problem to construct a new relevant logic in which the fallacies of implication are removed and which is not so weak. In some literatures, we have already proposed the relevant logic ER and proved that ER has two properties; the first one is that the fallacies of implication are removed from this system as much as from the typical relevant logic R. The second one is that ER is decidable. However, we have not discussed provability of ER yet. In this paper, we prove that ER is strictly stronger than R. As a result, ER is proved to be a relevant logic in which the fallacies of implication are removed and which is not so weak. We also discuss several properties of ER in this paper.
1
Introduction
Formalization of knowledge reasoning is one of the main issues of philosophy and artificial intelligence, however, classical logic is not sufficient because the meaning of implication in classical logic is different from that in daily speech [2]. For example, for an arbitrary formula A, A —» B can be inferred from B in classical logic, even if there is no relation between A and B. However, from the viewpoint of daily speech, some relation is necessary for the inference of A —> B. In classical logic "Snow is black implies 3 + 5 = 8" can be inferred from "3 + 5 = 8", although it seems strange in the point of view of daily speech. The fallacies of implication are mainly classified into three kinds of classes; fallacies of relevance, validity and necessity(modality) [2, 7]. The study of relevant logic was initiated by Orlov and continued deeply by Lewis[5]. Church[4] and Moh[6] proposed the implicational fragment system -R_» independently. The fallacies of relevance and validity are removed from R_». Ackermann proposed entailment[l], from which all of above fallacies are removed. Anderson and Belnap proposed the relevant logic R including R_> and the relevant logic E including entailment[2]. Many other logics have
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
101
been proposed. Since the fallacies of relevance and validity are considered to be strong fallacies, they are removed from almost all relevant logics. R is a typical relevant logic excluding the fallacies of relevance and validity[2]. Variable-sharing was introduced as a necessary condition of relevant logic[2]. Variable-sharing means that if A —> B is a theorem, then A and B share the same atomic propositions. Almost all relevant logics, including R satisfy this condition. As a method of formalization of knowledge reasoning, relevance logic is more suitable than classical logic in the sense that the fallacies of implication are removed[3]. Removing fallacies, however, makes relevant logics too weak with respect to provability. For example, ((->A V B) A A) —> B is not a theorem in R, that is, B can not be inferred from ->A V B and A in R. This formula, however, do not include the fallacies of implication. Thus, it is an interesting problem to construct a new relevant logic in which the fallacies of implication are removed and which is not so weak. In [10], we proposed the relevant logic ER and proved that ER has two properties; the first one is that the fallacies of implication are removed from ER as much as from the typical relevant logic R. The second one is that ER is decidable. ER is a sequent style natural deduction system. By attaching attribute to formulas and using this attribute to restrict applicability of inference rules, the fallacies of implications are removed from ER. Since ER uses attribute of formulas in the inference rules, ER is a kind of labeled deduction system[9]. In [10], we also show that several formulas such as ((->A V B) A A) —> B can be inferred in ER but not in R. In this paper, we prove that ER is strictly stronger than R. This result and previous results in [10] show that ER is a relevant logic in which fallacies of implication are removed and which is not so weak. It follows that ER is one of the suitable logics as a method of formalization of knowledge reasoning. Since ER is a labeled deduction system, it is not obvious that some logical properties hold in ER. We also discuss some properties such as consistency in this paper. Although these properties includes some negative properties, it is important to make its properties clear. This paper is organized as follows. Section 2 recalls ER. Section 3 proves that ER is strictly stronger than R. Section 4 discusses some properties of ER. Section 5 concludes this paper. 2
Relevant Logic ER
This section defines the relevant logic ER. As showed in Fig. 1, ER is defined as a sequent style natural deduction system. The difference to usual natural deduction is that a formula has an attribute value. Attribute values show how a formula is inferred and which kind of rule can be applied to the formula. By using attribute values, we restrict the applicability of inference rules in order to remove the fallacies of implication from ER. Definition 2.1 (Formula) An atomic proposition is a formula. If A and B are formulas, - A, A/\B, A V B and A —» B are formulas. Definition 2.2 (Attribute values of formula) We define e, i and r as attribute values of formula. The attribute value e indicates that a formula with such an attribute value can be major premise of elimination rules. The attribute value i indicates that a formula with such an attribute value can not be major premise of elimination rules. The
102
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
A :e h A :e
->A : r I—-.A : r
T\-A:e
: e . „, ""*
Axiom
F h A :e
RAA
r h A AB : e . ™ F h A : y> T h B : e "•"" TI->lVB:»V/1
ThAvB:e
T h yl VB : t
V/2
Ai,A;eHC:e A2,B:ehC:e r,Aj,A2l-C:e 2,B:el-C:
V£?2
: . . . (Bn -> A) . . .) is not a theorem in ER. Theorem 2.3 (Removal of fallacy of validity) Suppose that A and B are different atomic propositions. A —> (B V ->B) and (A A ->A) —> B are not theorems in ER. Contrary to most relevant logics, we have shown following [10]. Theorem 2.4 (Decidability of ER) ER is decidable. 3
Comparison of ER and R
This section proves that ER is strictly stronger than R. In this proof, we use FR, a Fitch style natural deduction system of R[2]. The proof is difficult because of difference between system styles of ER and FR; ER is a sequent style natural deduction system while In this section, we use auxiliary logics, FR, FR' and ER' to prove that ER is strictly stronger than R. FR' and ER' are sequent style natural deduction systems. The reason why we use FR' and ER' is that it is difficult to compare FR and ER directly. The outline of the proof is as follows: First, we prove that FR' is stronger than FR. Next, we prove that all theorems of FR' can be inferred by a normalized proof of FR'. Further we prove that ER' is weaker than ER and that ER' is stronger than FR' by showing the correspondence between proofs of ER' and normalized proofs of FR'. In addition, we show that several formulas can be inferred in ER but not in R. As a result, we obtain that ER is strictly stronger than R. 3.1 System FR This subsection considers FR[2]. FR is a Fitch style natural deduction in which a set of natural numbers is attached to each formula used in proof. This set decides the applicability of inference rules and we can prevent the fallacies of implication by using this set. A set of natural numbers represents the hypotheses which are used for the inference of a formula with this set. In the following, a tt) b is an union of sets a and b such that a n b = 0. Definition 3.1 Suppose that A is a formula and x is a set of natural numbers. Then, Ax stands for a rank formula. Definition 3.2 A proof of FR is a finite sequent F1, F2, . . . , Fn of rank formulas such that Fi (1 < i < n) is inferred by one of the following rules with some of F1, F2,...Fi-1 as premises. Natural number which is called "rank" is added to each Fi (1 < i < n). Inference rule The rank of premises of inference rules must be the same as that of previous rank formula of the conclusion formula. In a proof F1, F2, . . . , Fn, we denote Fi by the previous rank formula F i - 1 . Hyp: A{k} way be introduced as a hypothesis. If the formula is on the top of the proof, its rank is 1. Otherwise, the rank is obtained by incrementing the rank of the previous formula.
104
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
Rep: If Aa occurs in the proof, Aa may be introduced. The rank of the introduced formula is the same as that of the immediately previous formula. -> I: From Ba&{k} to infer A—*Ba, where k is the same as the rank of BaW{k} and A{k} is the nearest formula introduced by the rule Hyp. The rank of A —> Ba is that of Bam{k} minus 1. A{k} is called discharged formula. > E: From Aa and A —> Bb to infer Baub, whose rank is the same as that Aa. ->I: From A —> ->Aa to infer ~>Aa, whose rank is the same as that of A —> ->Aa. ->E: From ~Ba\jb, whose rank is the same as that of ->Aa->->/: From ->->Aa to infer Aa, whose rank is the same as that of->->Aa. i->E: From Aa to infer ->-^Aa, whose rank is the same as Aa. A/: From Aa and Ba to infer A/\Ba, whose rank is the same as that of A A Ba. /\E: From A/\Ba
to infer Aa or Ba, whose rank is the same as that of A A Ba.
VI: From Aa or Ba to infer A V Ba, whose rank is the same as that of Aa or Ba. VE: From A V Ba, A —> Cb, and B —> Cb, to infer CaUb, whose rank is the same as that of A->Ca or B -*Ca. AV: From A A (B V C)a to infer (A A B) V Ca, whose rank is the same as that of A A(B v C)a. Definition 3.3 (Theorem of FR) A is a theorem of FR if and only if A0 can be inferred. 3.2 System FR' This subsection defines a sequent style natural deduction system FR'. FR' is a complicated system, especially some inference rules of FR' such as P, VE2 or V/3 are non-standard. This is because we require that all of theorems in FR' can be inferred in a normalized proof and that a proof of FR' corresponds to that of ER' to be defined later. Sequent of FR' consists of rank formulas and package formulas. Definition 3.4 (Package formula) We define a package formula recursively as follows. • If 7» ^i and 62 are multisets of rank formulas, then (j\Si\82) is a package formula. • If j, 6\ and 63 are multisets of rank formulas and package formulas, then is a package formula. Definition 3.5 (Rank set) We define rank set as follows. 1. The rank set of a rank formula Ax is x. 2. The rank set of a package formula (yltiifa) is the rank set 0/7.
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
a
(Axiom) -
7, -'Ai Kmt KAA 7hA6 -
A/
7, Aa h Boab 7 h .A -> £a 3 h A6 7 h A - > J 9 6 ~* J 7,*HS a U 6
105
^
-y
~
7 h -^Ag a h A6 7,«l-«u6 ro
f, J?t, B6 h A. Bb
Ca» h C»
: Applying C repeatedly
3.3 Normalization theorem of FR' This subsection proves that every proof of FR' can be transfered into a normalized proof. Since a set of natural numbers and some special inference rules are used in FR', it is difficult to normalize a proof of FR' by normalization methods used in classical logic. In addition, we require that normalized proof of FR' should correspond to a proof of ER. Thus, a normalization procedure proposed in this section consists of two conversion procedures. The first one is a conversion for RAA of FR' because the restriction on using RAA in ER is unique and because a normalized proof of FR' should correspond to a proof of ER. The second one is a usual normalization conversion given in [8]. At the beginning, we give some definitions to prove normalization theorem of FR'. Definition 3.9 (Cut) Cut of a proof of FR' is defined to be the occurrence of a finite sequence of a sequent F1, F2, . . . , Fn (n > 1) satisfying the following conditions. 1. F1 is a conclusion of I-rule l. 1
I-rule of FR' is AI, VIl, VI2, VI3, -» I or -.I.
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
107
2. Fi (2 < i < n) is a conclusion of the inference rule C with premise Fi-1. 3. Fn is a major premise
2
of E-rule3.
The length of cut F1, • • •, Fn is n. By definition, the right side formulas of the members of cut are the same formula. The complexity of cut is defined to be the complexity 4 of the right side formula of the members of cut. Definition 3.10 (Normalized proof) A proof is a normalized proof if and only if it satisfies the following. • Formulas discharged 5 by RAA are only negative literals6. • Cut does not exist. In the following, we prove that every theorem can be inferred in a normalized proof. Lemma 3.2 If B0 can be proved from 7,61,62 h Aa where Si and 63 have the same rank set, then h 3% can be proved from 7,5\ h Aa. Proof of Lemma 3.2: We prove this lemma by induction on the number of VE1, VE2 and V/3 between 7,61,62)- Aa and h B0.
H B0
Proof A
h B9
Proof B
Suppose that VE1, VE2 and V/3 are not used between 7,^1,^2 ^~ Aa and h B0. Since only the inference rule C can remove the elements of E, vE; or ->E. 4 The complexity of a formula A is the number of occurrences of atomic propositions and connectives in A. 5 A formula discharged by RAA is -iAa in RAA. 6 A negative literal is of the form ->A where A is an atomic proposition.
108
N. Yoshiura and N. Yonezaki / Provability of Relevant Logic ER
Proof of Lemma 3.3: Let x be the maximum complexity of cut in a proof, y the sum of the lengths of all maximum complexity cuts, s the maximum complexity of formulas discharged by RAA, and t the number of maximum complexity formulas discharged by RAA. C is defined to be (x + s) • u + (y +t), where u> is a cardinal number. We repeatedly apply the following conversion to a proof of a theorem of FR' if possible. First select the cut or discharged formula that has the highest complexity. If a cut is selected, convert it by using rules in Fig.3. By applying Conversion rule 1 or 4, the end sequent of cut changes. For example, by applying Conversion rule 4, an end sequent (f>, ,6i haut or 0,82 I~0u6- Lemma 3.2, however, guarantees that there exists a proof with the same conclusion. If a formula A discharged by RAA is selected, then convert the proof as follows: 1. A is of the form - < CONCEPTUAL_ANNOTATION> Template2.31 Exist hospital_l begin 02/06/1997
J
Next
|
fZ
Figure 4. Final results of the application of hypothesis hi of Table 6. The values associated with the variables z et v in the first condition schema will then be used to create the search patterns derived from the second condition schema. It will then be possible to retrieve an occurrence corresponding to the information: "In the framework of the agreement previously mentioned, Pharmacopeia has actually produced the new
G.P. Zarri / Conceptual Modelling and Knowledge Management
181
compound", see again Figure 4. The global information retrieved through the execution of the hypothesis can then supply a sort of 'plausible explanation' of the Schering's payment: Pharmacopiea and Schering have concluded some agreements for the production of a given compound, and this compound has been effectively produced by Pharmacopeia. We must note that, obviously, the example developed here has only a didactic purpose, and that this particular hypothesis is certainly not the only one that we could test to explain the case examined.
7. Conclusion Sticking strictly to the domain concerning the representation (and processing) of narratives in computer-understandable form, we can now examine some 'other' possible solutions that, explicitly or implicitly, have been proposed for this task. Because of the space constraints, we will limit ourselves to the 'pure ontology' approach; for some remarks about conceptual graphs see, e.g., [14]. For simplicity's sake, we will assume here that the concepts of an ontology are defined as frames, i.e., represented according to two basic principles. The first is a hierarchical one, and it is materialised by the IsA link: it relates the concept to be defined to all the other concepts of the ontology trough the 'generic' (or 'subsumes'), 'specific' (or 'is-subsumed') and 'disjoint' relationships. The second is a relational principle and, via the 'attribute (property)-value' mechanism, relates the concept to be defined to some of the other concepts already present in the ontology. We will not dwell here on secondary aspects of the structure of a frame like facets (used to describe properties of slots) and axioms (used to specify additional constraints). It is now evident that an organisation in terms of frames (or an equivalent one) is largely sufficient to provide a static definition of the concepts — i.e., a definition a priori of each concept considered in itself. We can, on the contrary, wonder if this sort of organisation can be sufficient to define the dynamic behaviour of the concepts, i.e., to describe the mutual relationships affecting a posteriori the concepts and their instances when they take part in some concrete action, situation etc. ('events'). If we want to represent a narrative fragment like "NMTV (an European media company) ... will develop a lap top computer system...", asserting that nmtv_ is an instance of the concept company_ and that we must introduce an instance of a concept like lap_top_pc will not be sufficient. We must, in this case, have recourse to a most complex way of structuring the concepts that, as in NKRL, includes also a 'predicate' and the associated 'roles', the temporal co-ordinates, etc. In the literature, we find sometimes descriptions of frame-based systems trying to extend the attribute-value mechanism to produce some representations of 'events' according to an NKRL meaning. To code, in fact, some simple sell/purchase events, it is possible to add, in the frame for, e.g., company_, slots in the style of HasAcquired or AcquiredBy or, better, it is possible to define a new concept like company_acquisition with slots like NameOfTheCompany, Buyer, DateOf Acquisition, Price etc. In this way, the instances of company_acquisition could be sufficient to describe in a complete way a (simple) sell/purchase event for a company.
182
G.P. Zarri / Conceptual Modelling and Knowledge Management
The limits of this approach are however evident. Restraining the description of sell/purchase events to the sole relationships between the buyer, the seller and the 'object' exchanged, with some additional information about, e.g., date of the transaction and price is, normally, only a very rough approximation of the original event, and a lot of useful information is lost. It is very likely, in fact, that the original information about a company's sale was something in the style of: "Company X has sold its subsidiary Y to Z because the profits of Y had has fallen dangerously these last years due to a lack of investments" or, returning to a previous example, "NMTV will develop a lap top computer system to put controlled circulation magazines out of business" or, to take a last example from the CONCERTO'S biotechnology domain, "Berlex made a milestone payment to Pharmacopeia because they decided to pursue an in vivo evaluation of the candidate compound identified by Pharmacopeia". In Computational Linguistics terms, we are here in the domain of 'Discourse Analysis' that deals, in short, with the two following problems: i) determining the nature of the information that, in a sequence of statements, goes beyond the simple addition of the information conveyed by a single statement; ii) determining the influence of the context in which a statement is used on the meaning of this individual statement, or part of it. Being able to supply an, even approximate, representation of the 'meaning' of real events implies then to be able to deal with all sort of 'connectivity phenomena' like causality, goal, indirect speech, co-ordination and subordination etc., i.e., precisely the phenomena taken into account by the NKRL second order 'binding' structures. It is now easy to imagine, on the contrary, the awkward proliferation of totally ad-hoc slots that, sticking to the attribute-value paradigm, it would be necessary to introduce in order to approximate the connectivity phenomena in the above examples. This is, moreover, contradictory with the most recent tendencies of the theory of frame systems that postulate the creation a priori of systems of slots defined independently of any specific frame, see the Open Knowledge Base Connectivity (OKBC) protocol [5]. Trying to reduce the description of events to the description of concepts is then nothing that a further manifestation of the 'uniqueness syndrome' that affects some of the Artificial Intelligence and Knowledge Representation milieus. In NKRL, we make use in an integrated way of several sorts of representational principles, and several years of successful experimentation with the most different narrative situations are there to testify that this seems to be a reasonable approach.
References [1] Zarri, G.P., NKRL, a Knowledge Representation Tool for Encoding the 'Meaning' of Complex Narrative Texts, Natural Language Engineering - Special Issue on Knowledge Representation for Natural Language Processing in Implemented Systems, 3 (1997) 231-253. [2] Zarri, G.P., Representation of Temporal Knowledge in Events: The Formalism, and Its Potential for Legal Narratives, Information & Communications Technology Law - Special Issue on Models of Time, Action, and Situations, 7 (1998) 213–241. [3] Zarri, G.P., et al., CONCERTO, An Environment for the 'Intelligent* Indexing, Querying and Retrieval of Digital Documents. In: Foundations of Intelligent Systems - Proceedings of llth International Symposium on Methodologies for Intelligent Systems, ISMIS'99. Springer-Verlag, Berlin, 1999. [4] Fensel, D., et al., OIL in a Nutshell. In: Knowledge Acquisition, Modeling, and Management - Proceedings of the European Knowledge Acquisition Conference, EKAW'2000, Dieng, R., et al., eds. SpringerVerlag, Berlin, 2000.
G. P. Zarri / Conceptual Modelling and Knowledge Mana,
183
[5] Chaudhri, V.K., et al. Open Knowledge Base Connectivity 2.0.3. - Proposed. SRI International, Menlo Park (CA), 1998 (http://www.ai.sri.com/~okbc/spec/okbc2.html). [6] Buchheit, M., Donini, P.M., and Schaerf, A. Decidable Reasoning in Terminological Knowledge Representation Systems, Journal of Artificial Intelligence Research, 1 (1993) 109–138. [7] Sowa, J.F., Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks Cole Publishing Co., Pacific Grove (CA), 1999. [8] Zarri, G.P., and Gilardoni, Structuring and Retrieval of the Complex Predicate Arguments Proper to the NKRL Conceptual Language. In : Foundations of Intelligent Systems - Proceedings of 9th International Symposium on Methodologies for Intelligent Systems, ISMIS'96. Springer-Verlag, Berlin, 1996. [9] Franconi, E., A Treatment of Plurals and Plural Quantifications Based on a Theory of Collections, Minds and Machines, 3 (1993) 453–474. [10] Lassila, O., Swick, R.R., eds., Resource Description Framework (RDF) Model and Syntax Specification. W3C, 1999 (http://www.w3.org/TR/REC-rdf-syntax/). [11] Brickley, D., Guha, R.V., eds., Resource Description Framework (RDF) Schema Specification. W3C, 1999 (http://www.w3.org/TR/WD-rdf-schema/). [12] Zarri, G.P. (2000) A Conceptual Model for Capturing and Reusing Knowledge in Business-Oriented Domains. In : Industrial Knowledge Management: A Micro-level Approach, Roy, R., ed. Springer-Verlag, London, 2000. [13] Zarri, G.P., and Azzam, S., Building up and Making Use of Corporate Knowledge Repositories, In: Knowledge Acquisition, Modeling and Management - Proceedings of the European Knowledge Acquisition Workshop, EKAW'97. Springer-Verlag, Berlin, 1997. [14] Lukose, D., et al., Conceptual Structures for Knowledge Engineering and Knowledge Modelling. In : Supplementary Proceedings of the 3rd International Conference on Conceptual Structures: Applications, Implementation and Theory. Department of Computer and Information Sciences of the University of California, Santa Cruz (CA), 1995.
184
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
Time in Modeling A. T. BERZTISS Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15260, USA (e-mail:
[email protected]; fax: +412-624-8854) and SYSLAB, University of Stockholm
Abstract. We clarify some of the terminology of temporal modeling, in particular with regard to real time and time intervals. A notation for temporal quantities is introduced that indicates the resolution of the clock to be used. It is shown that temporal expressions can have multiple interpretations, and we introduce constructs that allow a different representation to be developed for each such interpretation. These constructs are provided with semantics in terms of time Petri nets. Our primary purpose is to present a notation for the modeling of real-time applications that is in principle executable. The uniting theme of our work is an interest in uncertainty aspects of temporal modeling.
1. Introduction. We are much concerned with time, but the concept is difficult to understand. This has resulted in philosophers giving considerable attention to time. For example, Harris [1] distinguishes between physical time, biological time, psychological time, and historical time. In the context of information modeling historical time appears to be of greatest value, but as a version that considers events in the future as well as those in the past. Actually, here we will not be interested in the time concept itself. Instead we shall be concerned with artifacts known as clocks, which present us with readings that we interpret as time measurements. In general, our clock readings are assumed to be composed of two parts, a date and a time-of-day, but often we will be discussing the two components separately. Section 2 deals with measurements of time, and with concepts that depend on such measurements. We refer to their values as temporal quantities. As the so-called Y2K. problem has shown, representations of clock readings have to be very carefully selected. One such representation is introduced in Section 3. The novel aspect of this notation is that it explicitly indicates the required resolution of clocks, allowing for different granularities in different applications. Section 4 addresses problems relating to resolution and granularity. In Section 5 we consider operations with temporal quantities. In Sections 2 through 5 our main contribution is an investigation of ways of accounting for uncertainties in temporal modeling. The main new results are presented in Sections 6 through 8. Section 6 deals with constructs for temporal modeling. We consider an expression, which, while very simple, can be interpreted in six different ways. The need to find expressions for all six of these
A. T. Berztiss / Time in Modeling
1
expressions has led to the introduction of the constructs. Section 7 introduces the basic concepts of a modeling language, with special attention to components of the system that relate to time. In Section 8 semantics for these time-related components are provided in terms of slightly modified time Petri nets. Section 9 consists of a few examples. Section 10 summarizes our work. Throughout the presentation we address primarily the needs of software engineers developing information systems. We shall therefore not be concerned with some of the finer distinctions of mainly academic interest. For example, if a specific time point that lies on the boundary between two time intervals could belong to either of the intervals or to both, we do not consider this to have major practical significance. In most applications it does not matter whether a clock shows 09:01 or 09:02. Moreover, when it does matter, our notation does allow a boundary point to be placed precisely where needed.
2. Times, intervals, durations There is no general agreement on the meaning of time. While Newton considered time to be independent of events, Leibnitz found time to be formed by events and relations between them. As regards the modeling of information systems, we do not have to be concerned with deep philosophical interpretations. We can talk of the passing of time, of past, present, and the future, and be reasonably certain that our listeners will intuitively understand these terms in the same way as we do. As already noted, we shall not be concerned with time as such, but with clocks. Now, a clock can be considered equally well as measuring time without regard for events or as constituting a sequence of events, e.g., the oscillations of an atomic clock. We consider a clock reading as composed of a date and a time-of-day. Clock readings are non-repetitive values. They are linearly ordered, i.e., they form a transitive relation. There is a current reading, whose value increases uniformly. The past is defined by times smaller than the current reading; the future by times greater than the current reading. Because of the increase in the value of the current reading the boundary between past and future is constantly shifting. Time in this sense can be represented by a time axis along which the current time point keeps moving. Clock readings are points on this axis. For simplicity, when there is no danger of confusion, in what follows we shall refer to clock readings simply as times. There have been suggestions that the interval should be the basic time concept, notably by Allen [2]. Allen does not define an interval by time points, i.e., by our clock readings, noting that time points can create logical difficulties. For example, given intervals ( t 1 , t 2 ) and (t2,t3), to which interval does the time point t2 belong? Instead, an interval is an undefined basic concept whose meaning derives from the relations in which it stands to other intervals, relations such as "overlaps", "contains", "conies before". These relations are qualitative, which makes Allen's approach useful for natural language processing and in dealing with imprecise information. We, however, are assuming that clocks are available, and that events are to be timed by the clocks. Then an interval is defined by the expression (t 1 , t2), where t1 and t2 denote, respectively, the start and end of the interval. There is a difference between an interval and a duration. For example, when somebody works on March 12 from 9:00 till 16:00, we are talking of an interval of seven hours that starts at nine o'clock on this date. However, when somebody is said to have worked for 42 hours in a particular week, this is a duration. The distinguishing feature of an interval is that it is represented by a line segment on a discrete time axis. If t2 < t1, then t2 -t1 is negative. Negative intervals do not exist, but negative values of subtractions of times are
186
A. T. Berztiss / Time in Modeling
sometimes useful. An instance arises in Section 6. Durations, in contrast to intervals, are not associated with time points, i.e., they cannot be represented on a time axis, but the same metric is used for both intervals and durations. In computer applications the term real time is often used. It has been noted that there are misconceptions in the interpretation of this term [3, 4], which are primarily related to properties of processes, such as safety and liveness - safety ensures that nothing bad happens, e.g., no deadlock arises; liveness ensures that something good eventually happens. We consider real time to be of two kinds. Relative real time. Given input x at time t1, output f(x) is to be be generated within (t 2 ,t 3 ), where t1 < t2 < t3- From this we can derive several variants, (a) t1 = t2: Output f(x) may be generated at once, but must be generated before t3. (b) t2 = t3- Output f(x) is to be generated at precisely time t2. For example, in automated package routing, a bar-code reader identifies a package at time / \, and a gate is opened to receive this specific package at time t2, but the time of closing of the gate is not specified, (c) /3 > t2. This is the general case, where, in terms of the package routing example, (t2, f3) denotes the time during which the gate is to remain open. Absolute real time. An event is to take place within (/j, / 2 )» where t\ 1 lam 3. Postman =» comes_at => 1 lam 2. Postman => comes.at => 1 lam 1 . Postman =*• comes_at => 1 lam
The generalization is based on matching absolute proplets with episodic ones. This type of matching is called complete matching because a whole episodic proposition is matched by a corresponding absolute one. Thereby, not only the concepts but also the feature structures of the proplets involved must be compatible. One function of absolute propositions is episodic inferencing, i.e. inferencing over episodic propositions using absolute propositions. Consider the following example:
R. Hausser / Autonomous Control Structure
6.4
EPISODIC INFERENCE 2. [abs: bread] => [abs:is] =*• [abs: food] ft U 1. [epi: John] =» [epi: has] => [epi: bread] 3. [epi: John] => [epi: has] => [epi: food]
Episodic proposition 1 is the premise, absolute proposition 2 serves as a rule of inference,13 and episodic proposition 3 is the consequent. The inference is based on matching the M-concept bread of the absolute proposition 2 with the corresponding I-concept/oc of the episodic proposition 1 . Using the absolute proposition 2 as a kind of bridge, the episodic proposition 3 is derived as the conclusion. A second function of absolute propositions is absolute inferencing, i.e. inferencing over absolute propositions. Consider the following example:
6.5 ABSOLUTE INFERENCE 2. [abs: drink] => [abs: is] ft 1 . [abs: milk] => [abs: is] [abs: is] => [abs: liquid]
The matching in episodic as well as absolute inferencing is called partial matching because only part of the premise is matched by the inference proposition and only part of inference proposition is matched by the consequent. Thereby, the compatibility between the proplets involved is limited to their respective concepts.
7 Language propositions The third type of proposition is language propositions. They are also defined as an unordered set of proplets related by intra- and extrapropositional continuation predicates and held together by a common proposition number.
7.1
LANGUAGE PROPOSITION Feld enthalt kleines Quadrat 'sun enthalten syn: verb 'ARG:field square] MODR: sem: cnj: 23 and 24 pro: 24 .func: contain j. 'sur: klein 1 syn: a in "MODD: square] id: 9 sem: prn: 24 Lmodr: small J
rsur: Feld syn: noun •FUNC: contain] MODR: sem id: 7 pm: 24
Mg:
field
J.
'sur: Quadrat syn: n oun 'FUNG: contain! MODR: sem: id: 9 pm: 24 .arg: square J.
This language proposition expresses the content of the episodic proposition 4.2. The sur-attributes contain the German word surfaces {enthalten Feld klein Quadrat}. 13
A formalized example of such an inference based on an LA-grammar is given in Hausser 1999, p. 494.
227
228
R. Hausser / Autonomous Control Structure
Natural language production (speaker mode) is based on navigating through episodic or absolute propositions. Thereby, the proplets traversed are matched with the corresponding sem-attributes of language proplets.
7.2 SCHEMA OF LANGUAGE PRODUCTION language proposition:
blue truck belongs_to milkman. ft ft ft ft blue => truck ^ belongs_to ^» milkman.
episodic or absolute proposition:
For communicating the prepositional content, only the resulting sequence of surfaces is used. It must be adapted to the word order properties of the natural language in •question, equipped with proper inflection for tense, number, and agreement, provided with function words, etc. Natural language interpretation (hearer mode) is based on supplying the sequence of incoming surfaces with language proplet schemata via lexical look up. The analysis of their syntactic composition is interpreted semantically by filling the attributes of the proplet schemata with appropriate values (analogous to the transition from 4.1 to 4.2).
7.3 SCHEMA OF LANGUAGE INTERPRETATION language proposition:
blue => truck => belongs_to => milkman.
episodic or absolute proposition:
blue
truck
belongs_to
milkman.
For communication, only the sem-attributes of the language proplets are used. They are stored in the word bank using the concepts as the keys.
8 Proplet matching Matching between proplets is always based on an upper and a lower level (cf. 3.1,3.2, 6.3, 6.4, 6.5, 7.2, 7.3). The different types of matching may be combined, however, into the following four level structure:
8.1 MATCHING RELATIONS IN DATABASE SEMANTICS level 4
level 3:
language proposition A absolute proposition
(c) level 2:
absolute proposition
level 1:
episodic proposition episodic proposition episodic proposition
language proposition A
(d) j
(e):
i
;
v
\
\.
^*.
absolute proposition
episodic proposition
;
V episodic proposition
Episodic propositions appear at level 1, absolute propositions resulting from (a) generalization and used for (b) episodic inference at level 2, absolute propositions used for
R. Hausser / Autonomous Control Structure
229
(c) absolute inference at level 3, and language propositions used for coding or decoding (d) absolute or (e) episodic propositions at level 4. The five matching relations are based on the four correlations between (i) absolute and episodic proplets (a, b), (ii) absolute and absolute proplets (c), (iii) language and absolute proplets (d), and (iv) language and episodic proplets (e): 8.2
FOUR CORRELATION TYPES BETWEEN PROPLETS
i) absolute proplet/ (ii) absolute proplet/ episodic proplet absolute proplet "arg: square " FUNC: have MODR: id: y prn: abs-3 .arg: square .
"arg: square ' FUNC: have MODR: id: y prn: abs-3 .arg: square .
'arg: square FUNC: contain MODR: small id: 9 .pm: 24 j
"arg: square FUNC: contain MODR: id: y prn: abs-3 .arg: square
(iii) language proplet/ episodic proplet
'sun Q uadrat syn: n oun •FUNC: contain' MODR: small sem: id: 9 prn: 24 .arg: square 'arg: square FUNC: contain MODR: small id: 9 .cm: 24
(iv) language proplet absolute proplet
'sun C uadrat syn: n oun 'FUNC: contain] MODR: small sem: id: 9 prn: 24 .arg: square J. "arg: square ' FUNC: have MODR: id: y pm: abs-3 .arg: square .
Correlation (i) between an absolute and an episodic proplet constitutes a complete matching in generalization and a partial one in episodic inference. Correlation (ii) between two absolute proplets is limited to absolute inference, where it constitutes a partial matching. Correlations (iii) and (iv) between the sem-attribute of a language proplet and an episodic or absolute proplet constitute a complete matching. Partial and complete matching equally require that the concepts involved must be compatible. There are only two basic constellations:
8.3 MATCHING BETWEEN M-CONCEPTS AND I-CONCEPTS loC (1)
M-CONCEPT/I-CONCEPT loc
(2)
M-CONCEPT/M-CONCEPT
edge 1: a cm angle 1/2: 9}
R. Hausser / Autonomous Control Structure
231
The recognition, action, and inference LA-grammars start running if and when they receive input. In recognition, the input is provided by events in the external and internal environment. But what provides input for action and inference? The answer is a tenth LA-grammar, called LA-MOTOR (cf. 10 in 9.1). It is special in that it is not activated by input. Instead, it moves through the railroad system of the word bank, choosing possible continations either at random (free association) or by following a highlighting of continuation proplets provided by a control structure. LAMOTOR is like a greyhound racing after an artificial rabbit provided by highlighting. The navigation driven by LA-MOTOR may be one-level or two-level. In one-level navigation, the proplets are simply traversed. In two-level navigation the episodic proplets traversed are matched by corresponding absolute or language proplets. Depending on their content, these matching proplets are passed on to the LA-grammars (4–6) for action and for (7-9) inference, activating them by providing them with input.
9.2 DEFINITION OF LA-MOTOR {([func: a] (1 F+A=F, 2 F+A=A})}
func: a F+A=F:
F+A=A:
ARG:xcp>prn: m
arg:B FUNC: a ==> [func: a ] {3 F+A=F, 4 F+A=A, 5 F+cnj=F} prn: m
"func: a ARG:x By prn: m
'arg: p FUNC: a => prn: m
"arg: a A+id=F:
F+cnj=F:
STF:
1 "arg: a
FUNC:B id: m pm: k
FUNC:Y id: m pm: l
func: a ARG:x cnj: m Cn prn: m
func: p ARG: y cnj: mCn prn: n
[arg: P] {6
func: y ARG:xa.y {7F+A=F,8F+A=A} pm: l
L [func: p] , {9F+A=F,10F+A=A}
{([func: x] rp F+A=F)}
The rules of LA-Motor specify patterns which are applied to episodic proplets. The first proplet provides the continuation predicate for moving to the next.16 The rule F+A=F is for intrapropositional navigation; it moves from a functor proplet to an argument proplet and back to the functor, using the intrapropositional continuation predicate of the functor to find the argument. F+A=A initiates an extrapropositional navigation based on identity. A+id=F continues this extrapropositional navigation, as indicated by the following continuation structure:
9.3 EXTRAPROPOSITIONAL id-NAVIGATION FUNCl F+A=A ARG1
ARG2
F+A=Fy
ARG3
ARG1' ARG2' A+id
ARG3' A+id =
232
R. Hausser / Autonomous Control Structure
The fourth rule F+cnj-F continues an intrapropositional navigation by moving to the next proposition based on the conjunction of the functor, as indicated below: 9.4
EXTRAPROPOSITIONAL Cnj-NAVIGATION
F+A=F ARG1
ARG2
ARG3
ARGT ARG2' ARG3*
As an algorithm, LA-MOTOR executes legal transitions from one episodic proplet to the next. However, given that each proplet may provide several possible continuations, there is the question of how to provide the highlighting to guide the navigation. The first obvious step is to connect LA-MOTOR to the cognitive agent's recognition procedures. When an episodic proposition is read into the word bank by means of LA-grammars for external, internal, or language17 recognition, the new proplets are highlighted, causing LA-MOTOR to jump to the first proplet and to follow the incoming propositions from there. LA-MOTOR's tracing of incoming propositions is important because it positions the focus point in preparation for subsequent action.
10 Autonomous control structure In order to guide LA-MOTOR's navigation from recognition to action, the concatenated propositions in a word bank are individuated into recognition-action-recognition (rac) sequences. These store past experiences relating to, for example, feeling hungry. When internal recognition reads CA feels hungry into the word bank once more, this proposition is used to retrieve and highlight all earlier rac sequences beginning with it:
10.1
RECOGNITION HIGHLIGHTING ASSOCIATED RAC SEQUENCES CA feels I CA feels I CA feels I CA feels
hungry hungry-CA searches at place X-CA finds no food hungry-CA searches at place Y-CA finds a little food hungry-CA searches at place Z-CA finds lots of food
The retrieval of rac sequences beginning with CA feels hungry is based on the word bank's data structure, using the concepts as the key. LA-MOTOR navigates through these rac sequences, jumping throughout the word bank from proplet to proplet by following their continuation predicates, ignoring those which are not highlighted.18 "For simplicity, the treatment of modifiers is omitted in LA-MOTOR. 17 Absolute propositions are normally processed by LA-grammars for inference, and episodic propositions by LA-MOTOR. However, as a special case of natural language interpretadon and production, there are also absolute propositions which may be read into and out of the word bank. This requires some straightforward extensions.
R. Hausser /Autonomous Control Structure
The question raised by this simplified example is: within which rac sequence's action proposition should LA-MOTOR motor switch from one-level to two-level navigation, thus activating a suitable LA-grammar for action? The solution requires that the actions initiated by LA-MOTOR have a purpose for the cognitive agent. For the implementation of purpose it is useful to distinguish between long and short term purposes. Because CA's survival from one situation to the next is a primary concern, the remainder of this paper will be devoted to the short term purpose. In database semantics, short term purpose is driven by need parameters in combination with the balance principle. Examples of need parameters are energy supply and temperature. The current value of the former may leave normal range because of consumption, of the latter because of external changes. Other short term need parameters are rest, sleep, fear, but also shelter, social acceptance, intellectual-stimulation, etc. According to the balance principle, CA strives to maintain the values of the need parameters within normal range. For this, rac sequences originating as a real action are evaluated as to whether they result in raising or lowering any of the need parameters, or leaving them unchanged. The evaluation is expressed by means of need vectors, pointing up or down at various degrees, whereby -»• expresses 'no change.' When a stored rac sequence is activated later by an incoming recognition, its need vector is related to the current parameter values. If one of the parameters is out of normal range, the rac sequence with a need vector most likely to regain balance for that parameter is highlighted most. In this way, LA-motor is enticed to (i) choose the rac sequence most appropriate to regain balance, and (ii) to switch to two-level navigation when traversing the action proposition of that rac sequence. For example, if CA's need parameter for energy supply has dropped a little out of range, the proposition CA feels hungry is read into the word bank. Because the state of the energy parameter corresponds best to the second rac sequence in 10.1, it would be highlighted most, causing CA to search at place Y. The first and the third rac sequence in 10.1 would come into play only if CA's search at place Y is unsuccessful. Similarly, if CA's need parameter for temperature has risen out of range a lot, CA feels hot is read into the word bank. This results in highlighting rac sequences beginning with that proposition, e.g. (1) CA feels hot-CA moves into shade-CA cools down a little, (2) CA feels hot-CA moves into the basement-CA cools down a lot, and (3) CA feels hot—CA moves into the sun-CA feels hotter. Here the need vector of the second rac sequence is suited best to regain balance, resulting in the corresponding action. But what about situations in which there are no rac sequences in the word bank to be activated by the current recognition? In this case, predefined ('innate') open rac sequences provide options for action. Consider the following example: 10.2
OPEN RAC SEQUENCES FOR UNKNOWN SITUATIONS CA meets unknown animal unknown situation—CA approaches- ? unknown situation-CA waits- ? unknown situation-CA flees- ?
I8
ln memory, the propositions of a rac sequence are naturally connected by subsequent propositions numbers, e.g. 54,55,56, reflecting the temporal order of their occurrence (cf. Hausser 2001b). This allows their traversal by means of LA-MOTOR based on the cjn-values of the respective functor-proplets.
233
234
R. Hausser / Autonomous Control Structure
These rac sequences are open because due to lack of experience the result proposition is represented as unknown ('?'). However, they have predefined need vectors relating to the fear parameter, the first being low, the second medium, and the third high. Appearance of the unknown animal as small and cuddly will lower the fear parameter, resulting in agreement with the first rac sequence. Appearance as big and ferocious, on the other hand, will increase the fear parameter, resulting in agreement with the third rac sequence. And accordingly if the animal's appearance is uncertain. Provided that CA survives the encounter, a new complete rac sequence evaluating CA's action towards this particular animal will be stored in CA's word bank. An indirect way to mediate between known (cf. 10.1) and unknown (cf. 10.2) situations is inference. The LA-grammars for generalization (cf. 6.3), episodic inference (cf. 6.4), and absolute inference (cf. 6.5) are activated by incoming absolute proplets resulting from two-level navigation of LA-MOTOR. Thereby, the proplets for generalization are based on complete matching, while those for episodic and absolute inference are based on partial matching. The crucial question is when LA-MOTOR should switch to which kind of two-level navigation to activate an LA-grammar for inference. Finally, there is language production (cf. 7.2). It is also activated by LA-MOTOR switching to two-level navigation, though the proplets used to match the path traversed are language rather than absolute proplets. As in non-verbal action, the control of language production is based on rac sequences and need parameters. The main difference to the nonverbal examples described above is that the need parameters for language interaction are of a social rather than a physiological nature. As an example, consider the social rac sequences (1) CA meets colleague in the morning-CA greets colleague- Colleague greets CA and (2) CA meets colleague in the morning-CA looks the other way-Colleague doesn't greet CA. The need vectors of these rac sequences relate to the need parameter of social acceptance, whereby the first need vector increases the current value of that parameter, while the second one does the opposite.
Conclusion The control structure presented in this paper is driven by physiological and social need parameters. They are used to evaluate recognition-action-recognition (rac) sequences in terms of need vectors related to those parameters. A detailed computational mechanism for selecting actions to maintain the cognitive agent's need parameters within normal range is described. This treatment of control resembles the treatment of coherence (Hausser 1999, p. 473f.) and spatio-temporal indexing (Hausser 200la) in database semantics: All three are driven by the structure of the external world. However, while coherence and spatiotemporal indexing are merely reflected in the cognitive agent's memory, the control structure uses such memories as models for actions to continuously maintain balance. The proper modeling of rac sequences in relation to physiological and social need parameters is an empirical matter of considerable interest. In combination with the balance principle, it can be used to reconstruct the notion of intention computationally. Whether CA's automatic assignment of need vectors to rac sequences has been implemented correctly or not may be verified objectively by testing CA's behavior. The same holds for the derivation of inferences and the implementation of long term purposes. For this methodology to become reality, however, robotics and procedural semantics have to work together much more closely than is currently the case.
R. Hausser / Autonomous Control Structure
References Anderson, J.R. (1990) Cognitive Psychology and its Implications, 3rd ed., W.H. Freeman and Company, New York. Gardenfors, P. (2000) Conceptual Spaces, MIT Press, Cambridge, Massachusetts. Fauconnier, G. (1997) Mappings in Thought and Language, Cambridge University Press, Cambridge, England. Hausser, R. (1989) Computation of Language, An Essay on Syntax, Semantics and Pragmatics in Natural Man-Machine Communication, Symbolic Computation: Artificial Intelligence, Springer-Verlag, Berlin-New York. Hausser, R. (1999) Foundations of Computational Linguistics, Springer-Verlag, Berlin, New York. Hausser, R. (200la) "The Four Basic Ontologies of Semantic Interpretation," in H. Kangassalo et al. (eds) Information Modeling and Knowledge Bases XII, IOS Press Ohmsha, Amsterdam. Hausser, R. (2001b) "Spatio-temporal indexing in Database Semantics," in A. Gelbukh (ed.) Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science Vol. 2004. Springer-Verlag, Berlin, New York. Hausser, R. (2001c) "Reconstructing predicate calculus in database semantics," in preparation. Hurford, J., M. Studdert-Kennedy, & C. Knight (1998) Approaches to the Evolution of Language, Cambridge University Press, Cambridge, England. Indurkhya, B. (1992) Metaphor and Cognition, Kluver, Dordrecht. Kosslyn, S. & D. Osherson (eds.) (1995) Visual Cognition, MIT Press, Cambridge, Massachusetts. Langacker, R. (1987/1991) Foundations of Cognitive Semantics, Vol. 1/2, Stanford University Press, Stanford, CA. Lakoff, G. (1987) Women, Fire, and Dangerous Things, University of Chicago Press, Chicago, Illinois. Lakoff, G. & M. Johnson (1980) Metaphors We Live By, University of Chicago Press, Chicago & London. Marr, D. (1982) Vision, W.H. Freeman and Company, New York. Nakayama, K., H. Zijiang, & Shinsuke Shimojo (1995) "Visual Surface Representation: A Critical Link between Lower-level and Higher-level Vision," in S. Kosslyn & D. Osherson (eds.). Peirce, W.S. (1931 – 1958) Collected Papers of Charles Sanders Peirce, edited by C. Hartshorne and P. Weiss, 6 vols. Harvard University Press, Cambridge, Massachusetts. Sowa, J.F. (1984) Conceptual Structures, Addison-Wesley, Reading, Massachusetts. Sowa, J. F. (2000) Conceptual Graph Standard, revised version of December 6,2000. http://www.bestweb.net/sowa/cg/cgstand.htm Tarr, M.J. & H.H. Bulthoff (eds.) (1998) Image-based object recognition in man, monkey and machine, special issue of Cognition, Vol. 67.1&2:1–208.
235
236
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
Modelling Variant Embedded Software Behaviour Using Structured Documentation Pekka SAVOLAINEN VTT Electronics, Kaitovayla 1, P.O. Box 1100, FIN-90571 Oulu, Finland Abstract The amount of software in embedded software products is increasing. Embedded software products usually form product families where the members of the family differ slightly in their form or function. Software forms a large part of product functionality, and is thus a subject of customisation, as it is furthermore considered to be the most flexible part of the product This paper focuses on a contemporary problem in the development of embedded software products: how to model the commonality and variability of software that is to be used over a family of products. The proposed solution is based on a structured documentation paradigm, and provides common ground for the work of the various stakeholders in product concept development and requirements engineering. The resulting model also supports automatic product configuration during the subsequent steps of the software design cycle.
1. Introduction Embedded software products are products that are dominated by software. Stevens & al. [1] identify two types of systems that are highly dependent on software. The first type denotes software-intensive systems (SISs), which are large systems that are essentially softwarebased. Examples include information systems, command and control systems, and financial systems. The second type denotes software-shaped systems (SSSs), which are complex systems containing mixtures of software, hardware, and people. Software is the critical element in terms of cost, added value, and risk. Examples include cameras, cars, planes, and cellular phones. The embedded software products populate the second, SSS, category of these software-dominated systems. The current development trends characterise the embedded software products well [2]: • The volume of products increases. • Software belonging to the same product family may be used in a number of products. • There may be several versions of the same product, representing e.g. subsequent corrective versions of embedded software and possessing slightly different features. • The products are integrated as parts of various other systems. • The volume of changes in the products has increased. The list above shows the central role of software as a part of a constantly evolving electronics product, and emphasises the need to manage variations created during product and software development. The objective of variation management is to benefit by using the same designs, and their resulting artefacts, over and over again in the developed products, and thus propagate savings in the per product development costs. Basically, the earlier in the design process the common and generic parts of products can be detected, the greater is the potential gain from reuse. This creates a challenge to develop a specification method, where genericness and uniqueness of product behaviour is tackled early in the concept-modelling phase of product development.
237
P. Savolainen /Modelling Variant Embedded Software Behaviour
Concept model Specifications of the designed and manufactured products
Figure 1: Conceptualisation process of an embedded product Products based on embedded software are typically used for a long time and require several modernisation cycles, during which new features may be added and outdated features removed. Figure 1 illustrates the design steps of one round of the modernisation cycle. The steps are a modified summary of new concept development's phases described in detail in [3]; the modification is to tie the generic description better in with the SSS development. The concept design advances generating a flow of information, depicted by arrows in the illustration, which ends up in updating a concept-model database. Consideration of the renewal of a product repertoire launches the actions of new concept generation. These actions include studies of technology and user preferences as well as market and competitor surveys. Once new concepts are drafted, the technical and managerial staff assess them. At this point, a decision is made on whether to accept a new concept or to reject it. The accepted concepts are developed further, either to improve an existing product or to create a new product. Accordingly, new product specifications are developed and stored to complete the existing design data. Stakeholders in concept design typically include domain experts, both technical and marketing oriented, their managers, and customers as representatives of the prospective users [3, 4]. Some stakeholders require a detailed specification, while others simply want a highlevel description and do not want to read pages and pages of text, tables, or diagrams. The iterative nature of design and analysis is particularly relevant in the development of embedded software product families and their construction platforms [5]. Each iteration round creates a new member in the product family. Adding customisation features, which is a foundation to form product families, into the product design process requires controlled management of product commonalities and variabilities. This information of modifiability is used at the product-realisation level, where the platform is exploited in creating products that meet customers' needs. This is the configuration step of family-based product development [21]. The Feature Oriented Domain Analysis method, FODA [6], aims at identification and faithful representation of commonality and variability in a domain. Though published a decade ago, feature modelling is still a rather new practice, without a commonly accepted notation to represent the model. Some notations have been proposed in the original method description [6], and recently some efforts have been made to accommodate the idea of feature modelling to the object-oriented design model and UML [7, 8,9]. This paper presents a novel application of FODA for model variant software behaviour, which is particularly suited to concurrent development of embedded software. Using the method presented in developing a family product, i.e. a family of products differing slightly from each other in form or function, we can better focus the effort of product development on new features, utilising the common parts from earlier developed products. The next chapter introduces domain analysis and feature modelling, and presents some recent approaches to implement feature modelling as well as their drawbacks that, in our
238
P. Savolainen / Modelling Variant Embedded Software Behaviour
opinion, prevent their effective use in modelling embedded software behaviour. Chapter 3 presents our competitive approach to modelling commonalities and variabilities, and chapter 4 discusses use of this approach in the development of software embedded in electronics products. Experience gained in using the proposed feature management model is presented in chapter 5. Chapter 6 analyses our approach in respect to the related research. Chapter 7 concludes and outlines our future work on feature modelling. 2. Feature oriented modelling of software specifications The motivation for creating program families was already identified two decades ago by Parnas [10]: "We consider a set of programs to constitute a family whenever it is worthwhile to study programs from the set by first studying the common properties of the set and then determining the special properties of the individual family members." Later on, the set of current and future applications which share a set of common capabilities and data has been called an application domain (a domain, in brief) [6] and, accordingly, studying of that domain as domain analysis. This chapter discusses domain modelling, especially its most popular form, feature oriented domain modelling, from the viewpoint of embedded software development. 2.1. Domain analysis and feature modelling Domain analysis, according to Arango and Prieto-Diaz [11], is an engineering activity that aims at communicating the understanding and intuitions shared by the reuse community. This definition includes both the existing and future systems analysis facets of domain analysis, as illustrated in Figure 2 [26]. These facets are also called bottom-up and top-down processes of domain analysis. Domain analysis consists of the tasks of identifying, collecting, organising, and representing the relevant information in that domain. Domain analysis is based on the study of existing systems, knowledge captured from domain experts, and emerging new technology within the domain. The most effective way to perform domain analysis is an iterative top-down / bottom-up process [26]. The most constricted definition of domain analysis is that it is a process to study commonalities and variabilities across a set of systems in a domain. Accordingly, the major deliverable of the domain analysis is a domain model that describes the essential concepts and their relationships that represent (or the essential components and their relationships that occur in) systems in the domain [26]. In the '90s the Feature Oriented Domain Analysis method, FODA, has become the most popular domain analysis method [7, 8, 12, 9]. The distinctive facet of FODA is to strive to discover domain genericness (i.e. general knowledge) and formalise it in respect of factors that make one application different from another. Both the genericness and those factors that make each application unique are an important part of domain knowledge and are captured in the domain products. In the FODA method, domain products are not ends unto themselves, but evolve through applications [6]. The FODA method covers the application design cycle Top-down Strategic
Identifying commonalities in the future DOMAIN ANALYSIS Identifying commonalities in existing systems
Figure 2: Top-down / bottom-up domain analysis [26]
P. Savolainen / Modelling Variant Embedded Software Behaviour
239
from context analysis to architectural modelling. The following models are generated during the development: a context model (as a result of context analysis); a feature model, an entityrelationship model, a data flow model, and a finite state machine model (all these are results of domain analysis); and finally a process interaction model and module structure charts (during architectural modelling). Creation of the models does not follow a strict waterfall process, but is accomplished iteratively. Of these models, the feature model describes domain product behaviour as visible outside of the domain, and this model is of interest here. 2.2. Elements of the feature model A feature model describes system behaviour as features. A feature is "a prominent or distinctive user-visible aspect, quality, or characteristic of a software system or systems" [6]. The connections between features can be presented as a tree diagram. Figure 3 illustrates a feature diagram, showing features of an imaginary taximeter. In the diagram, the and/or hierarchy of features is shown. The root node represents a family of systems (i.e. an application domain), and the rest of the nodes symbolise standard features of the system family in that domain. Structural relationships, "consists-of's, which represent logical groupings of features, are lined up in the diagram to form a tree of features. The parent-child connections of the tree represent features consisting of some other features, and the children are called sub-features of the parent feature. An arc in the connections denotes alternative features; of the alternative features only one can be selected. A dashed line connection symbolises optionality of a feature. According to the diagram, every taximeter is used to collect cash payments. Optionally, also invoicing and credit card payments are supported. A taximeter can contain a credit card reader and a printer, and a taximeter can produce reports. A printer may have graphics support. There are two types of non-graphics printers, normal and wide, but only one may be selected. This is a common case in the feature model: in the case of connected features, the parent feature controls when a sub-feature can be selected. The alternative features, for their part, can be thought of as specialisations of a more general feature. Beside the hierarchical tree connections described above, there are two other types of connections that violate the tree structure. These connections, depicted as dotted lines in the diagram, are called composition rules. The first rule is "mutually excludes" (mutex). All optional and alternative features that cannot be selected when the named feature is selected must be stated using the mutually excludes rule. As an example, in our taximeter, a credit card reader cannot be selected if a graphics printer has been selected (maybe because they use the same slot).
CardReader
Printer
Reports req-
Figure 3: Diagram of feature connections
240
P. Savolainen /Modelling Variant Embedded Software Behaviour
The second composition rule is "requires" (req). All the optional and alternative features that must be selected when the named feature is selected must be stated with the requires rule. In our taximeter, the reporting feature requires a printer to be also available. As a matter of fact, the connections between features, when completed with the composition rules, form a directed acyclic graph [9]. One further aspect of feature connections is when to bind, i.e. fix, the value of an alternative or optional feature [6]. This issue is related to the design of software architecture. Detailed knowledge of the connections supports in deciding upon the software architecture and the parameterisation of generalised components. According to the binding time, features can be divided into three categories: compile-time features, load-time features, and runtime features [6]. Those alternative or optional features that result in different packaging of the software should be processed at compile time. Compile-time features usually result in the differences between products of the same product family. Unlike the compile-time features, load-time bound features are selected at the beginning of the execution, but they remain stable during it. And finally, in contrast to the first two categories, runtime features can be changed automatically or interactively during the execution. The binding is not shown in the feature diagram. One implication of the binding time in a taximeter domain would be a load-time binding of the feature printer. That would indicate that when switched on, the system could detect whether there is a portable printer plugged in, and load the driver software accordingly. Detection of features is not a linear process. In the FODA [6], several models are built concurrently to support each other in modelling the domain, its commonalities and variabilities. To avoid contradictions in the model, feature modelling consists of many iterations and regroupings before the model has an understandable form for designers, sales managers and customers. One approach to detect features is outlined in the review of related research in the following section. 2.3. Applying feature modelling in software design Feature oriented modelling has traditionally been used to primarily support the design of software architecture and the components therein, as can be seen in e.g. [8, 12]. However, some proposals for using feature modelling in the first part activities of the development of software-intensive systems exist. FODAcom [7] and FeatuRSEB [8] relate the feature modelling in use case modelling and requirements engineering. FODAcom introduces the use of feature modelling in requirements analysis, where the feature model is augmented in parallel with a domain use case model. Variation in the individual systems of the domain is presented in both of the models; here the use case model is used in the extended form of RSEB [13]. Subsequently, a requirements template is composed and filled based on the domain use case model, to describe the fixed and variable behaviour that corresponds to domain and single system requirements, respectively. An UML-like notation is used to depict the hierarchical relation "consists-of', with variations of "optional" and "alternative" existence. No composition rules (i.e. requires or mutually excludes) are represented in the model, nor included in the requirements template. Likewise, FeatuRSEB outlines a design process where a separate feature model is constructed and used along with the use case model and other models of object-oriented software engineering. The feature model is proposed to be presented using UML modelling tools and notations. Hein & al. [9] build on these approaches, and elaborate them in an experiment in an industrial setting. They are using a similar requirements template based approach, though apparently without domain level use-cases. First, requirements of individual applications, comprising a domain, are recorded, and subsequently they are unified to derive a domain requirement model. In the domain model, variabilities of applications are captured in parameter documents, which contain application-selected values for all the parameters that
P. Savolainen /Modelling Variant Embedded Software Behaviour
241
have been covered during derivation. The domain requirements and parameters are then used to construct a separate feature model. In the work of Hein & al feature modelling, representing variability among a series of applications, completes the requirements modelling. The parameter documents essentially bind the features and requirements, which are primarily concerned with stating commonality. The feature model's hierarchy is divided into several sub-trees, according to the optionality of the nodes (features). Each sub-tree is represented as an UML package, containing the feature descriptions and possible references to other sub-trees, represented as classes themselves. In the work of Hein & al. [9], a representation of all the FODA feature model relations is depicted; as a matter of fact even some extra concepts to the original feature model are introduced in the UML counterpart, to work around the limitations of the available UML standard constructs. The conclusion comment of Hein & al. is worth noting: "a UML representation is not appropriate for feature models". Also [7] mentions awkward adaptation of a standard CASE tool. Thus, it can be stated that these approaches cannot provide a generally applicable method for embedded software concept modelling, as defined in chapter 2, State charts have also been proposed as a notation to represent feature behaviour [14, 15]. However, the variant behaviour, communication of which is the essence of feature modelling, is hard to represent in a graphical form [6,14]. With graphic notations, a completing text representation is usually required [6,4], which presumably also holds true in this type of case. Furthermore, feature models of reasonable real-world size are likely to produce large diagrams which would be hard to handle, both on display and in printed form, even by expert engineers, not to mention other stakeholders of the concept definition process. The challenge of concept modelling is closely related to engineering and management of requirements [4]. There are plenty of requirements engineering tools commercially on the market. The requirements engineering tools are primarily intended to support elicitation and storage of stakeholder requirements, grouping the requirements for further analysis (into system requirements), tracking origins of requirements and effects of changes to related requirements. Tool support in these activities is essential to handle the increasing amounts of documentation with time. However, these tools as such are not likely to include domainmodelling capability [28]. Some of the tools, like Telelogic DOORS (former QSS DOORS) [27], contain APIs for extension of the tools' capabilities, which could be used to construct tool support for further development of requirements. 3. Application of feature modelling to product concept modelling Our proposal for managing family products during the specification phase is based on applying feature modelling to manage the complexity involved in modelling variant software behaviour. Structured documentation is used to provide the formal descriptions, needed to define the features and their connections, presented in chapter 2. Otherwise, the contents presentation format is a natural language text, supplemented with pictures and diagrams as usual in technical documentation. The document structure can be implemented using either of the structured text languages, SGML [16] or XML [17]. This chapter describes the principle of our solution. Some implementation details are presented in chapter 5, in the description of a practical application. In this description, the definition language is thought to be SGML. There is a short discussion about SGML and XML and their differences at the end of this chapter. For detailed information on these languages please refer to [18]. 3.1. Structured text approach to feature description In structured documentation, the entire document processing automation is established via the document's structure. This logical structure of document must correspond to the actions
242
P. Savolainen /Modelling Variant Embedded Software Behaviour
required for producing the intended result document, which in this case is a valid product specification. Thus, in constructing a feature model, the features and their connections that form the basis of configuration have to be modelled in the document structure. In our structured text modelling scheme, each feature is basically presented as one structured document describing the feature's behaviour. The approach is illustrated in Figure 4 with the taximeter features. The purpose of a feature description is to capture the user's understanding of the general capabilities of applications in a domain [6]. The description of a feature includes a detailed description of the user's intentions and system's services, with the anticipated exceptions, related to that feature. The system's states and user events that change them are described, in domain terms, as well as how this activity is reflected to the user. A detailed feature specification may contain all the visual components of user interaction and all the terminology a user (and a developer) must know to operate the product. Dependency on the environment (e.g. hardware, standards) or optional parts of the product or its software (i.e. other features) must be clearly expressed throughout the description of a feature. The content structure of a structured document routinely reflects the layout to be produced, i.e. the titles, subtitles, paragraphs, and lists are located in the content structure. Above and beyond these, the content structure must resemble the logic actually comprising a feature, to simultaneously model both the generic product family platform and the specifications of individual products. Accordingly, also the structures needed to represent feature connections require a special consideration. These aspects are dealt with in detail in the sections below. 3.2. Structured text implementation of the feature model In structured documentation, a document's logical structure is defined by a document type definition, DTD, which must be defined for each document. (In XML, the DTD, though it usually exists, is optional.) The structure defines elements, in terms of other elements and basic data types, building up the document. Elements may have attributes that describe their contents; also, the attributes themselves are part of the document content.
Figure 4: Features as structured documents
P. Savolainen / Modelling Variant Embedded Software Behaviour
243
SubFeatureGroup
Feature *
Figure 5: Feature model representation as a structured document The DTD, affiliated to each document, provides a syntactical definition of document contents and its meta-data. The basic idea is that the document's hierarchically organised elements form the content, and the meta-data is presented as attributes attached to these elements [19]. In practice, tools support attribute processing, and thus it usually pays off to define the parts of content requiring special treatment as attributes. (From this viewpoint, the attributes act like table fields in the relational data model.) The document structure used to implement the feature model sketched above is demonstrated in Figure 5, with one change compared to the initial model. The features are not implemented here as separate documents; but instead they are wrapped inside grouping elements. With this enhancement, the whole feature model can be treated as a single structured document. This clarifies the description of a model's management, as the conceptually similar parts of the model can also be treated equally. Looking at Figure 5, a hierarchic accommodation of the body of feature model to a structured document DTD can be seen. Named boxes in the structure tree are structured-text container elements and in this case contain the feature description data. Rectangular connections to child elements denote that the children appear sequentially in the given order. Dimmed element borders denote optionality, and an asterisk denotes that the element may recur. Ovals next to the elements stand for attributes, i.e. meta-data of the elements. The features of a product family are grouped, and one of the group's attributes expresses whether the features are alternative to each other or whether several of them can be selected. Each feature has a unique identifier and possible composition rules as its attributes. The composition-rule attributes express whether the feature requires or excludes other features; these references to other features are expressed by listing the unique identifiers of the other features. One of the feature attributes is also used to express the optionality of the feature. The description can contain other features, to allow nesting of features. This structure implements the specialisation of a more general-level feature, described in chapter 2. The subfeatures may also be alternative to each other, and therefore there is also a need for a wrapper group. Below the sub-feature group, the whole feature structure may be repeated recursively. 3.3. Comparison with the theoretical model The relationship "consists-of', which in essence forms the feature hierarchy, is represented by including the child feature's description in the parent feature's structure. This is implemented by defining one (optionally iterative) element in the structure to contain the whole feature structure. The recursion, thus formed, allows nesting of the documents all obeying the same generic structure; accordingly, the depth of nesting does not have to be decided on during the structure design time, but only during the model specification. The and/or hierarchy of features can be represented with the nested feature structure. While
244
P. Savolainen /Modelling Variant Embedded Software Behaviour
optionality of a feature can be defined in an attribute of its root element, alternative features require a common parent element that encloses the (sub-) features that are alternative to each other. This is a rational representation, as the alternate features can be thought of as specialisations of a more general category. In the parent feature, the general meaning and behaviour are specified, but the specialities of behaviour are included in the sub-feature descriptions. Whether a connection is to be established during compile-time, load-time, or runtime, is specified along with the connection in another attribute. Each feature must be named distinctively, and the name is positioned in an attribute of its root element. The composition rules are defined using these unique names. Also these connections are attached to the root elements. To handle mutex rules, one attribute of the root element is reserved to list the unique names of other features that this feature excludes. Likewise, to handle requires rules, another attribute lists the features that are required to exist together with this feature. The mutex rule is bi-directional, and subsequently this relationship must be recorded in the features at both ends of this connection. To summarise this comparison, the reference feature model, with all the inter-feature relationships, can be represented with the proposed structured text feature model, as described in this section. 3.4. Technical basis of the solution The feature structure described above can be presented using either of the structured text languages SGML or XML. SGML (Standard Generalized Markup Language, International Organization for Standardization, ISO 8879) [16] is an international standard for the definition of device-independent and system-independent methods of representing texts in electronic form. A common use of SGML is to construct user guides and reference manuals. Using SGML, the documentation can be authored in parts, maybe in subcontractors' premises, and then assembled into a complete product description. Utilising the document's logical content model, the automatic assembly can produce several forms of presentation from a single source, e.g. a product tutorial slideshow, cue cards to support service operations, or a product reference manual. The results of assembly can be automatically transformed, using style sheets that attach formatting to the logical content, to be distributed in various media, usually as printed documentation and in hypertext format on CD-ROM. SGML is a widely used standard in technical documentation. Thus there are a plenty of tools available. As an example, SGML Buyer's Guide [20] lists more than a dozen of tools in each of the following categories: structured text editors, transformation tools, and document database systems. Building the feature manipulation and storage system on top of an international standard gives thus options in selecting the implementation tools, thereby decreasing the risks involved in relying only on one tool supplier. XML [17], on the other hand, is a subset of SGML, with the most hard-to-implement features removed, but still saving the elementary document structure capabilities. Consequently, what is said about SGML in this paper holds largely true for XML as well. Though XML is a new and still evolving standard, due to the simplification of the grammar, practically all SGML tools already support XML as well. 4. Use of the feature model in embedded product development The feature model notation presented in chapter 3 is formal enough to satisfy the needs of automatic configuration processing [21], and thus it can support many tasks in the SSS family design process. Construction of a feature model is an iterative process, taking place in parallel to other system modelling. The concurrently executed modelling tasks support each other, and the results, i.e. models, provide complementary views of the system from different perspectives. Our modelling scheme is designed to suit the needs of concurrent software development in the development of embedded products [22]. The feature model is used to
245
P. Savolainen / Modelling Variant Embedded Software Behaviour
Specifications of all the designed products
Figure 6: Feedback loop in the incremental domain modelling record the results of domain analysis into an integrated domain specification, and it is aimed at supporting concept modelling and requirements analysis of the individual products. Real-life domains are not absolutely stable, but in most cases changes in a domain tend to be gradual, and to a large extent, monotonic [11]. This implies that changes in products are largely extensions and variations that are consistent with the existing product knowledge. If the model were not updated according to these changes, it would decay over time. Thus, a practical reuse system must support a feedback loop from the domain model to new product design and back to the model. The proposed feature model provides this feedback loop for continuous product family development, as depicted in Figure 6. The existing domain knowledge is available to all the stakeholders in concept design. Updating the domain model is straightforward when using a structured text authoring tool, which is capable to operate on the feature-document structures. In the incremental product family construction, the emerging product features can be recorded in the feature model either as new features in their own right, or via developing the existing features, whichever the designers find most appropriate in each case. A new feature's interactions with an existing feature can be recorded as properties in either of the features, as long as they are marked with the other feature, introducing the composition rule requires, to keep a bond between the features and their common property. The structure serves as a basis for automatic processing, and the model can be compiled into a list, table, or graphical presentation to provide variant views to the features and their relations. An SGML-aware document database system, set up on a server host and interfacing with the authoring tools on network clients, can be used to build up a shared feature model database. Except for the individual feature descriptions, information can be extracted from the database either in the form of a general product platform presentation, or as projected to the user-selected individual products. The latter form is a detailed specification of an individual product, with all the parts of specification not related to the product removed. During the concept generation, knowledge of existing products helps in focusing the survey and analysis effort (e.g. whether to aim at complementing the features of an existing product, or to create a completely new product). The model provides a database that can deliver an individual feature's description for inspection, to be compared with the proposed new features. The same database can be used as a source for ad-hoc queries, which can be used to assess with which of the existing features the proposed new feature will be related, and how the new feature interferes with the current features.
246
P. Savolainen /Modelling Variant Embedded Software Behaviour
5. Practical experiences in using the feature model We have applied the presented feature modelling approach in an industrial setting. The next section describes the application environment, the role of feature modelling therein, and benefits gained from using the structured-text presentation of feature model. Thereafter, experience in using the structured text feature format is discussed. 5.1. Case implementation The constructed feature model describes the features of a family of electronics products. The model consists of approximately five hundred features and sub-features and is managed by ca. fifty authors. When printed, the model produces about eighteen hundred A4 pages (the page amount varies a lot depending on the parts of descriptions included and the output format applied to them). Figure 7 illustrates the case's objectives in modelling the domain. Application specifiers and graphics designers are responsible for feature authoring, and they have the feature model available on a network server. Model's conceptual integrity is managed via baselining: each feature update is targeted to a baseline, against which it will also be reviewed. References, either explicit or implicit, to other features are checked inside the baseline, to conform with the other feature descriptions (their versions) belonging to the same baseline. After the features belonging to a baseline have been accepted, the baseline will be released and it becomes available to all the stakeholders. A feature specification release is produced first in SGML format, and subsequently translated to other office formats like PDF and HTML. Readers with SGML browsers available can benefit from the document structure while viewing the documentation. They can, for example, search for all the occurrences of a specific layout element based on its parameters, or view all those parts of the model that are specific to a particular hardware setting. Predefined queries can be provided for common information requests. For those users who receive only the non-structured format, ready-made product specific views are produced from the platform release. As illustrated in Figure 7, the formal presentation of structured documentation can also serve the downstream tasks of a development process. During the specification of a product, when new features are recorded in the specification platform of the product family, a feature template guides the author to describe the correct things with the required accuracy. The effort made during specification pays back in the subsequent design steps. For software developers, the model provides a database of software capabilities - partially at an "as required" level, partially at an "as designed" level - with the commonalities and variabilities exactly expressed. Queries about feature availability and legal combinations can be made and the results used in subsequent design and product configuration activities. For automatic software configuration, the model provides a database of software capabilities and composition rules. From the software product line point of view, the featured requirements specification provides a good starting point to detect which parts of the software occur in most products - or are even compulsory in all products - and which parts are optional, occurring only on occasion. (In fact, the latter parts may not even be detected in the initial feature model of the family product, as the focus should thereby be on standard features of a family of systems, not on the rare exceptions.) For testing purposes, the feature model provides product-specific specifications of the product's intended behaviour. The points of common behaviour, which should look similar over the whole product family, or individual behaviour of the product (or the hardware or standard) can be emphasised to focus the testing effort on the selected aspect.
P. Savolainen /Modelling Variant Embedded Software Behaviour
247
Management Customer documentation Testing Product projects
Software implementation Usability
SGML Other office formats
_--'
Centralised management
Baselining Change control Access control
Document database Figure 7: Management and use of the specifications In user manual production, the specifications can be used as input as follows. The structured presentation of specifications provides a good starting point for detecting differences between the new product and the already existing members of the same product family. This information is in turn used to reproduce a starting point for the new product's user documentation from the already existing documentation. If reusable parts of the existing documentation can be detected, a considerable effort will be saved in translation, as only the actually changed parts of user documentation have to be translated. A documented composition of the end-product's functionality from atomic parts, provided by the structured text feature model, also assists in modifying the user documentation structure to favour this kind of text reuse. (The specifications in the feature model, produced to guide product development, are very detailed and technical in nature, and, based on our experience, must be rewritten to provide future users the information they actually need in using the product.) The user-visible details of behaviour are finely marked (i.e. presented as parameterised structured text elements), and can be automatically pointed out in the specification - as projected to a particular product, if that is desired. This benefits the internationalisation and localisation work related to SSS-type products' development [23], in respect to both the product's software and its user documentation. 5.2. Discussion on the feature structure The basic feature document structure used to implement the described behaviour is presented in Figure 8. The document structure is modified from the domain model presentation of chapter 3. The sharp-angular boxes are unnamed wrapper groups, to allow arbitrary nesting of the description container elements, and the bifurcate connection to their child elements denotes "or", meaning any one of the containers may appear.
248
P. Savolainen /Modelling Variant Embedded Software Behaviour
DocumentID
Graphics
ListOrdered
ListUnordered
Figure 8: Augmented feature structure of the case The grouping of features is removed from the document structure, and the features are represented as separate documents. This design decision has been made for technological reasons concerning implementation: the separate feature documents are easier to manage in our document database implementation. There are also extensions in the feature structure. As presented earlier, each feature has a body that describes its behaviour. In addition, the structure contains common document header and footer information, with change histories, glossaries and indexes. This per-feature information can be merged during the release generation. As an example, by merging the glossaries included in the feature specifications we can create a data dictionary of the domain. To complete the data dictionary, also introductory parts of the feature descriptions may be included there. The actual descriptions of features are organised as sections, which may be nested. The leaf level elements stand for the section content, consisting of standard structured text blocks, graphics entities, lists and tables. The detailed descriptions of these are omitted from the diagram. A real-life feature structure is likely to contain several dozens of elements. The presented structure is simplified to focus on the elementary idea of presenting the feature as a structured text. (The extra elements would express various details, e.g. the change history typically consists of a list of change entries, each entry consisting of the name of the author, description of the change, and dates of committing and accepting the change.) Specialisation of a feature is presented as sub-features in the feature model. This always creates another full-fledged feature, with its own relationships to other features. In practice, however, a substantial part of the software variation is due to accommodation of changes in the environment. The environmental changes include: various existing standards, of which some may be selected to be followed; various user environments; or variation in the hardware, e.g. different display types and input devices. Representing all variation as features and sub-features would increase the total amount of features considerably. This can be avoided by allowing exceptional behaviour to take place inside a feature. For this purpose, the document structure contains attributes used to represent constraints, as depicted in the diagram adjacent e.g. to the section element. The constraint attributes, when present, restrict the described behaviour to the environment given as the constraint. Constraints can be nested: a particular behaviour may be specific to a particular standard applied in a particular hardware. It is advantageous to categorise the constraints, as thereafter the reliability of the model can be enhanced with simple rules in writing, controlled automatically by the supporting system.
P. Savolainen /Modelling Variant Embedded Software Behaviour
249
The same mechanism can be used to represent feature collaboration. In this case the constraints are not specific to the environment, but to other features whose existence interferes with the behaviour of this feature. For explanatory purposes, a specific design rationale element can be defined. SGML allows freely locatable elements to be defined, i.e. elements that are outside the feature model but still belong to the document. In this way, the design rationale can be included in arbitrary parts of the domain model. And from there, the rationale can be viewed and queried like any part of the model, but filtered out as necessary, for example from the published product specification documents. It is also rational to integrate the data dictionary into this same model, as we have done in the case study. The data dictionary defines the vocabulary shared by all the stakeholders and needed to understand the domain. 6. Analysis of the approach Compared to the related research presented in chapter 2, there are two elementary differences, beside the different notation. First, contrary to the FODAcom type approaches [7, 8, 9] described in chapter 2, our approach uses a single model to represent both the features and feature interrelationships. Second, compared to [9], in our solution all the product features, be they compulsory or optional to every product, are presented in the same model, in the structured text feature model, at the level of detailed software specification. As the work of Hein & al. [9] is the most elaborated of these similar approaches, it is reasonable to use it as the basis to further analyse our approach. In our solution, the ability to constrain a feature description is used to express parts common to two features that are in separate branches of the feature tree. An equivalent to our constraint construct, which imposes connections conflicting with the tree hierarchy, cannot be located in this approach. This may be due to separation of the composition rules, called a secondary structure, to be presented apart from the feature tree hierarchy. Hein & al. present two alternative ways to include these inter-feature relationships into the model: a table or rule presentation, and inclusion as variants into the feature tree. Their solution adheres to the former, while our solution adopts the latter alternative. Yet, we have neither encountered the mentioned problem of subtree explosion nor the subsequent difficulties in managing implicit feature combinations. This may be attributable to our straightforward modelling scheme, which supports finding decompositions that reduce complexity, or to an option to elaborate the constrained feature parts into sub-features, enhancing their distinct relationships with the remaining feature tree. Also the ability to constrain the feature description either with environmental aspects or related to the other features is an essential part of our model. Hein & al. attend to this, otherwise missing, aspect by extending their composition rule management with optional "consists of relations. This solves the original problem of omission in the model implementation, but may present new ones originating from the thus defined, apparently redundant, new feature-hierarchy representation. As already mentioned, despite their explicit effort, Hein & al. [9] could not exploit UML modelling to represent feature models. They ended up using a requirement management tool, Telelogic (QSS) DOORS that supports tree presentations of elicited requirements, to implement the feature hierarchy. In viewing the result, it seems that both the feature tree hierarchy and the composition rules can be presented using the requirement management working method of DOORS. However, this approach ends up in a split representation of the model, with faltering in implementing some characteristics of the model, as discussed above. In the related research referred to above, the focus is on the software architecture design. Even though our focus is on providing a common ground for all the stakeholders of concept and product specification and the design team, our solution can also provide configuration information which is directly usable in the software production line.
250
P. Savolainen / Modelling Variant Embedded Software Behaviour
As a consequence, the modelling scheme presented here can be said to "carve the world at its joints" [24], as it conceptualises abstract domain entities and their interactions in products' behaviour, to produce a homogeneous feature specification of a family of SSS products. It is important to note the relationship between the DTD and the feature model. The DTD is a definition of the syntactical relationships of parts of features. The actual feature model is presented as contents of elements and attributes, i.e. as the semantic contents of the document structure. Accordingly, SGML as such does not control the validity of the feature model; this is a task an application must do. The method presented to model embedded software products only identifies components, but falls short in creating the components ready for reuse. Constructing the reusable components, as well as the other parts of software production line architecture, are left to be done in the subsequent phases of software architecture development. For example, [29] lists three other views, module, execution, and code view, in addition to the conceptual view provided by our method, which all ought to be worked off in software architecture development. Each of the views describes a different kind of structure. Iterative and incremental development of all these views is required to achieve a software architecture that can guide the implementation task, including detailed design, coding, integration, and testing. Making use of these views can be seen as a continuum to the divide-and-conquer approach practised in our conceptual modelling approach. The first results of the study show that the approach works in an industrial setting. They also show that some stumbling stones can be found in the pursuit of getting the authoring and dissemination of the feature model to run smoothly. Of the two structured text languages, SGML was selected as the implementation language. The primary reason was that at the time when the decision was made, XML was taking its first steps. SGML, on the other hand, had already been available for technical writing for one decade. It was considered that a known standard is better than a new one, no matter how promising it is. Basic SGML authoring services are provided by a structured text editor. Structured document management systems (DMS), when integrated with the editing tool, can provide extra services. The DMS can manage baselining and record version histories. Together they can establish an effective working environment to manage the domain model, with version and access control handled transparently to the system's user. As the two types of tools are basically designed to complement each other and are based on the same standard, it is easy to underestimate the effort required to integrate them, first together, and subsequently the result combination to an engineering environment. Sometimes a part of the required behaviour could be implemented on either end, server or client, and in these cases a decision has to be made considering e.g. user preferences, response times over a network, scalability, the implementation effort now and future expandability. Time-consuming experimental arrangements may be needed to argue for a design decision. Furthermore, in practice, selecting a tool from one category, e.g. an editor, limits the alternatives available in the other category. Once the feature model is stored into the DMS, a selected baseline can be exported and translated into PDF format, for print delivery, or into HTML format, for online hypertext presentation. These more or less standard representations excluded, proprietary interface formats may be needed when adapting to a preset production environment. At least the proprietary interfaces require application design and making oneself familiar with the tools' application programming interfaces. In the hypertext presentation, links between related parts in the documentation are desired. SGML has one shortcoming related to the link management: it does not support linking across document borders, i.e. references whose target element are in another document are not allowed. Consequently, not all the tools support inter-document references, though there are workarounds to this deficiency. Inter-document links would be needed in our solution, as the domain model consists of several documents. Beside the ordinary references to
P. Savolainen /Modelling Variant Embedded Software Behaviour
251
supplementary material in other documents, the links could also be used to express the composition rules of the feature model. Links would be defined during document authoring, and written references should be seen when the documents are printed, whereas active links should be produced for the hypertext version of a release. A DMS with good linking support would guide the authors in the link creation, to rule out deficient links. Validity of links should also be checked in the delivery of a release, particularly when the release is projected to describe only one product of the family. One workaround to the link management problem could be to implement the domain model as one huge document, where all the links would point inside the same document. Some structured document database systems support document management at the element level, which could be used to divide the document into smaller pieces as necessary, e.g. for concurrent authoring purposes. Here, a compromise must be made between the overall clarity of the model and its technical feasibility. There is also an SGML extension called HyTime [25], which implements inter-document references. Regrettably this extension is readily available in only some of the SGML tools. A practical contemporary approach would be to use XML instead of SGML. Even that would not solve the basic challenge of link management: how to refer to another document in the database in such a way that a guarantee can be given of there being no deficient references in the releases. In releasing, it must be checked that each link points to the material belonging to this release. If either the link source or the target is missing, i.e. is not a part of this particular baseline, the designer responsible for the release should be notified of an error in the baseline composition. Managing links in the product-specific views, where parts of the domain model are not visible, provides a particular challenge. 7. Conclusions The foundation of software platform development, or software product line design, is comprised of the detection of domain entities and the analysis of their variability and commonality. Our approach is intended to particularly support the upstream activities, concept design and requirement specification, of platform development for SSS products. Concerning the use of a software platform and its incremental development, the method proposed provides a conceptual model and terminology (i.e. named features and variables) that are compatible with the contemporary software assets. This helps in detecting available reusable components and in accommodating new features (of incoming products) among the existing reusable assets. Our method is based on applying feature modelling to manage the complexity involved in modelling variant software behaviour. Structured text presentation, implemented by SGML or XML, is used to provide the formal descriptions needed to define the features and their connections. Based on the structured text paradigm, we can build a feature model that is easy to keep up to date technically but is still flexible in services. The other benefits of using these standard representations are the freedom to pick the most appropriate tools from a wide variety of options, and commonly, a fairly easy integration of tools with the other tools used in the software development environment. An essential part of our future work will be to harness a structured text management system to support the potential of the feature model's structured document presentation. The additional value of a document management system (DMS), beside version management and access control, would be the delivery-in-place of the domain model in the network. As Kang & al. [6] have already foreseen, sophisticated automated support for the interactive display of the feature model, accomplished by hypertext techniques, can make the feature diagram redundant and hence unnecessary. DMS can also support in the authoring of the specifications, e.g. in the specification of the feature interrelationships and the validation of a feature model baseline. An alternative development direction is to integrate the feature model
252
P. Savolainen /Modelling Variant Embedded Software Behaviour
with a CASE environment, in a way that feature descriptions and the corresponding software components could be developed synchronously. This would support the close matching of feature description and the corresponding software modules in the incremental application development. The presentation format constructed combining feature modelling with structured text is particularly suitable for families of usability-intensive electronics products, as in their design there is a strong emphasis on communication in the early stages of development. But, as this paper explains, the proposed model to handle commonality and variability can benefit also the later stages of the development cycle in designing software for families of electronics products.
Acknowledgements I would like to thank my colleagues Susanna Karinen and Jari Ensomaa, in our Product Data Management research group, for their effort in building the case application. I would also like to thank the fellow members of the groups of Software Architectures and Knowledge Engineering for their advice; thank you, Mr. Tuomas Ihme, Dr. Eila Niemela, Mr. Jarmo Kalaoja, and Mr. Johan Plomp, for the numerous discussions that have helped me in conceptualising my ideas.
References [I] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
R. Stevens, P. Brook, K. Jackson, S. Arnold, Systems Engineering: Coping with Complexity. ISBN: 0-13095085-8. Prentice Hall, Great Britain, 1998. J. Taramaa, Practical development of software configuration management for embedded systems. ISBN 951-38-5344-6. VTT Publications: 366, Espoo, Finland, 1998. K. T. Ulrich, S. D. Eppinger, Product design and development. ISBN: 0-07-065811-0; 0-07-113742-4. McGraw-Hill, New York, 1995. I. Sommerville, P. Sawyer, Requirements engineering: a good practice guide. ISBN: 0-471 -97444-7. John Wiley & Sons, Chichester, England, 1997. H. Perunka, E. Niemela, J. Kalaoja, Feature-Oriented Approach to Design Reusable Software Architectures and Components in Embedded Systems. ESI, European Reuse Workshop '97 Position papers and presentations, Brussels, 26-27 Nov 1997, pp. 74 - 77. K. C. Kang & al., Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/ SEI90-TR-21, ESD-90-TR-21. Software Engineering Institute, Carnegie-Mellon University, PA, 1990. A. D. Vici, N. Argentieri; A. Mansour, M. d'Alessandro, J. Favaro, FODAcom: An Experience with Domain Analysis in the Italian Telecom Industry. Proceedings of the 5th International Conference on Software Reuse, IEEE Computer Society Press, Los Alamitos, USA, 1998, pp. 166–175. M. L. Griss, ; J. Favaro, M. d'Alessandro, Integrating Feature Modeling with the RSEB. International Conference on Software Reuse, Proceedings of the 5th International Conference on Software Reuse, IEEE Computer Society Press, Los Alamitos, USA, 1998, pp. 76–85. A. Hein, M. Schlick, R. Vinga-Martins, Applying Feature Models in Industrial Settings. In. P. Donohue (ed.), Software Product Lines - Experience and Research Directions, Kluwer Academic Publishers, Dordrecht, The Netherlands, 2000, pp. 47-70. D. Pamas, On the Design and Development of Program Families. IEEE Trans. Software Eng., 2 (1976), No. 1, pp. 1–9. G. Arango, R. Prieto-Diaz, 1991. Domain Analysis Concepts and Research Directions. In. R. Prieto-Diaz, G. Arango, (eds.), Domain Analysis and Software System Modelling, ISBN 0-8186-8996-X: IEEE Computer Society Press, Los Alamitos, USA, 1991, pp. 3–26. Kang, K. C.; Kim, S.; Lee, J.; Lee, K.: Feature-Oriented Engineering of PBX Software for Adaptability and Reusability. Software - Practice and Experience, 10 (1999), p. 875-896. I. Jacobson, M. Griss, P. Jonsson, Software Reuse: Architecture, Process and Organization for Business Success. ISBN: 0-201-92476-5. ACM Press, New York, 1997. T. Jokela, Modelling the External Behaviour of Electronic Products. ISBN 951-38-47748. Technical Research Centre of Finland, VTT Publications 236, Espoo, Finland, 1995. B. W. Choi, K. B. Jang, C. H. Kim, K.S. Wang, K. C. Kang, Development of Software for the Hard RealTime Controller Using Feature-Oriented Reuse Method and CASE Tools. Proceedings of the IEEE International Symposium on Computer-Aided Control System Design, 22-27 Aug 1999. IEEE Control
P. Savolainen / Modelling Variant Embedded Software Behaviour
253
Systems Society, pp. 126–131. [16] C.F. Goldfarb, Y.Rubinsky, The SGML handbook. ISBN:0-19-853737-9. Clarendon Press, Oxford., 1990. [17] R. Cover, Extensible Markup Language (XML). Organization for the Advancement of Structured Information Standards, http://www.oasis-open.org/cover/xml.html. Last edited 15 Jan 2001. [18] R. Cover, The XML Cover Pages. Organization for the Advancement of Structured Information Standards, http://www.oasis-open.org/cover/. Last edited 18 Jan 2001. [19] E. Maler, J. E. Andaloussi, Developing SGML DTDs: From Text to Model to Markup. ISBN 0-13309881-8. Prentice Hall, Upper Saddle River, NJ, 1996. [20] C. F. Goldfarb, S. Pepper, C. Ensign, SGML Buyer's Guide: A Unique Guide to Determining Your Requirements and Choosing the Right SGML and XML Products and Services. ISBN 0-13-681511-1. Prentice Hall, New Jersey, 1998. [21] D. Sabin, R. Weigel, Product Configuration Frameworks - A Survey. IEEE Intelligent Systems & Their Applications, 13 (1998), No. 4, pp. 42–49. [22] P. Savolainen, Model Based Management of Product's Design Specifications. In P. Ghodous; D. Vandorpe (eds.) Advances in Concurrent Engineering - CE2000. ISBN 1-58716-033-1. Technomic Publishing Company, Lancaster, PA, 2000, pp. 187–194. [23] S. Kokkotos, C. D. Spyropoulos, An Architecture for designing internationalized software. Proceedings of the 1997 8th IEEE International Workshop on Software Technology and Engineering Practice, STEP, 14– 18 Jul 1997, pp. 13–21. [24] B. Chandrasekaran, J. R. Josephson, V. R. Benjamins, What are ontologies, and why do we need them? IEEE Intelligent Systems & Their Applications, 14 (1999), No. 1, pp. 20-26. [25] HyTime. ISO 10744:1997 - Hypermedia/Time-based Structuring Language (HyTime), 2nd Edition, R. Cover (ed.), http://www.oasis-open.org/cover/hytime.html. Last edited 29 Jan 1999. [26] C. McClure, Software Reuse Techniques: Adding Reuse to the System Development Process. ISBN: 0-13661000-5. Prentice Hall PTR, Upper Saddle River, NJ, 1997. [27] Telelogic DOORS, Info Center, http://www2.telelogic.com/doors/infocenter/. [28] 0. Gotel, Systems Requirements Engineering. 7th International Summer School in Novel Computing, Joensuu, Finland, 14–18 Aug 2000. (Dr. Oily Gotel, Department of Computer Science, University College London, Gower Street, London WC1E 6BT. http://www.cs.ucl.ac.Uk/staff/o.gotel/) [29] C. Hofmeister, R. Nord, D. Soni, Applied Software Architecture. Addison Wesley Longman, Inc, Reading, MA, USA, 1999. ISBN: 0-201-32571-3.
254
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
A Component-basedApplication Framework for Context-driven Information Access Mina AKAISHI*, Nicolas SPYRATOS** and Yuzuru TANAKA* *Meme Media Laboratory, Graduate School of Engineering Hokkaido University, 060-8628 Sapporo, Japan **Laboratoire de Recherche en Informatique, Universite de Paris-Sud, LRI-Bat 490, 91405 Orsay Cedex, France E-mail: {minaltanaka}@meme.hokudai.ac.jp,
[email protected] Abstract We propose an application framework in which the contents of an information base are organized into contexts [1,2] and information accesses are achieved using a path-language for context traversal. We regard the notion of context [1,2] as a conceptual modeling mechanism for organizing and managing very large information bases, together with a path-language for context traversal. We also propose a component-based application framework for context-driven access to information, and provide an implementation of contexts based on the IntelligentPad System.
Keywords: Information Base, Context, Query, Path Language, Information Retrieval
1. Introduction Computers and networks are now rapidly expanding their applications through many fields and users can share the knowledge of others and also can publish their own knowledge through the Internet. As a consequence, vast amounts of information are accumulated at accelerating paces. To cope with such volumes of information, users need efficient management methods and search mechanisms. Traditional database management systems do provide efficient methods for managing information, as well as expressive query languages for accessing desired information subsets [3–8]. However, these systems are designed for managing structured data, i.e., data conforming to a pre-defined schema, and provide little help in organizing and managing semi-structured or unstructured data, such as the data typically found over the Internet. In this paper, we introduce the notion of context that is proposed by M. Theokorakis et. all [1,2] as a conceptual modeling mechanism for organizing and managing very large information bases, together with a path-language for context traversal. In the computer science, some notion of context have appeared in several areas, such as artificial intelligence [9,10], software development [11–14], databases [15-18], machine learning [19], and knowledge representation [20-22]. All these notions are very diverse and serve different purposes. In information bases, a context is seen as a higher order conceptual entity that groups together other conceptual entities from a particular standpoint. Contexts allow us to focus on the objects of interest, as well as to name each of these objects using
M. Akaishi et al. /A Component-based Application Framework
255
one or more convenient names. According to [1,2], a context is defined as a set of objects within which each object has a set of names and possibly a reference to some other context. An object may belong to more than one context with arbitrary names in each context. Users can arbitrarily group together various objects in a single context. Accessing information in a contextualized information base is performed through reference and name paths. These paths provide means to traverse an object references from an object to another. Primitive and macro functions are provided for querying such information bases. The macro operations allow users (i) to focus on the context of interest, (ii) to search accessible contexts for objects with specified names, (iii) to switch to an alternative path of contexts based on cross reference information, and (iv) to compare contexts based on similarity of paths. We propose a component-based application framework for context-driven information access, and provide an implementation of contexts based on the IntelligentPad System [23,24]. We implement the notion of context using the IntelligentPad system [23,24]. The Intelligent Pad system provides its users with a toolkit for the construction of various interactive media objects including multimedia documents, desktop tools, and application systems. Every component is presented as a media object, called a pad. Users can easily compose any document or any tool by directly pasting pads on top of other pads. The pads can carry a variety of knowledge resources, replicate and recombine themselves, and adapt to already existing various pad. Components in this framework are realized as pads such that the context management pads and the context retrieval pads. The context management pads access an Oracle database [25] to register or update the context information. The context retrieval pads issue queries to the contextualized information base and return the results that might be represented by arbitrary pads, such as an image pad, a movie pad, a text pad and so on. Users can arbitrarily combine such context management pads and context retrieval pads with other existing pads to create new applications. Moreover, pads can be published world wide through the Internet by embedding them in an arbitrary web page, which users can easily download those pads for local reuses. The remainder of the paper is organized as follows: In section 2, we introduce the notion of context and path language for context traversal. In section 3, we present an application of contextualized information bases. It is derived from the World Wide Web. In sections 4, we present the application framework of the context. Finally, in section 5, we make some concluding remarks.
2. The Concept of context In this chapter, we explain the context model and the query language that are proposed by M. Theodorakis et. all [1,2]. A context is a set of objects of interest, each object having a set of names, and a possibly reference to other context. A context is regarded as a modular representation of information. It can model an object under different perspectives. This contextualization can be used orthogonally to usual abstraction mechanisms.
2.1
The notion of context
A context c is a set of objects, denoted by objs(c), such that each object o is associated with (1) a set of names, called the names of o in c, denoted by nam(o, c);
256
M. Akaishi et al. /A Component-based Application Framework
(2) zero or one context, called the reference of o in c, denoted by ref(o, c). An object can belong to different contexts and may have different names and/or references in each context. That is, names and references are context-dependent. An element of a context's contents is represented as a triplet consisting of an object id, a set of names and a reference. The notion of context supports a simple and straightforward way of referencing objects at any level of detail. An object can belong to one or more different contexts. The same object can have different names in different contexts. Two different objects can have the same name within the same context. The same object can have different references within different contexts. 2.2
Accessing information through paths
Accessing information in an information base often involves navigating from one object to another by following links. From within a given context, we can reach any object that belongs to the reference of an object within that context and, recursively, any object that lies on a path. Navigation is based on the notion of path. There are two kinds of paths: (i) object path, which is a sequence of dot-separated objects, (ii) name path, which is a sequence of dot-separated names. As the reference of an object within a context can also be a context, references provide a means to traverse from an object o of a context c to objects of another context via the reference of o in c. The sequence of traversed objects constitutes a kind of path, which is called a reference path. Reference paths form the basis for reaching objects in a context navigating through the references of objects. 2.3
Querying Contextualized Information Bases
The paper [1] formally described operations for querying contextualized information bases. There are three groups of functions, primitive operations, fundamental operations and macro functions. The access to information is achieved using a path-language for context traversal. (1) Primitive operations Primitive operations are classified into three kinds, namely, context primitives, path primitives, and matching primitives, (i) Context primitives are the functions used in the definition of context, (ii) Path primitives are the functions used to extract parts, or to derive information about a path, (iii) Matching primitives are the functions that match name paths with reference paths. (2) Fundamental operations The select operation extracts the contents of a context that satisfy a given predicate. The output of this operation is a new context that contains the selected triplets. The project operation isolates certain elements of a context's contents, such as the objects of the context, the set of names bound to the objects, or the references of the objects. The generate operation is provided to create new triplets, combining contents of given contexts. There are two path selection operations: the reference path selection that returns reference paths, and the name path selection that returns name paths. This operation takes as input a context c, and a predicate, and returns those reference paths or name paths in c that satisfy the input predicate. Often we are interested not only in selecting objects satisfying some criteria but also in being aware of the path we can follow to reach these objects. This is useful in order (i) to choose the desired path to reach objects of interest,
M. Akaishi et al. /A Component-based Application Framework
257
(ii) to see in which paths the desired information is embedded and to choose an object in the path to explore relevant information, and (iii) to find different representations of the same object.
(3)
Macro functions
The function look-up(c, n) is a useful high-level operation. This operation takes as its input a name n and returns the set of name paths ni from the specified context c, such that ni ends with name n. This operation is especially useful in environments such as the Web environment, where users seeking information follow a three step approach: (i) they focus on a context c, (ii) they give a name n (that is a keyword they know), and (iii) they consult lookup(c, n) to decide on which context to focus next. Roughly speaking, the name n is a keyword that the user has in mind of the information he is looking for, and the result of the operation is the set of all name paths related to that information. Each of these name paths describes a different information path in which the desired information is embedded. The function cross-ref(np, c) is another high-level operation. In general, we often need to know, given a name path np of an object in the current context c, what are the name paths through which we can reach the same object if we start our navigation from a different context. This is useful as we can find alternative representations of the same object in different contexts, as well as the paths to reach these representations. We will refer to this different context as the cross-reference context.
3. Contextualized Information Base for WWW The WWW forms a large contextualized information base. We can regard a web page as an object. A set of objects is embedded in the web page. Such objects are grouped together under a viewpoint of the embedding page. A set of these objects seems like a context. The names of each object are shown in the page as the hyper linked text. The reference of an object is another context that is derived from another web page that is associated with the object. The result is a large contextualized information base. Existing WWW browsers and hyper link systems already provide the function of accessing information through reference. In addition, query functions for contextualized information bases allow the user to find interesting information within context. Users can compare or analyze the objects from different viewpoints, namely from different contexts.
3.1 Context derived from HTML document In the World Wide Web, the location of each object is represented as a URL. URLs are regarded as the pointers to objects. Users can browse the pages by following the hyper links that are embedded in each page. A hyper link in WWW is tagged by a text or an image in an HTML document. When a hyper link is represented by a text, the text is regarded as the name of the object that is pointed to by the associated URL. The name of the object depends on the pages, that is, the context. For example, let us see our laboratory's top page (http://www.meme.hokudai.ac.jp). Figure l(a) shows the display hard copy of our home page. Two anchors are embedded in this page. One anchor is shown as text "English" that is linked to the English page of our laboratory. So "English" is the name of the English page in the context of the top page. According to this interpretation, we take contexts out of HTML documents as follows. In HTML documents, an anchor tag is used to specify the pointer to any URL. The anchor tag starts with "< A.. .>" and ends with "". A HREF option in an anchor tag must be described to specify the pointer to the object as URL. The text that is written between
258
M. Akaishi et al. /A Component-based Application Framework
The pointer to the object
The name of the object
Meme Media Lab, Hokkaiao Univ.
Anchor tags Fig. 1 (a) The top page of the Meme Media Laboratory.
Names: English: Japanese:
Fig. 1 (b) The source file of the left web page.
object http://www.meme.hokudai.ac.jp/index-e.html http://www.meme.hokudai.ac.jp/index-j.html
reference
Fig.l (c) The context that is derived from http://www.meme.hokudai.ac.jp/index-e.html
Names: object Japanese: http://www.meme.hokudai.ac.jp/index-j.html Meme Media Laboratory: http://www.meme.hokudai.ac.jp/map.hlml Laboratory of Computer Architecture Engineering: http://ca.meme.hokudai.ac.jp/index.html Laboratory of Database Engineering: http://db-ei.eng.hokudai.ac.jp/index_e.html Laboratory of Visual-Information Science and Engineering: http://indigo.media.eng.hokudai.ac.jp/index-e.html Laboratory of Information Mathematical: http://aurora.elsip.hokudai.ac.jp/index.html
reference -> c} -> c4 -> c5 -> c6
Sekiguchi Seminar:
http://adam.econ.hokudai.ac.jp-.8008/index.html
-
>
Webmaster:
mailto:
[email protected]
->
NIL
-> ->
c7 c8s
Fig. 1 (d) The context that is derived from http://www.meme.hokudai.ac.jp/index-e.html start/end tags of an anchor is regarded as the name of the object that is in that HTML document. Let us see our laboratory's top page again. Figure l(b) shows the source file of this web page. The parser of the web page for context (PWPC) finds the first anchor tag. Then it takes the object ('../index-e.html') and the name ('English') that is associated with the
M. Akaishi et al. /A Component-based Application Framework
259
object in this page. In the same way, the next anchor tag is found. Finally, figure l(c) shows the context that is derived from the source file of Figure 1 (b). In the same way, the context C2 is derived from the URL (http://www.meme.hokudai.ac.jp/index-e.html) that is specified by the object in the context c1. The context c2 shows more details about the Meme Media Laboratory in English. 3.2 Operations for querying contextualized information bases We are collecting the context information in WWW by tracing the paths from our home page. Users can access the Contextualized WWW information base using the path queries. Let us consider the function look-up. The function look-up(c, ri) returns all paths from context c to the object named n. For example, the function look-up(c1, 'Greece') returned the following three paths up to now. (i) Japanese.Librarv of information and electronics.Library in the world.Technical University of Crete (Greece) (ii) Japanese.Librarv of information and electronics.Library in the world.Aristotle University of Thessaloniki (Greece) (iii) Japanese.Library of information and electronics.SwetScan.GGreece and Rome The underlined words in the result paths are actually presented with Japanese. The result paths narrowed down to the viewpoint from our home page. Users can select their path of interest and get detail information by following these paths. If user choose another context as an current context c, the function look-up(c 1 , 'Greece') returned another paths to 'Greece'. Those are different views of Greece from different points of views.
4
Component-based Application Framework
This chapter presents the application framework to access the information bases. We implement the notion of context using the IntelligentPad system. The IntelligentPad system provides its users with a toolkit for the construction of various interactive media objects, called pads. It allows users to access the contextualized information bases through pads and to get various kinds of information as appropriate pads. The advantage of using IntelligentPad is that users can arbitrarily combine such pads with other existing pads or future developed pads to create new applications. 4.1
IntelligentPad System
The IntelligentPad system provides its users with a toolkit for the construction of various interactive media objects including multimedia documents, desktop tools, and application systems. Any component is presented as a media object, called a pad. The basic pads provided in such a system are called a primitive pads, while a pad constructed by the combination of primitive pads and/or other pads is called a compound pad or a composite pad. In the IntelligentPad architecture, container media are separated from their contents. Figure 2 shows the logical structure of a pad. Each primitive pad consists of its shell and its contents. Its shell defines its standard media structure and interface. It is up to the developer of each pad how the contents are implemented in the standard shell. The application-linkage interface of each pad is defined as a list of slots. Each slot can be accessed either by a 'set' or a 'gimme' message. Each of these two messages invokes the respective procedure attached to the slot. Its slots and attached procedures define the internal mechanism of each pad. They are defined by its developer.
260
M. Akaishi et al. /A Component-based Application Framework
Display
/ \
Knowledge Resource
Connection jack (Slot list)
/ \
Knowledge Resource
Knowledge Resource
Fig. 2 The logical structure of a pad that allows its generic definition. Users can easily compose any document or any tool by directly pasting some pads on top of another. Such a paste operation simultaneously defines both the layouts of its components in the composed pad and the functional linkage among component pads. Users can also easily replicate any pads and peel a pad off a composite pad. These operations can be equally applied to both primitive pads and any composite pads. 4.2
Framework for context-driven information access
In this chapter, we explain the component-based framework to access the contextualized information bases. The environment for the management and the retrieval of contextualized information is provided as component tools. It allows users to construct the interface tools to access the contextualized information bases and to combine such functions with other existing component tools to create new applications. We use the IntelligentPad system to implement the tools of the environment for the management and retrieval of the contextualized information. Components in this framework are realized as pads, such as the context management pads and the context retrieval pads. The context management pads access an Oracle database to register, update, or delete context information. The context retrieval pads issue queries to the contextualized information base and return the results that might be represented as various pads, such as an image pad, a movie pad, a text pad and so on. Users can arbitrarily combine such context management pads and context retrieval pads with other existing pads to create new applications. Moreover, those pads can be published world wide through the Internet by embedding them in an arbitrary web page. 4.3
Structure of Information Base environment Pads
The information base consists of an IB-environment and the contents of the information base. An IB-environment includes the means for defining names, object ids and context ids, as well as an administration context.
M. Akaishi et al. /A Component-based Application Framework
261
Proxy of Information Base Manager
Information Base Manager Information Base IB Environment IB-context Contents of the Information Base
Fig. 3
Accessing Contextualized Information Bases from a IntelligentPad system
Information base environments are constructed by combining primitive pads. Figure 3 shows the access mechanism to contextualized information bases from IntelligentPad. IntelligentPad provides the function of IB-environment as pads through a proxy object of an information base manager. In the following, we explain the details of IB-environment pads. These tools provide the primitive functions to access the information bases. The combination of such tools and other existing pads provide more complex functions. It is also easy to change the interface to access the information bases by composing or decomposing pads. (7) Visualization of context and Traversal paths The C-triplet pad allows users to access an object in a context and to traverse a path through the references. It represents one element of a context. The C-triplet pad has three data slots to keep the value of an object identifier (#oid), the names (#names) of an object in a context and a referenced context identifier (#ref). According to the contents of a context, C-triplet pads are automatically created and connected with a Context pad. When Context pad gets data in the slot #cid, the C-triplet pads will appear on the Context pad. Figure 4(a) shows the pad structure to represent a context and figure 4(b) and (d) are the display hardcopy of a context. To traverse the paths, pop-up pads and button pads are pasted on the C-triplet pad. A Pop-up pad keeps the pad identifier in a slot #pid. When a slot #pop-up of a pop-up pad is accessed, the attached procedure is invoked. Then the pad specified by a pad identifier pops up on a screen. When one of the left button pads in figure 4(b) is clicked, the Pop-up pad that is connected to the slot #oid calls the pads that correspond to a specified object (See figure 4(c)). In the same way, when one of the right button pads in figure 4(b) is clicked, the referred context appears on the screen as a context pad whose structure is the same as in figure 4 (a).
262
M. Akaishi et al. / A Component-based Application Framework
Fig. 4
Fig. 5
Materialization of context by IntelligentPad
A structure of Information Base access pad
M . Akuishi et ul. / A Component-bused Applicution Framework
263
(2) Accessing the information base Figure 5 shows a structure of Information Base access pad. The Information Base pad includes a proxy of an Information Base Manager, which manages the contents of the Information Base. Through an Information base pad, the Intelligentpad system users can use a set of database functions as a pad. The information base pad has some command slots, such as #enter, #delete, #update and #commit. The command slots have attached procedures. When a slot is accessed, the attached procedure is invoked. The arguments of these commands are taken from the data slots, such as #cid, #context, #triplet and #index. Figure 6 shows the pad component structure for collecting context information through the Internet. In the Intelligentpad system, a WWW browser is provided as an HTML viewer pad. The context-collecting pad includes a parser that analyzes the source file of web pages. The extracted context information is sent to an information base pad to store the context data into the IB-context in the Information Base. (See Fig.3,5.) (3) Querying a contextualized information base The main operations for querying contexts are selecting paths, and selecting and assembling parts of contexts. These functions are provided as pads. Figure 7 shows a look-up pad. A look-up function takes as input a name n and a context c, and returns the set of name/object paths from the specified context c to the object whose name is n. The
Fig. 6 Collecting contectualized information from World Wide Web
264
M. Akaishi et al. /A Component-based Application Framework
Fig. 7 Context-based Query pad slots #cid and #name of a look-up pad keep the value of arguments for a function look-up. When a slot #look-up is accessed, the attached procedure is invoked. Then the result of look-up function is returned into the slot #result. Appropriate pads represent the result. In figure 7, name paths are represented by string pads and list pads, and object paths are represented by lists of button pads. When the retrieval result is contexts, the Context pad visualizes it. When the retrieval result is name/reference paths, it is visualized using the Pop-up pads and button pads.
5 Conclusion In this paper, we proposed a component-based application framework to access contextualized information bases. The environment of the management and retrieval of contextualized information is provided as component tools in the IntelligentPad system. It allows users to construct the interface tools, to access the contextualized information bases, and to combine such functions with other existing components for new applications. We applied this framework to the management of information in WWW. Contexts model each object under different perspectives, and provide a modular representation for each different kind of information and/or intellectual resources. Users can access to the information by path-languages. References [1] Manos Theodorakis, Anastasia Analyti, Panos Constantopoulos and Nicolas Spyratos: Querying Contextualized Information Bases, Proc. 24th Intern. Conference on Information and Communication Technologies and programming(ICT&P '99), Plovdiv, Bulgaria, June, 1999 [2] Manos Teodorakis, Anastasia Analyti, Panos Constantopoulos and Nicolas Spyratos:
M. Akaishi et al. /A Component-based Application Framework
265
Context in Information Bases, Proceeding of the 3rd International Conference on Cooperative Information Systems (coopIS '98), pp.260–270 (1998) [3] Date, C.J.: An Introduction to Database Systems, Addison Wesley (1990) [4] Cattell, R. G: Object Data Management: Object-Oriented and Extended Relational Database Systems, Addison Wesley, Reading, Massachusetts(1991) [5] Bancilhon, F, Delobel, C. and Kanellakis, P.: Building an Object-Oriented Database System, Morgan Kaufmann, San Mateo, California(1992) [6] Copeland, G and Maier, D.: Making Smalltalk a Database System, Proc. ACM SIGMOD International Conference on Management of Data, pp.316–482 (1986) [7] Stonebraker, M.: Object-Relational DBMSs, Morgan Kaufmann (1996) [8] Bertino, E. and Kim, W.: Indexing Techniques for Queries on Nested Object, IEEE trans. Knowledge and Data Engineering, Vol.1, No.2, pp.296–213, (1989) [9] J. McCarthy: Notes on Fromalizing context, Proc. IJCAI-93, pp555–560, Chambery, France, 1993 [10]R. Guha. Contexts: A Formalization and Some Applications, PhD thesis, Stanford University, 1991 [11] G Gottlob, M. Schrefl and B. Rock: Extending Object-Oriented Systems with Roles. ACM Trans. Inf. Syst., 14(3), pp.268–296, July 1996 [12] Y. Shyy and S. Su. K: A High-level Knowledge Base Programming Language for Advanced Database Applications, Proc. ACM-SIGMOD conference, pp.338–347, Denver, Colorado, May 1991 [13] R. Katz: Towards a Unified Framework for Version modeling in engineering Databases, ACM Comput. Surv., 22(4), pp.375–408, Dec. 1990 [14] G. Kotonya and I. Sommerville: Requirements Engineering with Viewpoints, Software Engineering Journal, pp.5–19, Jan. 1996 [15] F. Bancilhon and N. Spyratos; Update Semantics of Relational Views, ACM Trans. Database Syst., 6(4), pp.557–575, Dec. 1981 [16] S. J. Hegner: Unique complements and decompositions of database schemata, Journal of Computer and System Sciences, 48(1), pp.9–57, Feb. 1994 [17]S. Abiteboul and A. Bonner: Objects and Views, Proc. ACM-SIGMOD conference, pp.238–247, Feb. 1991 [18] A. Ouksel and C. Naiman: Coordinating Context Building in Heterogeneous Information Systems. Journal of Intelligent Inf. Systems, 3(2), pp.151–183, 1994 [19] S. Matwin and M. Kubat: The role of Context in Concept Learning, Proc. ICML-96, Workshop on Learning in Context-Sensitive Domains, pp. 1–5, Bari, Italy, July 1996 [20] J. Mylopoulos and R. Motschnig-Pitrik: Patritioning Information Bases with Contexts, Proc. CoopIS'95, pp.44–55, Vienna, Austria, 1995 [21] B. Czejdo and D. Embley: View Specification and Manipulation for a Semantic Data Model, IS, 16(6), pp.585–612, 1991 [22] L. Campbell, T. Halping and H. Proper: Conceptual Schemas with Abstractions: Making Flat Conceptual Schemas More Comprehensible, DKE, 20(1), pp. 39–85, June 1996 [23] Yuzuru Tanaka: Meme Media and a World-Wide meme pool, Proc. The Fourth ACM International Multimedia Conference, MULTIMEDIA'96, pp. 175–186 (1996) [24] Yuzuru Tanaka: A Toolkit System for the Synthesis and the Management of Actife Media Objects, Proc. 1 st international Conference on Deductive and Object-Oriented Databases, Kyoto, pp.269–277(1989) [25] Dave Ensor, Ian Stevenson: OracleS Design Tips, O'REILLY, (1997)
266
Information Modelling and Knowledge Bases XIII H. Kangassalo el al. (Eds.) IOS Press. 2002
Modelling the Boundaries of Workspace: A Business Process Perspective Marite Kirikova Systems Theory Professor's Group, Riga Technical University, 1 Kalku, Riga, LV-1658, LATVIA, e-mail:
[email protected]
Abstract. Effectiveness of humans' performance depends on knowledge concerning boundaries of their workspace where they can freely choose their way of performance. In the context of business process modelling a boundary of workspace may be constructed as a boundary process. The boundary process is a procedural representation of information that is provided by official documents used in the organisation. Boundary processes do not impose on their users unnecessary details that could hinder creativity of employees and do not restrict flexibility in performance of individual activities. Modelling of boundary processes is useful in requirements acquisition, business process reengineering, knowledge management and other situations where transparency and shared understanding of things are important.
1. Introduction Research in Cognitive Engineering [1] has shown that the best results of human performance in adaptive organisations are achieved if people need not follow prescribed sequences of processes, but are equipped with knowledge concerning boundaries of their workspace where they can freely choose their style and method of performance. In Cognitive Engineering the boundaries of workspace have been defined in terms of functionally acceptable state of affairs, acceptable cost-effectiveness, and acceptable workload [1]. The purpose of this paper is to interpret the boundary of adaptive systems workspace in terms of business process models. If the boundary of workspace is represented as a business process model, then it may be regarded as a source of explicit knowledge concerning workspace. The explicit knowledge concerning the boundaries of workspace then may be used for various purposes, such as requirements engineering, knowledge management, strategic planning, etc. Business Process Modelling is already a common tool in those areas. Different business process modelling languages are applied and business modelling software tools have been developed. However, success of business process modelling has not always fulfilled expectations [2]. There have been situations where models were too vague to be useful [3] or so prescriptive that they hindered the creativeness of their accomplishers [4]. Therefore it is important to find out whether business process models are the appropriate tools for modelling boundaries of workspace in adaptive organisations. The discussion of the topic is structured as follows. In Section 2 the boundaries of workspace are defined in terms of business process modelling. Appropriateness of different business modelling approaches for modelling of workspace boundaries is discussed in Section 3. The methodology for modelling of boundaries and communicating the models to affected actors of the organisation is considered in Section 4. An experimental application of boundary modelling is described in Section 5. Section 6 consists of brief conclusions.
M. Kirikova /Modelling the Boundaries of Workspace
267
2. Bounded Work Space in several contexts This section deals with the notion of bounded workspace or, in other words, boundaries of the workspace, in three contexts. In Section 2.1 the notion is examined in the context of adaptive systems. This context provides information about requirements concerning contents of modelling language that would be useful for representation of workspace boundaries. In Section 2.2 the bounded workspace is examined in the context of business processes of an organisation. This context allows consideration of workspace as a scope of processes at different levels of abstraction and detail. Actors of the organisation take part in the accomplishment of those processes. An organisation itself is also an actor. The boundary of the workspace is defined with respect to a particular actor. Boundary processes (or parts of processes) are those that are relevant to at least two actors at the highest level of the abstraction of the representation that defines a particular process. Special attention is paid to the boundaries of workspace defined by an information system's processes. In section 2.3 the modelling of boundaries is discussed in the context of information systems development. 2.1. Bounded workspace in terms of adaptive systems The notion of bounded workspace has been discussed in the context of understanding the behaviour of adaptive systems [1]. Due to the fact that humans do not have stable inputoutput characteristics that can be studied in isolation, the following two principles are important for understanding of system behaviour when adaptation has taken place [1]: • It is necessary to consider the entire system, and, "instead of decomposing functions to the structural elements, we have to abstract from these elements and, at a purely functional level, to identify and to separate the relevant functional relations". • "The design must not result in a system that constrains the behaviour of human actors to only one possible work process. Instead, the design should define for the actors a space bounded by the goal and resource constraints". The necessity to model at a purely functional level requires the use of appropriate modelling tools that are capable of capturing the functionality of the entire system. The most common tools for modelling of functionality are Data Flow Diagrams (DFD) [5, 6, 7]. DFD can be considered as a higher level of abstraction of different business process models, i.e., in general, the business process models comprise more information about the system than the DFD. In the context of adaptive systems the bounded workspace is defined by the following three boundaries [1]: • Boundary of functionally acceptable state of affairs • Boundary of acceptable work load • Boundary of acceptable cost-effectiveness To be able to model all three boundaries, business process models are evidently more suitable than DFD, because the analysis of workload and cost-effectiveness are beyond the scope of the representation of DFDs. On the other hand, many business process modelling languages include either elements suitable for analysis of workload [8], or elements for analysis of cost-effectiveness [9], or both [10]. The suitability of business process modelling languages for modelling boundaries of workspace is discussed in more detail in Section 3. 2.2 Bounded workspace in the context of business process modelling In the context of business process modelling, the boundary of the workspace of the actor can be defined by processes that are accomplished together with other actors. So one and the same process can be regarded as a boundary process for several actors (Fig. 1). Actually, a boundary process may be defined as a sequence of actions, as a data flow or just as separate tasks. The elements of the boundary process are those that are relevant for at least two actors at a particular level of abstraction and detail. Theoretically, in many cases boundary processes are interactive processes. In general, business modelling languages are not the best tools for modelling interactive activities [11]. However, when
268
M. Kirikova /Modelling the Boundaries of Workspace
boundary processes are considered, those languages may be used as tools of representation of boundaries, because, in this case, the main focus is on the result of the cooperation of actors, but not on the particular sequences of states or activities. Appropriateness of those languages depends on (1) potential richness of detail that can be included in the model and (2) the possibility of considering a process at several levels of abstraction (Section 3). In the case when an organisation as a whole is considered as an actor, all processes relevant to modelling of its workspace boundaries may be divided into the following groups (Fig. 2): • Environmental processes • Organisational processes • Information systems processes Environmental processes are those that do not depend on the organisation, such as governmental laws, particular international marketing procedures and standards, etc. Organisational processes are those processes that are invoked on the basis of different contracts and agreements that are officially signed within the organisation (internal rules, standards, guidelines, procedures) or with environmental partners (competitors, providers, etc.). Special attention must be paid to the information system as a subsystem of an organisation. The information system as a system that processes information must be considered as an actor. However, this actor has a very limited capability to adapt in changing situations. Therefore constraints imposed by an information system must be well understood and documented. The need to document the impact of an information system on business processes is especially important in situations where complicated commercial software packages, e.g., Enterprise Planning Systems are introduced [12]. In other cases, requirements specifications may be considered as a contract between the organisation and the information system. The information system is not the only subsystem that may have a particular impact on the organisational processes. Theoretically each subsystem can impose particular constraints on other parts of the organisation by specific requirements or procedures. Thus each process relevant to the definition of the system's workspace boundary can be defined either by the environment, or by the system, or by the subsystem (Fig. 2). In the context of adaptive systems bounded workspace is defined by goals and resource restrictions [1]. In the context of business process modelling, the goals are represented by those processes that are ultimately defined in terms of organisational strategy and policy, and documented as organisational standards or rules. The restrictions of the workspace are those organisational processes that overlap with or are dependent on the environmental processes and particular subsystems' (e.g., information system's) processes. Therefore modelling of boundary processes is based on knowledge about relationships between environment and system; system and its subsystems; as well as particular commitments inside the system. A process of actor A
A process of actor B
Fig. 1 AB as a boundary process for actors A and B
M. Kirikova /Modelling the Boundaries of Workspace
Process defined by an environment
Process defined by a system
269
Process defined by a subsystem
Fig. 2 Processes to be considered for definition of the bounded work space of the organisation
2.3 Bounded workspace in the context of information systems development In the context of information systems development, the boundary between an organisation and an information system is an object of particular interest (Fig. 3). To construct this boundary, the workspace itself must be well understood. Knowledge concerning boundaries defined by the environment and the organisation itself is important in understanding the workspace. In a case where those boundaries are defined in terms of business process models, part of information systems requirements can be derived directly from those models. Therefore, from the point of view of information systems development, modelling of boundary processes in terms of business process models is relevant for the following reasons: • Clearly defined boundaries help organisational workspace (i.e., organisational processes) to be understood • Part of information systems requirements are defined by the boundary processes • Definition of the boundary between an information system and an organisation in terms of business process models allows assessing of the impact of the information system on organisational processes before actual implementation or installation of the system.
Boundary defined by an organisation (goals)
Boundary defined by an environment
Boundary defined by an information system
Fig. 3 Bounded workspace of the organisation in the context of information systems development
270
M. Kirikova /Modelling the Boundaries of Workspace
3. Using business process models for boundary definition This section deals with details concerning contents of the models used for representation of boundary processes. In Section 3.1 the theoretical requirements concerning the contents of the models that are based on theory about bounded workspace [ 1 ] are presented. In Section 3.2 several business modelling tools are analysed with respect to the requirements stated in section 3.1. 3.1. Constituents of the model Requirements concerning contents of the business process model used for representation of boundary processes are stated on the basis of analysis of bounded workspace in three contexts (Section 2). From the context of adaptive systems it follows that the model must contain elements that permit it to represent a functional state of affairs, workload, and costeffectiveness. Representation of the functional state of affairs may be shown by the use of elements of DFD, such as Subprocesses, Flows, Stores, and External entities [5, 6, 7] (Fig. 4). Additionally, for the analysis of workload, a Performer and Duration of the subprocesses become necessary parts of the model. Control flows, Timer events and Triggering conditions are other important elements for the workload analysis. Those three elements are necessary also for the analysis of cost-effectiveness. Evidently, Cost of the subprocesses is also required for cost-effectiveness analysis. The context of business process modelling does not impose new requirements on the elements of the process. However, it approves the necessity to represent, in the business process model, control flows and triggering conditions. These elements usually are not presented in DFDs. On the other hand, this context suggests the following requirements to the model as a whole: • Multilevel representation of the processes • Simulation possibilities • Full and optional representation of all elements of the model Boundary process in three contexts
Elements of business process model
External entity
Subprocess Context of adaptive systems
Information flow
I Functional state of affairs Material flow
Cost-effectiveness
Control flow
Timer Context of business process modelling
Context of information systems development
Triggering condition
Store
Performer
Cost
Duration
Fig. 4 Elements of a business process model (column on the right) that are needed for boundary processes representation in different contexts (coloumnon the left)
271
M. Kirikova /Modelling the Boundaries of Workspace
The multilevel representation of the model is necessary due to the fact that boundary processes can be quite complicated. As the model has to be communicated to the people involved in the process, it is necessary to keep parts of the model at a level of complexity convenient for human comprehension [13] and clearly represent the relationships between the parts. Simulation is the most common tool for estimating workload and cost-effectiveness. Therefore it is relevant not only in the context of business process modelling but also in the context of adaptive systems. Full and optional representation of all elements of the model is one of the main characteristics for successful modelling of boundary processes. Boundary processes usually are those processes that require several performers and can be derived from particular organisational or environmental documents, e.g., contracts. Those documents may contain information that allows representation of the process in full detail. However, in many cases only partial information is available that permits the construction of a business process model that contains only part of its elements [14]. Therefore it is important that the optional as well as the full scope of elements represented in Fig. 4 can be depicted and seen in the boundary process model. The context of information systems development does not impose new requirements on the contents of the business process model. Nevertheless, this context confirms that stores must be presented in the business process model and that it is necessary to distinguish between information, data and control flows (Fig.4). 3.2 Availability of tools for boundary process modelling Many business modelling tools are currently available [15]. They differ in their functionality, complexity of use, and interfaces. In this section features of several tools are illustrated to give an insight in their suitability for modelling of boundary processes. The choice of those tools depended only on availability of the information about the tools. No other criteria were used for the selection of the tools. Features of the selected tools are illustrated in Table 1 and Table 2. Table 1. Features of selected business modelling tools
Name of the tool
No. 1.
Axiom-SYS [16] BPWin[17]
Business modelling language(s) represented by the tool Process model by Yordon and DeMarko IDEF.IDEFO, IDEF3[17]
Multilevel representation +
+
Simulation
Full and optional representation of elements Partly
Based on Petri nets
+
4.
COSA Workflow [18] GRADE [19]
By linking to theBP Simulator +
GRAPES BM [8]
+
+
5.
Live Model [20]
+
+
6.
Oracle Designer [21] Scitor Process [22] Silverrun-BM [9]
Event-driven Process Chain Diagram Oracle CASE
+ (except of Cost) Partly
+
+
Partly
+
-
2.
3.
7. 8.
9.
Workflow BPR [10]
DFD
Process models by Gane and Sarson, Merise, Ward-Mellor, Datarun, Yordon and DeMarko Activity Decision Flow
+
+
-
Partly
+
-
272
M. Kirikova /Modelling the Boundaries of Workspace
In Table 1 the business modelling language represented by the tool is shown in the third column. The fourth column illustrates whether the multilevel representation of the processes is possible, the fifth column shows whether the tool has simulation function and the sixth column characterizes the possibility of full and optional representation of all available elements. The sign "+" means that the requirement that corresponds to the feature (Section 3.1) is satisfied, "-" means that the requirement is not satisfied. The cell is left empty in cases where information corresponding to it was not available to the author of the paper. Table 2 illustrates the coverage of necessary constituents of the model by particular tools. The constituents are those depicted in Fig. 4., i.e., Information Flow, Material Flow, Control Flow, Timer, Triggering Condition, Store, Performer of the process, Cost and Duration of the process. External Entity and Subprocess are omitted in Table 2 because they are present in all nine selected tools. Those elements can be named differently in various tools. So, External Entity may be named as External Object, External Process, or Terminator; and Subprocess may be named as Function, Action, Activity, and Task. If the tool represents a particular business process element the sign "+" is in the corresponding cell; if the element is not represented by the tool, the sign "-" is used. If appropriate information about the particular element with respect to the particular tool is not found, the cell is left empty. Tables 1 and 2 show that there is no one tool, among the selected ones, that would fully satisfy the requirements stated in Section 3.1. However, quite a lot of tools, such as GRADE, Silvemin-BM, Workflow-BPR satisfy almost all requirements. This fact indicates that, actually, currently available business modelling tools support (at least partly) modelling of boundary processes. In real life situations there are additional requirements for boundary process modelling tools. Those additional requirements are more subjective and can be named as the representational "culture" of the tool. Due to the reason that boundary process diagrams must be communicated to employees, it is important to represent diagrams in a way that satisfies cognitive principles of comprehension [13]. It means that diagrams must be represented in a clear manner where elements of the diagram may be freely grouped so that, in each particular case, they convey meaning not only due to the contents of the elements but also due to their location.
Table 2. Model elements covered by the selected business modelling tools No.
Name of the tool
1. 2. 3.
Axiom-SYS [161 BPWin[17] COSA Workflow [18] GRADE [19] Live Model [20] Oracle Designer [21] Scitor Process [22] Silverrun-BM [9] Workflow BPR [10]
4. 5. 6. 7. 8. 9.
1
Infor. Flow + + +
Mat. Flow
Duration
-
-
+ + +
+ +
+ +2
+1
+
+ + +
+ -
+2 + +2
+ +
+
Timer
Trig. Cond.
Store
+
-
+ +
+ +3 +
+ +3 +
+ +3 +
+ +
+ + +
+ + +
+ +
. -
A sub-process may be named as a performer Indirectly or by related model (diagram) 3 May be presented as an event 2
Cost
+ -
Performer -' -
Cont. Flow + +
M. Kirikova /Modelling the Boundaries of Workspace
273
4. Towards boundary process modelling methodology Most business process modelling methodologies do not consider boundary processes of the workspace directly. However, several methodologies indirectly refer to these processes. These methodologies are discussed in Section 4.1. In boundary process modelling the main sources of information are documents that are relevant in the performance of particular activities. Those documents may be contracts, documented internal rules, governmental laws, etc. Such documents are in the center of interest in boundary process modelling because they represent those issues of performance that can be regarded as goals or restrictions of the organisation's or the actor's workspace. When processes are modelled on the basis of this information then they show the boundaries of workspace. A boundary process consists only of those issues of performance that are officially regulated and therefore do not impose on the actors unnecessary restrictions that would press an employee to follow a course of action that is not appropriate to his or her education, skills or habits. Such an approach prevents over-modelling, where detailed models of human performance are produced, but never followed by their performers [10]. On the other hand, the modelling of boundary processes helps to avoid omissions of those issues that are relevant at organisational or individual levels with respect to cooperation with other actors in the environment or inside the organisation. Boundary processes usually are based on several documents that add information to each other. Therefore those processes have quite a high level of complexity (example is given in Section 5). Guidelines for boundary process modelling are given in section 4.2. 4.1 Related work There are several systems modelling methodologies that indirectly consider issues related to boundaries of workspace. Some of them, namely, The Strategic Modelling for Enterprise Integration, Semantic Object Modelling Approach, Object Oriented Approach for Business Process Modelling, Use Cases, and TEMPORA, are discussed in this section. Strategic Modelling for Enterprise Integration [23, 24] considers strategic relationships between actors of the system in terms of soft goals, goals, tasks and resources. This methodology supports the identification of the boundary processes but does not describe them. The methodology is focused on responsibilities of each particular actor rather than on goals or restrictions of participatory performance that are relevant in boundary process description. In Semantic Object Modelling Approach [25] it is claimed that "both requirements engineering and business process reengineering must start with a model of communication and contracts among the participants in the business and other stakeholders, customers, suppliers and so on". The model of communication in this methodology is based on speech-act theory. In some aspects this starting point is similar to Strategic Modelling for Enterprise Integration described above. Each speech-act includes the goal of communication and the task to be performed by a destination object. The starting point seems to be similar to that of boundary process modelling. However, Semantic Object Modelling Approach further considers decomposition of those tasks. Boundary process modelling requires the opposite - functional generalisation with respect to structural units of the organisation (Section 2.1). Object Oriented Approach for Business Process Modelling [26] is based on analysis of the customer-supplier relationship. The methodology proceeds from the definition of the customer-supplier interaction model to the identification of the customer-supplier chain and further to the definition of process flow. Again, the analysis proceeds in the direction of functional decomposition where the role of each participant is one of the most important aspects. Therefore this approach differs from boundary process modelling in a way similar to Strategic Modelling for Enterprise Integration and Semantic Object Modelling Technique. Use Cases [27, 28] is also an approach that takes into consideration borders of communication. However, this approach does not abstract from use cases of defined structural components to the system as a whole, in terms of functions. Abstraction is made,
274
M. Kirikova /Modelling the Boundaries of Workspace
but in terms of objects, that does not permit the definition of a border of workspace of an adaptive system as prescribed by the theory of Cognitive Engineering [1] (Section 2.1). TEMPORA [29] is one more methodology relevant for boundary process modelling. This methodology does not start with the analysis of communications or relevant organisational documents. However, it contributes to boundary process modelling because it clearly shows correspondence between business rules and processes. Actually, the existence of TEMPORA is an additional proof that boundary processes can be defined in terms of process models. Since documents mainly contain rules, boundary processes may be derived from the rules represented in the documents. 4.2. Guidelines for boundary process modelling Modelling of boundary processes gives a clear picture of those processes or elements of processes that are prescribed by particular rules, laws, agreements or guidelines. Analysis and representation of those processes is worthwhile because they clarify, in general, how the activities that require participation of several performers proceed. Boundary processes do not specify what exactly each actor must do, they just picture what must be accomplished by mutual cooperation and how this must be done. Particular activities of each participant are either clear from the context or, in other cases, several alternative actors can perform these activities. If, for example, the process is "examination in a university" then it is clear that the student is the one who must take it and the teacher is the one who evaluates it; if the subprocess is "check the payment before the examination", then it is not important whether it is checked by a secretary, teacher or any other person of the administrative staff. In each situation checking can be done in the most suitable and effective way. As discussed in previous sections, the main feature of boundary process models is their correspondence to organisational goals and internal rules in such a way that they do not impose more restrictions on employees than are regulated by relevant official documents generated inside and outside the organisation. It is suggested that boundary processes are modelled by a professional systems analyst using an appropriate modelling tool (Section 4.2). A simplified flow of actions in boundary process modelling is shown in Fig.5. Modelling of boundary processes starts with analysis of documents and integration of information included in them. On the basis of this integrated information boundary processes are constructed. During construction, people who created the documents may be interviewed (if possible) to make sure that the documents are not misunderstood. Actual performers also are interviewed to identify what information concerning processes is relevant in their work and must be included in the visual representation of the processes. When the process model is ready it must be distributed to those employees who are affected by the particular workspace boundary represented by the model. The systems analyst must check whether the model is understood properly. Face to face communication is preferable for model distribution. It should be noted here that the boundary process model will not always look like a DFD. It is possible that the model consists of particular abstractions of the processes (e.g., only the sequence of sub-processes is identified) or even unrelated elements. In any case, such a boundary process model contains information that must be known by particular employees and included in their personal processes in the most suitable way. If boundary process modelling becomes a part of organisational knowledge management it is preferable to relate process modelling to particular change management tools (such as DOORS [30]) that can handle document management in an object oriented way. Each document can then be considered as a scope of objects that are related to particular elements of boundary processes. Thus the changes in the documents can be directly reflected in boundary process models, too. Modelling of boundary processes is suggested in the following situations: Early stages of requirements definition Handling of time consuming processes and conflict situations Analysis of organisational strategy Process reengineering Contract analysis and establishment
M. Kirikova /Modelling the Boundaries of Workspace
275
Boundary process modelling in early stages of requirements definition permits not only identification of part of requirements for information systems development. Boundary process models promote better understanding of needed innovations and serve as a background for decision making concerning granularity of analysis of other processes in the organisation's workspace.
Analysis and distribution of diagrams
Fig.5. Simplified flow of actions in boundary process modelling
Lengthy processes and conflict situations in an organisation may be caused by several fuzzy interpretations of documents circulated in the organisation. If several documents refer to one and the same decision making situation, complexity of decision making can be higher than the cognitive capabilities of the personnel. In such cases even structured decision- making situations can be handled by employees as semi-structured or unstructured ones; and several different interpretations of the situations held [6]. Careful analysis of information included in the documents and representation of contents of the documents, in terms of processes, may explicitly show relationships of different aspects regulated by the documents and so contribute to the shared understanding of the situation and realistic expectations of the employees. The analysis of boundary processes can contribute also to the analysis of organisational strategy. The procedural interpretation of organisational rules can uncover potential opportunities or point to those issues of the strategy where changes or improvement are necessary. The same refers also to business process reengineering and contract analysis and establishment. 5. Case study - Process of examinations in Riga Technical University Examinations in Riga Technical University are regulated by three main documents; regulations concerning handling the examinations, regulations concerning payment for reexamination and regulations concerning particular dates for examinations and reexaminations. The first two documents have recently been changed several times and are quite complicated. In cases of re-examination the number of issues to be considered during decision making were more than five. Quite often there were long discussions and misunderstandings between students and secretaries due to different interpretations of the documents. Construction of the boundary process "Examination" revealed complexity of the process. It was much higher than expected by the creators of the documents regulating the examination process. The complexity of the process is illustrated in Fig. 6 by the bird's eye view on the process model.
276
M. Kirikova / Modelling the Boundaries of Workspace
The process in Fig. 6 shows a roadmap a student can go through if he fails to pass one or more examinations one ore more times. The process model was made using business modelling tool GRADE [31]. Originally it was coloured. The use of colours and grouping of elements were chosen in a way that supported understandability of the picture. The model in this paper is presented only for the demonstration of the outlook of the particular boundary process and not for analysis of its semantics. During construction of the model, administrators of the university where interviewed. Preliminary versions of the model were discussed in the interviews. Secretaries, students and professors also were interviewed to find main points of misinterpretations and misunderstandings. The final version of the boundary process model was communicated to one secretary and put up on the notice board for students. Other secretaries requested their personal copies of the model. The existence of the boundary process model "Examination", not only shortened discussions between secretaries and students and saved students from wrong expectations, but also has revealed specific requirements to be analysed and incorporated in the requirements specification of the new information system currently under development. It also was a cause of a number of discussions concerning possible improvements in the documents regarding the examination process.
Fig. 6 Complexity of the boundary process "Examination"
M. Kirikova / Modelling the Boundaries of Workspace
277
6. Conclusions The theory of Cognitive Engineering [1] suggests that the best results of human performance are achieved in situations where they are equipped with knowledge concerning boundaries of their workspace. In the paper this idea is illustrated in the context of business process modelling. In business process modelling a boundary of workspace may be regarded as a boundary process. Business modelling languages in general are appropriate tools for boundary process modelling. Nevertheless, there are particular requirements concerning the elements of languages and software tools that are used for representation of boundary processes. Boundary processes are constructed on the basis of information that is provided by official documents used in an organisation, therefore those processes do not impose unnecessary details that could hinder creativity of employees and do not restrict flexibility in performance of individual activities. A specific feature of boundary modelling methodology is functional generalisation. An experiment of boundary process modelling in Riga Technical University has been successful. However, more experiments are needed to develop finer definition of the boundary process modelling methodology. Further investigations are intended concerning the use of boundary process models in information systems development and knowledge management fields. Acknowledgements I acknowledge Janis Grundspenkis (Riga Technical University) and Janis Stirna (Stockholm University and Royal Institute of Technology) for valuable discussions concerning the draft of the paper, and Maria Roba and Jeremy Theaker for checking of the use of English. References [1] J. Rasmussen, A.Pejtersen and L. P. Goodstein, Cognitive Systems Engineering, John Wiley & Sons, Inc., USA, 1994. [2] D. Brash, Participation in Enterprise Modelling: Empirical Studies in Support of a Requirements Identification and Formulation Team, Department of Computer and Systems Sciences, Stockholm University and Royal Institute of technology, Report series No. 99-013, 1999. [3] A. Patel, et al., Stakeholder Experiences with Conceptual Modelling: An Empirical Investigation. Proceedings of the 19th International Conference on Information Systems, December 13-16, 1998, Helsinki, Finland. R. Hirschheim et al. (Eds). Omnipress, 1998, P. 370 - 375. [4] Workflow Handbook 1997, Lawrence, P. (Ed.), John Wiley & Sons Ltd., Great Britan, 1997. [5] A. M. Langer, The Art of Analysis. Springer-Verlag, New York, Inc., 1997. [6] K. E. Kendall and J.E. Kendall, Systems Analysis and Design, Prentice Hall, Inc., 1995. [7] A. M. Davis, Software Requirements Analysis and Specification, Prentice Hall, 1990. [8] GRADE Business Modeling Language Guide, INFOLOGISTIC GmbH, 1998. [9] Silverrun-BPM: A Software tool designing Business Process Models. Reference Manual. CSA Computer Systems Advisers, 1996. [10] Workflow BPR, available at Http://www.holosoft.com [11] J. Barzdins et al., Business Modelling Language GRAPES BM - 4.0 and Its Use, Riga, DATI, 1998 (in Latvian). [12] R. Baskerville, S. Pawlowski and E. McLean, Enterprise Resource Planning and Organisational Knowledge: Patterns of Convergence and Divergence. Proceedings of the ICIS'2000, P. 396-406. [13] J. R. Anderson, Cognitive Psychology and Its Implications. Freeman and Company, 1995. [14] R. D. Banker, J. Kalvenes, and R. A. Patterson, Information Technology, Contract Completeness, and Byer-Supplier Relationships. Proceedings of the ICIS'2000, P. 218-228.
278
M. Kirikova / Modelling the Boundaries of Workspace
[15] J. Stirna, Choosing Strategy for Enterprise Modelling Tool Acquisition. Department of Computer and System Sciences, Stockholm University and Royal Institute of Technology, Report series No. 99-102, 1999. [16] Axiom-SYS, available at Http://www.stgcase.com. [17] BPWin Feature Guide, Logic Works, inc., 1997. [18] COSA Workflow, available at Http://www.cosa.de. [19] GRADE User Guide, INFOLOGISTIC GmbH, Germany, 2000. [20] Live Model, available at Http://www.intellicorp.com. [21] Oracle Designer, available at Http://www.oracle.com/ip/develop/ids/index.html/designer.html. [22] Scitor Process, available at Http://www.scitor.com. [23] E. S. K. Yu, Modelling Strategic Relationships for Process Reengineering, University of Toronto, 1994. [24] E. Yu, Strategic Modelling for Enterprise Integration. In: Proceedings of the 14th IFAC World Congress (H-F. Chen, D-Zh Cheng and J-F. Zhang, Eds.), Vol. A, Elsevier Science, 1999. [25] I. Graham, Requirements Engineering and Rapid Development: An Object Oriented Approach. ISBN: 0 201 36047 0. Addison-Wesley, 1998. [26] M. Rohloff, A Framework for Organisational Design and Information Systems Development, in Proceedings of the 5th European Conference on Information Systems, Cork Publishing Limited, 1997. [27] Cr. Larman, Applying UML and Patterns: Introduction to Object Oriented Analysis and Design. ISBN: 0 13 748880 7. Prentice Hall, Inc., 1998. [28] D. Leffingwell and D. Widrig, Managing Software Requirements: A Unified Approach, Addison-Wesley Longman, Inc., 2000. [29] B. Wangler, Business Rule Capture in TEMPORA, SISU, Sweden, 1993. [30] DOORS, available at Http://www2.telelogic.com/doors/. [31] GRADE, available at Http://gradetools.org.com.
Information Modelling and Knowledge Bases XIII //. Kangassalo et al. (Eds.) IOS Press, 2002
279
DESIGNING METHODS FOR QUALITY Elvira LOCURATOLO IEI, Consiglio Nazionale delle Ricerche, Via Alfieri, 1 Ghezzano - San Cataldo - Pisa Telephone: +39 50 3152895. Fax: +39 50 3152810 E-mail locuratolo @ iei.pi. cnr. it
Abstract A metamethod for designing methods that achieve quality requirements is provided and MetaASSO, a step-wise approach to the design of a conceptual database design methodology, named ASSO, is described. MetaASSO starts from the proposal of a method for meeting two conflicting quality requirements: flexibility in modifying a database schema and efficiency in accessing and storing information. More concrete proposals that increase the quality requirements are then given until a modular design is achieved. The provided approach makes it easy to describe ASSO and the modules, called methodological tools. Furthermore it favours the achievement of practical results. Features of ASSO can thus be reused to design an ASSO-toolkit that performs the proof process at a low cost. Categories and Subject Descriptors: H.2.1 [Information System]: Conceptual Design - Data Models Schema and Subschema; H.2.3. [Information System]: Languages; D.2.10 [Software Engineering]: Design - Methodologies; D.2.2 [Software Engineering]: Tools and Techniques; F.4.3 [Mathematical Logic and Formal Languages]: Formal languages.
1. Introduction The approaches for quality proposed in literature [22] are mainly concerned with models of software quality; these consist of informal attributes classified by means of suitable criteria. Measures of quality are defined exploiting metrics that represent the system ability to satisfy the desired quality requirements. The quality requirements change during the cycle of software life and makes it appropriate to consider the following aspects of quality: • • •
goal quality, which is the quality related with the real needs of the users; design quality, which is the quality related with the main parts of a software project such as software architecture, program structure and strategies to define the user interface; quality in use, which is the quality perceived when the software is executed in the user environment.
The goal quality is a conceptual entity which cannot be completely defined at the beginning of a design since the user is not really aware of his needs. The design quality reflects the philosophy and the design strategies. Details improving software quality can be introduced during implementation and testing, but the fundamental nature of software quality, represented by the design quality, is substantially unchanged. The goal quality effects the design quality, whereas the requirements for quality in use can be optionally included into the goal quality specification. Measures of quality in use performed when the product is completed can be exploited to evaluate some of the goal quality requirements [23]. Goal quality, design quality and quality in use are aspects of quality which can be introduced to design software as well as to design methods. In this paper, an approach to
280
E. Locuratolo / Designing Methods for Quality
the design of methods for the achievement of quality requirements is provided and, as an example, MetaASSO, the metamethod employed to design a methodology of conceptual database design, named ASSO, is described. Starting from an initial proposal of method to conciliate flexibility in modifying a database schema with efficiency of the object oriented methodologies, two conflicting quality requirements, more concrete proposals that increase the quality requirements are given until to result into a modularised design. The modules, called methodological tools, have been designed exploiting the same approach. This makes it easy describe ASSO and the methodological tools. Features of ASSO can thus be reused to design an ASSO-toolkit that performs the proof process at a low cost. ASSO ensures easiness in specifying the conceptual schema, flexibility in reflecting the changes occurring in real life, consistency between static and dynamic modelling, correctness of the logical schemas and efficiency in accessing and storing information. As methodologies currently in use for database design do not guarantee the conceptual schema consistency, as they do not ensure that the schema supported by the database system satisfies the requirements specified at the conceptual level and mainly as they do not ensure the achievement of both flexibility and efficiency, two conflicting quality requirements, ASSO is a novelty. The achievement of all the listed quality requirements makes the difference between ASSO and the traditional methodologies of database design [4],[5],[11],[17],[18],[25]. An overview of these methodologies with respect the achievement of quality has been given in [14]. In order to achieve quality in ASSO, the following methodological tools have been designed: • • •
Structured Database Schema, conceptual model designed to specify information with flexibility while guaranteeing consistency; Revisited Partitioning, method designed to refine the Structured Database Schema towards correct and efficient implementations; Relations between ASSO and B, approach of study proposed to link a methodology designed at conceptual level with a formal method designed at a lower abstraction level.
The paper is organised as follows: section 2 proposes the metamethod and MetaASSO. Section 3 describes the methodological tools and their effects in designing an ASSO-toolkit. Conclusions and further developments are included in Section 4. 2. The Metamethod The metamethod is a stepwise approach to the design of methods that meet goal quality requirements. Each step is characterised by an objective, a solution and a demonstration. The objective requires to design a method that meets quality requirements; the solution is the proposal of a design and the demonstration is the means to establish if the solution is a good solution, i.e., if the solution satisfies the objective correctly. A good solution enriched with new quality requirements defines the new objective of the sequence. The metamethod is represented in Fig.l, where each step is composed by two linked points: the former is the objective, the latter the solution and the directed arrow the demonstration which establishes the solution goodness. Obj1 requires to design a method that meets a not empty subset of the goal quality requirements, let us say the QRi requirements. Solution1 proposes a method at a first definitional level, i.e. it only describes the features which are sufficient to demonstrate the initial goodness of the solution in a simple way. If solution 1 is a good solution, a new objective, i.e. Obj2, is defined, otherwise a new solutioni can be proposed. Obj2 is defined by adding some goal quality requirements not included in QRi to solution 1, let us say the QR2 requirements. As solution 1 is not a detailed solution, QR2 may also be empty; in this case, Obj2 is defined by solution1. Once Obj2 has been defined, a solution2 is proposed and so on.
E. Locuratolo / Designing Methods for Quality
281
Starting from a good solution 1, in order to demonstrate that the subsequent solutions are good, it is sufficient that the solution is a quality oriented refinement of the previous solution, i.e. a solution which refines the previous one while meeting the added quality requirements and guaranteeing the solution goodness. Step by step, the original solution becomes a more concrete solution until a final solution is obtained. As each goal quality requirement belongs to some QRi and as the correct objective satisfaction is a transitive relation, the final solution is a good solution that satisfies the goal quality requirements.
QR1 Obji
Soil
Sol1 +QR2 Obj2
Soln-l + QRn
Sol2
Objn
Final Solution
• ooo • Fig.l: Metamethod
The provided metamethod is the evolution of the approach given under the assumption that all the goal quality requirements are established [16]; however, as the goal quality is a conceptual entity that cannot be completely defined at the beginning of a design, the described metamethod removes this constraint. The metamethod links the aspects of quality discussed in section 1, since it implicitly includes the goal quality, whereas the quality in use requirements can be included into the goal quality specification. Further, it guarantees the solution goodness, while giving the opportunity to reuse the previous experience and to agree a new solution. As an example of the given metamethod, the next section proposes MetaASSO, a step-sequence resulting into a modular design of ASSO. 2.1. MetaASSO This section describes MetaASSO, the sequence of steps resulting into ASSO [15], [3], [26], [27], [31], a methodology of conceptual database design, which achieves the following goal quality requirements: Easiness of use, the method ability to provide a conceptual schema easy to be used. Flexibility, the method ability to provide a conceptual schema easy to be modified. Reliability, the method ability to provide the conceptual schema consistency and the logical schema correctness. Economy, the method ability to require low costs of the proof processes. Efficiency, the method ability to access and store information using a limited amount of time and storage. The initial objective of MetaASSO requires proposing a methodology for the achievement of flexibility and efficiency, two conflicting quality requirements. The proposed solution, solution1, consists of the following two linked schemas: a consistent conceptual schema, i.e., a consistent high level specification of the database structure and behaviour; a logical schema, i.e., a database schema supported by an efficient database management system; a correct transformation, i.e., a transformation from the conceptual to the logical schema, preserving the conceptual schema semantics. Solution1 is represented in Figure 2. Solution1 is a good solution since the required transformation from the conceptual to the logical schema is the link to achieve both the easy modifiability of the conceptual schema and the implementation on an efficient database system. Solution] describes only the idea to overcome the conflict between flexibility and efficiency and thus it is not a detailed solution; in order to define the new objective of the sequence, i.e. obj2, we choose to add no further quality requirements to
E. Locuratolo / Designing Methods for Quality
282
this solution, so that a solution2 must be proposed to define a consistent conceptual schema correctly linked with a logical schema.
Consistent Conceptual Schema
Correct Transformation
Logical Schema
Fig.2: Solution 1
Solution proposes to support the conceptual schema by means of a formal semantic data model extended to handle behaviour, to support the logical schema by means of a formal object model and to link the two schemas by means of a formal transformation. Solution2 is represented in Figure 3. In order to demonstrate that solution2 is a good solution, it is sufficient to demonstrate that solution2 is a quality oriented refinement of solution1. Since semantic data models are promising models to specify database applications in a easy and flexible way; object models are promising models to obtain efficient implementations and formality is the means to achieve both the conceptual schema consistency and the design correctness, solution is a good solution. Solution2 with the added easiness of use and economy becomes the new objective of the sequence, i.e. obj3, however, the added requirements conflict with formality, since formality requires high costs of the proof processes while degrading easiness of use. In order to overcome these new conflicts, a quality oriented refinement ofsolution2meeting easiness of use and economy is required.
Formal Extended Semantic Data Model Formal Transformation
Model
Fig.3: Solution 2
Solution3 proposes to use the Revisited Partitioning [28],[34] as a quality oriented refinement of solution2. The Revisited Partitioning is a formal method that maps semantic data models into object models improving the results achieved with the Partitioning Method [26]. This method consists of recursive decompositions of specialisation hierarchies supported by semantic data models until all disjoint classes are obtained. The Revisited Partitioning makes the original Partitioning easier to be understood. Furthermore, it recomposes the disjoint classes into a specialisation hierarchy supported by object systems. The specialisation hierarchy supported by semantic data models has the property that each object instance can belong to any class of the specialisation hierarchy thus enhancing flexibility, whereas the specialisation hierarchy supported by objects models has the property that each object instance must belong to one and only one class, thus ensuring efficiency. The Partitioning works on structural aspects of database; solution3 proposes to extend the specialisation hierarchies on which the partitioning works in order to support
E. Locuratolo / Designing Methods for Quality
283
descriptions of databases in which both structural and behavioural aspects of modelling are specified. Similarly to attributes, the operations of the extended semantic data model are inherited and preserve both the class and the specialisation constraints. The Revisited Partitioning can be applied to a consistent Extended Semantic Data Model. Step by step, the models generated by the Revisited Partitioning preserve equivalence, thus ensuring correctness. Finally, the idea to link both formal and informal notations is exploited in order to achieve easiness of schema specification. Figure 4 represents solution3: Mi....Mn are sets of extended semantic data models generated by the decompositions of the Revisited Partitioning, whereas the bi-directed arrows represent the equivalence between these sets of models.
Formal Extended Semantic ^ „. , , Data Model
object „ Mi
. , , ,, . RevisitedDPartitioning
D
., Mn
Model
O O O
Fig.4: Solution 3
Solution proposes to enrich the operations specified in the extended semantic data model with the constructs of preconditioning, non-determinism, and partiality, defined in B, a formal method of software engineering appropriate for commercial use. The Enriched Extended Semantic Data Model is called Structured Database Schema. Solutiom is a quality oriented refinement of solution3 since a formal relation can be established between ASSO and B. As a consequence, the Structured Database Schema can be translated into Bmachines; these define the semantics of the model supported by ASSO at a lower abstraction level where more formal details must be explicited with respect to the conceptual model supported by ASSO. Tools supporting B can be used to prove the model consistency [30], however practical benefits can be achieved constructing an ASSO toolkit which exploits the conceptual features of ASSO to reduce the complexity of the consistency obligations as well as to improve easiness of the schema specifications. Steps of behavioural refinement must be applied to the Structured Database Schema before applying the Revisited Partitioning. Figure 5 represents the Final Solution: SDBS1,....,SDBSn are schemas supported by Structured Database Schemas, the directed arrows represent steps of behavioural refinement, the bi-directed arrows represent the equivalence between the models generated by Revisited Partitioning and Object Schema is a specialisation hierarchy supported by an object system.
Structured Database Schema • 1
DBSi
> • oo• i
[7
Object Schema
SDBSi
l£> •
Behavioural Refinement
• O O •
•
Revisited Partitioning
Fig.5: Final Solution
The design of ASSO resulting from this step-sequence includes the following two phases:
284
•
•
E. Locuratolo / Designing Methods for Quality
the conceptual design, which consists of constructing a conceptual schema supported by the Structured Database Schema. Both structural and behavioural aspects of modelling are specified at a high abstraction level, whereas simple first order formulas permit to prove the conceptual schema consistency. The refinement, which consists of a sequence of schema transformations and comprises two sub-phases: >
>
the behavioural refinement, which is a stepwise approach that leaves the state unchanged while reducing details in the operation specifications. Similarly to the B refinement, at each step a model supported by ASSO, with more implementation details than the previous one, is proposed. Simple first order formulas are proved to guarantee the behavioural correctness. The revisited partitioning, which is a decomposition process that starts from a schema supported by a Structured Database Schema and ends with a logical model to be implemented on an object system.
The decomposition process generates sets of schemas equivalent to the behaviourally refined Structured Database Schema thus ensuring the Revisited Partitioning correctness without any need of proof. As in semantic data models [24], each object instance of the conceptual schema can belong simultaneously to any class of a specialisation hierarchy, whereas, as in most object systems, each object instance of the logical schema belongs to one and only one class [1]. These two properties, coupled with the design correctness, guarantee the coexistence of both flexibility and efficiency [27], whereas both the design of the Structured Database Schema and the design of the Revisited Partitioning guarantee economy [24]. Finally, the link between formal and informal notations has been exploited in order to ensure easiness of use. The quality requirements achieved by ASSO result from the initial need to conciliate the flexibility in reflecting the changes occurring in the real life, with the efficiency of the object-oriented methodologies [33] and by the subsequent need to overcome the conflict of the formality requirement, introduced to ensure reliability, with both easiness of use and economy [14]. 3. Methodological Tools This section describes the innovative methodological tools designed for achieving quality in ASSO. They are the Structured Database Schema, a formal extended semantic data model able to guarantee a consistent integration between static and dynamic aspects of modelling while permitting the easy modifiability of the schema. The Revisited Partitioning, a method which allow decomposing a behaviourally refined Structured Database Schema until it obtains a logical schema which can be supported by an efficient object system, and the relations between ASSO and B, formal links established to achieve practical results designing an ASSO-toolkit based on the B-toolkit [6]. In the following, the description of each of them will be provided. 3.1. Structured Database Schema The model supported by ASSO [3], [26], called Structured Database Schema, is based on the concepts of class and specialisation. The former permits to model both structural and behavioural aspects of a set of objects in the database, whereas the latter permits to extend the classic specialisation hierarchy with aspects of behavioural modelling. In the following, their definitions will be given: Definition (class) A class is a tuple (name, Alt, Const, Op) where name is a term connoting the class name and denoting the class objects. The term name called class extension represents a subset of a given set. An is a finite set of terms called attributes; each of them is defined
E. Locuratolo / Designing Methods for Quality
285
as a function from the extension name to either a given set or the extension of another class. Both extension and attributes define the class state variables, whereas the predicate, which formalises their definitions, is called class constraints. Const is a predicate on the class state variables, which formalises a set of properties, called class application constraints. Op is a finite set of operations defined as functions from predicates establishing the class constraints to predicates establishing the class constraints. A special operation, called initialisation, belongs to Op. This associate initial values establishing the class constraints with the state variables. The class operations are formalised using a notation based on the generalised substitution language [2]. Basic operations having the semantics of the simplest substitution preserving the class constraints have been defined as building blocks [12] and a set of constructors coinciding with those of the generalised substitution language have been recursively applied to the basic operations. The axiomatic semantics of basic operations and constructors is given in Appendix A. Definition (is-a* relationship) If class name2 is in is-a* relationship with class name1 then: •
the objects of class name2 are a subset of the objets of class name1.
•
class name2 inherits both attributes and operations from class name1; has specific operations and may have specific attributes.
Definition (inherited operation) If op1 is an operation of class name 1 and class name2 is in is-a* relationship with class namel, then the inherited operation op 1 on name2 is the operation op1 instantiated on the class name2. In order to define operations which preserve the constraints of the specialisation hierarchy, the inherited initialisation (resp. the inherited operation which inserts objects) must be composed through the parallel composition [21] with a specific operation on the class name2 corresponding to the inherited operation. Definition (specialisation of operation) If op1 is the initialisation (resp. an operation which inserts objects) of class name1 and class name2 is in is-a* relationship with class name2, then the specialisation of op1 on name2 is the specific operation on the class name2 corresponding to the inherited op1. Definition (Structured Database Schema) A Structured Database Schema is a connected acyclic graph whose nodes are classes and whose links are is-a* relationships between classes. Application constraints may be associated with the classes and/or with the sub-hierarchies of a Structured Database Schema; however, for the sake of simplicity, in this paper the application constraints will be associated only with the classes. Property The Revisited Partitioning can be applied to the Structure Database Schema after steps of behavioural refinement [3]. Definition (specialised class) If class name2 is in is-a* relationship with class name1 then the specialised class name2 can be defined as in the following: • • •
the objects of the specialised class name2 are those of class name2; the attributes are both the attributes of the class name1 and the specific attributes of the class name2; the initialisation (resp. an operation which inserts objects) is the parallel composition of the inherited initialisation (resp. the inherited operation which inserts objects) with the corresponding specialisation;
286
• •
E. Locuratolo / Designing Methods for Quality
for each remaining operation on the class name1, the inherited operation belongs to the specialised class name2. the specific operations of class name2 belong to the specialised class name2. Property Each specialised class is a class.
Definition (Structured Database Schema) A Structured Database Schema is a set composed by a class, called root class, and by a finite number of specialised classes. The above definition of Structured Database Schema is equivalent to the first one, however, this has been taken into consideration since it permits to see the model supported by ASSO as a set of independent classes which allow the decomposition of large consistency obligations into a set of small obligations. In fact, as the operations of the Structured Database Schema have been defined to preserve the constraints which formalise the specialisation hierarchy, in order to guarantee a consistent integration between static and dynamic aspects of modelling, it is sufficient to prove the consistency only with respect to the application constraints. Further, the proofs of inherited operations can be avoided. Definition (class consistency) A class is consistent if the initialisation establishes the application constraints and each of the remaining operations preserves the application constraints. Definition (model consistency) A Structured Database Schema is consistent if each of its classes is consistent. The consistency proof of a specialised class can be reduced to the correctness proof of specialisations and specific operations. 3.1.1 An Example of Conceptual Schema The following syntactic forms are used to specify a schema supported by the Structured Database Schema: class name l of GIVEN-SET with (att-list; const; op-list) class name2 is-a* name 1 with (att-list; const, op-list) The former is the basic constructor used to specify the root class of a Structured Database Schema. The latter is used to specify the remaining classes. Within these forms, name1 and name2 denote the class names, att-list, const and op-list denote respectively the attributes, the application constraints and the operation list of the class name1 in the former specification and of the class name2 in the latter specification. The class constraints are implicitly specified with the class constructor. To specify application constraints and other operations on the specialised class and further to specify classes in multiple inheritance, enriched syntactic forms not provided in this paper must be introduced. The specification of an ASSO conceptual schema is provided in Figure 7. This Figure describes an example where the database maintains information about: > a set of persons and their income, > a subset of working persons and their salary, > a subset of students and their identifier •
the income of each person is greater then or equal to 1000; the salary of each employee is greater then or equal to 500; each student has a unique identifier;
E. Locuratolo /Designing Methods for Quality
• •
information is added when a new person is inserted in the database. This is specialised both when the person is employed and when the person becomes a student; information is removed from the database when a person dies.
Structured Database Schema Database class person of PERSON with (income:N; person Vp (p e person => income(p) > 1000) init.personQ = person, income :=0, 0 ; new.person(pers, i) = PRE perse PERSON-person A i >1000 THEN ADD person(pers,i) END del.person(pers) = PRE pers e person THEN REM person (pers) END) class employee is-a* person with (salary: N; employee Ve (e e employee =* salary(e) S 500) init.employeeO = employee, salary := 0, 0; new.employee( pers, sal) = PRE sal>500 THEN ADD employee( pers, sal) END) class student is-a* person with (identifier: N; student Vsl, s2 (si e student A s 2 e student A si * s2 => identifiers 1) * identifiers!)); init.studentQ = student, identifier := 0, 0; new.student(pers) = ANY m WHERE m € N A m g ran(identifier) THEN ADD student(pers,m) END END) Fig. 6: A Conceptual Schema
The employee class (resp. student class) includes the init.employee (resp. init.student) and the new.employee (resp. new.student) specialisations. No other specific operation has been specified. The given conceptual schema shows that the specifications supported by the Structured Database Schema provide the advantages of both formal and informal notations while exhibiting the quality features of the object oriented style. Similarly to the informal specifications written using the friendly database conceptual languages, ASSO allows the implicit specification of many details explicitly specified by the formal notations, but, differently from the informal notations, the ASSO specifications permit properties of the model, such as its consistency, to be proved. Further, the ASSO notation is powerful enough to represent pre-conditioned, partial and non-deterministic operations that are given at an abstraction level higher than the level provided by database conceptual languages. As the application constraints in our example involve only variables of single classes, the consistency proof of the whole schema can be reduced to the consistency proof of the class person, of the class employee and of the class student.
288
E. Locuratolo / Designing Methods for Quality
Differently from other approaches of information systems working at conceptual level [20], [24], [19] the Structured Database Schema can be refined to provide logical schemas that can be implemented efficiently. A graphical representation of the proposed conceptual schema and of the specialised class employee are given in Appendix B. Similarly to attributes, the inherited operations are not specified on the conceptual schema, whereas both inherited attributes and operations on the specialised class employee have been represented enclosing the operation name between round brackets and using an index. 3.1.2 Behavioural refinement The behavioural refinement of a class name1 with operations op1, precondition Pre0pl and application constraint const is a stepwise approach which weakens the precondition Pre o p l , and/or reduces non-determinism and/or partiality. In order to guarantee correctness, at each step, a class with more implementation details is proposed and the simple proof obligations are proved for the operation op1 that is refined into operation op2 [13]. The class behavioural refinement is correct if each step of behavioural refinement is correct. As an example, let us consider the class student in Fig. 6 with the non-deterministic operation new.student. This associates with each new student one and only one identifier that have not been used before. A possible step of behavioural refinement consists in associating with each new student the maximum of all existing identifiers incremented by one [31]. A Structured Database Schema behavioural refinement is a modular refinement of classes i.e.; if a class is refined, the entire Structured Database Schema is refined. 3.2 Revisited Partitioning The Revisited Partitioning can be interpreted as the decomposition process of a behaviourally refined Structured Database Schema. This is recursively decomposed into a Structured Database Schema1 and a Structured Database Schema2 [28], [34]. The root classes of these two Structured Database Schemas define a partition of the original root class and the following properties are satisfied: •
•
• •
the names of the root classes represent the revisited partitioning. The former root class is defined by the set difference between the objects of the original root class and the objects of the first specialised class. The latter root class is defined by the intersection of the original root class objects with the objects of the first specialised class; only attributes, application constraints and operations of root class name1 are taken by the root class name1-name2; whereas the attributes, the application constraints, and the operations of both the classes are taken by the root class name1* name2. All this information can be specified implicitly; the intermediate stages of revisited partitioning are not required; no proof obligation needs to be generated and proved in order to guarantee the process correctness.
After a step of behavioural refinement for the class student, the conceptual schema in Figure 6 has been decomposed into two: the Structured Database Schema1 with root class person-employee and the Structured Database Schema2 with root class person* employee. The former root class takes only attributes, application constraints and operations of the class person, whereas the latter root class person* employee takes attributes, initialisation, and constraints of both the classes person and employee. The operations are parallel compositions of the corresponding operations on the classes person and employee. Each structured database schema also takes a copy of the class student implicitly splitting this class between the partition of person. With a further step of decomposition, four disjoint classes are obtained. These are recomposed to define a specialisation hierarchy in which each object instance belongs to one and only one class. The object schema specifies more information with respect to the
E. Locuratolo /Designing Methods for Quality
289
conceptual schema since the Revisited Partitioning explicates all the class intersection implicitly specified into semantic data models. Structured Database Schema 1 Database class person-employee of PERSON with (income:N; person Vp (p e person-employee =* income(p) S 1000) init.person-employee () = person, income :=0, 0 ; new.person-employee (pers, i) = PRE pers€ PERSON-(person-employee) A i >1000 THEN ADD person-employee (pers,i) END del.person-employee (pers) = PRE pers e person-employee THEN REM person-employee (pers) END) class student is-a* person-employee with (identifier: N; student Vsl, s2 (si e student A s 2 e student A si # s2 => identifiers 1) * identifier(s2)); init.studentO = student, identifier := 0, 0; new.student(pers) = ADD student (s, max(ran(identifier))+1) END) Fig.7: A revisited partitioning decomposition - first schema
Structured Database Schema! database class person • employee of PERSON with (income:N, salary:N; person Vp (p e person • employee => income(p) > 1000), employee Ve (e € person • employee => salary(e) > 500); init. person • employee () = person, income, employee, salary :=0, 0, 0, 0 : new. person • employee (pers, i) = PRE perse PERSON-( person • employee) A i >1000 A sal >500 THEN ADD person • employee (pers.i, sal) END del. person • employee (pers) = PRE pers e person • employee THEN REM person • employee (pers) END) class student is-a* person • employee with (identifier: N; student Vsl, s2 (si e student A s2 e student A si # s2 => identifiers 1) * identifier(s2)); init.studentO = student, identifier := 0, 0; new.student(pers) = ADD student (s, max(ran(identifier))+1) END) Fig.8: A revisited partitioning decomposition - second schema
290
E. Locuratolo / Designing Methods for Quality
33 ASSO-B relations The B-Method [2] represents one of the most comprehensive formal methods currently being promoted as appropriate for commercial use. It is based on a model, in the following called B-Machine, which permits both the static and the dynamic aspects of modelling to be specified within the same formal framework. A B-Machine is defined as a state and a set of operations, including an initialisation, which specify declarative state transformations. Dynamic modelling is specified through the generalised substitution language [2], which is powerful enough to represent pre-conditioned, partial and nondeterministic operations. The B-Method consists of two phases: specification and refinement. A development process starts from a consistent specification, i.e., a BMachine that expresses the application requirements without implementation details and applies steps of B refinement to reach a B-Machine, which is near to executable code. At each step, the designer proposes a new B-Machine with more details than the previous one and proves first order logic formulas, proof obligations, to guarantee the step correctness. The relations between ASSO and B [13], [29] have been established in order to define both, an ASSO toolkit which reuse the B-toolkit [31] and a formal theory of ASSO which reuses the B theory [32]. In this paper, the relations between ASSO and B will be captured through the following properties: Property (B-Machine and Class) An Abstract Machine whose initialisation establishes the class constraints and whose operations preserve the class constraints can be associated with each class, and a class can be associated with each B-Machine whose initialisation establishes the class constraints and whose operations preserve the class constraints. If the initialisation establishes the class constraints and the operations preserve the class constraints, the B-Machine state can be seen as restricted to the class state. Property (Class) A Class can be identified with a B-Machine whose state variables are constrained to satisfy the class constraints. As the class state is included in the B-Machine state, not all the B state transformations are class operations, but only those preserving the class constraints. The class formalisation can be given exploiting any model used to formalise the B-Machine restricted to capture all and only those B-Machines that are classes. In the following, we will call class-machine a B-Machine that identifies a class. This definition enables to identify the notions of is-a* relationship and specialisation with those of is-a* relationship between class-machines and of specialised class-machine. Property (Structured Database Schema: first view) A Structured Database Schema can be identified with a set composed by a root classmachine and a finite number of specialised class-machines. The Structured Database Schema provides mechanisms of classification and specialisation which are not available to B [29], [31]. The specifications which can be constructed in ASSO are only those which satisfy the class and the specialisation constraints. This restriction is the key to reduce the consistency proof of the whole schema to the consistency proofs of the class-machines. The Structured Database Schema can be interpreted as a model at a higher abstraction level with respect to a B-Machine, since formal details which must be specified in the B-Machine are implicitly specified in ASSO whereas large obligations which are required to be proved in B can be reduced to a set of small obligations for class-machines.
E. Locuratolo / Designing Methods for Quality
291
Property (B-Machine and Structured Database Schema) A B-Machine whose operations preserve the class and the specialisation constraints can be associated with each Structured Database Schema. It can be proved that not any B-Machine whose operations preserve both the class and the specialisation constraints can be decomposed exploiting the Revisited Partitioning. At this purpose, it is sufficient to show that there are operations allowed in the B-Machine preserving the class and the specialisation constraints, but not allowed in the Structured Database Schema [3], [32]. ASSO performs steps of behavioural refinement in order to result into a Structured Database Schema, which can be partitioned and implemented efficiently on an object system. Property (Structured Database Schema: second view) A Structured Database Schema can be identified with a specialisation hierarchy whose nodes are class-machines and whose links are is-a* relationships between classmachines. Property (Behavioural Refinement) A Structured Database Schema behavioural refinement is a B refinement of classmachines that let the state unchanged while reducing details in the operation specifications. 3.3.1 Tools for ASSO In order to apply ASSO to practical situations, a set of tools that check the notation and prove the consistency of specifications as well as assist the refinement into application code would be necessary. It could be desirable design an ASSO-toolkit for the achievement of quality adopting the same approach employed to design ASSO. The initially desired quality requirements for the ASSO-toolkit are reusability and proof complexity reduction. Reusability, since we will not design the ASSO-toolkit starting from scratch, and proof complexity reduction since we will not prove complex obligations. As relations hold between ASSO and B, it could be desirable to use the existing support tools for B, such as the B-toolkit [6], as a basis for supporting tools of ASSO. Further, as ASSO is a methodology designed at a higher abstraction level than B, it could be desirable to design also the ASSO-toolkit at a higher abstraction level than the B-toolkit. From the reusability point of view, this means that in order to design the ASSO toolkit, not all the B-toolkit functionality are to be exploited, but only specialised functionality, and from the proof complexity reduction point of view, this means that the ASSO-toolkit must be able to avoid the generation of inessential obligations which are required to be proven with the B-toolkit, while permitting the reduction of the essential obligations to small obligations. Translation tool A process of translation from ASSO classes to class-machines able to generate the application constraint obligations while avoiding the generation of the class constraint obligations could be proposed. In order to reach this objective, the translation [28] from an ASSO class to a B-Machine is performed by associating both a base machine and a class machine with each class. The base machine is the means to constraint the state variables to satisfy the class constraints. Once modelled this, a class can be identified with a class machine. The base machine contains the class variables, base operations and initialisation, without invariant, thus generating no proof obligations. The base machine is embedded within a class machine that declares the class operations and asserts the application constraints on the form of invariant. The encapsulation principle of B ensures that only the base operations can modify the class variables. As the base operations preserve the class constraints, this principle ensures that operations on classes preserve the class constraints. Features of ASSO can be reused in the translation process of specialised classes. If application constraints involve only class variables, the generation of the
292
E. Locuratolo / Designing Methods for Quality
consistency obligations for the inherited operations is avoided. Further use of the BToolkit can be exploited to support the behavioural refinement of ASSO. Again, the known properties of ASSO mean that a more restricted set of checks are required than provided by the full generality of the B-Toolkit. 4. Conclusions and Further Developments A metamethod for designing methods for the achievement of quality requirements has been provided and the approach employed to design a methodology of conceptual database design named ASSO has been described. Starting from an initial proposal of method for achieving minimal conflicting quality requirements, more concrete proposals that increase quality are given. A modular design is finally achieved. In order to achieve quality in ASSO, the following modules, called methodological tools, have been designed: • • •
Structured Database Schema, conceptual model able to specify information with flexibility while guaranteeing consistency; Revisited Partitioning, formal method able to refine the ASSO model towards correct and efficient object implementations; Relations between ASSO and B, approach of translation proposed to link a methodology designed at conceptual level with a formal method designed at lower abstraction level.
The provided approach makes it easy to describe ASSO and the methodological tools, furthermore, it favours the achievement of practical results. As an example, the methodological tools can effects the design of support tools for ASSO. Future investigations concerned with quality in ASSO can regard the privacy requirement [7], [8], [9], and [10]. Acknowledgements The author would like to thank her husband Antonio Canonico, her brother-in-law Vittorio Pizzi, the colleagues at IEI of Pisa and the colleagues at IASI of Rome for their help and advice. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9]
S. Abitebul, R. Hull, V. Vianu: "Foundations of Databases", Addison-Wesley Publishing Company. J. R. Abrial, 'The B-Book", Cambridge, University Press 1996. R. Andolina, E. Locuratolo: "ASSO: Behavioural Specialisation Modelling", Information Modelling and Knowledge Bases VIII, H. Kangassalo (Ed.), IOS Press, pp. 241–259, ISSN 0922–6389 C. Batini, S.Ceri, S.B. Navathe: "Conceptual Database Design: An Entity-Relationship Approach" Redwood City, California, Benjamin Cummings 1992. E. V. Berard: "Essays on Object-Oriented Software Engineering" - Volume I Englewood Cliffs, New Jersey: Prentice-Hall 1993 B-Core (UK) Ltd. "B-Technology, Technical Overview". Approach", Redwood City, California: Benjamin Chumming (1996) E. Bertino, S. Jajodia, P. Samarati: "An Extended Authorization Model", IEEE Trans, on Knowledge and Data Engineering,. Vol. 9, No. 1, 85–101. E. Bertino, C. Bettini, E. Ferrari, P. Samarati: "Decentralized Administration for a Temporal Access Control Model", Information Systems, Vol. 22, No. 4, 223–248. E. Bertino, A. Ciampichetti, S. Jajodia, P. Samarati: "Information Flow Control in Object-Oriented Systems", IEEE Trans, on Knowledge and Data Engineering, Vol. 9, No. 4.
E. Locuratolo / Designing Methods for Quality
[10]
[11] [12] [13] [14]
293
E. Bertino, S. Jajodia, P. Samarati, V. S. Subrahmanian: "A Unified Framework for Enforcing Multiple Access Control Policies", Proc. of ACM-SIGMOD International Conference on Manegement of Data, Tucson (Ariz.). G. Booch: "Object Oriented Design with Applications" Redwood City, California: Beniamin Cummings, 1991. D.Castelli, and E. Locuratolo: A Formal Notation for Database Conceptual Schema Specifications. Information Modelling and Knowledge Bases VI, H. Jaakkola (Ed.), IOS Press, 1994. D. Castelli, E. Locuratolo: Abstract machine and database schema. - 1994 Nota Interna B4 34, settembre 1994. D. Castelli, E. Locuratolo, "Enhancing Database System Quality through Formal Design". 4th Software Quality Conference, University of Abertay Dundee & Napier University: 366–359 1995.
[15]
D.Castelli, E. Locuratolo: "ASSO - A Formal Database Design Methodology", Information Modelling and Knowledge Bases VI, H. Jaakkola (Ed.), IOS Press. 145–158. 1995.
[16]
D. Castelli, E. Locuratolo: "Database Design for Quality", Achieving Quality in Software: 238 - 246. 1995. P. P. Chen: "The Entity -Relationship Model: Towards a Unified View of Data" ACM Transaction on Database Systems, l(l):76–84 - 1976
[17] [18]
P. Coad, E. Yourdon: "Object-Oriented Design" Yourdon Press - 1991
[19]
Comic: A system and methodology for conceptual modelling and information construction. Data & Knowledge Enginerring, 9: 287 – 319
[20]
E. Compantangelo, G. Rumolo: An Engineering Framework for Domain Knowledge Modelling, IOS Press, 1998 E W Dijkstra., and S Scholten: Predicate Calculus and Program Semantics,Springer-Verlag
[21] [22]
R.L. Glass: "Building Quality Software", Prentice Hall.
[23]
Information Technology - Software quality characteristics and metrics, 1997
[24]
R. Hull, R. King: "Semantic Database Modelling: Survey, Applications and Research Issues" ACM Computing Surveys 19(3): 201-259 - 1987 M. Jarke, J. Mylopoulos, J.W. Schmidt and Y. Vassiliou "DAIDA: An Environment for Evolving Information Systems", ACM Transactions on Information Systems, Vol.10, N. 1, January 1992, pp.l50
[25]
[26]
E. Locuratolo: "ASSO: Evolution of a formal Database Design Methodology", Proceedings of Symposium on Software Technology, (SoST'97), Buenos Aires, August 12-13, 1997
[27]
E. Locuratolo, F. Rabitti: "Conceptual Classes and System Classes in Object Informatica35(3)::181–210, 1998.
[28] [29]
E. Locuratolo: "Portability as a Methodological Goal". IEI Report B4-98. E. Locuratolo and B.M. Matthews: "On the relation between ASSO and B". Information Modelling and Knowledge Bases, Vammala, Finland, May 1998.
[30]
B. M. Matthews and E. Locuratolo: "Translating Structured Database Schemas into Abstract Machines". In proceedings of the 2nd Irish Workshop on Formal Methods, Cork, Ireland, July 1998
[31]
E. Locuratolo and B.M. Matthews "ASSO: A Formal Methodology of Conceptual Database Design" 4lh International ERCIM workshop on Formal Methods for Industrial Critical Systems.
[32]
Databases", Acta
B.M. Matthews and E. Locuratolo: Formal development of Databases in ASSO and B. In LNCS 1708, pp. 388–410, FM 99; J.Wing, J woodcock, J.Davies (Eds.) Springer-Verlag Berlin Heidelberg. [33] J. Rumbaugh, M. Blaha, W. Premerlani, F. Eddy, W.Lorensen: "Object-Oriented Modelling and Design". Prentice-Hall, Englewood Cliffs 1991. [34] A.M. Spagnolo: Incrementare la qualita' in ambito Basi di Dati. Tesi di Laurea. Universita' degli Studi di Pisa - Corso di Laurea in Scienze dell' Informazione. Relatore: Dott.ssa Elvira Locuratolo. Anno Accademico 1999-2000.
294
E. Locuratolo / Designing Methods for Quality
APPENDIX A Basic operations: operations defined to add objects, remove objects, modify attributes or let the class unchanged. In the following, the axiomatic semantics of basic operations is given: [Op name (par_list)]R (class-name constraint => class-name constraint') A R' where name is the class name, par-list a list of parameters, constraint the class constraint, R a predicate on the class variables, [Op name (par_list)]R the result of applying operation Op to the predicate R and the primed predicates are the predicates after the operation Op.
An example: Add employeefpers, sal):
[Add employee(pers, sal)]R (employeecPERSON A incomee employee-»N)
=* ((employeeu(pers))£PERSON A
A R((employeeu(pers), (salary[op(par_list)]R
[CHOICE op(par_list) ORELSE op*(par_list) END]R [op(par_list)]R A [op*(par_list)]R [ANY y WHERE P THEN op(parJist)END]R
Vy P => [op(parJist)]R
an example of correctness proof: pre A const => [new.emp(pers, sal)] const'
perse PERSON-employee A sal>500 A Ve(ee employee =* salary(e)>500)
Ve (ee(employeeu{(pers)}) =» salary500) «* TRUE
E. Locuratolo / Designing Methods for Quality
295
APPENDIX B Structured Database Schema: specialisation hierarchy whose nodes represent classes; a label connoting the class name and denoting the class objects is associated with each node, whereas a list of attribute names included by brackets is associated with each label. Operations have been represented as small ellipses on the nodes including the name operation. A link between both formal and informal notation has been established to specify information. The representation implicitly specifies that the set of objects of a class is a subset of a given set, whereas each attribute is a function defined in the specified set of objects and assuming values in an implicitly specified given set and that each operation is a predicate transformer, i.e., a function from predicates to predicates. Similarly to attributes, operations are inherited in order to satisfy the constraints that define the specialisation hierarchy. The graphical representation of the Structured Database Schema has also been suggested by the B-machine representation. The specialised class employee represented below evidences the inherited operations which must be composed with the specialisations.
/
\
Conceptual Schema
Specialised Class Employee
296
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
Concept Descriptions for Text Search J0rgen Fischer Nilsson Informatics and Mathematical Modelling Technical University of Denmark Abstract: This paper summarises the notion of concept descriptions introduced in the ONTOQUERY project as a means to conduct content-based search in text data bases. Phrases of the text source are compiled into descriptive terms, which in turn are organised according to a formal ontology. Relevant sections of the text can then be retrieved by indices to the text in the formal ontology, which constitutes a conceptual model for the text domain. Keywords: Conceptual models, formal ontology, text search, relation-algebraic logic.
1
Introduction
Search in text sources such as text data bases and web pages is commonly done by retrieval based on occurrence of words or logical patterns of words. Such a primitive search based on comparison of strings of characters suffers from a number of drawbacks: A more sophisticated search directed towards understanding of the meaning of words has to take into account synonyms and morphology, since say, "vitamins" and "vitamin" possess the same conceptual content. Meaning-oriented search should preferably also take into account conceptual relationships between terms such as "vitaminC" being a kind of "vitamin" etc. Ideally the meaning of the text should be extracted by means of linguistic analysis supported by appropriate representation of domain-specific knowledge as well as common world-knowledge. In order to approach this goal in the research project ONTOQUERY [1, 10] text files are preprocessed in order to extract descriptions of the conceptual content of the text. Focus is on noun phrases, which are identified and analysed linguistically in order to compile concept descriptions. A mapping noun phrase t-+ concept description is established through a linguistic analysis assisted by a lexicon and ontology. Ambiguities (lexical and structural ones) may give rise to multiple descriptions for a phrase. During the processing of the text source moreover an index is built:
J. Fischer Nilsson / Concept Descriptions for Text Search
concept description
297
H-> links into information source.
With this mapping an index search in the source text can be carried out by stating a noun phrase as a query to be processed linguistically as the text source. The concept descriptions are organised in a structure constituting a formal ontology for the domain functioning as a high level conceptual model of the application as indicated below. Thus search can also be carried out by retrieval via the ontology of conceptually similar adjacent descriptions. The ONTOQUERY project currently focusses on the domain of nutrition as described in the Danish National Lexicon (i.e., the Danish national encyclopaedia). This encyclopaedic source comprises around 100 articles concerning this subject of a length ranging from a few lines to a few pages. As usual, the articles are at disposal in the source in alphabetical order by the header key word.
2
Concept Descriptions
As an example consider the noun phrase Mangel pa Dvitamin om vinteren (i.e., Deficiency of vitamin D in winter). In Danish it can also be paraphrased as Dvitaminmangel om vinteren using a compound noun. WRT : vitaminD 1 ' TMP : winter
These phrases can be translated into the concept description lack , v
which is meant to capture the essential conceptual content of the phrase. It is stated here in the form of a so called (typed) feature structure cf. [3].
2.1
Skeleton Ontology
A formal ontology comprising the terms of the application domain is established as basis for formation and comparison of concept descriptions. The formal ontology forms a taxonomy in which application terms are organised according to the ISA ordering relationship. The ISA conceptual inclusion relationship forms a partial order. The diagram
shows a concept c with inclusion arcs to its immediate sub-concepts (specialisations) and its immediate superconcepts. For instance vitaminD is a sub-concept of vitamin. In [6, 7] we describe a relation-algebraic logic (concept algebra) in which the skeleton ontology becomes a distributive algebraic lattice transcending hierarchical classifications.
298
2.2
J. Fischer Nilsson I Concept Descriptions for Text Search
Core Ontology
The top of the ontology comprises major general categories such as material concept CONCR, and abstract concepts such as EVENT and STATE. The CONCR category is divided into substances (linguistically corresponding to us of mass terms) and (countable) physical objects : CONCR
/
\
STUFF
OBJECT
\
/*
PORTION
As it appears these two material categories overlap, giving rise to a cross-category > "portions" in the lattice, jointly possessing the qualities of the superior categories The application-specific ontology comes about by extending such a core ontolog through further sub-divison of the core categories with application-specific term For instance the concept of vitamin enters below STUFF, with subconcepts such i vitaminB, which further specialises into vitaminBl, vitamin B2, etc. Individual concepts are situated at the bottom of the core structure.
2.3
Conceptual Relationships
In addition to the conceptual inclusion relationship a number of other univers relationships between concepts are identified, cf. also conceptual graphs [11]. Belc is shown a selection of binary semantic relations (roles) pertaining to the conceptu analysis of noun phrases with prepositional subphrases. Role TMP
Legend temporal aspects (generic role) location, position LOG FOR (inverse of BMO) purpose, function BMO (inverse of FOR) by means of, instrument, via with respect to WRT characteristic (property ascription) CHR CBY (inverse of CAU) caused by CAU (inverse of CBY) causes CMP (inverse of POF) comprising, has part POF (inverse of CMP) part of
3
Combining of Concepts
It is a key point that compound concept descriptions can be formed by means the above binary roles used as attributes in
J. Fischer Nilsson / Concept Descriptions for Text Search
299
which attributes a concept with relation r to concept description (p. Multiple attribute-value entries may be attached to a concept term as already exemplified and conforming with the notion of feature structures. Attributions may also be nested as in c[r : c'[r' : c"]]. The theoretical foundation for these concept descriptions is extended relation algebraic logic, cf. [2], which combines an algebra of concepts with an algebra of binary relations. The attribution (p[r : ip] is understood as the lattice meet (infimum) of
3.1
Derivation of Concepts
It is natural to understand generation of concept descriptions as a syntactical derivation process (cf. [9]) in which a general category is stepwise specialised as shown below for the above example top of ontology 4 state 4 lack
4 lack (WRT : stuff) U lack (WRT : vitamin) 4 lack (WRT : vitaminD)
U lack (WRT : vitaminD)
(TMP : winter)
Each of the descriptions also represent a node in the transfinite ontology.
300
4
J. Fischer Nilsson / Concept Descriptions for Text Search
Restrictions on Combination
The combining of terms into descriptions is constrained by ontological sortal and structural restrictions as addressed in [4, 8], giving rise to a notion of ontotypology.
Acknowledgment The ONTOQUERY project is funded 1999 - 2004 by the Danish National Science Foundation under the Informatics programme. I express my gratitude to the members of the ONTOQUERY team for contributing to a challenging interdisciplinary research environment.
References [1] Andreasen,T., Nilsson, J. Fischer & Thomsen, H.Erdman: Ontology-based Querying, in H.L. Larsen et al. (eds.) Flexible Query Answering Systems, Recent Advances, Proceedings of the FQAS'2000 conference, Physica-Verlag (Springer-Verlag), 2000. [2] Brink, C., Britz, K., and Schmidt, R.A.: Peirce Algebras, Forma/ Aspects of Computing, Vol. 6, 1994, pp. 339–358. [3] Carpenter, B., The Logic of Typed Feature Structures, Cambridge University Press, 1992. [4] Jensen, P. Anker, Nilsson, J. Fischer, & Vikner, C., Towards an Ontology-based Interpretation of NP's, in [5]. [5] Jensen, P. Anker and Skadhauge, P. (eds.): Ontology-Based Interpretation of Noun Phrases. Proceedings of the First International OntoQuery Workshop. Department of Business Communication and Information Science. University of Southern Denmark - Kolding, forthcoming. [6] Nilsson, J. Fischer: An Algebraic Logic for Concept Structures, Information Modelling and Knowledge Bases V, IOS Press, Amsterdam, 1994. pp. 75-84. [7] Nilsson, J. Fischer: A Conceptual Space Logic, in Proceedings of the EuropeanJapanese Conference on Information Modeling and Knowledge Bases, Iwate, Japan, May, 1999, republished in E. Kawaguchi et al. (eds.) Information Modelling and Knowledge Bases XI, IOS Press/Ohmsha, Amsterdam, 2000. pp. 26–40. [8] Nilsson, J. Fischer: A Logico-Algebraic Framework for Ontologies ONTOLOG, in [5]. [9] Nilsson, J. Fischer: Are there Ontological Grammars ?, this volume. [10] ONTOQUERY project: http://www.ontoquery.dk [11] Sowa, J.F.: Knowledge Representation, Logical, Philosophical, and Computational Foundations, Brooks/Cole Thomson Learning, 2000.
Information Modelling and Knowledge Bases H. Kangassalo et al. (Eds.) IOS Press, 2002
XIII
301
Towards Cost-Effective Construction of Classification Models Bostjan Brumen1,2, Tatjana Welzer1, Hannu Jaakkola2, Izidor Golob1, Ivan Rozman1 1
University of Maribor, Faculty of Electrical Eng. and Computer science, Slovenia 2 Tampere University of Technology, Pori School of Technology, Finland e-mail: l{brumen, welzer, izidor.golob, i.rozman}@uni-mb.si2{hj, brumen}@pori.tut.fi Abstract. Data hide important knowledge and this fact has been recognized as a very important motivating factor for data mining, which is a part of a knowledge discovery process. It can be conducted using several paradigms. One of tasks of data mining is classification, where the goal is to build a model based on existing data and use it on new data. Classification models, when used, make errors, but they tend to decrease as the amount of data used to build it increases. A plot of error rate versus amount of data used is called a learning curve. Since classification is regarded as a form of supervised learning, the training data used to build the model need to be manually prepared; the preparation is a costly process. It is not possible to tell the amount of data needed or the error rate in advance. In the paper, an approach is presented where the error rate can be estimated by observing the learning curve.
1. Introduction Data hide important knowledge. The amount of data available today has increased to such an extent that new techniques are required for knowledge extraction. Data mining (DM) is a relatively young discipline and its results are very promising. It is a part of a comprehensive process, knowledge discovery in databases, defined in [1] as a "nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data". To accomplish a data-mining task, several techniques were devised (or reintroduced to the field). They are a mixture of machine learning, artificial intelligence, statistics, and many other approaches. Pre«*^fc Transprocessing J_ ^ formation
Data
Target Data
| Preprocessed \ Transformed | | Data I Data I
Patterns
|
Knowledge
Figure 1: Knowledge discovery in databases [1]
One of the techniques used is classification, which provides a mapping from attributes (observations) to pre-specified groupings or classes. Classification is considered to be supervised learning. The records (also called examples) must belong to a small set of classes that an expert has predefined. These examples are called also a training set. The induced model consists of patterns; essentially generalizations over the records from training set that are useful for distinguishing the classes. Once a model is induced, it can be
302
B. Brumen et al. / Towards Cost-Effective
Construction
used to automatically predict the class of other unclassified records. The techniques often create a set of IF-THEN rules or decision trees. A generated model can classify a record into a wrong class. The number of misclassified records versus the total number of records is the classification error of the induced model. If the induction algorithm used to build a model is capable of capturing the patterns that underlie the data then the error rate tends to decrease as the number of examples used to build the model increases. A plot of error rate versus sample size (number of records used) is called a learning curve. In general, we cannot foretell whether the chosen approach will be able to capture the patterns or not - in other words, whether a learning curve will be decreasing or not. We continue the work outlined in [2]. The approach we developed tries to use the data prepared by the expert in a cost-effective way. Since there is no warranty that the model will have or reach a certain level of error rate, we need to be able to estimate it early in the process by observing the learning curve. The problems that arise are: • When to terminate the process of building the learning curve, • How to construct the full learning curve based on partial results. In our contribution, we will give a method of constructing the full learning curve based on partial results. Additionally, we will outline the conditions to be met for the process of building the learning curve to terminate. 2. Adaptive incremental approach The current research has shown that most data mining algorithms learn fast in the beginning of learning process and much slower later in the process [3], [4]. This resembles the way humans learn [5]. Decision tree performance, for example, was successfully modeled by the power law [3]. We too decided to model the learning curve with a power law, but have modified the original power law form in Equation 1: y = b • Xc
Equation 1
and propose a new one (Equation 2): Equation 2 = a +b - x c The reason we added the constant a in the equation is because we believe that the learning algorithm never acquires 100% accuracy (0% error) on the data. We believe the performance of the learning algorithm approaches asymptotically to a certain (but unknown) number due to several reasons (e.g. noisy data). Additionally, the form we use is still valid even in the case when the error rate actually reaches 0%. The form of an ideal learning curve based on power law is presented in Figure 2.
Figure 2: An ideal learning curve
B. Brumen et al. / Towards Cost-Effective
Construction
303
Several other models (such as linear, log-linear or exponential) can be used as well, but the selection of the appropriate model can be made during the process based on the ChiSquared fitness. But since the power law is the most appropriate and successful [3], we decided to use it as a default model. Now that the model for learning curve is known, we need to obtain the data for constructing the learning curve. To chart a learning curve it is desirable to measure the performance repeatedly as the amount of training (i.e. sample size) is increased. For this purpose, Lehnert and McCarthy [6] introduced a variant of a k-fold cross validation, called incremental k-fold cross-validation. Unlike the regular k-fold cross-validation procedure, theirs tests an algorithm for a fixed amount of iterations before the training is completed, but it does not clear memory until then, so we get to see the cumulative effects of training. Their procedure trains on just one item, then tests the model on fifty items from the test set; then it trains on a second item and tests again, and so on, until it has trained on ten items. After that, the procedure adopts increments of 10 training items until has trained on 100 items, and increments of 50 thereafter (until 450; the total database size was 500 items). Finally, after 26 trainings, the procedure clears memory, forgetting everything learned during training and repeats itself kfolds. At the end of the procedure, average of k performance measures is taken. We modify the McCarthy's and Lehnert's procedure to be able to cope with real-life requirements. First, the database size (i.e. how many manually classified examples are available) is generally not known in advance; only the current sample size is known. Thus, we cannot foretell how many iterations will be needed. Additionally, the learning phase, and especially the testing phase should use as many items as possible, not a fixed amount. Finally, the procedure should stop under several conditions: if the costs are exceeded or the model's performance is satisfactory, or an additional effort (in preparation of samples, execution time) will not contribute significantly to the model's performance. For this reason we developed an adaptive incremental k-fold cross-validation method (Procedure 1), which takes into the account the given requirements. Procedure 1: Adaptive incremental k-fold Cross-Validation Repeat 1. Shuffle the items in the set 2. Divide the set into k equal parts of size n 3. Do i = 1 to k times: a. Call the 1th set of n samples the test set and put it aside; the remaining k - 1 sets is the training set. b. Train the system on the training set; test the system on the test set, record the performance. c. Clear memory, that is, forget everything learned during training. 4. Calculate the performance of training averaged over the k test sets. 5. Increment the size of the training set (i.e. add additional samples to the existing training set). Until (performance is satisfactory) and/or (appropriate learning curve)
The steps 1-4 are identical as in regular incremental k-fold cross-validation. Step 5 in our approach is added and is based on the selected sampling schedule, as opposed to the fixed one used by Lehnert and McCarthy. The final exit conditions are based on user's expectations regarding the performance of the model built and on the properties of a learning curve built during the procedure.
304
B. Brumen et al. / Towards Cost-Effective Construction
If the model has a reasonable (acceptable) error rate, the procedure stops. The acceptance level is set by a user (e.g. 15% of the items may be misclassified). The exit condition based on the properties of a learning curve is not as trivial. Namely, the learning curve is not decreasing in every schedule point. Rather, the error can increase and decrease for various reasons, the most important being local variance. Thus, we need to have the distance large enough between the points in schedule. For this reason we propose the following schedule S={k*10i; i0 A 0k10}. In this schedule, the distance is increased by a factor 10 after every 10 points. Such a setting prevents from spending too much time with sample sizes which are simply too small for a learning algorithm to build a good model. Once the distance is such that the error rate is decreasing, the exit condition is met. We require that the error rate is decreasing in the last three schedule points, or formally: ei-2 ei-1
ei
Equation 3
Moreover, we require that the plot of error-rate is concave up (as it is in an ideal learning curve). Formally, the Equation 4 must be fulfilled. Y A
i-l
—A Y
i-2
Y A
i
—A Y
Equation 4
i-l
In Equation 3 and Equation 4, the ek denotes the error rate in step k and Xk the corresponding sample size. Now that we have answered the first identified problem (see section 1), we need to answer the second one - construction of the full learning curve based on partial results. This can be solved by using the very same adaptive incremental approach. The obtained error rates and the corresponding sample sizes obtained during the adaptive incremental approach can be used for the construction of the full learning curve. The error rates ek and the corresponding sample sizes Xk are the y and x in Equation 2, respectively. The unknown parameters to be fit to the equation are the constants a, b, and c. This can be done by using one of the non-linear least squares methods [7]. Once the parameters are obtained, the full learning curve can be built by arbitrary selecting a sample size (x) and calculating the hypothetical error rate (y) at this point. 3. The results of adaptive incremental approach We have built a prototype and tested the adaptive incremental approach presented in Section 2 on several datasets from UCI Knowledge Discovery in Databases Repository [8], and from UCI Machine Learning Repository [9]. For the classification algorithm, we have used C5.0 [10]. We will present the results of the approach on the "Forrest" dataset only due to the lack of space. The task with this dataset is to model the type of forest that covers a certain area. The data are classified into seven classes - seven different tree types that are prevailing in the area - based on 12 measures, but 54 attributes (10 quantitative variables, 4 binary wilderness areas and 40 binary soil type variables). The size of this dataset is 581012 records, or 72.2 megabytes. The results of the process are shown in Table 1. All error rates are in percents. The error rates are shown for only 300000 records, due to prohibitive run times at xi300000. The exit criterion was met at xi=80000 (shaded area of the table). At this point, the error rate ei was 6.3%; at the two previous points, the error rates were 6.85% and 6.48%, respectively. Thus, the conditions in Equation 3 and Equation 4 were met.
305
B. Brumen et al. / Towards Cost-Effective Construction
Table 1: Results for "Forrest" dataset 80 90 100 200 300 400 500 600
20,5400 21.3200 18.8000 19.3500 21.9000 21.7500 18.3600 19.4500
700 800 900 1000 2000 3000 4000 5000
20.8506 21.6200 21.2300 21.7600 21.2400 19.2500 17.2300 17.6700
6000 7000 8000 9000 10000 20000 30000 40000
19.1000 19.8300 19.8500 20.4400 20.2500 14.8600 11.3100 8.9000
50000 60000 70000
7.7200 6.8500 6.4800
. ;800Wfe.V.&JQflK&iSt 90000 100000 200000 300000
6.1400 5.7800 5.3900 5.3700
One can notice that the error rate at this point is only 6.3%, so the user's requirements could have been met earlier if she has set the goal at 10% or lower. But, if the limit were lowered to 5%, the condition would not be met. The parameters a, b, and c of the learning curve obtained from its model were 6.3, 425 and -0.4728, respectively. Thus, the full learning curve can be built. 4. Conclusion and future work We have presented two problems when building classification models, specifically the problems of constructing a learning curve - when to stop building it and how to construct the full learning curve based on partial results. We have used an adaptive incremental k-fold cross-validation to obtain and construct the learning curve. The stopping criteria are divided in two groups - user defined and learning curve specific. In the former, the criteria are the costs of the process and the performance of the model built. In the latter, the curve needs to be decreasing and it has to be concave up. The construction of the full learning curve is based on the points obtained during the early phases of the process. The parameters of a modified power law are to be calculated. Based on the parameters, the full learning curve can be constructed. The ideas presented are a part of the approach, which needs to be extended with additional features: the selection of the appropriate learning curve model during the process, the estimation of costs, both algorithm run-time (time to build a model on given sample size) and costs of preparing the samples (manual classification). We will test our approach on several datasets using different paradigms. Since the model-building algorithm to be used in the approach is arbitrary, a special challenge is to use algorithms capable of classifying textual data. Acknowledgements This work was supported by CIMO Finland and the SOCRATES/ERASMUS program of the EU. References [1]
Fayyad, Usama; Shapiro-Piatetsky, Gregory; Smyth, Padhraic; Uthurusamy, Ramasamy; (Eds.): Advances in Knowledge Discovery and Data Mining, AAAI Press, 1996.
[2]
Brumen, Bostjan; Jaakkola, Hannu; Welzer, Tatjana; Rozman, Ivan: Predicting Minimum Sample Size in Data Mining Tasks: Additive Approach. In [Jaakkola, Hannu; Kangassalo, Hannu (Eds.)]: Proceedings of The 10th European-Japanese Conference on Information Modelling and Knowledge Bases, Saariselka, Finland, pp. 264-270, 2000
306
B. Brumen el al. / Towards Cost-Effective
Construction
[3]
Frey, Lews J.; Fisher, Douglas H. Jr.: Modeling Decision Tree Performance with the Power Law, In (Heckerman David; Whittaker, Joe; Eds.): Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics, San Francisco, USA, Morgan Kaufmann, Inc., 1999.
[4]
Harris-Jones, Chirs; Haines, Troy L.: Sample size and misclassification: Is more always better?, Working paper AMSCAT-WP-97–118, AMS Center for Advanced Technologies, 1997.
[5]
Anderson, John R.; Schooler, James L.: Reflections of the environment in memory, Psychological Science, Vol. 2, No. 6, pp. 396–408, 1991.
[6]
Lehnert, Wendy G.; McCarthy, Joseph; Soderland, Stephen; Riloff, Ellen; Cardie, Claire; Peterson, Jonathan; Feng, Fang Fang; Dolan, Charles; Goldman, Seth: Umass/Hughes: Description of the CIRCUS system as used for MUC-5. In Proceedings of Fifth Message Understanding Conference, Morgan Kaufmann Publishers Inc., San Mateo, USA, pp. 277–291, 1993
[7]
Mathworks: Optimization Toolbox for use with Matlab, Version 2, Mathworks Inc, USA, 1999
[8]
Bay, Stephen D.: The UCI KDD Archive (http://kdd.ics.uci.edu, last visited 19-Jan-2001). Irvine, CA: University of California, Department of Information and Computer Science, 1999
[9]
Blake, Catherine L., Merz, Cristopher J.: UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html, last visited 19-Jan-2001]. Irvine, CA: University of California, Department of Information and Computer Science, USA, 1998
[10]
Quinlann, Ross J.: C5.0, http://www.rulequest.com (last visited 19-Jan-2001), 2000.
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. {Eds.) IOS Press, 2002
307
Deriving Valid Expressions from Ontology Definitions Yannis Tzitzikas1,2, Nicolas Spyratos3, Panos Constantopoulos1'2 1
Department of Computer Science, University of Crete, Greece 2 Institute of Computer Science, ICS-FORTH Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France Email : {tzitzik, panos}@ics.forth.gr,
[email protected] Abstract. In this paper, we consider an ontology as a set of terms together with three binary relations on terms, called synonymy, subsumption and crossreference. We present a model-theoretic interpretation of ontologies and we show that this approach although appropriate for deciding the soundness of an ontology, is not sufficient for providing a sound and complete inference procedure for checking the validity of expressions in an ontology. Therefore a different "proof-theoretic" approach which allows checking the validity of expressions is also presented.
1
Introduction
Research on ontologies is becoming increasingly widespread in the computer science community and its importance is being recognized in many diverse research fields and application areas, such as conceptual analysis, conceptual modeling, information retrieval, information integration, agent communication, semantic annotation (see [11] for a review). There are numerous definitions of what an ontology is, revolving around the basic idea that "an ontology is a consensual and formal specification of a vocabulary used to describe a specific domain" (see [10] for a review). In this paper, we consider an ontology as a set of terms together with three binary relations on terms, called synonymy, subsumption and cross-reference. These ontologies resemble to the structure of those linguistic ontologies (such as thesauri) which can accept a clear semantic interpretation. They also resemble to the organizational structures employed by web site providers, in order to organize their contents and to provide browsing and retrieval services, e.g. the subject hierarchy of Yahoo!. We present a model-theoretic interpretation of ontologies inspired by the approach of [25], [26] which allows checking whether a linguistic ontology is appropriate for tasks requiring a semantic interpretation of ontologies. We show that the model-theoretic interpretation although appropriate for deciding the soundness of an ontology, is not sufficient for providing a sound and complete inference procedure for checking the validity of expressions in an ontology. Therefore a different "proof-theoretic" approach is also presented. In section 2 we define our ontologies and in section 3 we discuss their implementation. In section 4 we discuss the soundness of an ontology and in section 5 we present the query expressions that we consider. In section 6 we investigate the model theoretic interpretation as an inference procedure, while in section 7 we present an alternative inference procedure. In section 8 we compare our approach with other knowledge representation and reasoning approaches, and finally, in section 9 we review related work and conclude the paper.
308
Y. Tzitzikas et al. / Deriving Valid Expressions from Ontology Definitions
2 Ontologies Intuitively, an ontology consists of a set of words, or terms, and a set of relationships between the terms. Each term describes some aspect of a set of objects of interest and the relationships between terms reflect corresponding relationships between the objects. The assignment of meaning to terms and the recording of relationships are the outcome of a formal process and enjoy the consensus of some community. We conceptualize the world as a set of objects, that is, we assume an arbitrary, but fixed, domain of discourse and a corresponding set of objects Obj. The only constraint that we impose on the set Obj is that it must be a denumerable set. Def 2.1 A terminology is a finite set of words pertaining to the objects in a specific domain of discourse. The elements of a terminology are called terms. o For example, the following set is a terminology for describing the content of university courses: T = { mathematics, history, computer science, humanities, ...} Here the underlying set Obj is the set of course identifiers, such as M315, H213, CS265 and so on. The set of objects described by a term is the interpretation of that term. Def 2.2 Given a terminology T, we call interpretation of T over Obj any function / : T — 2Obj. o Thus each term t denotes certain objects in Obj and its interpretation I(t) is the set of objects to which the term t is correctly applied. In our discussion the set Obj will be usually understood from the context. So we shall often say simply "an interpretation" instead of "an interpretation over Obj". In the previous example, an interpretation of T by a given agent might assign to the term mathematics all course identifiers beginning with M, to history all course identifiers beginning with H, to computer science all course identifiers beginning with CS, and so on. However, note that different agents may attach different interpretations to the same term. Strictly speaking, interpretation, as defined above assigns to a term denotational or extensional meaning ([30]). As we explained earlier, an ontology comprises not only a terminology, i.e. a set of terms, but also relationships between those terms. For the purposes of this paper, we shall consider three kinds of relationships, as stated in the following definition: Def 2.3 An ontology is a quadruple A = (T, ~, x, •) where T is a terminology and ~, -, • are binary relations over T such that: "~" is reflexive, symmetric and transitive (i.e. an equivalence relation). Conceptually, "~" means "synonym or, e.g. math ~ mathematics, computer science ~ informatics. "•" is irreflexive, transitive, and asymmetric. Conceptually, "x" means "subsumed by" or "isA" or "covered by", e.g. math - sciences, canaries - birds. "•" is reflexive and symmetric. Conceptually, "•" means "cross-reference" or "related to" in the sense of non-disjoint interpretations e.g. math • computer science, computers • electronics, pets • parrots.
o Figure l.(A) shows graphically an ontology for describing the content of university courses, where terms are represented as nodes and term relationships as labeled edges. Observe that edges labeled by "x" are oriented (because "-" is asymmetric), while edges labeled by "~", or "•" are not oriented (because the relations "~" and "•" are symmetric). Many web catalogs, such as Yahoo!, employ an ontology in order to organize their contents (for more see [21]) and Figure l.(B) shows graphically an ontology for describing the contents of web pages advertising electronic products. Now, let A = (T, ~, x, •) be an ontology and let / be an interpretation of T. Clearly, in order for / to make sense in the ontology, it must also reflect all relationships that exist between terms. This is precisely what is stated in the following definition of a model.
Y. Tzitzikas et al. / Deriving Valid Expressions from Ontology Definitions
309
135 mm c. [[Single lens reflexclFwin lens reflex c. I llVc
[suul
Figure 1: Graphical representation of two ontology definitions Def 2.4 Let A = (T, ~, -, •) be an ontology. An interpretation / of T is called a model of A if the following hold, for any terms t, t' in T: (i) I(t) jt 0, i.e. every term is associated with at least one object (i.e. with a "witness") (ii) if t ~ t' then I(t) = I(t'), i.e. synonymy is interpreted as set equality (iii) if t x t' then I(t) c I(t'), i.e. subsumption is interpreted as strict set inclusion (iv) if t • t' then I(t)r\ I(t') ^ 0, i.e. term cross-reference is interpreted as nonempty intersection Let us now discuss these constraints. Recall that an interpretation, as defined in Def. 2.2 assigns to a term extensional meaning1, and an ontology definition actually specifies a number of constraints that must hold between these meanings. However we must clarify that the interpretation of a term does not refer to its "extension" in a particular database, but it refers to its extension in the whole domain Obj, and this is the reason for requiring I(t) ^ 0 for each term t and model I. On the contrary, I(t) = 0 would mean that term t cannot by applied to any of the objects in Obj, but in this case, t would be useless and should not be included in the ontology definition. Another remark concerns the subsumption relation "-". We interpret subsumption by strict set inclusion (c) and not by set inclusion (C) as it is commonly done (see extensional subsumption [17]). Roughly, if subsumption was interpreted by set inclusion (C) then a "cycle" (cycles will be defined formally in section 4) might induce that a term t is synonym to a term t', although t may have been declared that subsumes t'. However this phenomenon does not fit well to the "axiomatic nature" of the recorded subsumption relationships of an ontology. Nevertheless, strict set inclusion (c) introduces problems when we want to extend a stored interpretation / to a model /' (which is needed in query answering in a materialized ontology). This issue is discussed in a subsequent article on materialized ontologies. Def 2.5 An ontology A is called sound if there is a model of A, otherwise it is called unsound. o A
If A is a sound ontology we will write A |=, while if A is an unsound ontology we will write . We will return to the issue of soundness in section 4.
3 Implementing an Ontology The definition of an ontology can be implemented using any of a number of data models. For example, using the relational model [7], we can implement the definition of an ontology as a database schema consisting of four tables, one for storing the terminology and the others for storing the three relationships of the ontology: TERMINOLOGY(term-id: I nt, term-name:Str) SYNONYM(term l: I nt, term2:1 nt) ISA(terml: I nt, term2:1 nt) RELATED(terml: I nt, term2:1 nt) 1
In contrast to the intensional meaning of terms (i.e. [14], [5]).
310
Y. Tzitzikas el al. / Deriving Valid Expressions from Ontology Definitions
Note that each term of the terminology is stored in the form of a pair term-id, term-name where "term-id" is an internal identifier. For the purposes of this paper, however, we assume that term identifiers are integers and term names are strings. Each instance of this schema is called an ontology base. The rows of the tables shown of Figure 3 that are written in boldface correspond to the ontology base of the ontology shown in Figure 1.(A). We call an ontology base consistent if it can be completed (using the properties of relations from Def. 2.3) so as to satisfy Def. 2.3. The tables of Figure 3 including the rows in italics constitute the completion of our ontology base. Note that two different ontology bases may have the same completion. However an ontology definition may be inconsistent, i.e. the ontology of Figure 2.(I) is inconsistent since it violates the irreflexiveness of "-", while the ontology of Figure 2.(II) is inconsistent since it violates the asymmetry of "-". Finally the ontology of Figure 2.(III) is inconsistent since the transitive closure of "x" violates the assymetry (and the irreflexiveness) of "•". a— ±— b aO
(ID
(I)
V (III)
Figure 2: Three inconsistent ontologies
TERMINOLOGY term-id term-name Sciences I Mathematics 2 Math 3 Computer Science 4 Databases S Artificial Intelligence 6 AI 7 Philosophy 8 History 9 Humanities 10
SYNONYM term1 term2 3 2 7 6 2 3
RELATE!3
term1
3 5 7
7 I
6 I
4 6
2
2
5
3 4
3 4
5
5 6 7 8 9
6 7 8 9 10
term2 4 6 8
3 5 7
10
Figure 3: An ontology definition as a relational schema
4
Sound and Unsound Ontologies
The definition of an ontology for a given application domain involves collecting terms that are appropriate for that domain and reaching an agreement as to the relationships that hold among the terms. It is therefore natural to expect that some ontologies may not be sound. For example Figure 4.(A) shows three unsound ontologies. The ontology of Figure 4.(A).(I) is unsound, because there is no interpretation / such that I(a) c I(b) and I(a) = I(b). Similarly, the ontology of Figure 4.(A).(II) is unsound, because there is no interpretation / such that I(a) C I(b) C I(c) = I(a). Finally, the ontology of Figure 4.(A).(III) is unsound because there is no interpretation I such that I(a) = I(b) C I(c) = I(d) c I(a). However note that these ontologies are consistent, that is, they can be completed so as to satisfy Def. 2.3. In fact, the basic reason why these three ontologies are unsound is the presence of cycles. Indeed, if we lump together the terms of Figure 4.(A) that are synonyms (as shown in Figure 4.(B)) then in each case we obtain a cycle. The notion of cycle plays a central role in our treatment of ontologies. Indeed, as we show next, an ontology is sound if and only if there is no cycle between the equivalence classes induced by "~". Let A = (T, ~, -, •) be an ontology and let T/~ be the set of equivalence classes induced by "~" over T (recall from Def. 2.3 that"~" is an equivalence relation). We can extend the relation
Y. Tzitzikas el al. / Deriving Valid Expressions from Ontology Definitions
311
"x" over the set T/~ as follows: for all c, c' in T/~, c -< c' iff there is t 6 c and t' e c' such that t - t'. We shall use the same symbol for both "-" and its extension over T/~, as the distinction will be clear from the context.
d ----=— c (III)
Figure 4: Three unsound ontologies Def 4.1 An ontology A = (T,~,-, ontology A is cyclic. o
is acyclic if "-" is acyclic over T/~. Otherwise the
Recall that a binary relation R over a set N is acyclic, if for every element n1 of N, there is no sequence n 1 ,..., nk with k 1 such that n1 = nk and niRni+1 for all i = 1,.., k – 1. For example, the ontologies of Figure 4.(A) are cyclic, because in each graph of Figure 4.(B) the set of nodes is the set T/~ and "-" is cyclic over T/~. Proposition 4.1 Every cyclic ontology is unsound. Proof: Let A - (T, -, ~, •) be a cyclic ontology, and assume that there exists a model / of A. It follows from Def. 2.4(ii) that all terms of an equivalence class c e T/ ~ have the same interpretation, and let us denote this interpretation by I(c). It follows from Def. 2.4(iii) that if c -< c1 then I(c) c I(c'). Since A is cyclic there exists a sequence of equivalence classes c 1 ,..., ck with k 1 such that c = c1 -...- c- = c. This implies that I(c) = I(c 1 ) c ... C I(ck) = I(c). Obviously, this is impossible, hence A is unsound. o Proposition 4.2 Every acyclic ontology A is sound, o We will prove this proposition by providing below an algorithm (Algorithm 4.1) which takes as input an acyclic ontology A and produces a model m of A. In this algorithm, we assume a function witness that takes as argument either a term t or a pair t • t' of different but related terms and returns a single object from Obj. We assume that witness is injective everywhere except on symmetric pairs, i.e. pairs of the form t • t' and t' • t with t / t'. For such pairs witness returns the same object, i.e. witness(t • t') = witness(t' • t). Note that the existence of such a function requires that the set Obj is "adequately" large. Specifically, we must have: card(Obj) card(T) + 1/2 * card( { t - t ' | t ^ t ' } ) , where "card" stands for "cardinality". The correctness of the algorithm that follows, and therefore the proof of Proposition 4.2, pre-supposes satisfaction of the above constraint. Let us apply the algorithm to the ontology shown in Figure 5, assuming Obj to be the set of all positive integers, and the function witness to be defined as follows: witness(a) = 1 witness(b) = 2 witness(e) = 5 witness(f) = 6
witness(c) = 3 witness(b • d) = 7
witness(d) = 4
Below we describe each step of the algorithm, and Figure 6 shows the interpretations of the terms as they are being updated in each step of the algorithm. Step 1: In this step a distinct integer is assigned to the interpretation of each term: m 1 (a) = {1}, m1(b) = {2}, m,(c) = {3}, m 1 (d) = {4}, m1(e) = {5}, m1(f) = {6} Step 2: In this step, the relationship 6 • d causes the following assignments: m2(b) = m1(b) U {7} = {2} U {7} = {2, 7} m2(d) = m1 (d) U {7} = {4} U {7} = {4,7} Step (3): In this step, the relationship c ~ 6 causes the following assignments: m3(c) = m2(c) U m2(b) = {3} U {2, 7} = {2, 3, 7} Step (4): In this step, the relationships 6 - a, e - f and d - e cause the assignments: m 4 (a) = m3(a) U m3(b) = {1,2,3,7}
312
Y. Tzitzikas et al. / Deriving Valid Expressions from Ontology Definitions
Algorithm 4.1 Ontology Model Input: An acyclic ontology A = (T, ~, x, •) Ouptut: A model m of A Step 1: For each t, set m(t) := {witness(t)} Step 2: For each t • t' If m(t) n m(t') = 0 then m(t) := m(t) U {witness(t • t')} m(t') := m(t')\j{witness(t •t') Step 3: For each t ~ t' If m(t) ^m(t')then m(t) :=m(t)Um(t') m(t') :=m(t) Step 4: For each t x t' If m(t) £ m(t') thenm(t') := m(t) U m(t') Step 5: If changes in Step 3 or in Step 4 then Goto Step 3 else return m
m4(e) = m3(e) U m3(d) = {4,5,7} Step (5): The control goes back to step 3. Step (3): No changes happen in this step. Step (4): In this step, the relationship e - f causes the assignment: ms(f) = m 4 (e) U m4(f) = {4,5,6,7} Step (5): The control goes back to step 3 and since no other changes are done in steps (3) and (4), the algorithm terminates returning the produced model
*
~
b*/
•
1X
,f ^
IS A
SYNONYM
b a 1 cl b II
RELATED
bid 1
e f d
Figure 5:
T a b c d e (
Step 1 mi
{1} {2} {3} M
is}
{6} Re ationships tha cause the assignments
Step 2
Step 3
Step 4
m2
m3
m4
{2,7}
{2,3.7} {2,3,7}
Step 3
Step 4
{1,2.3,7}
{4,7} {4.5.7}
{5.6}
b-d
c~ b
6 - a e-/ d - e
OUTPUT
m5
{4,5,6,7} e-(/
{1,2,3,7} {2,3,7} {2,3,7} {4,7} {4,5,7} {4,5,6,7}
Figure 6: Run of Algorithm 4.1 to the ontology base of Figure 5
Proposition 43 Given an acyclic ontology A, Algorithm 4.1 terminates and produces a model
of A. Proof: Let m be the interpretation returned by Algorithm 4.1. According to Def. 2.4 the interpretation m is a model of A if for each t, t1 of T:
Y. Tzitzikas et al. / Deriving Valid Expressions from Ontology Definitions
313
(i) m(i) ^ 0. This holds due to Step 1. (ii) if t ~ t' then m(t) = m(t'). This holds due to Step 3. (iii) if t - t' then m(t) c m(t'). This holds due to Step 4. If t - t' the algorithm performs the assignment m(t') := m(t') U m(t). This assignment guarantees only that m(t) C m(t'). However m(t) c m(t') actually holds because m(f) does not contain the element witness(t') which belongs to m(t') due to Step 1. If the element witness(t') were also a member of m(t) it would have been assigned to m(t) only due to the existence of a sequence of equivalence classes c1 - . . . - c* (& 1) such that t' e c\ and t e c*. But in that case, the relationship < -< £' and the above sequence would form a cycle which is a contradiction since the input ontology is acyclic. (iv) if t • t' then m(t) n m(t') / 0. This holds due to Step 2, and note that this property is preserved because the subsequent steps may only enlarge the interpretation of a term. Clearly, Algorithm 4.1 terminates. Suppose it does not. Then, since the terminology T, and therefore T/~, is finite, this is possible only if the loop between Step 3 and Step 5 does not terminate which is possible only if there is a cycle in T/~. But this is impossible because the ontology is acyclic, o Theorem 4.1 An ontology A is sound iff it is acyclic. Proof: Follows immediately from propositions 4.1 and 4.2. o Two important remarks are in order here. The first is that Algorithm 4.1 produces a model of an acyclic ontology, even starting with an ontology base which might contain an "incomplete" ontology definition. The second remark is that by slightly modifying Algorithm 4.1 we can obtain an algorithm which takes as input any ontology A (cyclic or acyclic) and produces a model m of A, if A is sound, or returns "UNSOUND" if A is unsound. This is Algorithm 4.2 which in comparison with the Algorithm 4.1, has one extra if-then statement in Step 4, which is shown in boldface. It can be easily proved (similarly to the proof of Theorem 4.1) that an ontology is unsound iff the Algorithm 4.2 returns "UNSOUND". Figure 7 shows the application of this algorithm to the ontology of Figure 4.(III). Algorithm 4.2 Ontology Model 2 Input: An ontology A - (T, ~, -, •) Ouptut: A model m of A, if A is sound, otherwise "UNSOUND" Step 4: For each t - t' lfm(t) £ m(t') then m(t') •=m(t)Um(t/) If m(t) £ m(t') then return "UNSOUND"
Returning back to the constraints of Def. 2.4, we can now say that if subsumption was interpreted by set inclusion (C) then all ontologies would be sound, even the cyclic ones. Allowing cyclic ontologies has the following consequence. If a subset C = (c\,..., ck} C T/~ forms a cycle, then all terms which are elements of one equivalence class in C, are essentially synonyms, although they may have not been declared as synonyms. 5 Querying an Ontology The relationships contained in the definition of an ontology can be seen as expressions that combine terms of the terminology using the connectors " ~ "," - "," •". There is an infinite set
314
Y. Tzitzikas et al. /Deriving Valid Expressions from Ontology Definitions
Stepl T mi a {»} b {2} c {3} d {4} Relationships that cause the assignments
Step 2 T712
Step 3 m3
{1.2}
{u|
{3.4} {3,4} a~b d~ c
Step 4 "U {1.23.4}
Step 3
Step 4 ms UNSOUND
OUTPUT UNSOUND
{1,2.3,4} {U3.4}
d - a 6 - c
{ 1,2,3,4} a~6 d~c
d - a
Figure 7: Run of Algorithm 4.2 to the ontology base of Figure 4.(III) of expressions that one can form in this way, and the relationships contained in the definition of the ontology can be seen as the given valid expressions of the ontology. New valid expressions can be inferred from given ones, and the purpose of this section is to define the inference mechanism for doing so. For instance, in Figure 1 the expression Math - Sciences is not member of the completed definition of the ontology. However, this expression can be characterized as valid, because one can easily see that for every model / holds I(Maths) c I (Sciences). The same holds for the expressions Mathematics • Computer Science, and Computer Science • Humanities. On the other hand, the expression Math • AI, cannot be characterized as valid, since in the model / of the ontology, which is given below, I(Math) n I ( A I ) = 0 (actually, this is the model produced by Algorithm 4.1). I(Sciences) I(Math) I(Databases) I(AI) I(History)
= = = = =
{1,2,3,4,5,6,7,11,12,13} I (Mathematics) {2,3,11} I (Computer Science) {5,12} I (Artificial Intelligence) {6,7,12,13} I(Philosophy) {9} I(Humanities)
= = = = =
{2,3,11} {4,5,6,7,11,12,13} {6,7,12,13} {8,13} {8,9,10,13}
This will lead us to define validity of expressions with respect to an interpretation. Before doing so, however, we must specify precisely what an expression is. Def 5.1 Let A = (T,~, -, •) be an ontology. An expression over A is a description, a strict inclusion, an inclusion or a synonymy, defined as follows (where t is a term): description d ::= e\t.d, where t is the empty description strict inclusion si ::= d - d', where d, d' are descriptions inclusion i ::= d • d', where d, d' are descriptions synonymy s ::= d ~ d', where d, d' are descriptions expression
e ::= d|si|i|s
Note that a description is either empty, or it has the form t 1 • . . . • „ , where n 1 and the f.'s are terms; we shall call each ti a subterm of d. Thus every term of T and every cross-reference t • t' of an ontology definition, is a description. Also note that the notion of expression generalizes the notion of term and the relations between terms. Some examples of expressions over the ontology definition of Figure 1 follow: Math • AI Math - Sciences Databases • AI - Computer Science Math ~ AI
Databases • Al • Sciences Math X AI Mathematics • AI ~ Math • Artificial Intelligence
Let A = (T,~,-,-) be an ontology. Any interpretation I of T can be extended to an interpretation I over descriptions as follows: for any description d = t1 • t2 •... • tk over A, we define }(d) = I(ti)r\ I(t2) n ... n I(tk), if d ^ e, and /(d) = 0 otherwise. For simplicity we shall use the symbol / to denote both the interpretation and its extension over descriptions. Def 5.2 Let A - (T, ~,^,-) be an ontology. interpretation / of T as follows:
We define validity of an expression in an
Y. Tzitzikas el al. / Deriving Valid Expressions from Ontology Definitions
o
315
a description d is valid in I, denoted / |= d, if I(d) £ 0. a strict inclusion d - d' is valid in /, denoted I |= d -< d', if I(d) c I(d')an inclusion d ;< d' is valid in /, denoted / |= d ^ d', if I(d) C I(d'). a synonymy d ~ d' is valid in /, denoted I |= d ~ d', if I(d) = I(d').
Def 5.3 An expression e is valid in A, denoted A |= e, if I |= e for all models / of A. o It follows immediately from Def. 2.4, that the expressions contained in the definition of an ontology A are valid in A. The main problem that we solve in this paper is the following: given an ontology A and an expression e decide whether A |= e. Given a model m of A, let E(m) denote the set of all expressions over A which are valid in TO, that is, E(m) = { e | m |= e}. Let us now denote by E( A) the set of all expressions which are valid in A and we shall call E( A) the closure of A. According to Def. 5.3 this set consists of the expressions which are valid in every model of A, that is, E(A) = p) { E(m) | m is a model of A} 6
Investigating a Model-theoretic Method for Deriving Valid Expressions
In this section we investigate a model-theoretic method checking the validity of expressions. In particular we look for a special model of A, denoted by mA, such that E ( m A ) = E( A), that is, mA |= e iff A |= e for all e. Note that by definition it holds E(A) C E(m) for any model m of A. This means that the model-theoretic approach as inference procedure is always complete (that is E(A) cannot contain an expression which is not element of E(m) for any model m). Thus the difficulty lies in the soundness of the approach, that is, in finding a specific model mA and proving that E(mA) C E(A). For example, Figure 8.(A) shows graphically an ontology A and a model m of that ontology in which m |= 6 • c, although obviously A ^ 6 • c.
V
A
Let us first investigate the descriptions. Let Ed(m) denote the set of descriptions which are valid in a model m, and Ed(A) denote the set of descriptions which are valid in every model of A. Now we will look for a model m such that Ed(m) — Ed(A). Below we prove that all descriptions that are valid in malg (the model produced by Algorithm 4.1) are also valid in every model of A. Proposition 6.1 E d (m alg ) = Ed(A) Proof: (see appendix A) o Thus for checking the validity of descriptions in A, we run once the Algorithm 4.1 which produces the model malg, and then it suffices to check the validity of descriptions in this model. However the counter-example that follows proves that we cannot derive the validity of strict inclusions, or of synonymies, by the model-theoretic approach. Example 6.1 Consider the ontology A shown in figure 8.(B). Clearly, in every model m of A, we have: m(a)cm(6) 1 ( m(a) C m(c) }
\ )-
\ >
\ >
\>
This means that in some of the models of A, it holds m(a) c m(6) n m(c) (see the model mx), while in the rest of the models it holds m(a) = m(b) n m(c) (see the model my). We conclude that the model-theoretic approach is not appropriate for checking the validity of strict inclusions and synonymies, since in the model mA that we are looking for, it will either hold: mA(a) C m,A(b) n m A (c), or mA(a) = mA(b) n m A (c), but none of these expressions is valid in every model of A. o
316
Y. Tzitzikas el al. / Deriving Valid Expressions from Ontology Definitions
Moreover we cannot check the validity of inclusions by the model-theoretic approach because if we could derive the validity of inclusions then we would be able to derive the validity of synonymies ( since A |= d ~ d' o A |= d •< d' and A |= d' •< d), but we have already showed that the latter is impossible in the model theoretic approach. Another counter example follows. Example 6.2 Consider an ontology A = ({a, b}, 0,0,{a- b}). Clearly, in every model m, we have: m(a) / 0, m(b) •£ 0, and m(a) n m(b) / 0. The top part of the table shown in Figure 9, presents four models, m1, m2, m3, m4, of A. These models demonstrate all possible relations that can hold between the interpretations of the terms a and b. In the bottom part of the table, each row corresponds to an expression over A, while each column corresponds to one of the four models. The presence of a bullet indicates that the corresponding expression is valid in the corresponding model. Observe that in each model, there is at least one expression, which is not valid in some or all of the other models. One can easily see that it is impossible to find a model mA such that E(mA) = E(A) ( always it will be E(m) D E(A)). o
a 6
overlapping mi
equal m2
subset mj
subset m4
{13} {2,3}
{1} {>}
{1} {1.2}
{U} {1}
•
•
E(m,) •6 ~6 Xi -< a • 6 -< a
•6x6
•
• •
• • •
• •
•
Figure 9: Thus in this section we proved that model-theoretic approach is a sound and complete inference procedure for checking the validity of descriptions. However it is unsound for inferring synonymies, inclusions and strict inclusions. 7
An Alternative Approach for Checking Expressions
In the previous section we showed that the model theoretic approach is not appropriate for deriving the validity of synonymies, inclusions, or strict inclusions. In this section we provide an alternative inference procedure for synonymies and inclusions, which is sound and complete. Let A = (T, - m(a) m(a) v /C - m(6) v / n m(c) \ /
According to our method A | = a ^ b - c i f f A | = a - 6 - c ~ a . The later is valid if r(a • b • c) = r(a), which is true, since r(a • b • c) = {a}. Thus our method also derives that A |= a X b • c. Note that our method also derives that A J£ b • c ^ a since A | = b - c X a < » A | = a - 6 - c ~ 6 - c which is false, since r(a • b • c) = {a}, while T(b • c) = {6, c}. o Concerning strict inclusions, Figure 10 shows graphically some ontology definitions and some indicative strict inclusions which are valid (or invalid) in these ontologies. Checking the validity of strict inclusions is a more complex reasoning task and we shall present a sound and complete inference procedure which is based on the construction of a graph, on a subsequent paper. However in applications which employ ontologies in order to store descriptions of concrete objects, that is, in materialized ontologies we only need to check the validity of inclusions (i.e. for answering queries and checking the containment of queries). b
c
t Af=ab^ a b < c d
318
Y. Tzitzikas et al. /Deriving Valid Expressions from Ontology Definitions
8 Comparison with other Knowledge Representation and Reasoning Approaches In this section we try to represent our ontologies in some logic-based languages, in particular, Propositional Calculus, First Order Predicate Calculus, Horn Clauses and Description Logics. We study each language in the following manner. At first we look for a method for translating an ontology definition A, to a set of well formed formulas AA of that language and investigate whether the semantics and the corresponding inference rules of the language allow checking ff the soundness of A (that is, A |= •£> AA (=). If this holds, then we investigate whether an expression e (description, synonymy, inclusion, strict inclusion) can be written as a wff 4>e and whether the wff that are inferred from AA correspond to expressions which logically follow from A and the vice versa (that is, A (= e & AA \= - bt (I) Notice that this formula also implies that r(6) \ r(o) £ 0, that is r(6) £ r(a). If formula (1) is satisfied then the enlargement process will result in a model m' such that m'(a) = m'(6), which means that we failed to construct the model mc that we are looking for. However in this case, we can try the opposite direction, that is, we can add a new object oa to each m(a,). Certainly oa 6 m(a) while oa ^ m(6) (since r(6) ^ r(a)). Now we enlarge m for making it a model. Again, what remains to show is that this step did not add oa to each m(6,) too. By a similar analysis we reach the conclusion, that this step would update each m(6,) only if V 6r € r(6) \ r(a) 3 ar 6 r(a) \ r(6) such that 6r x af (2) From this analysis we conclude that we cannot construct the model mr that we are looking for, only if the formulas (1), and (2) both hold. Formula (1) imply that there exists a, e r(a) \ r(6), and bt € r(6) \ r(o), such that az y bz. Formula (2) imply that there exists az> € r(a) \ r(6), such that 6, >- a,-. Notice that if z - z' then A would be cyclic (since it would be a, >- a z ), while if z ^ z' then this would contradict the definition of r(a) (since it would be at >- az>, therefore a, should not be an element of r(a)). Thus we conclude that the formulas (1) and (2) cannot be both true. This means that we can always construct a model mr such that mr(a) ^ mx(b). This implies that if r(a) £ r(6) then A £ a ~ 6, that is, A |= d ~ d' => r(d) = r(d'). o
Information Modelling and Knowledge Bases XIII H. Kangassalo et al. (Eds.) IOS Press, 2002
325
A Semantic Information Filtering and Clustering Method for Document Data with a Context Recognition Mechanism Dai Sakait Yasushi Kiyoki^ Naofumi Yoshida^ Takashi Kitagawa"^ fGraduate School of Media and Governance, Keio University ffFaculty of Environmental Information, Keio University ffflnstitute of Information Sciences and Electrics, Univ. of Tsukuba 5822 Endoh, Fujisawa, Kanagawa 252-8520, Japan \
[email protected] ff
[email protected] \
[email protected] Abstract. In this paper we present a semantic information filtering method with a context recognition mechanism and its application to document clustering. The filtering method is able to filter out irrelevant data that are less correlated in meaning with given context words. Our method is based on an idea that meaning of document data varies with contexts or viewpoints. By filtering out retrieval candidate data items with low semantic correlation with the given context words, semantic information retrieval and data mining become effective because analysis of data is only performed on data items with high correlation with the given context words. This filtering is realized by the use of the Semantic Associative Search Method. Dynamic semantic analysis of document data according to the given context words is made possible by a semantic projection operation to select a semantic subspace from an orthogonal multiple dimension space where document data are mapped. We apply this filtering method to semantic clustering. The meaning of document data obtained after the semantic information filtering are analyzed according to the given context words. In the clustering process, semantic correlation of document data with respect to each other is calculated in the semantic subspace to form distance matrix. A semantic distance calculation formula designed based on the machinery of the context-dependent semantic subspace selection is applied to stress the characteristics of document data as the distances are computed. By this process, relevant clusters can be obtained. The feasibility of our semantic information filtering method and its application to semantic clustering are shown in two experiments. The application of the context-dependent semantic information filtering to document mining enables efficient knowledge acquisition from document data according to a given context words.
1
Introduction
Document data have been spread wide in computer networks. With the increase of an opportunity available to the searcher for acquisition of information, efficient methodology for acquisition of document data and knowledge extraction from them has become a very important research issue[l, 7]. Filtering out of irrelevant document data set is an important issue in document mining. Clustering of a huge amount of document data, found in the internet for an example, causes a large overhead on calculation. Removal of trivial data from calculation makes a big improvement in efficiency. This becomes a crucial issue when the analyzer of data is using a semantic clustering method with semantic interpretation
326
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
of data, which needs heavy amount of calculation. To realize efficient clustering, an effective method to filter out irrelevant data should be introduced. There have been many research results to realize effective information filtering systems[8, 9]. There is a filtering method which filters data by comparing attributes of data with the keywords registered in the searcher's profile. Also a filtering technique based on evaluation of a relationship between data's creator or sender and the receiver is found. Those filtering methods may be carried out by pattern-matching of words in data with keywords given. There are other ways of analyzing the data such as a method that filters out irrelevant information by analyzing syntax of data[2]. In this paper we present a context-dependent semantic information filtering method and its application to semantic clustering with context recognition. The filtering method is defined using the Semantic Associative Search Method which has been proposed in [3, 4, 5, 6]. The semantic filtering algorithm works by providing the analyzer with a way to express his/her viewpoint. The searcher gives context words to express his point of view to the system implemented with our filtering method. By this process, our system can filter out those document data semantically unrelated with the given context words. The searcher can dynamically obtain relevant document data according to various contexts. In this paper, we apply the filtering method to the semantic clustering method. Analyses of meaning of document data in accordance with the given context words may need heavy amount of calculation. Filtering out of irrelevant data may improve efficiency of this clustering algorithm. We examine this potentiality in this paper. The feasibility of our semantic information filtering method and its application to the semantic clustering are shown in two experiments. 2
Context-Dependent Information Filtering
When handling a large number of document data, analyses of irrelevant data cause a heavy overhead on calculation. We introduce an information filtering method with context recognition using the Semantic Associative Search Method[3, 4, 5, 6] to filter out irrelevant data before analyzing data. By using this filter, we enable efficient information retrieval and document clustering. 2.1 Brief Outline In this section, the steps to realize our semantic filtering method are briefly described. Stepl: Creation of a Semantic Space and Mapping Data An orthogonal multidimension space (semantic space) is prepared using the Semantic Associative Search Method[3, 4, 5, 6]. Document data are vectorized based on their metadata expressed in vectors and mapped onto the semantic space. Step2: Context Recognition The dynamic interpretation of meaning of data according to the given context words is realized through the selection of a semantic subspace from the entire semantic space[3, 4, 5, 6] (in the current implementation, the entire semantic space consists of approximately 2000 orthogonal vectors[6]). A subspace is extracted by the semantic projection operator when context words, or the searcher's viewpoint, are given. Thus, vectors of document data in the semantic subspace have norms adjusted accordingly with the given context words. An example of a semantic subspace and document vectors are shown in Figure 1.
D. Sakai el al. /A Semantic Information Filtering and Clustering Method
Figure 1: An example of a semantic subspace (qi,q2 qi) extracted from an entire semantic space is shown where xij represents a projected value of a document vector j on an axis qi. The normj represents a norm of document vector j in this semantic subspace. Step3: Filtering out Irrelevant Data In the semantic subspace selected in accordance with the given context words, norms of the document vectors are calculated as the measurement of amount of relevant information contained. Then, the irrelevant document data whose vectors have norms less than a given threshold T are filtered out from retrieval for the searcher.
2.2
Step 1: Creation of a Semantic Space and Mapping Data
The filtering method and clustering algorithm described in this paper are realized through the use of the Semantic Associative Search Method[3, 4, 5, 6]. In this section, the outline of the Semantic Associative Search Method is briefly reviewed. 2.2.1
Creation of Semantic Space I
1. Matrix of basic words : A set of basic words which characterizes the data items (document data) to be used is given in the form of an m by n matrix. Each of given m words is characterized by n features. 2. Eigenvalue decomposition of the correlation matrix : First we construct the correlation matrix with respect to the features. Then we execute the eigenvalue decomposition of the correlation matrix and normalize the eigenvectors. We define the semantic space I as the span of the eigenvectors which correspond to nonzero eigenvalues. We call such eigenvectors semantic elements hereafter. We note that since the correlation matrix is symmetric, the semantic elements form orthonormal bases for J. The dimension v of the semantic space I is identical to the rank of the data matrix A. Since I is v dimensional Euclidean space, various norms can be defined and a metric is naturally introduced.
328
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
2.2.2 Mapping Data 1. Creating Metadata for Document Data: Each of the document data for analysis by the Semantic Associative Search Method is given a series of basic words as metadata. Each basic word is expressed as a vector in the method [6]. When mapping a document datum onto a semantic space I, it is vectorized by evaluating union operator ® of vectors of basic words contained in the datum's metadata. 2. 2-normalization (for filtering): By 2-normalization of document vectors in the semantic space I as described in [6] and norm calculation of them in the semantic subspace that corresponds to the given context words, we can obtain norms as measurement of semantic correlation between document data and the context words. 2.3 Step 2: Extraction of a Semantic Subspace The dynamic semantic interpretation of document data according to the given context words is realized by selection of a semantic subspace from a semantic space in accordance with the given context words. This process has also been presented in [3, 4, 5, 6]. 2.3.1 Defining a Semantic Projection 1. Defining a set of the semantic projections IIv: We consider the set of all the projections from the semantic space I to the invariant subspaces (eigen spaces). We refer to the projection as the semantic projection and the corresponding projected space as the semantic subspace. Since the number of i dimensional invariant subspaces is (v(v — 1) • • • (v — i + l))/i!, the total number of the semantic projections is 2". That is, this model can express 2" different phases of the meaning. 2. Constructing the Semantic Operator Sp: Suppose a sequence sl of l words which determines the context is given. We construct an operator Sp to determine the semantic projection according to the context. We call the operator a semantic operator. (a) First we map the t context words in databases to the semantic space I. This mathematically means that we execute the Fourier expansion of the sequence s/ in I and seek the Fourier coefficients of the words with respect to the semantic elements. This corresponds to seeking the correlation between each context word of st and each semantic element. (b) Then we sum up the values of the Fourier coefficients for each semantic element. (We call this sum corresponding axis' weight.) This corresponds to finding the correlation between the sequence sl and each semantic element. Since we have v semantic elements, we can constitute a v dimensional vector. We call the vector normalized in the infinity norm the semantic center of the sequence s/. (c) If the sum obtained in (b) for a semantic element is greater than a given threshold £, we employ the semantic element to form the projected semantic subspace. We define the semantic projection by the sum of such projections.
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
329
Table 1: Metadata and norms of their vectors in the semantic subspace that corresponds to a given context " cancer" Norm
0.25 0.3 0.52 0.63 0.86
Document Title medicineMixed9-1 mental-diseases5-3 cancer-related2-9 cancer-related3-2 cancer-related5-7
Metadata anti-ulcer-drug anti-virus-drug antitussive-drug blood-drug herbal-medicine emotional-disturbance manic-depressive-psychosis mental-disease lung-cancer cancer-phobia cancer brain-tumor esophageal-cancer uterine-cancer malignant-tumor lung-cancer cancer-phobia breast-cancer
This operator automatically selects the semantic subspace which is highly correlated with the sequence sl of the l context words which determines the context. This model makes dynamic semantic interpretation possible. We emphasize here that, in our model, the "meaning" is the selection of the semantic subspace, namely, the selection of the semantic projection and the "interpretation" is the best approximation in the selected subspace.
2.4
Step 3: Filtering Out Irrelevant Data
In this section, the semantic information filtering method with the Semantic Associative Search Method is introduced. First, we define what relevant data are and then describe the reasons for 2-normalization of document data. Finally, setting of a threshold T is explained. 2.4.1
Relevancy of Document Data
In our filtering method, we assess relevancy of document data against the given context words by calculating the data's correlation with the given context words. Relevant datum is defined as a datum which has high correlation in meaning with the given context words. On contrary, data with small semantic correlation are defined to be irrelevant. The correlation is measured with a size of a norm of the datum's vector in a subspace selected in accordance with the given context words as shown in Table 1. Data whose norms are higher than a threshold (threshold T) are considered relevant. 2.4.2 2-normalization of Document Vectors 2-normalization of document vectors in the semantic space X as described in [6] enables efficient calculation of semantic correlation between document data and the given context words. 2-normalization is applied to fix norms of all document vectors to a value of 1.0 assuming that all document data have an equal amount of semantic information. Without this process, each document datum contains different amount of information. By 2-normalization, we can obtain norms as measurement of semantic correlation between document data and the context words in the semantic subspace that corresponds to the given context words. 2.4.3 Adjustment of Document Vectors The vectors in the semantic space are adjusted as they are mapped onto a semantic subspace that corresponds to the given context words. Since axes whose semantic
330
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
elements unrelated to the context are not used in the selection of the subspace, vectors of document data semantically unrelated with the given context words have smaller norms in the subspace. As all the vectors are normalized in 2-normalization, it is assumed that each vector contains the same amount of semantic information in the whole space, and the vectors with high correlation to the selected subspace, that corresponds to the given context, have high values of norms in the subspace. That is, the norm of a vector in the selected subspace represents the ratio of semantically related elements for the given context to the whole elements. An example of sets of metadata and corresponding norms in the semantic subspace is shown in Table 1. 2.4.4 Filtering of Irrelevant Data The norms of the vectors in the subspace are compared with a threshold T, which is a reference point of relevancy of document data against the given context words. The vectors with norms less than the threshold T are considered unnecessary and filtered out from retrieval or clustering candidates. In this context-dependent semantic information filtering, the selection of a threshold T is case-dependent. The searcher can decide whether he/she wants to filter out a relatively large amount of data and gain only data very related to his/her viewpoint or set the threshold at a lower value so that he/she can gain most data for thorough analysis. The effect of the threshold selection on the semantic information filtering is examined in the experiments later on. 3
An Application of the Context-Dependent Semantic Information Filtering to a Semantic Clustering Method
We apply the semantic information filtering method with context recognition to a semantic clustering method. The practicability of this application is seen in the formation of relevant clusters. 3.1 Brief Outline The semantic clustering method works by the following steps. Stepl: Mapping of document data onto the selected subspace without 2-normalization of their vectors. Step2: Standardizing document vectors' projected values on all axes of the semantic subspace so that meaning of distances in all axes are unified. Here, the steps 3 and 4 are repeated until the number of clusters reaches the previously defined number. Step3: Calculating distances among data in the semantic subspace. We employ a semantic distance calculation formula to stress characteristics of document data as distances are calculated.
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
331
Step4: Forming clusters using UPGMA (unweighted pair-group method using arithmetic averages[10]) which is a well known clustering algorithm and described in Section 3.5. Each step is explained in detail in the following sections. 3.2 3.2.1
Step 1: Mapping of Document Data The Context Dependency of the Clustering Method
After the semantic information filtering, we apply a semantic clustering method to document data in the semantic subspace extracted from the semantic space in the filtering process. The use of the same subspace used in the filtering step enables the same context words to be reflected in the formation of clusters. 3.2.2 Mapping without 2-normalization Remaining document data after filtering are mapped without 2-normalization onto the semantic subspace. The vectors are not 2-normalized since we examine characteristics of each document datum. Size and direction of a vector are equally important. They are not lessened by 2-normalization. 3.3
Step 2: Standardizing Projected Values
In the semantic subspace, the projected values of the vectors on each axis are standardized so that meaning of distance is the same in any axis. Here a projected value refers to a continuous value on an axis made by projecting a document vector onto the axis (Figure 1). Thus when x vectors are mapped onto a subspace, there are x projected values on each axis. 3.3.1
Calculation of Relative Correlation
Before data are classified by clustering, a measurement to evaluate similarity or dissimilarity among data should be defined. In this clustering algorithm, the Euclidean distance is used to represent semantic correlation among the document data in a selected subspace. Distance must represent relative similarity or dissimilarity among data. Distances do not represent relative similarity or dissimilarity when document data are mapped onto the semantic subspace. We standardize projected values on all axes in the semantic subspace so that a distance represents relative similarity or dissimilarity among all data. For example, by this process a distance of 0.1 have the same meaning on any axis. The clustering is performed by appropriately calculating relative similarity among document data through the standardization. 3.3.2 Standardization Function To standardize the projected values on all axes, the following standardizing function is used. "t" describes the number of documents, "i" is the identifier of axes, "j" represents the identifier of documents. A projected value is denoted as "xij" and the standardized value is called "zij". "x i " is the average value of all projected values in an axis and "s i " is the standard deviation in the axis.
332
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
xij-xi
By this conversion of projected values, sizes of projected values on all axes are mapped onto a single scale and distance among data represents relative similarity or dissimilarity. 3.4 Step 3: Calculation of Distance among Document Vectors After the standardization, distances among vectors are calculated in the semantic subspace extracted from the semantic space in the filtering process. To realize a precise semantic analysis of meaning of document data in the semantic subspace, we use a semantic distance calculation formula that can clarify and stress characteristics of document data as distances are calculated. This formula is based on the Euclidean distance formula as shown below. The weight of axis, or an axis' correlation with the context words as explained in Section 2.2, and the size of projected values on all axes are emphasized. 1/2
Here, "d ij " denotes a distance between document vector "i" and "j". "k" is the identifier of an axis and "p" is the total number of axes. The weight (wk) and projected value (x) have the power a and 0 respectively. The parameter a decides a degree to which weight of an axis is emphasized. The parameter ß stresses projected value of a vector, thereby clarifying its characteristics. Both parameters can take continuous values above 0.0, but effective values have not exceeded 3.0 in many experiments. The semantic correlation between a document datum and the given context words in the semantic subspace is proportional to the norm of the document vector and the weight of axes on which the document vector lies. To cluster highly correlated data in detail, we can stress projected values and/or weights of axes while distances among document data are calculated with this semantic distance calculation formula. 3.5 Step 4: Forming Clusters After the calculation of distances among document vectors, a clustering algorithm is applied to form clusters of document data in the semantic subspace. This algorithm puts two document vectors that have the closest distance into the same cluster. The algorithm for choosing the pair of vectors or clusters with the smallest distance becomes complex when each cluster has more than one document datum. To determine whether a document datum A belongs to cluster 1 or 2 depends on the measurement of distance from the datum to each cluster. Following methods are used as the algorithms for solving this problem [10]. SLINK (single linkage clustering method): The distance between two clusters is equal to the smallest distance between two document vectors each from a different cluster[10].
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
333
CLINK (complete linkage clustering method): The distance between two clusters is the distance between two document vectors from each cluster that are furthest apart [10], UPGMA (unweighted pair-group method using arithmetic averages): The distance between two clusters is equivalent to the averaged distance between all document vectors in the first cluster and those in the second[10]. SLINK and CLINK are said to have the smallest overhead in the system performance. UPGMA needs heavy amount of calculation, but is known to give the most balanced results. 4
Experiment
In this section, we show the feasibility of our semantic information filtering method with context recognition and its application to the described clustering method. 4.1
Specific Goals
We prepare two experiments from the viewpoint of the following goals: • Evaluating precision of semantic filtering of document data which contain irrelevant data less correlated in meaning with the given context words. • Evaluating efficiency of the described clustering method when our semantic information filtering is applied beforehand and when not. • Specifying the situations in which the filtering and its application to the semantic clustering method work efficiently. We give different contexts and values of parameters to the system implemented with the Semantic Associative Search Method[3, 4, 5, 6] to evaluate their feasibility in different cases. We carry out two experiments to examine the semantic filtering method with semantic interpretation and its effect on the semantic clustering method. 4.2
Experimental System and Data
For two experiments, we use a system implemented with the Semantic Associative Search Method as explained below. Semantic Space: We have implemented an experimental system with the Semantic Associative Search Method. 1001 basic words and 309 features concerning medical field are prepared in defining m by n matrix as explained in Section 2.2. In the setting of 1001 x 309 matrix for creation of a semantic space, a value of 1 is set to a feature that corresponds to a basic word as in Figure 2, if the feature describes meaning of the basic word. The rest of features that are not related to the basic word are set to a value of 0. Using this matrix, the medical semantic space that consists of 265 axes is defined by the methodology explained in Section 2.2.
D. Sakai et al. /A Semantic Information Filtering and Clustering Method
334
heart
virus
liver-disease
cancer
1 0 0
0 0 0
0
0
1
d10001 1
1
heart-disorder liver-disorder cancer
lung-cancer
•••f 3 0 9
o
••• o
0
1 1
1
••• 0
1
0
0
••• 1
Figure 2: 1001x309 matrix used to create the medical semantic space, d represents a basic word. / represents a feature.
Table 2: Four word-groups each containing 10 related words cancer-related breast-cancer brain-tumor liver-cancer stomach-cancer uterine-cancer (5 more words)
treatments electronic-therapy immunotherapy physical-therapy kidney-transplant liver-transplant (5 more words)
medicine mental-diseases herbal-medicine depression hormone-drug neurosis internal-medicine autism sleeping-drug mania tranquilizer schizophrenia (5 more words) (5 more words)
Metadata of Document Data: In preparing sets of metadata of medical document data, we defined four word-groups each containing 10 related basic words as in Table 4.2. Metadata of document data are prepared using basic words in four word-groups. We prepared 120 sets of metadata using words from each word-group. A total of 480 sets is prepared using four word-groups, as depicted in Figure 3. Specific procedures are as follows. 1. The number (x) of basic words to form a set of metadata for a document datum is selected from values of x (= 9, 7, 5, 3, 2, 1). Also a word-group from which basic words are extracted is chosen. 2. 10 different combinations of x basic words are selected from the chosen wordgroup. Each combination becomes a set of metadata for each document datum. 10 sets of metadata are thus defined. 3. 10 sets of metadata are made as explained with respect to each of six values of x (= 9, 7, 5, 3, 2,1). Thus 60 sets of metadata are defined with respect to a word-group. We shall name 60 sets of metadata as " [group-name] metadata" as in 60 sets of medicine metadata. 4. In the same manner, 60 more sets of metadata are defined using the basic words that belong to the same word-group used in the previous step. They
D. Sakai et al. / A Semantic Information Filtering and Clustering Method
335
mental-diseases Title of Doc F
G H I
N O P... treatments
Metadata
Title of Doc
Metadata
J 10 words
cancer.relatedMixed5-4 D E F G H + random
Q R S ... medicine
x: The number of words to be used to form a set metadata
cancer-relateds. 10 I ABCD
cancer-relateded2-10 EH
V
of
An example of a mixed-metadata set cancer-related Mixed5-1l A B C D E + random cancer-relatedMixed5-2 B C D E F + random
10 sets of metadata
10 sets 01 metadata
cancer-relatedMixed 5-10 J A B C D + random Adding random number of randomly generated unrelated words to the metadata prepared.
Figure 3: This figure shows how 60 sets of metada and 60 sets of mixed-metadata are prepared using 10 words in cancer-related word-group. A total of 480 metadata are prepared with respect to 4 word-groups. are identical with the 60 sets created in the previous step, except that new sets contain random basic words. One to five basic words not found in any word-group are attached randomly to each set. We call those sets "[groupname] mixed- metadata" as in 60 sets of medicine mixed-metadata. 5. We repeat the procedures 1-4 with respect to all word-groups to gain a total of 480 metadata of medical document data.
Relevant document data
Irrelevant document data for removal
0.47 0.47 canccr-related1-101cancer-phobia 0.370.52a.52 cancer-related2-91 tung-cancer cancer-phobia 0.380.54cancer-related2-101cancer-phobiabreast-cancer 0.37 0.83 cancer-related 5-61 stomach-cancer uterine-cancer malignant
0.18 0.65 medicincMUtltt-lOm-animifafranti-ulcer-drug anti-rirus-dmi antitussiveredmg blood-drug herbal-medicine hormone-drug internal-medicine preventive-mediant TNF allergy-disease anti-cancer-agent influenza measles pleura 0.09 0.25medicineMixedn,tA\cm,M\*co*-\\anti-ulcer-drueanti-virus-drug antitussive-drug blooding herbal-medicinee hormone-drug internal-medlcine preventive-medicine sleeping-drug tumor tung-cancer cancer-phobia5-7 | u t e r i n e M U t f c t f C t r - r e l a t e d 5 - 7 | u t0.12 e r i n0..40 e - c a nmedicine c e r maltemant-u6rttaetS-llulerlnt-cancermaliinanl-lumorluni-cence Mixed9-3|antitussive-drug blood-drug herbal-medicine hormone-drug canncer-phobia breast-cancer internet-medicine preventive-medicine sleeping-drug tranquilizer anti-uicer-drug DHA .26 0.37 cancer-related2-4lesophageal-cancer blood-donallon eye-lotion fatness 0.27 0.3Scancer-related2-5|liver-canceretnter-rtMeil-Suiter-cancerstomach-cancer 0.10 0.33 medicineMixed9-4|lood-druf herbal-medicine hormone-drug internal-medicine preventive-medicine sleeping-drug tranquilizer anli-ulcer-drug anti-virus-drug VHE blood0.24 0.34 cancer-related2-6|leo.l-4\itomacll-cancer uterine-cancer vessel cold medical-facilities sodium-restriction 0.370.63cancer-related3-2|*nctc-rt\t eA3-2]cancerbrain-tumorbrain-tumoresophageal-cancer 0.09 0.39 medicinMixed9-5|herbaLmedicine hormone-drug internal-medicine preventive.25 0.69cancer-relatedMixed2-2,*.»«1NPUT_PARAMETER"*>,*«*FIELD NAME*»,*«*INPUT PARAMETER**); if(l !- @htmlK S,match($table,f,*);
8,insert($tab(e,@field,$id.$pw.$em4pln); @html - &Outpu_generate("op1.htmr.'l jSIAIUo
SOME TABLg
^
O A fields corresponding toavisiibleparametereAfieldscorresponding to a hidden parameter
Figure 12: Input fields and output fields of a filter
ORESULT
K, Jamroendararasame el al. / Web-based Transaction Systems
4.2
371
Process Description
A process description is a set of equations and functions which define output of filters. Each equation defines a value of an output field using values of input fields. Output fields of a filter are • fields of a pipe on the output side of the filter, excluding fields which correspond to a control, •
and fields corresponding to a database table written by the filter (Fig. 12). Input fields of a filter are
• fields of a pipe on the input side of the filter, •
andfieldscorresponding to a database table read by the filter. All output field values must be defined in the process description.
4.3 An Example of Process Descriptions We show a process description of a processing program ADD1 of our seminar room booking system in Fig. 13. The program ADD1 gets an ID, a password, and an email address from a Web page Register, and adds the new user to a table USER_LIST. The program is aborted if either the ID or the password is empty or there is a user who has the same ID. 1: filters add 1 { 2: prev=db_select("ID",i.USER_LIST,"WHERE ID='%s'",i.ID) 3: 4: errorl if i.ID eq "" || i.PW eq "" || i.EM eq "" || db_ntuples(prev)>0 with { 5: o.USERJJST=i.USER_LIST; 6: } 7: 8: confirm! otherwise with { 9: pin=generatePIN(i.ID,i.PW,i.EM) 10: 11: o.EM=i.EM; 12: o.PIN=pin; 13: o.USER_LIST=db_insert(i.USER_LIST,"VALUES('%sV%sV%s',%d)",i.ID,i.PW,i.EM,pin) 14: } 15:}
Figure 13: A process description of a processing program ADD]
The program ADD! consists of two filters. One is attached to a pipe Error] (Fig. 13, 11. 4-6) and the other is attached to a pipe Confirm1 (Fig. 13, 11. 8-14). In the process description, we add prefixes "i." and "o." to an input field name and an output field name respectively so that we can distinguish an input field from an output field. The condition of the former filter is an expression defined on the fourth line. If the value of an input field i.ID is empty, the value of the expression is true, therefore the former filter is selected and the program ADD1 sends an Errorl page to a Web browser. An equation on the thirteenth line describes registration of a new user. The equation defines a new value of a table USER_LIST as a table which we get by adding the new user to the current value of the USER_LIST table. A text "VALUES ('%s','%s','%s',%d)" represents the new record in the same way as SQL language. Characters %s and %d are
372
K. Jamroendararasame et al. I Web-based Transaction Systems
replaced with a parameter of a function db_insert. 4.4
Generation of CGI Programs
Our PF-Web generator generates CGI programs written in Perl language. We show a part of a CGI program ADD1 generated by PF-Web system in Fig. 14. This program has the following 3 main parts. 1. A routine to input form data (Fig. 14,11. 32-39) decodes form data and assigns them to variables corresponding to input fields. 2. A routine to compute output field values (Fig. 14,11. 6-20). 3. A routine to generate a Web page (Fig. 14, 11. 24–31) reads a Web page template, replaces visible parameter names and hidden parameter names with parameter values, and sends an HTML document to a Web browser. PF-Web system adds the following codes to CGI programs for a standard level of security management. •
A routine to check consistency between a Web page and input data: We may give maximum length of input text to text controls and password controls using our Web transition diagram editor. We may also give menu items to menus. However, it is possible that the length of input data exceeds the maximum length or the input data do not match any of menu items because a cracker who knows HTTP protocol and HTML can give any values of any length to CGI programs.
•
A routine to replace HTML special characters in field values with character references such as '>', '<', and '&' in the Web page generation routine: Characters '', and '&' have special meanings in a HTML document, so a Web browser can not show a Web page properly if the Web page contains special characters as a text or an attribute value of HTML tags.
1: ... 2: &readFormData; 3: &openDB; 4: ... 5: $v_prev=&db_select( "ID", $i_USER_LIST, "WHERE ID^/os'", $i_lD ); 6: if($i_ID eq "" || $i_PW eq "" || $i_EM eq "" || &db_ntuples( $v_prev ) > 0) { 7: $o_USER_LIST = $i_USER_LIST; 8: ... 9: &_gen_errorl_page; 10:} 11: else { 12: $v_pin=&generatePIN( $i_ID, $i_PW, $i_EM ); 13: 14: $o_EM = $i_EM; 15: $o_PIN = $v_pin; 16: $o_USER_LIST = &db_insert( $i_USER_LIST, 17: "VALUES (1%s',. Intensional junctors, where the interpretation of composed formulae depends on the context are not considered at all. The approach by Ganter and Wille [7] formalizes intensions by common attributes of a set C of objects. Conversely, the extension of a set of attribute is formalized by the set of objects having the attribute. This theory naturally needs to a lattice of concepts, each of which is defined by an extension and an intension. Of course, Kauppi's axioms hold in these lattices, but intensional containment (expressed on the intensions of concepts, i.e., sets of attributes) is equivalent to the superset relation for the extensions of concepts (expressed by sets of objects). Thus, though the term "intension" is used in this theory, it is debabatable to call this approach to Concept Theory to be intensional. The approach taken by Duzi [4]—which she characterizes as being "Neo-Frege"— starts from the point of view that concepts are objective entities which cannot be created, but only discovered. Leaving aside this aspect of "existential theory", then the major difference to the work by Kauppi and successors is to take into account the
424
T. Feyer et al. / Intensionality in Concept Theory
dependence of extensions from "worlds" and "time points". Prom a logical point of view this suggests to consider a possible worlds semantics.
3 Concepts as Theories Following [12, 20] the intension of a concept is the information content required to recognize a thing belonging to the extension of the concept in question. This is far from being a clear mathematical definition; maybe it is not intended to be one. It is not at all clear how to understand the term "information content". This remains undefined. However, the underlying assumption is that concepts are used to characterize things. Otherwise said, there is a fundamental relation denoted as "falls under": a thing falls under a concept. The extension of a concept C is then the set of all things falling under C. Characterizing things can be done by using logic (of any kind). A set of formulae in a logic is called a theory. Thus, the intension of a concept can be defined as a logical theory. As a consequence, the falls-under-relation would become the satisfaction relation. Forgetting about the things, the extension would become a model of the theory. In order not to run into anomalies we should not presume that models are aways sets. Also, with respect to flexibility in Concept Theory we can dispense with fixing the logic a priori. For a given logical signature E a concept is a triple (C, tnf(C),eart(C)), where C is just a name for the concept, int(C) is a theory over E and ext(C) is a model for int(C). This leaves enough'room for classical and possible world semantics. In particular, theories may be equivalent, i.e., have the same models, but nevertheless be distinct, i.e., different concepts.
4 Conclusion We argued that the intension of a concept has to be defined as a logical theory and the extension as a model of this theory. Then we have a choice for the logic which will result in various different Concept Theories. The underlying logic can be classical first-order logic or a fragment of it. Among others this is the case for database theory. However, as shown in [21] it makes sense to have also a look at non-classical logics, in this case higher-order intuitionistic logic. According to the goal of capturing intensionality rather than extensionality it seems to be worthwhile to investigate Montague's intensional logic [6, 3, 23]. This logic has been used in Artificial Intelligence to study natural language approaches [22]. It has also been used for database predesign [5].
References 1. Alfs T. Berztiss. Concepts, objects and domains. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases, volume X, pages 80-89. IOS Press, 1999. 2. Aart Bijl. What's in a concept? In H. Jaakkola, H. Kangassalo, T. Kitahashi, and A. Markus, editors, Information Modelling and Knowledge Bases, volume V, pages 34–49. IOS Press, 1994.
T. Feyer et al. /Intensionality in Concept Theory
3. Sasa Buvac, Vanja Buvac, and Ian A. Mason. Metamathematics of contexts. In Fundamenta Informaticae, 23(2/3/4). IOS Press, 1995. 4. Marie Duzi. A contribution to the discussion on concept theory. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases, volume XII, pages 346-350. IOS Press, 2001. 5. Thomas Feyer, Marcela Varas, Marta Fernandez, and Bernhard Thalheim. Intensional logic for integrity constraint specification in predesign database modeling. In H. Kangassalo, E. Kawaguchi, H. Jaakkola, and T. Welzer, editors, Information Modelling and Knowledge Bases, volume XIII. IOS Press, 2002. 6. Dov Gabbay and Franz Guenther. Handbook of Philosophical Logic, Volume II: Extensions of Classical Logic. D. Reidel Publishing Co., Dordrecht, 1984. 7. B. Ganter and R. Wille. Formal Concept Analysis: Mathematical Foundations. SpringerVerlag, Berlin, 1999. 8. Roland Hausser. Modeling everyday thought in database semantics. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases, volume XII, pages 351-352. IOS Press, 2001. 9. Marko Junkari and Marko Niinimaki. An algebraic approach to Kauppi's concept theory. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases, volume X, pages 90–102. IOS Press, 1999. 10. Marko Junkari and Marko Niinimaki. An algebraic approach to Kauppi's concept theory II: Functional representation of concept operations and concept associations. In E. Kawaguchi, H. Kangassalo, H. Jaakkola, and Issam A. Hamid, editors, Information Modelling and Knowledge Bases, volume XI, pages 175-187. IOS Press, 2000. 11. Hannu Kangassalo. On the concept of concept for conceptual modelling and concept detection. In S. Ohsuga, H. Kangassalo, H. Jaakkola, K. Hori, and N. Yonezaki, editors, Information Modelling and Knowledge Bases, volume III, pages 17-58. IOS Press, 1992. 12. Hannu Kangassalo. COMIC: A system and methodology for conceptual modelling and information construction. Data & Knowledge Engineering, 9:287-319, 1993. 13. Railli Kauppi. Einfuhrung in die Theorie der Begriffssysteme. Acta Universitas Tamperensis. Ser. A, 15, 1967. 14. P. Materna. Meanings are concepts. From the Logical Point of View, 2:76-89, 1992. 15. Tapio Niemi. New approaches to intensional concept theory. In E. Kawaguchi, H. Kan gassalo, H. Jaakkola, and Issam A. Hamid, editors, Information Modelling and Knowledge Bases, volume XI, pages 188-204. IOS Press, 2000. 16. J0rgen Fischer Nilsson. A concept object algebra CA +x In H. Kangassalo, H. Jaakkola, K. Hori, and T. Kitahashi, editors, Information Modelling and Knowledge Bases, volume IV, pages 42–55. IOS Press, 1993. 17. Jari Palomaki. From the theory of concepts to concept theory. In S. Ohsuga, H. Kangassalo, H. Jaakkola, K. Hori, and N. Yonezaki, editors, Information Modelling and Knowledge Bases, volume III, pages 107-122. IOS Press, 1992. 18. Jari Palomaki. Towards the foundations of concept theory. In H. Jaakkola, H. Kangassalo, T. Kitahashi, and A. Markus, editors, Information Modelling and Knowledge Bases, volume V, pages 139-154. IOS Press, 1994. 19. Jari Palomaki. Three kinds of containment relations of concepts. In H. Kangassalo, J.F. Nilsson, H. Jaakkola, and S. Ohsuga, editors, Information Modelling and Knowledge Bases, volume VIII, pages 261–277. IOS Press, 1997. 20. Jari Palomaki. Concept theory vs. set theory. In H. Jaakkola, H. Kangassalo, and E. Kawaguchi, editors, Information Modelling and Knowledge Bases, volume XII. IOS Press, 2001.
425
426
T. Feyer et al. / Intensionality in Concept Theory
21. Klaus-Dieter Schewe. The type concept in OODB modelling and its logical implications. In E. Kawaguchi, H. Kangassalo, H. Jaakkola, and Issam A. Hamid, editors, Information Modelling and Knowledge Bases, volume XI, pages 256–274. IO8 Press, 2000. 22. A. Thayse. Prom Natural Language Processing to Logic for Expert Systems: A Logic Based Approach to Artificial Intelligence. John Wiley & Sons, New York, 1991. 23. Richmond H. Thomason. Formal Philosophy: Selected Papers of Richard Montague. New Haven, Yale University Press, 1974. 24. Hiroyuki Yamauchi and Setsuo Ohsuga. Modelling objects by extensions and intensions - a theoretical background of KAUS. In S. Ohsuga, H. Kangassalo, H. Jaakkola, K. Hori, and N. Yonezaki, editors, Information Modelling and Knowledge Bases, volume III, pages 160–173. IOS Press, 1992.
427
Author Index Akaishi, M. Ampornaramveth, V. Berztiss, A.T. Brumen, B. Constantopoulos, P. Dagorret, P. Dietz, J. Etcheverry, P. Fernandez, M. Feyer, T. Fischer Nilsson, J. Fujima, J. Golob, I. Hahn, U. Hausser, R. Henno, J. Hommes, B.-J. Izumi, N. Jaakkola, H. Jamroendararasame, K. Kangassalo, M. Kazmierczak, E. Keen, C. Kerminen, P. Kirikova, M. Kitagawa, T. Kiyoki, Y. Kokol, P. Locuratolo, E. Lopisteguy, P. Masuda, G. Matsuzaki, T.
254 398 184 301 307 123 344 123 138 138,422 296,412 1 301 375 215 63 344 83 131,301 361 135 19 19 131 266 325 325 390 279 123 390 361
Milton, S. Niemi, T. Niinimaki, M. Nummenmaa, J. Ohshima, H. Ohsuga, S. Palomaki, J. Podgorelec, V. Rozman, I. Sakai, D. Sakamoto, N. Savolainen, P. Schewe, K.-D. Schulz, S. Soderstrom, E. Spyratos, N. Suzuki, T. Takaki, S. Tanaka, Y. Thalheim, B. Thanisch, P. Tokuda, T. Tzitzikas, Y. Ueno, H. Varas, M. Welzer, T. Yamamoto, R. Yonezaki, N. Yoshida, N. Yoshiura, N. Zarri, G.P.
19 152 37 152 201 51,201 419 390 301 325 390 236 422 375 115 254,307 361 51 1,254 138,422 152 361 307 398 138 301 390 83,100 325 100 164