This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Andreas Tolk and Lakhmi C. Jain (Eds.) Complex Systems in Knowledge-based Environments: Theory, Models and Applications
Studies in Computational Intelligence, Volume 168 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 147. Oliver Kramer Self-Adaptive Heuristics for Evolutionary Computation, 2008 ISBN 978-3-540-69280-5 Vol. 148. Philipp Limbourg Dependability Modelling under Uncertainty, 2008 ISBN 978-3-540-69286-7 Vol. 149. Roger Lee (Ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 2008 ISBN 978-3-540-70559-8 Vol. 150. Roger Lee (Ed.) Software Engineering Research, Management and Applications, 2008 ISBN 978-3-540-70774-5 Vol. 151. Tomasz G. Smolinski, Mariofanna G. Milanova and Aboul-Ella Hassanien (Eds.) Computational Intelligence in Biomedicine and Bioinformatics, 2008 ISBN 978-3-540-70776-9 Vol. 152. Jaroslaw Stepaniuk Rough – Granular Computing in Knowledge Discovery and Data Mining, 2008 ISBN 978-3-540-70800-1 Vol. 153. Carlos Cotta and Jano van Hemert (Eds.) Recent Advances in Evolutionary Computation for Combinatorial Optimization, 2008 ISBN 978-3-540-70806-3 Vol. 154. Oscar Castillo, Patricia Melin, Janusz Kacprzyk and Witold Pedrycz (Eds.) Soft Computing for Hybrid Intelligent Systems, 2008 ISBN 978-3-540-70811-7 Vol. 155. Hamid R. Tizhoosh and M. Ventresca (Eds.) Oppositional Concepts in Computational Intelligence, 2008 ISBN 978-3-540-70826-1 Vol. 156. Dawn E. Holmes and Lakhmi C. Jain (Eds.) Innovations in Bayesian Networks, 2008 ISBN 978-3-540-85065-6 Vol. 157. Ying-ping Chen and Meng-Hiot Lim (Eds.) Linkage in Evolutionary Computation, 2008 ISBN 978-3-540-85067-0
Vol. 158. Marina Gavrilova (Ed.) Generalized Voronoi Diagram: A Geometry-Based Approach to Computational Intelligence, 2009 ISBN 978-3-540-85125-7 Vol. 159. Dimitri Plemenos and Georgios Miaoulis (Eds.) Artificial Intelligence Techniques for Computer Graphics, 2009 ISBN 978-3-540-85127-1 Vol. 160. P. Rajasekaran and Vasantha Kalyani David Pattern Recognition using Neural and Functional Networks, 2009 ISBN 978-3-540-85129-5 Vol. 161. Francisco Baptista Pereira and Jorge Tavares (Eds.) Bio-inspired Algorithms for the Vehicle Routing Problem, 2009 ISBN 978-3-540-85151-6 Vol. 162. Costin Badica, Giuseppe Mangioni, Vincenza Carchiolo and Dumitru Dan Burdescu (Eds.) Intelligent Distributed Computing, Systems and Applications, 2008 ISBN 978-3-540-85256-8 Vol. 163. Pawel Delimata, Mikhail Ju. Moshkov, Andrzej Skowron and Zbigniew Suraj Inhibitory Rules in Data Analysis, 2009 ISBN 978-3-540-85637-5 Vol. 164. Nadia Nedjah, Luiza de Macedo Mourelle, Janusz Kacprzyk, Felipe M.G. Fran¸ca and Alberto Ferreira de Souza (Eds.) Intelligent Text Categorization and Clustering, 2009 ISBN 978-3-540-85643-6 Vol. 165. Djamel A. Zighed, Shusaku Tsumoto, Zbigniew W. Ras and Hakim Hacid (Eds.) Mining Complex Data, 2009 ISBN 978-3-540-88066-0 Vol. 166. Constantinos Koutsojannis and Spiros Sirmakessis (Eds.) Tools and Applications with Artificial Intelligence, 2009 ISBN 978-3-540-88068-4 Vol. 167. Ngoc Thanh Nguyen and Lakhmi C. Jain (Eds.) Intelligent Agents in the Evolution of Web and Applications, 2009 ISBN 978-3-540-88070-7 Vol. 168. Andreas Tolk and Lakhmi C. Jain (Eds.) Complex Systems in Knowledge-based Environments: Theory, Models and Applications, 2009 ISBN 978-3-540-88074-5
Andreas Tolk Lakhmi C. Jain (Eds.)
Complex Systems in Knowledge-based Environments: Theory, Models and Applications
123
Professor Dr. Andreas Tolk Engineering Management & Systems Engineering 242B Kaufman Hall Old Dominion University Norfolk, VA 23529 USA Email: [email protected]
Professor Dr. Lakhmi C. Jain School of Electrical and Information Engineering University of South Australia Mawson Lakes Campus Adelaide, South Australia SA 5095 Australia Email: [email protected]
ISBN 978-3-540-88074-5
e-ISBN 978-3-540-88075-2
DOI 10.1007/978-3-540-88075-2 Studies in Computational Intelligence
ISSN 1860949X
Library of Congress Control Number: 2008935501 c 2009 Springer-Verlag Berlin Heidelberg This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed in acid-free paper 987654321 springer.com
This Book Is Dedicated to Our Students
Preface
The tremendous growth in the availability of inexpensive computing power and easy availability of computers have generated tremendous interest in the design and implementation of Complex Systems. Computer-based solutions offer great support in the design of Complex Systems. Furthermore, Complex Systems are becoming increasingly complex themselves. This research book comprises a selection of state-of-the-art contributions to topics dealing with Complex Systems in a Knowledge-based Environment. Complex systems are ubiquitous. Examples comprise, but are not limited to System of Systems, Service-oriented Approaches, Agent-based Systems, and Complex Distributed Virtual Systems. These are application domains that require knowledge of engineering and management methods and are beyond the scope of traditional systems. The chapters in this book deal with a selection of topics which range from uncertainty representation, management and the use of ontological means which support and are large-scale business integration. All contributions were invited and are based on the recognition of the expertise of the contributing authors in the field. By collecting these sources together in one volume, the intention was to present a variety of tools to the reader to assist in both study and work. The second intention was to show how the different facets presented in the chapters are complementary and contribute towards this emerging discipline designed to aid in the analysis of complex systems. The common denominator of all of the chapters is the use of knowledge-based methods, and in particular ontological means. The chapters are categorized into two parts which are the Theoretical Contributions and the Practical Applications. We believe that this volume will help researchers, students, and practitioners in dealing with the challenges encountered in the integration, operation, and evaluation of Complex Systems. We are grateful to the contributors and the reviewers for their time, efforts and vision. We would like to express our sincere thanks to the editorial staff of SpringerVerlag publisher and Scientific Publishing Services Private Limited for their excellent support. Editors
Andreas Tolk Lakhmi C. Jain
Contents
1 An Introduction to Complex Systems in the Knowledge-Based Environment Andreas Tolk, Lakhmi C. Jain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2 Uncertainty Representation and Reasoning in Complex Systems Kathryn Blackmond Laskey, Paulo Cesar G. Costa . . . . . . . . . . . . . . . . . . . .
7
3 A Layered Approach to Composition and Interoperation in Complex Systems Andreas Tolk, Saikou Y. Diallo, Robert D. King, Charles D. Turnitsa . . . .
Abstract. This chapter gives a brief introduction to the complex systems in an intelligent environment. It presents the chapters included in the book and places them into a common context. It contains a number of resources for interested readers to pursue their particular interests.
can be represented by allowing intelligent software solutions to draw conclusions and support decisions. Theoretical contributions and practical applications are also documented. The idea behind this book was to assemble these different experiences and gather them into a common context of knowledge-based support enabling to deal better with complex systems. The target audiences are master-level students of science and engineering and practitioners in the field. All are facing the challenge that the necessary solutions steadily increase in complexity. This is true for both real and virtual systems. Efficient methods and tools are needed to manage this complexity. In particular when new functionality needs are to be added, – and this functionality is often included in legacy systems which are by themselves already complex –, effective and efficient support for this task needs to be provided. A new way to document complex systems focusing on the two characteristic aspects - the multiple interfaces of complex systems, most of which exposing non-linear interconnections - is a first step. Intelligent agents shall be able to read these documentations in order to support the engineer. Intelligent decision and decision support technologies are applied in these domains and were successfully used in various applications. The requirement of addressing challenges of transferring explicit knowledge, information, and data in large heterogeneous organizations is known. This can include networked solutions, the federation of systems, or engineering systems mainly comprising existing systems. It may even support the merging of big businesses and their business processes. There may be a need to integrate the formerly separated solutions into a common infrastructure. Knowledge management applied in this context must be mature and lead with certainty to feasible solutions. Using these ideas means that current legacy systems must evolve from their existing state of organization specific variants to a heterogeneous, but aligned, single common architecture. These federated systems must provide consistent support of the common business processes and at the same time provide service-unique applications and services.
1.2 Chapters Included in the Book To adequately cover both aspects, the book is divided into two parts. The first four chapters are related to the theory and foundations of complex systems. The last four chapters focus on practical applications. It should be noted that all chapters contribute to the theory and are both applicable to practice. Laskey and Costa describe how to deal with uncertainty in complex systems and how to allow not only the representation but also present the reasoning in uncertain environments. The ability to deal with uncertain and incomplete data is a vital requirement for all real-world applications. Not including uncertainty will lead to wrong decisions and to failure of the solution. Bayesian probability theory and traditional knowledge representation languages provide a powerful logical basis for representing uncertainty, but they alone are not sufficient. On this account, the authors introduce Multi-Entity Bayesian Networks (MEBN) and the PR-OWL language. This enables expressing probabilistic ontologies to be expressed as an efficient support for complex system engineering. A layered approach to the composition and interoperation in complex systems is introduced by Tolk and others in the third chapter. The Levels of Conceptual Interoperability Model (LCIM) is used to identify means of knowledge representation
An Introduction to Complex Systems in the Knowledge-Based Environment
3
support. The first aspect is the composition of many systems into one system of systems. The second aspect is to understand the interoperation of the many systems that are part of a system of systems. The engineer must understand data, processes and the supported operational context provided by the business use of such cases. The chapter introduces data engineering, process engineering, and constraint engineering. These are used as methods to deal with challenges of composition and interoperation by use of the ontological spectrum. The work of Cruz and Xiao is a complementary effort and uses ontology-driven data integration in heterogeneous networks for complex system integration. They focus on building networks of data sources that are both syntactically and schematically different. To this end, they first show how to define a common vocabulary that constructs the basis of semantic integration. The common vocabulary results in an application global ontology on which the various data schemes can be mapped. The chapter introduces the necessary metadata, schema conceptualization, and it shows how to mediate solutions. Finally, Chen et al. address the challenges which must be solved to manage complexity and emerging behavior in the resulting engineering systems. Emerging properties are exposed by complex systems on higher levels than the originally engineered level. Such higher level emerging properties are typically more difficult to predict and are harder to manage. This is because the underlying functions are both more complex and are also non-linear. Even small changes on the engineering level can result in tremendous differences in the system’s behavior on the users’ level. The chapter gives an overview of the currently available key concepts for coping with the complexity and the emergence of other complications and shows how they can be applied to support the more complex system engineering using knowledge-based means. In the best practice section of this volume, Thoern and Sandkuhl contribute the first chapter introducing the principles of feature modeling. Their goal is to provide the support necessary to manage the variability in complex systems. The underlying problem is that providing users with the features offered by these complex systems often causes serious management issues for the developers. This chapter describes feature modeling as an important contribution to help solve this problem. This is done by capturing and visualizing common features and the interdependence between these features and system components that take part in the implementation. The fundamentals of feature modeling are introduced. It gives also examples from the real-world applications in the automotive industry. The use of knowledge by robots is a new discipline. It is no longer only applied in laboratories but also in real systems. An important example is the use of robots for emergency operations. Elci and Rahnama document recent developments in the field of semantic robotics and use cooperative labyrinth discovery as an example. Each agent is an autonomous complex system and acts on its sensory input. Information is retrieved from other agents. This requires agent ontologies and domain ontologies. Communication is based on semantic web services. The resulting solution is in the conceptual phase, but this chapter gives examples of the use of these robots, which includes traffic support and homeland security. An application-driven solution using the idea of a virtual environment to support and address the problems observed in real-world systems is then introduced. This is in the chapter by Maxwell and Carley. Many real-world systems can only be observed
4
A. Tolk and L.C. Jain
by engineers. Manipulation of the system is too dangerous or otherwise infeasible. Socio-cultural systems cannot be subject to manipulation. The authors use multi-agent simulations to represent heterogeneous populations with the objective to develop policy in an environment suitable for experimentation. Multi-agent simulations are stochastic, and they illuminate significant uncertainties that exist in the environment. The chapter highlights the primary development concerns. It is suitable for use in a wide range of application domains. The last chapter of this book is based on some large-scale applications in support of business information systems. West shows in this contribution that ontology is no longer an academic toy. Serious applications are not only possible, but are already in use. The chapter summarizes best practices available to support the development of business information systems. These focus on the conceptual data modeling phase by introducing an ontological framework. This framework takes spatio-temporal aspects into account, which means that four dimensions need to be captured. The reason why this is important and how it can be applied is demonstrated by real-world examples. The chapter also introduces the underlying theory which is based on data models, set theory, and the use of properties. Every chapter uses references. In addition, all authors submitted contributions to a bibliography – comprising breakthrough and milestone papers as well as good survey review articles on the selected topic – and contributions to a resource list – enumerating resources relevant to the topic, such as websites, software, organizations, etc. These contributions have been compiled into the Bibliography and Resources for Additional Studies. They will support the interested reader in the initiation of further studies on complex systems in knowledge-based environments.
1.3 Overview on Journals, Conferences, and Workshops Although there is no conference or journal on complex systems in knowledge-based environments – at least not yet –, several conferences and journals of interest exist that the interested reader may use keep track of current developments. Table 1.1 enumerates journals and conferences and workshops that may be of interest to a reader of this book. This list can be neither complete nor exclusive. It is a hub enabling further study and contributions of new solutions. Several conference proceedings have significantly contributed to books to establish an initial body of knowledge for system engineering and system of systems engineering utilizing the means of knowledge management and knowledge representation. The reader is referred to the bibliography at the end of this book for examples. As pointed out previously, these contributions are just examples and by no means complete or exclusive. In summary, the eight chapters of this book give an overview of the theory and the best practices applicable to a broad range of challenges encountered when developing or managing complex systems and systems of systems. These systems can be either real or virtual. The use of knowledge representation means, in particular ontology applications, should be paired with engineering methods. They may be required to deal with
An Introduction to Complex Systems in the Knowledge-Based Environment
5
Table 1.1. Selected Journals and Conferences and Workshops and Book Series
Journals
Conferences & Workshops
IEEE Intelligent Systems, IEEE Press, USA
AAAI Conference on Artificial Intelligence
IEEE Transactions on Systems, Man and Cybernetics, Part A, B, C, IEEE Press USA Intelligent Decision Technologies: An International Journal, IOS Press, The Netherlands International Journal of Hybrid Intelligent Systems, IOS Press, The Netherlands
KES International Conference Series
International Journal of Knowledge-Based Intelligent Engineering Systems, IOS Press, The Netherlands Journal of Systems Engineering, Wiley Inter Science
International Conference on Complex Systems (ICCS)
Book Series Advanced Intelligence and Knowledge Processing, Springer-Verlag, Germany Computational Intelligence and its Applications Series, Idea Group Publishing, USA International Series on Natural and Artificial Intelligence, AKI
International Conference on Knowledge Systems Science and Engineering (KSSE) Australian World Wide Web Conferences
Knowledge-Based Intelligent Engineering Systems Series, IOS Press, The Netherlands Advanced Information Processing, SpringerVerlag, Germany
European Conferences on Artificial Intelligence (ECAI)
The CRC Press International Series on Computational Intelligence, The CRC Press, USA
data, processes, requirements, uncertainties, and the many other concerns inherent in real-world applications. It connects all chapters providing support for the student and practitioner in the field. It provides a variety of ideas and recommends best practices. The bibliography and resource list help the interested reader to focus on particular topics.
2 Uncertainty Representation and Reasoning in Complex Systems Kathryn Blackmond Laskey1 and Paulo Cesar G. Costa2 1
Department of Systems Engineering and Operations Research MS 4A5 George Mason University Fairfax, VA 22030-4444, USA 1-703-993-1644 [email protected] 2 Center of Excellence in C4I MSN 4B5 George Mason University Fairfax, VA 22030-4444, USA 1-703-879-6687 [email protected]
Bayesian networks have been applied to a wide variety of problems including medical diagnosis, classification systems, multi-sensor fusion, and legal analysis for trials. However, Bayesian networks are insufficiently expressive to cope with many realworld reasoning challenges. For example, a standard Bayesian network can represent the relationship between the type of an object, the object’s features, and sensor reports that provide information about the features, but cannot cope with reports from a large number of sensors reporting on an unknown number of objects, with uncertain associations of reports to objects. Traditional knowledge representation languages based on classical logic are well suited to reasoning about multiple interrelated entities of different types, but until recently have lacked support for reasoning under uncertainty. To address this issue, we introduce Multi-Entity Bayesian Networks (MEBN), which combine the simplicity and inferential power of BNs with the expressive power of First-Order Logic. The last section closes the loop on using Bayesian techniques to complex systems by presenting the concept of a probabilistic ontology and the PROWL language for expressing probabilistic ontologies. Because it is an OWL upper ontology for probabilistic knowledge, PR-OWL ontologies can draw upon other OWL ontologies, and can be processed by OWL-compliant ontology editors and reasoners. Because PR-OWL is based on MEBN, it has all the advantages of a highly expressive Bayesian language, including logical coherency, a built-in learning theory, and efficient inference. The chapter concludes with a discussion of the role of probabilistic ontologies in design of complex systems.
2.1 Bayesian Networks A Bayesian Network is a compact and computationally efficient representation for a joint probability distribution over a potentially large number of interrelated hypotheses. Probabilistic knowledge is represented in the form of a directed graph and a set of local probability distributions. Each node of the graph represents a random variable (RV), or mutually exclusive and collectively exhaustive set of hypotheses. The edges of the graph represent direct dependence relationships. With each node is associated a local distribution, which specifies probabilities for its possible values as a function of the values of its parents. There is a large and growing literature on Bayesian network theory and algorithms (e.g., Charniak 1991; Jensen 2006; Neapolitan 2003; Pearl 1988). Bayesian networks have been applied to represent uncertain knowledge in diverse fields such as medical diagnosis (Spiegelhalter, et al., 1989), image recognition (Booker and Hota 1988), search algorithms (Hansson, and Mayer, 1989), and many others. Heckerman, et al. (1995) provide a comprehensive survey of applications of Bayesian Networks ca. 1995. As a running illustration, we will use the case study presented in Costa (2005), which was based on the popular Paramount series Star Trek™. Our examples have been constructed to be accessible to anyone having some familiarity with space-based science fiction. This example has structural features that are similar to the more down-to-earth problems today’s complex systems are designed to address. A Simple BN Model. Figure 2.1 illustrates the operation of a 24th Century decision support system tasked with helping Captain Picard to assimilate reports, assess their significance, and choose an optimal response. Of course, present-day systems are
2
Uncertainty Representation and Reasoning in Complex Systems
9
Fig. 2.1. Decision Support Systems in the 24th Century
much less sophisticated than the system of Figure 2.1. We therefore begin our exposition by describing a highly simplified problem of detecting enemy starships. In this simplified problem, the main task of a decision system is to detect Romulan starships (here considered as hostile by the United Federation of Planets) and assess the level of danger they bring to our own starship, the Enterprise. Starships other than Romulans are considered either friendly or neutral. Starship detection is performed by the Enterprise’s suite of sensors, which can correctly detect and discriminate starships with an accuracy of 95%. However, Romulan starships may be in “cloak mode,” which would make them invisible to the Enterprise’s sensors. Even for the most current sensor technology, the only hint of a nearby starship in cloak mode is a slight magnetic disturbance caused by the enormous amount of energy required for cloaking. The Enterprise has a magnetic disturbance sensor, but it is very hard to distinguish background magnetic disturbance from that generated by a nearby starship in cloak mode. This simplified situation is modeled by the BN in Figure 2.2, which also considers the characteristics of the zone of space where the action takes place. Each node in our BN has a finite number of mutually exclusive, collectively exhaustive states. The node Zone Nature (Z) is a root node, and its prior probability distribution can be read directly from Figure 2.2 (e.g., 80% for deep space). The probability distribution for Magnetic Disturbance Report (M) depends on the values of its parents Z and Cloak Mode (C). The strength of this influence is quantified via the conditional probability table (CPT) for node M, shown in Table 2.1. Similarly, Operator Species (O) depends on Z, and the two report nodes depend on C and the hypothesis on which they are reporting.
10
K.B. Laskey and P.C.G. Costa
Fig. 2.2. The Basic Starship Bayesian Network Table 2.1. Conditional Probability table for node Magnetic Disturbance Report
Zone Nature Deep Space Planetary Systems Black Hole Boundary
Cloak Mode True False True False True False
Magnetic Disturb. Rep. Low 80.0 85.0 20.0 25.0 5.0 6.9
Medium 13.0 10.0 32.0 30.0 10.0 10.6
High 7.0 5.0 48.0 45.0 85.0 82.5
A Bayesian Network provides both an elegant mathematical structure for representing relationships among random variables and a simple visualization of these relationships. For example, from the graph of Figure 2.2, we can see that M depends directly on Z and C, but only indirectly on O. The influence of O operates through C. From this Bayesian network, we can write the following expression for the joint distribution of the five random variables: P(Z,O,C,S,M) = P(Z)P(O|Z)P(C|O)P(S|O,C)P(M|Z,C)
(2.1)
This expression can be used to find the joint probability of a configuration of states of the random variables. In addition to the power of communication, BNs also exploit independence assumptions to simplify specification and inference. To illustrate, consider the BN of Figure 2.2. There are 3×3×2×3×3 = 162 possible configurations of the five existing random variables (the product of the number of states of each of the random variables). To specify a joint distribution for the five random variables, one must specify a probability for each of these configurations, or 162 probabilities in total. One of these can be obtained from the other 161 by applying the constraint that the probabilities must sum to 1. Thus, specifying a general probability distribution requires 161
2
Uncertainty Representation and Reasoning in Complex Systems
11
independent parameters. In contrast, we can specify the Bayesian network of Figure 2.2 by specifying the local distributions for each of the nodes and combining them according to Equation (2.1). For a root node, we need to specify a single probability distribution; for non-root nodes, we specify a distribution for each configuration of the node’s parents. Each distribution requires one fewer probability than the number of states; the last probability can be obtained from the constraint that probabilities sum to 100%. Thus, the number of independent parameters is 2 for Z; 3×2 for O; 3×1 for C; 6×2 for S and 6×2 for M. Therefore, the number of independent parameters required to specify the BN of Figure 2.2 is the sum of these, or 35. Even for this small BN, this is a considerable reduction in specification burden. For larger networks, the effect is much more dramatic. In general probability models, specification scales exponentially in the number of random variables. In Bayesian networks with a bounded number of states and parents per random variable, specification scales linearly in the number of nodes. For example, consider the BN of Figure 2.5, which models a situation in which the Enterprise encounters four other starships. A general probability distribution for these random variables would require 944,783 parameters, whereas a BN of the structure of Figure 2.5 would require only 182 parameters. For a situation with 10 starships, a general probability distribution requires 3.2x1013 parameters, whereas extending the model of Figure 2.5 to 2.10 starships would require 6,356 parameters. This example demonstrates the power of Bayesian networks to enable parsimonious specification of joint probability distributions over large numbers of interrelated hypotheses. Belief Propagation in BNs. Because a Bayesian network represents a full joint distribution over the random variables it represents, it can be used to reason about how evidence about some random variables affects the probabilities of unobserved random variables. Mathematically, the impact of evidence is assessed by applying Bayes Rule. Suppose T denotes a random variable that is the target of a query – that is, we wish to assess its probability distribution. Let P(T) denote the probability distribution for T. This distribution assigns probability P(T=t) to each of the possible values t for T. Now, suppose we learn that another random variable E has value e. After receiving this evidence, Bayes rule tells us how to obtain a new probability distribution for the target random variable:
P(T = t | E = e) =
P(E = e | T = t)P(T = t) = P(E = e)
P(E = e | T = t)P(T = t) ∑ P(E = e | T = t ')P(T = t ') t'
(2.2) The left-hand side of this equation is the conditional probability that T has value t, given that E has value e. It is called the posterior probability, because it reflects our knowledge about T after receiving the evidence about E. Equation (2.2) shows how to compute the posterior probability distribution for T as a function of the prior probability distribution and the likelihood of the evidence conditional on T. We saw above that specifying a joint probability distribution over many variables is unmanageable in general, but for many interesting classes of problem, specifying a Bayesian network is quite manageable. Similarly, the general task of Bayesian updating is intractable, but efficient inference algorithms make the task tractable for a wide
12
K.B. Laskey and P.C.G. Costa
variety of interesting and important applications. Inference algorithms for Bayesian networks exploit the independence relationships encoded in the graph for efficient computation. Some algorithms work by local message passing; others work by efficiently integrating or summing over variables other than the evidence and target variables. Despite the efficiencies achieved by the Bayesian network representation, exact computation becomes intractable for larger and more complex Bayesian networks. A variety of approximation algorithms have been developed. Common approaches include local message passing, efficient integration and/or summation, and stochastic simulation. Many inference methods require specifying in advance which variables will be observed and which will be the target of a query. In contrast, a Bayesian network represents knowledge in the form of a joint distribution, which implies that a mathematically well-defined conditional distribution exists for each random variable given evidence on any of the other random variables. Figure 2.3 shows an example in which evidence about causes is used to predict effects of those causes. In this example, evidence is received about the operator species (Romulan) and the zone nature (deep space). Given this evidence, Bayes rule has been applied to update beliefs for whether the ship is in cloak mode and the contents of the sensor and magnetic disturbance reports. Comparing these results with Figure 2.2, it can be seen that the probability that there is a starship nearby in cloak mode has increased 750% (from 12.2% to 90%), while the chances of a sensor to perceive a Romulan starship (which would be in cloak mode) remained practically unchanged. In this case, the relationships between causes and effects captured by this simple BN indicate that the proximity of a Romulan starship would not induce a major change in the magnetic disturbance report in a deep space zone.
Fig. 2.3. Predicting Observations from State of the World
Figure 2.4 illustrates that Bayesian networks can also be used to reason from evidence about effects to the likely causes of those effects. In this example, the same Bayesian network is used to infer the starship type and zone nature, given evidence about the sensor report (Romulan) and the magnetic disturbance report (high). Notice that although evidence of a high magnetic disturbance report would have increased
2
Uncertainty Representation and Reasoning in Complex Systems
13
the chances of a Cloak Mode, its impact was easily overcome by that of a sensor report indicating the presence of a Romulan starship, which strongly corroborates the hypothesis that this starship is not in cloak mode. The combination of this conflicting evidence resulted in a decrease in the probability of a starship in cloak mode to less than half of its previous figure (i.e., from 12.2% to 5.1%), while also increasing the probability of a nearby Romulan starship from 12.8% to 65% (a 5 fold increase). This ability to capture the subtleties of even the most complex relationships is a strong point of Bayesian networks.
Fig. 2.4. Inferring State of the World from Observations
Specifying the Structure and Probabilities in BNs. As noted above, specifying a Bayesian network to represent a problem requires defining a set of random variables, a graph to represent conditional dependence relationships, and a local distribution for each random variable that defines its probability distribution as a function of the states of its parents. The task of specification can be performed by eliciting knowledge from domain experts (e.g., Mahoney and Laskey, 2000; Korb and Nicholson, 2003), learning from observations (e.g., Korb and Nicholson, 2003; Neapolitan, 2003), or some combination of expert elicitation and data analysis The problem of specifying a Bayesian network is usually broken into two components – specifying the structure and specifying the parameters. The structure of a Bayesian network consists of the random variables, their possible values, the dependence relationships, and the functional form of the local distributions. The parameters are numerical variables that determine a specific local distribution from a family of local distributions. It has been argued that specifying structure is more natural for experts than specifying parameters (e.g., Pearl, 1988). A robust literature exists on elicitation of numerical probabilities from experts (c.f., Druzdzel and van der Gaag, 2000). When data are plentiful and expertise is scarce, then learning from data is the preferred option. Standard textbook methods and widely available software exist for parameter and structure learning for unconstrained discrete local distributions (c.f., Neapolitan, 2003). However, these methods can provide very imprecise parameter estimates,
14
K.B. Laskey and P.C.G. Costa
because when a node has many parent configurations (due to a large number of parents and/or large parent state spaces), there many be very few observations for some parent configurations. This may also result in failure of the learning algorithm to infer dependence relationships that could be found by more powerful methods. For this reason, if local distributions can be defined by specifying a functional form and a few parameters, the resulting estimates may be much more precise. However, this requires much more sophisticated parameter estimation methods, typically implemented as custom software. Limitations of BNs. Although a powerful representation formalism, BNs are not expressive enough for many real-world applications. More specifically, Bayesian Networks assume a simple attribute-value representation – that is, each problem instance involves reasoning about the same fixed number of attributes, with only the evidence values changing from problem instance to problem instance. This type of representation is inadequate for many problems of practical importance. Many domains require reasoning about varying numbers of related entities of different types, where the numbers, types and relationships among entities cannot be specified in advance and may themselves be uncertain. Stretching the expressiveness of BNs. The model depicted above is of limited use in a “real life” starship environment. After all, hostile starships cannot be expected to approach Enterprise one at a time so as to render this simple BN model usable. If four starships were closing in on the Enterprise, we would need to replace the BN of Figure 2.2 with the one shown in Figure 2.5. But even if we had a BN for each possible number of nearby starships, we still would not know which BN to use at any given time, because we cannot know in advance how many starships the Enterprise is going to encounter. In short, BNs lack the expressive power to represent entity types (e.g., starships) that can be instantiated as many times as required for the situation at hand. In spite of its naiveté, let us briefly hold on to the premise that only one starship can be approaching the Enterprise at a time, so that the model of Figure 2.2 is valid.
Fig. 2.5. The BN for Four Starships
2
Uncertainty Representation and Reasoning in Complex Systems
15
Furthermore, suppose we are traveling in deep space, our sensor report says there is no trace of a nearby starship (i.e., the state of node Sensor Report is Nothing), and we receive a report of a strong magnetic disturbance (i.e., the state of node Magnetic Disturbance Report is High). Table 2.1 shows that the likelihood ratio for a high MDR is 7/5 = 1.4 in favor of a starship in cloak mode. Although this favors a cloaked starship in the vicinity, the evidence is not overwhelming. Repetition is a powerful way to boost the discriminatory power of weak signals. As an example from airport terminal radars, a single pulse reflected from an aircraft usually arrives back to the radar receiver very weakened, making it hard to set apart from background noise. However, a steady sequence of reflected radar pulses is easily distinguishable from background noise. Following the same logic, it is reasonable to assume that an abnormal background disturbance will show random fluctuation, whereas a disturbance caused by a starship in cloak mode would show a characteristic temporal pattern. Thus, when there is a cloaked starship nearby, the magnetic disturbance at any time depends on its previous state. A BN similar to the one in Figure 2.6 could capitalize on this for pattern recognition purposes. Dynamic Bayesian Networks (DBNs) allow nodes to be repeated over time (Murphy 1998). The model of Figure 2.6 has both static and dynamic nodes, and thus is a partially dynamic Bayesian network (PDBN), also known as a temporal Bayesian network (e.g., Takikawa et al. 2001). While DBNs and PDBNs are useful for temporal recursion, a more general recursion capability is needed, as well as a parsimonious syntax for expressing recursive relationships.
Fig. 2.6. BN for One Starship with Temporal Recursion
More expressive languages. The above represents just a glimpse of the issues that confront an engineer attempting to apply Bayesian networks to realistically complex problems. To cope with these and other challenges, a number of languages have appeared that extend the expressiveness of standard BNs in various ways. Examples include include plates (Gilks, et al., 1994; Buntine, 1994; Spiegelhalter, et al. 1996), object-oriented Bayesian networks (Koller and Pfeffer, 1997; Bangsø and Wuillemin,
16
K.B. Laskey and P.C.G. Costa
2000; Langseth and Nielsen, 2003), probabilistic relational models (Pfeffer, 2000), relational Bayesian networks (Jaeger, 1997); Bayesian logic programs (Kersting and De Raedt, 2001; De Raedt and Kersting, 2003); and multi-entity Bayesian networks (Laskey, 2007). The expressive power of these languages varies, but all extend the expressivity of Bayesian networks beyond propositional power. As probabilistic languages became increasingly expressive, the need grew for a fuller characterization of their theoretical properties. Different communities appear to be converging around certain fundamental approaches to representing uncertain information about the attributes, behavior, and interrelationships of structured entities (cf., Heckerman et al. 2004). Systems based on first-order logic (FOL) have the ability to represent entities of different types interacting with each other in varied ways. Sowa states that first-order logic “has enough expressive power to define all of mathematics, every digital computer that has ever been built, and the semantics of every version of logic, including itself” (Sowa 2000, page 41). For this reason, FOL has become the de facto standard for logical systems from both a theoretical and practical standpoint. However, systems based on classical first-order logic lack a theoretically principled, widely accepted, logically coherent methodology for reasoning under uncertainty. In classical first-order logic, the most that can be said about a hypothesis that can be neither proven nor disproven is that its truth-value is unknown. Practical reasoning demands more. In our example, the Enterprise crew’s lives depend on the Captain’s assessment of the plausibility of many hypotheses he can neither prove nor disprove. Yet, he also needs first-order logic’s ability to express generalizations about properties of and relationships among entities. In short, he needs a probabilistic logic with first-order expressive power. Multi-Entity Bayesian Networks (MEBN) integrates first-order logic with Bayesian probability theory (Laskey, 2005). MEBN logic can assign probabilities in a logically coherent manner to any set of sentences in first-order logic, and can assign a conditional probability distribution given any consistent set of finitely many firstorder sentences. That is, anything that can be expressed in first-order logic can be assigned a probability by MEBN logic. Achieving full first-order expressive power in a Bayesian logic is non-trivial. This requires the ability to represent an unbounded or possibly infinite number of random variables, some of which may have an unbounded or possibly infinite number of possible values. We also need to be able to represent recursive definitions and random variables that may have an unbounded or possibly infinite number of parents. Random variables taking values in uncountable sets such as the real numbers present additional difficulties. The next section presents how MEBN addresses these issues and can be used as the logical underpinning of the Enterprise’s complex decision system.
2.2 MEBN Like present-day Earth, 24th Century outer space is not a politically trivial environment. Our first extension introduces different alien species with diverse profiles. Although MEBN logic can represent the full range of species inhabiting the Universe in the 24th century, for purposes of this paper we prefer to use a simpler model. We
2
Uncertainty Representation and Reasoning in Complex Systems
17
therefore limit the explicitly modeled species to Friends1, Cardassians, Romulans, and Klingons while addressing encounters with other possible races using the general label Unknown. Cardassians are constantly at war with the Federation, so any encounter with them is considered a hostile event. Fortunately, they do not possess cloaking technology, which makes it easier to detect and discriminate them. Romulans are more ambiguous, behaving in a hostile manner in roughly half their encounters with Federation starships. Klingons, who also possess cloaking technology, have a peace agreement with the United Federation of Planets, but their treacherous and aggressive behavior makes them less reliable than friends. Finally, when facing an unknown species, the historical log of such events shows that out of every ten new encounters, only one was hostile. Apart from the species of its operators, a truly “realistic” model would consider each starship’s type, offensive power, the ability of inflict harm to the Enterprise given its range, and numerous other features pertinent to the model’s purpose. We will address these issues as we present the basic constructs of MEBN logic. Understanding MFrags. MEBN logic represents the world as comprised of entities that have attributes and are related to other entities. Random variables represent features of entities and relationships among entities. Knowledge about attributes and relationships is expressed as a collection of MEBN fragments (MFrags) organized into MEBN Theories (MTheories). An MFrag represents a conditional probability distribution for instances of its resident RVs given their parents in the fragment graph and the context nodes. An MTheory is a set of MFrags that collectively satisfies consistency constraints ensuring the existence of a unique joint probability distribution over instances of the RVs represented in each of the MFrags within the set. Like a BN, an MFrag contains nodes, which represent RVs, arranged in a directed graph whose edges represent direct dependence relationships. An isolated MFrag can be roughly compared with a standard BN with known values for its root nodes and known local distributions for its non-root nodes. For example, the MFrag of Figure 2.7 represents knowledge about the degree of danger to which our own starship is exposed. The fragment graph has seven nodes. The four nodes at the top of the figure are context nodes; the two darker nodes below the context nodes are the input nodes; and the bottom node is a resident node. A node in an MFrag may have a parenthesized list of arguments. These arguments are placeholders for entities in the domain. For example, the argument st to HarmPotential(st, t) is a placeholder for an entity that might harm us, while the argument t is a placeholder for the time step this instance represents. To refer to an actual entity in the domain, the argument is replaced with a unique identifier. By convention, unique identifiers begin with an exclamation point, and no two distinct entities can have the same unique identifier. By substituting unique identifiers for a RV’s arguments, we can make instances of the RV. For example, HarmPotential(!ST1, !T1) and HarmPotential(!ST2, !T1) are two instances of HarmPotential(st, t) that both occur in the time step !T1. 1
The interest reader can find further information on the Star Trek series in a plethora of websites dedicated to preserve or to extend the history of series, such as www.startrek.com, www.ex-astris-scientia.org, or techspecs.acalltoduty.com.
18
K.B. Laskey and P.C.G. Costa
Fig. 2.7. The DangerToSelf MFrag
The resident nodes of an MFrag have local distributions that define how their probabilities depend on the values of their parents in the fragment graph. In a complete MTheory, each random variable has exactly one home MFrag, where its local distribution is defined.2 Input and context nodes (e.g., OpSpec(st) or IsOwnStarship(s)) influence the distribution of the resident nodes, but their distributions are defined in their own home MFrags. Context nodes represent conditions that must be satisfied for the influences and local distributions of the fragment graph to apply. Context nodes are Boolean nodes: that is, they may have value True, False, or Absurd.3 Context nodes having value True are said to be satisfied. As an example, if we substitute the unique identifier for the Enterprise (i.e., !ST0) for the variable s in IsOwnStarship(s), the resulting hypothesis will be true. If, instead, we substitute a different starship unique identifier (say, !ST1), then this hypothesis will be false. Finally, if we substitute the unique identifier of a non-starship (say, !Z1), then this statement is absurd (i.e., it is absurd to ask whether or not a zone in space is one’s own starship). To avoid cluttering the fragment graph, we do not show the states of context nodes as we do with input and resident nodes, because they are Boolean nodes whose values are relevant only for deciding whether to use a resident random variable’s local distribution or its default distribution. No probability values are shown for the states of the nodes of the fragment graph in Figure 2.7. This is because nodes in a fragment graph do not represent individual random variables with well-defined probability distributions. Instead, a node in an MFrag represents a generic class of random variables. To draw inferences or declare evidence, we must create instances of the random variable classes. 2
Although standard MEBN logic does not support polymorphism, it could be extended to a typed polymorphic version that would permit a random variable to be resident in more than one MFrag. 3 State names in this paper are alphanumeric strings beginning with a letter, including True and False. However, Laskey (2005) uses the symbols T for True, F for False, and ⊥ for Absurd, and requires other state names to begin with an exclamation point (because they are unique identifiers).
2
Uncertainty Representation and Reasoning in Complex Systems
19
To find the probability distribution for an instance of DangerToSelf(s, t), we first identify all instances of HarmPotential(st, t) and OpSpec(st) for which the context constraints are satisfied. If there are none, we use the default distribution that assigns value Absurd with probability 1. Otherwise, to complete the definition of the MFrag of Figure 2.7, we must specify a local distribution for its lone resident node, DangerToSelf(s, t). The pseudo-code of Figure 2.7 defines a local distribution for the danger to a starship due to all starships that influence its danger level. Local distributions in standard BNs are typically represented by static tables, which limits each node to a fixed number of parents. On the other hand, an instance of a node in an MTheory might have any number of parents. Thus, MEBN implementations (i.e., languages based on MEBN logic) must provide an expressive language for defining local distributions. We use pseudo-code to convey the idea of using local expressions to specify probability distributions, while not committing to a particular syntax. Lines 3 to 5 cover the case in which there is at least one nearby starship operated by Cardassians and having the ability to harm the Enterprise. In this uncomfortable situation for our starship, the probability of an unacceptable danger to self is 0.90 plus the minimum of 0.10 and the result of multiplying 0.025 by the total number of starships that are harmful and operated by Cardassians. Also the remaining belief (i.e., the difference between 100% and the belief in state Unacceptable is divided between High (80% of the remainder) and Medium (20% of the remainder) whereas belief in Low is zero. The remaining lines use similar formulas to cover the other possible configurations in which there exist starships with potential to harm Enterprise (i.e., HarmPotential(st, t) = True). The last conditional statement of the local expression covers the case in which no nearby starships can inflict harm upon the Enterprise (i.e., all nodes HarmPotential (st, t) have value False). In this case, the value for DangerToSelf(s, t) is Low with probability 1. Figure 2.8 depicts an instantiation of the Danger To Self MFrag for which we have four starships nearby, three of them operated by Cardassians and one by the Romulans. Also, the Romulan and two of the Cardassian starships are within a range at which they can harm the Enterprise, whereas the other Cardassian starship is too far away to inflict any harm. Following the procedure described in Figure 2.7, the belief for state Unacceptable is .975 (.90 + .025*3) and the beliefs for states High, Medium, and Low are .02 ((1.975)*.8), .005 ((1-.975)*.2), and zero respectively. In short, the pseudo-code covers all possible input node configurations by linking the danger level to the number of nearby starships that have the potential to harm our own starship. The formulas state that if there are any Cardassians nearby, then the distribution for danger level given the number of Cardassians will be: 1 Cardassian ship - [0.925, 0.024, 0.006, 0]; 2 Cardassian ships - [0.99, 0.008, 0.002, 0]; 3 Cardassian ships - [0.975, 0.2, 0.05, 0]; 4 or more Cardassian ships - [1, 0, 0, 0]
20
K.B. Laskey and P.C.G. Costa
Fig. 2.8. An Instance of the DangerToSelf MFrag
Also, if there are only Romulans with HarmPot(s) = True, then the distribution becomes: 1 Romulan ship - [.73, .162, .081, .027]; 2 Romulan ships - [.76, .144, .072, .024]; ... 10 or more Romulan ships - [1, 0, 0, 0]
,
For a situation in which only starships operated by unknown species can harm Enterprise, the probability distribution is more evenly distributed: 1 Unknown ship - [.02, .48, .48, .02]; 2 Unknown ships - [.04, .46, .46, .04]; ... , 10 or more Unknown ships - [.20, .30, .30, .20] Finally, if there are only friendly starships nearby with the ability to harm the Enterprise, then the distribution becomes [0, 0, 0.01, .99]. The last line indicates that if that no starship can harm the Enterprise, then the danger level will be Low for sure. As noted previously, a powerful representational formalism is needed to represent complex scenarios at a reasonable level of fidelity. In our example, we could have added additional detail and explored many nuances. For example, a large number of nearby Romulan ships might indicate a coordinated attack and therefore indicate greater danger than an isolated Cardassian ship. Our example was purposely kept simple in order to clarify the basic capabilities of the logic. It is clear that more complex knowledge patterns could be accommodated as needed to suit the requirements of the application. MEBN logic has built-in logical MFrags that provide the ability to express anything that can be expressed in first-order logic. Laskey (2005) proves that MEBN logic can implicitly express a probability distribution over interpretations of any consistent, finitely axiomatizable first-order theory. This provides MEBN with sufficient expressive power to represent virtually any scientific hypothesis.
2
Uncertainty Representation and Reasoning in Complex Systems
21
Fig. 2.9. The Zone MFrag
Recursive MFrags. One of the main limitations of BNs is their lack of support for recursion. Extensions such as dynamic Bayesian networks provide the ability to define certain kinds of recursive relationships. MEBN provides theoretically grounded support for very general recursive definitions of local distributions. Figure 2.9 depicts an example of how an MFrag can represent temporal recursion. As we can see from the context nodes, in order for the local distribution to apply, z has to be a zone and st has to be a starship that has z as its current position. In addition, tprev and t must be TimeStep entities, and tprev is the step preceding t. Other varieties of recursion can also be represented in MEBN logic by means of MFrags that allow influences between instances of the same random variable. Allowable recursive definitions must ensure that no random variable instance can influence its own probability distribution. As in non-recursive MFrags, the input nodes in a recursive MFrag include nodes whose local distributions are defined in another MFrag (i.e., CloakMode(st)). In addition, the input nodes may include instances of recursively-defined nodes in the MFrag itself. For example, the input node ZoneMD(z, tprev) represents the magnetic disturbance in zone z at the previous time step, which influences the current magnetic disturbance ZoneMD(z, t). The recursion is grounded by specifying an initial distribution at time !T0 that does not depend on a previous magnetic disturbance. Figure 2.10 illustrates how recursive definitions can be applied to construct a situation-specific Bayesian Network (SSBN) to answer a query. Our query concerns the magnetic disturbance at time !T3 in zone !Z0, where !Z0 is known to contain our own uncloaked starship !ST0 and exactly one other starship !ST1, which is known to be cloaked. To build the graph shown in this picture, we begin by creating an instance of the home MFrag of the query node ZoneMD(!Z0,!T3). That is, we substitute !Z0 for z and !T3 for t, and then create all instances of the remaining random variables that meet the context constraints. Next, we build any CPTs we can already build. CPTs for ZoneMD(!Z0,!T3), ZoneNature(!Z0), ZoneEShips(!Z0), and ZoneFShips(!Z0) can be constructed because they are resident in the retrieved MFrag. Single-valued CPTs for CloakMode(!ST0), CloakMode(!ST1), and !T3=!T0 can be specified because the values of these random variables are known.
22
K.B. Laskey and P.C.G. Costa
This leaves us with one node, ZoneMD(!Z0,!T2), for which we have no CPT. To construct its CPT, we must retrieve its home MFrag, and instantiate any random variables that meet its context constraints and have not already been instantiated. The new random variables created in this step are ZoneMD(!Z0,!T1) and !T2=!T0. We know the value of the latter, and we retrieve the home MFrag of the former. This process continues until we have added all the nodes of Figure 2.10. At this point we can construct CPTs for all random variables, and the SSBN is complete.4 The MFrag depicted in Figure 2.9 defines the local distribution that applies to all these instances, even though for brevity we only displayed the probability distributions (local and default) for node ZoneMD(z, t). Note that when there is no starship with cloak mode activated, the probability distribution for magnetic disturbance given the zone nature does not change with time. When there is at least one starship with cloak mode activated, then the magnetic disturbance tends to fluctuate regularly with time in the manner described by the local expression. For the sake of simplicity, we assumed that the local distribution depends only on whether there is a cloaked starship nearby.
Fig. 2.10. SSBN Constructed from Zone MFrag 4
For efficiency reasons, most knowledge-based model construction systems would not explicitly represent root evidence nodes such as CloakMode(!ST0) or !T1=!T0 or barren nodes such as ZoneFShips(!Z0) and ZoneFShips(!Z0). For expository purposes, we have taken the logically equivalent, although less computationally efficient, approach of including all these nodes explicitly.
2
Uncertainty Representation and Reasoning in Complex Systems
23
We also assumed that the initial distribution for the magnetic disturbance when there are cloaked starships is equal to the stationary distribution given the zone nature and the number of cloaked starships present initially. Of course, it would be possible to write different local expressions expressing a dependence on the number of starships, their size, their distance from the Enterprise, etc. MFrags provide a flexible means to represent knowledge about specific subjects within the domain of discourse, but the true gain in expressive power is revealed when we aggregate these “knowledge patterns” to form a coherent model of the domain of discourse that can be instantiated to reason about specific situations and refined through learning. It is important to note that just collecting a set MFrags that represent specific parts of a domain is not enough to ensure a coherent representation of that domain. For example, it would be easy to specify a set of MFrags with cyclic influences, or one having multiple conflicting distributions for a random variable in different MFrags. The following section describes how to define complete and coherent domain models as collections of MFrags. Building MEBN models with MTheories. In order to build a coherent model we have to make sure that our set of MFrags collectively satisfies consistency constraints ensuring the existence of a unique joint probability distribution over instances of the random variables mentioned in the MFrags. Such a coherent collection of MFrags is called an MTheory. An MTheory represents a joint probability distribution for an unbounded, possibly infinite number of instances of its random variables. This joint distribution is specified by the local and default distributions within each MFrag together with the conditional independence relationships implied by the fragment graphs. The MFrags described above are part of a generative MTheory for the intergalactic conflict domain. A generative MTheory summarizes statistical regularities that characterize a domain. These regularities are captured and encoded in a knowledge base using some combination of expert judgment and learning from observation. To apply a generative MTheory to reason about particular scenarios, we need to provide the system with specific information about the individual entity instances involved in the scenario. On receipt of this information, we can use Bayesian inference both to answer specific questions of interest (e.g., how high is the current level of danger to the Enterprise?) and to refine the MTheory (e.g., each encounter with a new species gives us additional statistical data about the level of danger to the Enterprise from a starship operated by an unknown species). Bayesian inference is used to perform both problem-specific inference and learning in a sound, logically coherent manner. Findings are the basic mechanism for incorporating observations into MTheories. A finding is represented as a special 2-node MFrag containing a node from the generative MTheory and a node declaring one of its states to have a given value. From a logical point of view, inserting a finding into an MTheory corresponds to asserting a new axiom in a first-order theory. In other words, MEBN logic is inherently open, having the ability to incorporate new axioms as evidence and update the probabilities of all random variables in a logically consistent way. In addition to the requirement that each random variable must have a unique home MFrag, a valid MTheory must ensure that all recursive definitions terminate in finitely many steps and contain no circular influences. Finally, as we saw above, random variable instances may have a large, and possibly unbounded number of parents.
24
K.B. Laskey and P.C.G. Costa
A valid MTheory must satisfy an additional condition to ensure that the local distributions have reasonable limiting behavior as more and more parents are added. Laskey (2005) proved that when an MTheory satisfies these conditions (as well as other technical conditions that are unimportant to our example), then there exists a joint probability distribution on the set of instances of its random variables that is consistent with the local distributions assigned within its MFrags. Furthermore, any consistent, finitely axiomatizable FOL theory can be translated to infinitely many MTheories, all having the same purely logical consequences, that assign different probabilities to statements whose truth-value is not determined by the axioms of the FOL theory. MEBN logic contains a set of built-in logical MFrags (including quantifier, indirect reference, and Boolean connective MFrags) that provide the ability to represent any sentence in first-order logic. If the MTheory satisfies additional conditions, then a conditional distribution exists given any finite sequence of findings that does not logically contradict the logical constraints of the generative MTheory. MEBN logic thus provides a logical foundation for systems that reason in an open world and incorporate observed evidence in a mathematically sound, logically coherent manner.
Fig. 2.11. The Star Trek Generative MTheory
Figure 2.11 shows an example of a generative MTheory for our Star Trek domain. For the sake of conciseness, the local distribution formulas and the default distributions are not shown here. The Entity Type, at the right side of Figure 2.11, is meant to formally declare the possible types of entity that can be found in the model. This is a generic MFrag that allows the creation of domain-oriented types (which are represented by TypeLabel entities) and forms the basis for a Typed system. In our simple model we did not address the creation or the explicit support for entity types. Standard MEBN logic as defined in Laskey (2005) is untyped, meaning that a knowledge engineer who wishes to represent types must explicitly define the necessary logical machinery. The Entity Type MFrag of Figure 2.11 defines an extremely simple kind of type structure. MEBN can be extended with MFrags to accommodate any flavor of typed system, including more complex capabilities such as sub-typing, polymorphism, multiple-inheritance, etc.
2
Uncertainty Representation and Reasoning in Complex Systems
25
Fig. 2.12. Equivalent MFrag Representations of Knowledge
It is important to understand the power and flexibility that MEBN logic gives to knowledge base designers by allowing multiple, equivalent ways of portraying the same knowledge. Indeed, the generative MTheory of Figure 2.11 is just one of the many possible (consistent) sets of MFrags that can be used to represent a given joint distribution. There, we attempted to cluster the random variables in a way that naturally reflects the structure of the objects in that scenario (i.e., we adopted an object oriented approach to modeling), but this was only one design option among the many allowed by the logic. As an example of such flexibility, Figure 2.12 depicts the same knowledge contained in the Starship MFrag of Figure 2.11 (right side) using three different MFrags. In this case, the modeler might have opted for decomposing an MFrag in order to get the extra flexibility of smaller, more specific MFrags that can be combined in different ways. Another knowledge engineer might prefer the more concise approach of having all knowledge in just one MFrag. Ultimately, the approach to be taken when building an MTheory will depend on many factors, including the model’s purpose, the background and preferences of the model’s stakeholders, the need to interface with external systems, etc. First Order Logic (or one of its subsets) provides the theoretical foundation for the type systems used in popular object-oriented and relational languages. MEBN logic provides the basis for extending the capability of these systems by introducing a sound mathematical basis for representing and reasoning under uncertainty. Among the advantages of a MEBN-based typed system is the ability to represent type uncertainty. As an example, suppose we had two different types of space traveling entities, starships and comets, and we are not sure about the type of a given entity. In this case, the result of a query that depends on the entity type will be a weighted average of the result given that the entity is a comet and the result given that it is a starship. Further advantages of a MEBN-based type system include the ability to refine type-specific probability distributions using Bayesian learning, assign probabilities to possible
26
K.B. Laskey and P.C.G. Costa
values of unknown attributes, reason coherently at multiple levels of resolution, and other features related to representing and reasoning with incomplete and/or uncertain information. Another powerful aspect of MEBN, the ability to support finite or countably infinite recursion, is illustrated in the Sensor Report and Zone MFrags, both of which involve temporal recursion. The Time Step MFrag includes a formal specification of the local distribution for the initial step of the time recursion (i.e., when t=!T0) and of its recursive steps (i.e., when t does not refer to the initial step). Other kinds of recursion can be represented in a similar manner. MEBN logic also has the ability to represent and reason about hypothetical entities. Uncertainty about whether a hypothesized entity actually exists is called existence uncertainty. In our example model, the random variable Exists(st) is used to reason about whether its argument is an actual starship. For example, we might be unsure whether a sensor report corresponds to one of the starships we already know about, a starship of which we were previously unaware, or a spurious sensor report. In this case, we can create a starship instance, say !ST4, and assign a probability of less than 1.0 that Exists(!ST4) has value True. Then, any queries involving !ST4 will return results weighted appropriately by our belief in the existence of !ST4. Furthermore, our belief in Exists(!ST4) is updated by Bayesian conditioning as we obtain more evidence relevant to whether !ST4 denotes a previously unknown starship. Representing existence uncertainty is particularly useful for counterfactual reasoning and reasoning about causality (Druzdzel & Simon 1993, Pearl 2000). Because the Star Trek model was designed to demonstrate the capabilities of MEBN logic, we avoided issues that can be handled by the logic but would make the model too complex. As an example, one aspect that our model does not consider is association uncertainty, a very common problem in multi-sensor data fusion systems. Association uncertainty means that we are not sure about the source of a given report (e.g., whether a given report refers to starship !ST4, !ST2 or !ST1). Many weakly discriminatory reports coming from possibly many starships produces an exponential set of combinations that require special hypothesis management methods (c.f., Stone et al. 1999). In the Star Trek model we avoided these problems by assuming our sensor suite can achieve perfect discrimination. However, the logic can represent and reason with association uncertainty, and thus provides a sound logical foundation for hypothesis management in multi-source fusion. Making Decisions with MEBN Logic. Captain Picard has more than an academic interest in the danger from nearby starships. He must make decisions with life and death consequences. Multi-Entity Decision Graphs (MEDGs, or “medges”) extend MEBN logic to support decision making under uncertainty. MEDGs are related to MEBNs in the same way influence diagrams are related to Bayesian Networks. A MEDG can be applied to any problem that involves optimal choice from a set of alternatives subject to given constraints. When a decision MFrag (i.e., one that has decision and utility nodes) is added to a generative MTheory such as the one portrayed in Figure 2.11, the result is a MEDG. As an example, Figure 2.13 depicts a decision MFrag representing Captain Picard’s choice of which defensive action to take. The decision node DefenseAction(s) represents the set of defensive actions available to the Captain (in this case, to fire the
2
Uncertainty Representation and Reasoning in Complex Systems
27
Fig. 2.13. The Star Trek Decision MFrag
ship’s weapons, to retreat, or to do nothing). The value nodes capture Picard’s objectives, which in this case are to protect Enterprise while also avoiding harm to innocent people as a consequence of his defensive actions. Both objectives depend upon Picard’s decision, while ProtectSelf(s) is influenced by the perceived danger to Enterprise and ProtectOthers(s) is depends on the level of danger to other starships in the vicinity. The model described here is clearly an oversimplification of any “real” scenario a Captain would face. Its purpose is to convey the core idea of extending MEBN logic to support decision-making. Indeed, a more common situation is to have multiple, mutually influencing, often conflicting factors that together form a very complex decision problem, and require trading off different attributes of value. For example, a decision to attack would mean that little power would be left for the defense shields; a retreat would require aborting a very important mission. MEDGs provide the necessary foundation to address all the above issues. Readers familiar with influence diagrams will appreciate that the main concepts required for a first-order extension of decision theory are all present in Figure 2.13. In other words, MEDGs have the same core functionality and characteristics of common MFrags. Thus, the utility table in Survivability(s) refers to the entity whose unique identifier substitutes for the variable s, which according to the context nodes should be our own starship (Enterprise in this case). Likewise, the states of input node DangerToSelf(s, t) and the decision options listed in DefenseAction(s) should also refer to the same entity. Of course, this confers to MEDGs the expressive power of MEBN models, which includes the ability to use this same decision MFrag to model the decision process of the Captain of another starship. Notice that a MEDG Theory should also comply with the same consistency rules of standard MTheories, along with additional rules required for influence diagrams (e.g., value nodes are deterministic and must be leaf nodes or have only value nodes as children). In our example, adding the Star Trek Decision MFrag of Figure 2.13 to the generative MTheory of Figure 2.11 will maintain the consistency of the latter, and therefore the result will be a valid generative MEDG Theory. Our simple example can be extended to more elaborate decision constructions, providing the flexibility to model decision problems in many different applications spanning diverse domains.
28
K.B. Laskey and P.C.G. Costa
Inference in MEBN Logic. A generative MTheory provides prior knowledge that can be updated upon receipt of evidence represented as finding MFrags. We now describe the process used to obtain posterior knowledge from a generative MTheory and a set of findings. In a BN model such as the ones shown in Figures 2.2 through 2.6, assessing the impact of new evidence involves conditioning on the values of evidence nodes and applying a belief propagation algorithm. When the algorithm terminates, beliefs of all nodes, including the node(s) of interest, reflect the impact of all evidence entered thus far. This process of entering evidence, propagating beliefs, and inspecting the posterior beliefs of one or more nodes of interest is called a query. MEBN inference works in a similar way (after all, MEBN is a Bayesian logic), but following a more complex yet more flexible process. Whereas BNs are static models that must be changed whenever the situation changes (e.g., number of starships, time recursion, etc.), an MTheory implicitly represents an infinity of possible scenarios. In other words, the MTheory represented in Figure 2.11 (as well as the MEDG obtained by aggregating the MFrag in Figure 2.13) is a model that can be used for as many starships as we want, and for as many time steps we are interested in, for as many situations as we face from the 24th Century into the future. That said, the obvious question is how to perform queries within such a model. A simple example of query processing was given above in the section on temporal recursion. Here, we describe the general algorithm for constructing a situation-specific Bayesian network (SSBN). To do so, we have to have an initial generative MTheory (or MEDG Theory), a Finding set (which conveys particular information about the situation) and a Target set (which indicates the nodes of interest to us). For comparison, let’s suppose we have a situation that is similar to the one in Figure 2.5, where four starships are within the Enterprise’s range. In that particular case, a BN was used to represent the situation at hand, which means we have a model that is “hardwired” to a known number (four) of starships, and any other number would require a different model. A standard Bayesian inference algorithm applied to that model would involve entering the available information about these four starships (i.e., the four sensor reports), propagating the beliefs, and obtaining posterior probabilities for the hypotheses of interest (e.g., the four Starship Type nodes). Similarly, MEBN inference begins when a query is posed to assess the degree of belief in a target random variable given a set of evidence random variables. We start with a generative MTheory, add a set of finding MFrags representing problemspecific information, and specify the target nodes for our query. The first step in MEBN inference is to construct the SSBN, which can be seen as an ordinary Bayesian network constructed by creating and combining instances of the MFrags in the generative MTheory. Next, a standard Bayesian network inference algorithm is applied. Finally, the answer to the query is obtained by inspecting the posterior probabilities of the target nodes. A MEBN inference algorithm is provided in Laskey (2005). The algorithm presented there does not handle decision graphs, and so we will extend it slightly for purposes of illustrating how our MEDG Theory can be used to support the Captain’s decision. In our example, the finding MFrags will convey information that we have five starships (!ST0 through !ST4) and that the first is our own starship. For the sake of illustration, let’s assume that our Finding set also includes data regarding the nature of the
2
Uncertainty Representation and Reasoning in Complex Systems
29
space zone we are in (!Z0), its magnetic disturbance for the first time step (!T0), and sensor reports for starships !SR1 to !SR4 for the first two time steps. We assume that the Target set for our illustrative query includes an assessment of the level of danger experienced by the Enterprise and the best decision to take given this level of danger. Figure 2.14 shows a situation-specific decision graph for our query5. To construct the decision graph, we begin by creating instances of the random variables in the Target set and the random variables for which we have findings. The target random variables are DangerLevel(!ST0) and DefenseAction(!ST0). The finding random variables are the eight SRDistance nodes (2 time steps for each of four starships) and the two ZoneMD reports (one for each time step). Although each finding MFrag contains two nodes, the random variable on which we have a finding and a node indicating the value to which it is set, we include only the first of these in our situation-specific Bayesian network, and declare as evidence that its value is equal to the observed value indicated in the finding MFrag. The next step is to retrieve and instantiate the home MFrags of the finding and target random variables. When each MFrag is instantiated, instances of its random variables are created to represent known background information, observed evidence, and queries of interest to the decision maker. If there are any random variables with undefined distributions, then the algorithm proceeds by instantiating their respective home MFrags. The process of retrieving and instantiating MFrags continues until there are no remaining random variables having either undefined distributions or unknown values. The result, if this process terminates, is a SSBN or, in our case, a situationspecific decision graph (SSDG). In some cases the SSBN can be infinite, but under conditions given in Laskey (2005), the algorithm produces a sequence of approximate SSBNs for which the posterior distribution of the target nodes converges to their posterior distribution given the findings. Mahoney and Laskey (1998) define a SSBN as a minimal Bayesian network sufficient to compute the response to a query. A SSBN may contain any number of instances of each MFrag, depending on the number of entities and their interrelationships. The SSDG in Figure 2.14 is the result of applying this process to the MEDG Theory in Figures 2.11 and 2.13 with the Finding and Target set we just defined. Another important use for the SSBN algorithm is to help in the task of performing Bayesian learning, which is treated in MEBN logic as a sequence of MTheories. Learning from Data. Learning graphical models from observations is usually decomposed into two sub-problems: inferring the parameters of the local distributions when the structure is known, and inferring the structure itself. In MEBN, by structure we mean the possible values of the random variables, their organization into MFrags, the fragment graphs, and the functional forms of the local distributions. Figure 2.15 shows an example of parameter learning in MEBN logic in which we adopt the assumption that one can infer the length of a starship on the basis of the average length of all starships. This generic domain knowledge is captured by the generative MFrag, which specifies a prior distribution based on what we know about starship lengths. 5
The alert reader may notice that root evidence nodes and barren nodes that were included in the constructed network of Figure 2.10 are not included here. As noted above, explicitly representing these nodes is not necessary.
30
K.B. Laskey and P.C.G. Costa
Fig. 2.14. SSBN for the Star Trek MTheory with Four Starships within Enterprise’s Range
One strong point about using Bayesian models in general and MEBN logic in particular is the ability to refine prior knowledge as new information becomes available. In our example, let’s suppose that we receive precise information on the length of starships !ST2, !ST3, and !ST5; but have no information regarding the incoming starship !ST8. The first step of this simple parameter learning example is to enter the available information to the model in the form of findings (see box StarshipLenghInd Findings). Then, we pose a query on the length of !ST8. The SSBN algorithm will instantiate all the random variables that are related to the query at hand until it finishes with the SSBN depicted in Figure 2.15 (box SSBN with Findings). In this example, the MFrags satisfy graph-theoretic conditions under which a re-structuring operation called finding absorption (Buntine 1994b) can be applied without changing the structure of the MFrags. Therefore, the prior distribution of the random variable GlobalAvgLength can be replaced by the posterior distribution obtained when adding evidence in the form of findings. As a result of this learning process, the probability distribution for GlobalAvgLength has been refined in light of the new information conveyed by the findings. The resulting, more precise distribution can now be used not only to predict the length of !ST8 but for future queries as well. In our specific example, the same query would retrieve the SSBN in the lower right corner of Figure 2.15 (box SSBN with Findings Absorbed). One of the major advantages of the finding absorption operation is that it greatly improves the tractability of both learning and SSBN inference. We can also apply finding absorption to modify the generative MFrags themselves, thus creating a new generative MTheory that has the same conditional distribution given its findings as our original MTheory. In this new MTheory, the distribution of GlobalAvgLength has been modified to incorporate the observations and the finding random variables are set with probability 1 to their observed values. Restructuring MTheories via finding absorption can increase the efficiency of SSBN construction and of inference.
2
Uncertainty Representation and Reasoning in Complex Systems
31
Fig. 2.15. Parameter Learning in MEBN
Structure learning in MEBN works in a similar fashion. As an example, let’s suppose that when analyzing the data that was acquired in the parameter learning process above, a domain expert raises the hypothesis that the length of a given starship might depend on its class. To put it into a “real-life” perspective, let’s consider two classes: Explorers and Warbirds. The first usually are vessels crafted for long distance journeys with a relatively small crew and payload. Warbirds, on the other hand, are heavily armed vessels designed to be flagships of a combatant fleet, usually carrying lots of ammunition, equipped with many advanced technology systems and a large crew. Therefore, our expert thinks it likely that the average length of Warbirds may be greater than the average length of Explorers. In short, the general idea of this simple example is to mimic the more general situation in which we have a potential link between two attributes (i.e., starship length and class) but at best weak evidence to support the hypothesized correlation. This is a typical situation in which Bayesian models can use incoming data to learn both structure and parameters of a domain model. Generally speaking, the solution for this class of situations is to build two different structures and apply Bayesian inference to evaluate which structure is more consistent with the data as it becomes available. The initial setup of the structure learning process for this specific problem is depicted in Figure 2.16. Each of the two possible structures is represented by its own generative MFrag. The first MFrag is the same as before: the length of a starship depended only on a global average length that applied to starships of all classes. The upper left MFrag of Figure 2.16, StarshipLengthInd MFrag conveys this hypothesis. The second possible structure, represented by the ClassAvgLength and StarshipLengthDep MFrags, covers the case in which a starship class influences its length. The two structures are then connected by the Starship Length MFrag, which has the format of a multiplexor MFrag. The distribution of a multiplexor node such as StarshipLength(st) always has one parent selector node defining which of the other parents is influencing the distribution at a given situation. In this example, where we have only two possible structures, the selector parent will be a two-state node. Here, the selector parent is the Boolean LengthDependsOnClass(!Starship). When this node has value False then StarshipLength(cl) will be
32
K.B. Laskey and P.C.G. Costa
Fig. 2.16. Structure Learning in MEBN
equal to StarshipLengthInd(st), the distribution of which does not depend on the starship’s class. Conversely, if the selector parent has value True then StarshipLength(cl) will be equal to StarshipLengthDep(st), which is directly influenced by ClassAvgLength(StarshipClass(st)). Figure 2.17 shows the result of applying the SSBN algorithm to the generative MFrags in Figure 2.16. The SSBN on the left doesn’t have the findings included, but only information about the existence of four starships. It can be noted that we choose our prior for the selector parent (the Boolean node on the top of the SSBN) to be the uniform distribution, which means we assumed that both structures (i.e., class affecting length or not) have the same prior probability. For the SSBN in the right side we included the known facts that !ST2 and !ST3 belong to the class of starships !Explorer, and that !ST5 and !ST8 are Warbird vessels. Further, we included the lengths of three ships for which we have length reports. The result of the inference process was not only an estimate of the length of !ST8 but a clear confirmation that the data available strongly supports the hypothesis that the class of a starship directly influences its length.
Fig. 2.17. SSBNs for the Parameter Learning Example
2
Uncertainty Representation and Reasoning in Complex Systems
33
It may seem cumbersome to define different random variables, StarshipLengthInd and StarshipLengthDep, for each hypothesis about the influences on a starship’s length. As the number of structural hypotheses becomes large, this can become quite unwieldy. Fortunately, we can circumvent this difficulty by introducing a typed version of MEBN and allowing the distributions of random variables to depend on the type of their argument. A detailed presentation of typed MEBN is beyond the scope of this chapter, and the interested reader is directed to Laskey (2007) for further information on the logic.
2.3 Probabilistic Ontologies This section closes the cycle of Bayesian technologies that can be applied to complex systems. We explain the concept of probabilistic ontologies (POs) and introduce PR-OWL, a MEBN reasoner and GUI that can be used to design complex models and save them as a PO, improving its chances of being interoperable, reusable, and extensible. Ontologies. Since its adoption in the field of Information Systems, the term ontology has been given many different definitions. A common underlying assumption is that classical logic would provide the formal foundation for knowledge representation and reasoning. Until recently, theory and methods for representing and reasoning with uncertain and incomplete knowledge have been neglected almost entirely. However, as research on knowledge engineering and applications of ontologies matures, the ubiquity and importance of uncertainty across a wide array of application areas has generated consumer demand for ontology formalisms that can capture uncertainty. Although interest in probabilistic ontologies has been growing, there is as yet no commonly accepted formal definition of the term. Augmenting an ontology to carry numerical and/or structural information about probabilistic relationships is not enough to deem it a probabilistic ontology, as too much information is lost to the lack of a good representational scheme that captures structural constraints and dependencies among probabilities. A true probabilistic ontology must be capable of properly representing those nuances. More formally: Definition 1 (from Costa, 2005): A probabilistic ontology is an explicit, formal knowledge representation that expresses knowledge about a domain of application. This includes: • • • • • • •
Types of entities that exist in the domain; Properties of those entities; Relationships among entities; Processes and events that happen with those entities; Statistical regularities that characterize the domain; Inconclusive, ambiguous, incomplete, unreliable, and dissonant knowledge related to entities of the domain; and Uncertainty about all the above forms of knowledge;
where the term entity refers to any concept (real or fictitious, concrete or abstract) that can be described and reasoned about within the domain of application. Probabilistic Ontologies are used for the purpose of comprehensively describing
34
K.B. Laskey and P.C.G. Costa
knowledge about a domain and the uncertainty associated with that knowledge in a principled, structured and sharable way, ideally in a format that can be read and processed by a computer. They also expand the possibilities of standard ontologies by introducing the requirement of a proper representation of the statistical regularities and the uncertain evidence about entities in a domain of application. Probabilistic OWL. PR-OWL was developed as an extension enabling OWL ontologies to represent complex Bayesian probabilistic models in a way that is flexible enough to be used by diverse Bayesian probabilistic tools (e.g., Netica, Hugin, Quiddity*Suite, JavaBayes, etc.) based on different probabilistic technologies (e.g., PRMs, BNs, etc.). More specifically, OWL is an upper ontology for probabilistic systems that can be used as a framework for developing probabilistic ontologies (as defined in above) that are expressive enough to represent even the most complex probabilistic models. DaConta et al. define an upper ontology as a set of integrated ontologies that characterizes a set of basic commonsense knowledge notions (2003, page 230). In PR-OWL, these basic commonsense notions are related to representing uncertainty in a principled way using OWL syntax (itself a specialization of XML syntax), providing a set of constructs that can be employed to build probabilistic ontologies. Figure 2.18 shows the main concepts involved in defining an MTheory in PR-OWL.
Fig. 2.18. Main Elements of PR-OWL
In the diagram, ellipses represent general classes while arrows represent the main relationships between these classes. A probabilistic ontology (PO) has to have at least one individual of class MTheory, which is basically a label linking a group of MFrags that collectively form a valid MTheory. In actual PR-OLW syntax, that link is expressed via the object property hasMFrag (which is the inverse of object property isMFragIn). Individuals of class MFrag are comprised of nodes, which can be resident, input, or context nodes (not shown in the picture). Each individual of class Node is a random variable RV and thus has a mutually comprehensive, collectively exhaustive set of possible states. In PR-OWL, the object property hasPossibleValues links each node with its possible states, which are individuals of class Entity. Finally, random variables (represented by the class Nodes in PR-OWL) have unconditional or conditional probability distributions, which are represented by class ProbabilityDistribution and linked to its respective nodes via the object property hasProbDist. Figure 2.19 depicts the main elements of the PR-OWL language, its subclasses, and the secondary elements necessary for representing an MTheory. The relations necessary to express the complex structure of MEBN probabilistic models using the OWL
2
Uncertainty Representation and Reasoning in Complex Systems
35
Fig. 2.19. PR-OWL Elements
syntax are also depicted. In addition to (Carvalho et al., 2007) the prospective reader will find more information on the PR-OWL language at http://www.pr-owl.org. At its current stage of development, PR-OWL contains only the basic representation elements that provide a means of representing any MEBN theory. Such a representation could be used by a Bayesian tool (acting as a probabilistic ontology reasoner) to perform inferences to answer queries and/or to learn from newly incoming evidence via Bayesian learning. However, building MFrags in a probabilistic ontology is a manual, error prone, and tedious process. Avoiding errors or inconsistencies requires deep knowledge of the logic and of the data structures of PR-OWL, since the user would have to know all technical terms such as hasPossibleValues, is-NodeFrom, isResidentNodeIn, etc. In an ideal scenario, many of these terms could be omitted and filled automatic by a software application projected to enforce the consistency of a MEBN model. The development of UnBBayes-MEBN, an open source, Java-based application that is currently in alpha phase, is an important step towards this scenario, as it provides both a GUI for building probabilistic ontologies and a reasoner based on the PR-OWL/MEBN framework. UnBBayes-MEBN was designed to allow building POs in an intuitive way without having to rely on a deep knowledge of the PR-OWL specification. Figure 2.20 brings a snapshopt of the UnBBayes-MEBN user interface. In the figure, a click on the “R” icon and another click anywhere in the editing panel will create a resident node, for which a description can be inserted in the text area at the lower left part of the screen. Clicking on the arrow icon would allow one to graphically define the probabilistic relations of that resident node with other nodes, as much as it would be done in current Bayesian packages such as Hugin™. All those actions would result in the
36
K.B. Laskey and P.C.G. Costa
software creating the respective PR-OWL tags (syntactic elements that denote particular parts of a PR-OWL ontology) in the background. Probabilistic Ontologies in UnBBayes-MEBN are saved in PR-OWL format (*.owl file), while application-specific data is stored in a text file with the *.ubf extension. UnBBayes-MEBN provides not only a graphical interface for building probabilistic ontologies, but also a probabilistic reasoner that performs plausible inference by applying Bayes theorem to combine background knowledge represented in the knowledge base (KB) with problem-specific evidence. Currently, only simple queries are available, but future releases will include the ability to perform more complex queries. When a query is submitted, the knowledge base is searched for information to answer the query. If the available information does not suffice, then the KB and the generative MTheory are used to construct a BN to answer the query. This process is called Situation Specific Bayesian.
Fig. 2.20. The UnBBayes-MEBN GUI
UnBBayes was designed to allow building POs in an intuitive way without having to rely on a deep knowledge of the PR-OWL specification. In the example, a click on the “R” icon and another click anywhere in the editing panel will create a resident node, for which a description can be inserted in the text area at the lower left part of
2
Uncertainty Representation and Reasoning in Complex Systems
37
the screen. Clicking on the arrow icon would allow one to graphically define the probabilistic relations of that resident node with other nodes, as much as it would be done in standard Bayesian network tools. The software automatically creates the necessary PR-OWL tags (syntactic elements that denote particular parts of a PR-OWL ontology) in the background. A few performance issues had to be considered in the implementation of UnBBayes-MEBN. As an example, it is possible for the algorithm to reach a context node that cannot be immediately evaluated. This happens when all ordinary variables in the parents set of a resident random variable term do not appear in the resident term itself. In this case, there may be an arbitrary, possibly infinite number of instances of a parent for any given instance of the child. Because this may have a strong impact on the performance of the algorithm, the designed solution involves asking the user for more information. In the current implementation, if one does not provide such information, the algorithm will just halt. Nonetheless, UnBBayes-MEBN provides a convenient tool for building complex MEBN models and save them as PR-OWL probabilistic ontologies. Thus constituting an important step towards the ability to design complex systems with Bayesian technology.
2.4 Conclusion As systems designed to address real world needs are becoming to cross the boundary of complexity that renders deterministic tools less than optimal, the need for proper representation and reasoning with uncertainty is a topic of growing interest. There is a clear trend for requiring systems to be able to deal with incomplete and ambiguous knowledge, and to perform inferences over such knowledge. By providing the best inferential analysis possible with the available data (Occam’s razor), Bayesian theory is a promising approach for complex system’s design. This chapter presented a set of Bayesian tools with a great potential to become the solution of choice for this approach.
References Booker, L.B., Hota, N.: Probabilistic reasoning about ship images. In: Proceedings of the Second Annual Conference on Uncertainty in Artificial Intelligence. Elsevier, New York (1986) Buntine, W.L.: Learning with Graphical Models. Technical Report No. FIA-94-03. NASA Ames Research Center, Artificial Intelligence Research Branch (1994) De Raedt, L., Kersting, K.: Probabilistic Logic Learning. ACM-SIGKDD Explorations: Special Issue on Multi-Relational Data Mining 5(1), 31–48 (2003) Calvanese, D., De Giacomo, G.: Expressive Description Logics. In: Baader, F., Calvanese, D., McGuiness, D., Nardi, D., Patel-Schneider, P. (eds.) The Description Logics Handbook: Theory, Implementation and Applications, ch. 5, 1st edn., pp. 184–225. Cambridge University Press, Cambridge (2003) Charniak, E.: Bayesian Networks without Tears. AI Magazine 12, 50–63 (1991) Costa, P.C.G., Laskey, K.B.: PR-OWL: A Framework for Probabilistic Ontologies. In: Proceedings of the International Conference on Formal Ontology in Information Systems (FOIS 2006), Baltimore, MD, USA, November 9-11 (2006)
38
K.B. Laskey and P.C.G. Costa
Costa, P.C.G.: Bayesian Semantics for the Semantic Web. Doctoral dissertation. In: Department of Systems Engineering and Operations Research, p. 312. George Mason University, Fairfax (2005) Druzdzel, M.J., van der Gaag, L.C.: Building Probabilistic Networks: Where do the Numbers Come From - A Guide to the Literature, Guest Editors’ Introduction. IEEE Transactions in Knowledge and Data Engineering 12, 481–486 (2000) Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. MIT Press, Cambridge (2007) Gilks, W., Thomas, A., Spiegelhalter, D.J.: A language and program for complex Bayesian modeling. The Statistician 43, 169–178 (1994) Hansson, O., Mayer, A.: Heuristic Search as Evidential Reasoning. In: Henrion, M. (ed.) Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence (UAI 1989). Elsevier, New York (1989) Heckerman, D., Meek, C., Koller, D.: Probabilistic Models for Relational Data. MSR-TR2004-30. Microsoft Corporation, Redmond (2004) Heckerman, D., Mamdami, A., Wellman, M.P.: Real-World Applications of Bayesian Networks. Communications of the ACM 38(3), 24–30 (1995) Jaeger, M.: Relational Bayesian Networks. In: The 13th Annual Conference on Uncertainty in Artificial Intelligence (UAI 1997), Providence, RI, USA, August 1-3 (1997) Jaeger, M.: Probabilistic role models and the guarded fragment. In: Proceedings IPMU 2004, pp. 235–242 (2006); Extended version in Int. J. Uncertain. Fuzz. 14(1), 43–60 (2006) Jensen, F.V., Nielsen, T.: Bayesian Networks and Decision Graphs, 2nd edn. Springer, Heidelberg (2007) Kersting, K., De Raedt, L.: Adaptive Bayesian Logic Programs. In: Rouveirol, C., Sebag, M. (eds.) ILP 2001. LNCS (LNAI), vol. 2157, p. 104. Springer, Heidelberg (2001) Koller, D., Levy, A.Y., Pfeffer, A.: P-CLASSIC: A Tractable Probabilistic Description Logic. In: The Fourteenth National Conference on Artificial Intelligence (AAAI 1997), Providence, RI, USA, July 27-31 (1997) Koller, D., Pfeffer, A.: Object-Oriented Bayesian Networks. In: The Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI 1997), San Francisco, CA, USA (1997) Kolmogorov, A.N.: Foundations of the Theory of Probability, 2nd edn. Chelsea Publishing Co., New York (1960) (Originally published in 1933) Korb, K.B., Nicholson, A.E.: Bayesian Artificial Intelligence. Chapman and Hall, Boca Raton (2003) Langseth, H., Nielsen, T.: Fusion of Domain Knowledge with Data for Structured Learning in Object-Oriented Domains. Journal of Machine Learning Research 4, 339–368 (2003) Laskey, K.B.: MEBN: A Language for First-Order Bayesian Knowledge Bases. Artificial Intelligence 172(2-3) (2007), http://ite.gmu.edu/~klaskey/papers/Laskey_MEBN_Logic.pdf Laskey, K.B., Costa, P.C.G.: Of Klingons and Starships: Bayesian Logic for the 23rd Century. In: Uncertainty in Artificial Intelligence: Proceedings of the Twenty-first Conference. AUAI Press, Edinburgh (2005) Laskey, K.B., Mahoney, S.M.: Network Fragments: Representing Knowledge for Constructing Probabilistic Models. In: Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI 1997), Providence, RI, USA (August 1997) Mahoney, S.M., Laskey, K.B.: Network Engineering for Agile Belief Network Models. IEEE Transactions in Knowledge and Data Engineering 12(4), 487–498 (2000) Murphy, K.: Dynamic Bayesian Networks: Representation, Inference and Learning. Computer Science Division, University of California, Berkeley (1998)
2
Uncertainty Representation and Reasoning in Complex Systems
39
Neapolitan, R.E.: Learning Bayesian Networks. Prentice-Hall, New York (2003) Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988) Pfeffer, A.: Probabilistic Reasoning for Complex Systems. Stanford University, Stanford (2000) Spiegelhalter, D.J., Thomas, A., Best, N.: Computation on Graphical Models. Bayesian Statistics 5, 407–425 (1996) Spiegelhalter, D.J., Franklin, R., Bull, K.: Assessment, criticism, and improvement of imprecise probabilities for a medical expert system. In: Henrion, M. (ed.) Proceedings of the Fifth Conference on Uncertainty in Artificial Intelligence (UAI 1989). Elsevier, New York (1989) Stone, L.D., Barlow, C.A., Corwin, T.L.: Bayesian multiple target tracking. Artech House, Boston (1999) Takikawa, M., d’Ambrosio, B., Wright, E.: Real-time inference with large-scale temporal Bayes nets. In: Breese, J., Koller, D. (eds.) Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI 2001). Morgan Kaufmann, San Mateo (2001)
Abstract. This chapter introduces three engineering methods to support the evaluation of composition and interoperation in complex systems. Data engineering deals with conceptualization of entities and their relations. Process engineering deals with conceptualization of functions and behaviors. Constraint engineering deals with valid solution spaces for data and processes. It is shown that all three aspects must be considered and supported by a solution. The Levels of Conceptual Interoperability Model is used as the basis for the engineering methods. Several current solutions to support complex systems in knowledge-based environments are evaluated and compared.
for the system. From this model, we derive the necessity to align three disciplines in order to understand and manage complex systems: data engineering, process engineering, and constraint engineering. Each discipline will be described respectively in the remaining three sections. The approach presented in this chapter was used to support students and practitioners within several projects conducted in recent years. The applications are in the domain of defense, homeland security, and energy. In all cases, the supported task was twofold: (1) understanding how the current system interoperates, and (2) showing how legacy systems can be migrated to participate in the desired system of systems. Additional examples can be found in (Berstein et al. 2004; Parent and Spaccapietra 1998; Parent and Spaccapietra 2000; Rahm et al. 2004, Tolk and Diallo 2005). The chapter will use the example of a rental company EZ Rental, that has a system to keep track of cars and customers. EZ Rental gets its cars either indirectly from a local dealer A&C, which has a system to monitor cars, customers and parts or directly from CheapCar, a local manufacturer. The manufacturer has a system to record and monitor cars, parts and customers.
3.2 Understanding Interoperation and Composability It may be surprising to some readers that the underlying research is rooted in the discipline of Modeling and Simulation (M&S). However, M&S applications best exhibit the challenges that have to be met when evaluating and managing interoperation and composability tasks in complex systems. The reason is that M&S applications make explicit the various layers of abstraction that are often hidden in other system domains: the conceptualization layer leading to the model, the implementation layer leading to the simulation, and technical questions of the underlying network. Each layer is tightly connected with different aspects of interoperation. We are following the recommendation given by Page and colleagues (Page, Briggs, and Tufarolo 2004), who suggested defining composability as the realm of the model and interoperability as the realm of the software implementation of the model. In addition, their research introduces the notion of integratability when dealing with the hardware and configuration side of connectivity. Following this categorization, we recommend the following distinction when dealing with interoperation: • Integratability contends with the physical/ technical realms of connections between systems, which include hardware and firmware, protocols, networks, etc. • Interoperability contends with the software- and implementation details of interoperations; this includes exchange of data elements via interfaces, the use of middleware, mapping to common information exchange models, etc. • Composability contends with the alignment of issues on the modeling level. The underlying models are purposeful abstractions of reality used for the conceptualization being implemented by the resulting systems. System modeling and architecture approaches, such as summarized in (Buede 1999) and other system engineering books show that a good system architecture is based on a rigid analysis of the underlying requirements, which leads to a conceptualization of necessary actors and processes. This conceptualization drives the architectural artifacts, which are used to build the system based on the available technology. It
3 A Layered Approach to Composition and Interoperation in Complex Systems
43
is also good practice to separate business and application logic from underlying technology. These principles are supported by our approach. On the requirement side, until recently, the support of decision makers often focused on representing data, such as displaying quart charts of available resources over time, etc. However, the advent of intelligent software agents using the Internet introduced a new quality to decision support systems. While early systems were limited to simple situations, the examples given by (Phillips-Wren and Jain 2005) show that state-of-the-art decision support is based on agent-mediated environments. Today, real-time and uncertain decision problems can be supported to manage the decision making process in a highly dynamic and agile sphere. Simple data mining and presentation is no longer sufficient: based on historic data, trend analysis and possible development hypotheses must be developed and compared. This requires a purposeful abstraction of reality and the implementation of the resulting concept to make it executable on computers. These processes are better known as “modeling,” the purposeful abstraction of reality and capturing of assumptions and constraints, and “simulation,” the execution of a model on a computer. In this light, M&S becomes more and more a backbone of operational research coping with highly complex and dynamic environments and decision challenges. Technically as well as operationally, M&S is therefore an emerging discipline. While M&S systems are valuable contributors to the decision makers toolbox, the task of composing them in a meaningful way is everything but trivial. The challenge is not the exchange of data between systems: the technical side is sufficiently dealt with by interoperability standards. In particular web services and other web enabling communication means provide a solid foundation for information exchange. The problem is that the concepts of the underlying models – or the implemented world view captured in the model – need to be aligned as well. In order to be able to apply engineering methods to contribute to a composable solution, several models have been developed and applied. In addition, a machine readable and understandable implementation based on data and metadata is ultimately needed to enable agents to communicate about situations and the applicability of M&S applications. They must share a common universe of discourse in support of the decision maker, which requires a common language rooted in a formal specification of the concepts. A formal specification of a conceptualization, however, is a working definition of a common ontology. This ontology can then be applied to derive conceptually aligned and orchestrated configurations for conceptually composable, technically interoperable, and integrated solutions. The Levels of Conceptual Interoperability Model (LCIM) was developed to support this approach. The LCIM is a model, that represents a hierarchy of capability for representing the meaning (increasingly conceptual in nature, as the model layers are ascended) of information passed between systems, components, or services. It utilizes the experiences made with interoperability models used in the defense domain pointing into the direction that documentation beyond the technical aspects of interoperability are necessary to insure the interoperation of complex systems and systems of systems. The LCIM in the currently used version distinguishes between the following layers: • Level 0: Stand-alone systems have No Interoperability. • Level 1: On the level of Technical Interoperability, a communication protocol exists for exchanging data between participating systems. On this level, a
44
•
•
•
•
•
A. Tolk et al.
communication infrastructure is established allowing the exchange of bits and bytes; the underlying networks and communication protocols are unambiguously defined. Level 2: The Syntactic Interoperability level introduces a common structure to exchange information, i.e., a common data format is applied. On this level, a common protocol to structure data is used; the format of the information exchange is unambiguously defined. At the level of syntactic interoperability, the bit and bytes exchanged can be grouped to form symbols. At this level, systems share a common reference physical data model instance. Level 3: If a common information exchange reference model is used, the level of Semantic Interoperability is reached. On this level, the meaning of data is shared; the content of the information exchange requests are unambiguously defined. Level 4: Pragmatic Interoperability is reached when the interoperating systems are aware of each other’s methods and procedures. In other words, the use of the data – or the context of its application – is understood by the participating systems; the context in which the information is exchanged is unambiguously defined. At this level systems are aware of all the possible groupings of symbols and how those groupings are related. The level of Pragmatic Interoperability implies the awareness and sharing of a common reference logical model. Level 5: As a system operates on data over time, the states of that system changes along with the assumptions and constraints that affect its data interchange. At the Dynamic Interoperability level, interoperating systems are able to comprehend and take advantage of the state changes that occur in the assumptions and constraints that each other are making over time. Simply stated, the effect of the information exchange within the participating systems is unambiguously defined. Dynamic interoperability implies that systems understand how the symbols they exchange are used during run-time. Level 6: Finally, if the conceptual models – i.e. the assumptions and constraints of the “purposeful abstraction of reality” – are aligned, the highest level of interoperability is reached: Conceptual Interoperability. This requires that conceptual models be fully documented based on engineering methods enabling their interpretation and evaluation by other engineers. In other words, we need a “fully specified but implementation independent model” as requested in (Davis and Anderson 2003), and not just a text describing the conceptual idea. At this level, the underlying concepts represented by the symbols are described unambiguously. Systems share a common reference conceptual model that captures the assumptions and constraints of the corresponding real or imaginary object.
The LCIM contributes to composition and interoperation in two important ways. First, it provides a framework for organizing the information concept into distinct, separate, and manageable parts. Second, the LCIM can be used to compare selected – or alternative – protocols, languages, standards, techniques, etc. in terms of their support for each layer. In other words, it can serve as the foundation of a maturity model for both tasks described in his chapter. The last aspect that needs to be mentioned before going into the details of the layered approach to composition and interoperation is the interplay of data, processes, and constraints. Similar observations are well
3 A Layered Approach to Composition and Interoperation in Complex Systems
45
known in the domain of knowledge-based system design, as shown among others by O’Kelly (2006) in recent discussions in expert forums for ontology. In the following section, we will use the LCIM to guide through three necessary engineering disciplines to support the tasks of interoperation and composition for complex systems in knowledge-based environments: data engineering, process engineering, and constraints engineering. In each step, we will add additional artifacts found in knowledge-based environments to gradually increase interoperation and composition in complex systems.
3.3 Applying Data Engineering The process of interoperating heterogeneous systems involves the exchange of data at the physical and logical level. The exchange of physical data or technical interoperability implies solving hardware integration problems to ensure that systems can actually communicate. However, each system has a logical representation of data which usually includes a definition of data elements and their relationships which respect to each other. This metadata needs to be exchanged as well in order to prevent misinterpretations and misrepresentations of data during interoperation. The exchange of logical information including syntax and semantics leads to syntactic and semantic interoperability respectively. This logical representation of data is internal to the system and only makes sense in its environment and internal conditions. As a result, it is important for systems to understand the context in which data exists in order to further avoid variances in representation during interoperation. The exchange of physical and logical data in context leads to pragmatic interoperability. In order to reach pragmatic interoperability an engineering method that systematically captures the interoperation process is necessary. This section introduces the Data Engineering process and discusses the challenges inherent to making systems interoperate at the pragmatic level. As an illustration, the Data Engineering process will be applied to the car businesses use case presented earlier. In the latter part of the section, the authors will introduce a refinement of the Data Engineering called Model Based Data Engineering and discuss its implication for complex systems in knowledge based engineering. 3.3.1 Data Engineering Data Engineering is based upon a simple observation that holds true regardless of the size and complexity of the systems involved. Simply stated, data has a format (structured, unstructured, or semi-structured) and a physical location (text file, relational Database etc…). In order to transition from data exchange to information exchangewhich is the stated goal of semantic interoperability, the meaning of data must also be exchanged (Spaccapietra et al 1992; Parent and Spaccapietra 1998). Since the meaning of data varies depending on the context in which it is used, the context must also be exchanged. The goal of Data Engineering is to discover the format and location of data through a Data Administration process, discover and map similar data elements through a Data Management process, assert the need for model extension and gap elimination through a Data Alignment process and resolve resolution issues through a Data Transformation process. The combination of these four processes enables not
46
A. Tolk et al.
only the transfer of bits and bytes between systems but more importantly, it leads to the transfer of knowledge between systems. The example used in this chapter is taken from a chapter exclusively dealing with model-based data engineering, which we recommend to the interested reader for further studies (Tolk and Diallo 2008). 3.3.1.1 Data Administration Data Administration identifies and manages the information exchange needs between candidate systems. This process focuses first on clearly defining the source system and the target system or the direction of data flow. This is an important step for the future since mapping functions do not always have an inverse. Mathematically speaking, for two sets S1 and S2, any mapping function f has a valid inverse if and only every element of S1 has one and only one counterpart in S2. Simply stated f must be bijective. This is clearly not the case and in fact research shows that while 1:1 mappings do exist, n: m mappings are more prevalent (Parent and Spaccapietra 2000). The issue of complex mapping is addressed in-depth during Data Management. Data Administration also aims at aligning formats and documentation, examining the context of validity of data and asserting the credibility of its sources. A special emphasis is put on format alignment. Format alignment implies that modelers should not only agree on a common format (XML, text file) for data exchange but also that semi-structured and unstructured data be enriched semantically and syntactically. Data administration is the first step to ensuring that systems communicate in a complete and meaningful manner. 3.3.1.2 Data Management The goal of Data Management is to map concepts, data elements and relationships from the source model to the target model. Data Management is the most time consuming and difficult area of Data Engineering. As the literature has shown, mapping can be done either manually or with semi-automated tools. Possible sources of conflict have been studied and classified (Spaccapietra et al. 1992; Parent and Spaccapietra 1998). The emerging consensus is that the manual approach is long and errorprone while tools are not powerful enough yet to act on large and complex systems (Berstein et al. 2004; Seligman et al. 2002; Rahm et al. 2004) To streamline the process, mapping can be decomposed into three distinct sub-processes: • Concept Mapping: This is the inevitable human-in-the-loop aspect of mapping. Experts in both domains (source and target) must agree on concept similarity. At this level, it is important to know if the models have something in common (intersect) and to extract ontologies if possible. Two models intersect if any of their subsets are identical or any of their elements can be derived from one another directly or through a transformation. Two concepts are deemed identical if they represent the same real world view. Parent and Spaccapietra (2000) assert that “if a correspondence can be defined such that it holds for every element in an identifiable set (e.g., the population of a type), the correspondence is stated at the schema level. This intensional definition of a correspondence is called an inter-database correspondence assertion (ICA).” Concept mapping is the listing of all existing ICA. • Attribute mapping: The next logical step is to identify similar attribute. At this level special attention has to be paid to synonyms, homonyms and the inherent context of attributes. Two attributes are said to be equal is they describe the same
3 A Layered Approach to Composition and Interoperation in Complex Systems
47
real world property. It is possible to say for example that an attribute “amount” in a cash register model is the same as the attribute “quantity” in another model if they both refer to “the total number of a product sold to a customer”. As this example shows, attribute mapping cannot be done out of context. If “amount” was referring to “the amount of money given to the cashier” then the correspondence no longer holds. • Content Mapping: Most mapping efforts tend to conglomerate content mapping with attribute mapping. Two values are said to be identical if they can be derived from one another. For example “” is “ + (<state tax>*)”. This example does not say anything about the relationship between the attribute “total price” on one side and the other three attributes on the other side. At the attribute level, equivalence between real world properties is established while the content level deals with how attribute values are derived from one another. The complexity of any mapping effort is directly related to the complexity of these individual components. The amount of effort can be measured by the size of the area of intersection, the similarity in concepts and to a lesser extent the disparity in attributes and the derivability of content. 3.3.1.3 Data Alignment The goal of data alignment is to identify gaps between the source and the target. The focus at this level is to map the non-intersecting areas of the two models by either merging them or introducing a reference model that intersects with the complement. A complete Data Alignment process ensures completeness in mapping and protects the integrity of information exchange. 3.3.1.4 Data Transformation The goal of Data Transformation is to align models in terms of their level of resolution. The assumption that models are expressed at the same level of detail is not true. Furthermore, information that is deemed vital in one model might not hold the same value in another due to disparities in focus, goal and approach between the two. As a result objects need to be aggregated or disaggregated during the mapping process in order to establish correspondences. 3.3.2 Applying Data Engineering to the Example Having described the data engineering process, let’s now apply it to solve the problem described in the example provided earlier in this chapter. 3.3.2.1 Applying Data Administration to the Example In terms of the example provided earlier, Data Administration requires a clear definition of source and target; therefore the team agrees that: • From EZ Rental to A&C Dealerships: For car information the rental company is the source and the dealership is the target. • From A&C Dealership to CheapCar: For customer information, the dealer is the source and the manufacturer is the target.
48
A. Tolk et al.
• From CheapCar to A&C Dealership: For parts information the dealership is the source and the manufacturer is the target. These definitions highlight the fact that source and target identification is not a one time process. The next step during Data Administration is to agree on a common exchange format and perform semantic enrichment to eliminate assumptions embedded within the models. The modeling team decides that XML is the common format that they will use and each model should publish a XML Schema Definition (XSD) encompassing the objects that will be exchanged. The team observes that the dealership and the manufacturer expect string values for their elements while the rental does not specify a type. For the sake of simplify, they agree that any value exchanged will be of type “String”. Each model must add a conversion layer to align its internal type to the agreed type. They further observe that there is a strong possibility of error due to the occurrence of homonyms and synonyms within models and across models. A&C for example has an attribute “Name” for both the “Car” and “Part” element. The dealership refers to the make and model of a car while the rental company has an attribute manufacturer. This begs the question as to whether the manufacturer refers to the make, the model or to both. As a result the team decides to make all of the assumptions and other documentation explicit within the XSD.
Fig. 3.1. XML Schema Definition of the Car Rental Company
Figure 3.1 shows the XSD of the car rental company. It has been augmented with a definition of type, a description of elements and constraints such as unique keys. The manufacturer and the dealer have a similar schema. This XSD shows how XML can be used to better serve the mapping effort. Other models can now use this schema in the Data Management process. It is worth noting that further enhancements are
3 A Layered Approach to Composition and Interoperation in Complex Systems
49
needed in this XSD. The documentation remains a little bit ambiguous (the car type definition does not specify what the enumeration values are for example). 3.3.2.2 Applying Data Management to the Example Data Management is the next logical step in the integration effort discussed in the example. Modelers must focus on identifying and mapping concepts, attributes and content. Let us apply the mapping steps identified earlier. Concept Mapping In this effort, it seems obvious that the concepts of car and parts are identical in both models. However, modelers must decide whether the concept of Dealership is similar to that of Customer. It might be that the manufacturer distinguishes between individual costumers that order online or have specific demands (the information about individual customers might be captured in an “Ind_Orders” object or table for example) and dealerships which are licensed vendors of their products (the “Dealership” object or table). This decision greatly affects the outcome of the integration effort because of the domino effect it has on Attribute and Content Mapping. In this case the decision is that the concepts of Dealership and Customer are related and therefore identical. It turns out that this is the closest possible match because the manufacturer does not take orders directly from individuals. All transactions are done through a dealership. We will see later how this affects the outcome. The concept of Manufacturer is represented as an attribute (Rental Company) in one model and as an object in the other; however, it is clear from the schemas that these are conceptually identical. Attribute Mapping At this level, similar attributes must be identified. Through a good Data Administration process, a close examination of the schemas yields the results presented in tables 3.1, 3.2 and 3.3. The mapping process has to be performed for each interface. Table 3.1 shows that some attributes in the source have an unknown correspondence in the target. We will see how this issue is resolved during Data Alignment. Additionally the figure does not identify how these attributes are related; this is done during content mapping. Table 3.1. Car Attribute Mapping from EZ Rental to A&C EZ Rental Car_ID Type Mileage Manufacturer Manufacturer
A&C Dealership VIN_Number Unknown Unknown Make Model
50
A. Tolk et al. Table 3.2. Customer Attribute Mapping from A&C to CheapCar A&C Dealership Name Location Customer_Type Policy_Number VIN_Number P_Num
CheapCar Manufacturer Name Location Unknown Unknown Car.VIN Parts.Serial_Number
Table 3.3. Parts Attribute Mapping from A&C to CheapCar A&C Dealership Name P_Num Part_Type Make Model
CheapCar Manufacturer Name Serial_Number Type Car_Model Car_Model
Having identified the attributes and their images, the team can now focus on deriving attributes values from the source to the target. Content Mapping The process of content mapping corresponds to generating functions that map the values of attributes to one another. Table 3.4 shows the content mapping between the attributes of a car in the rental model and its counterpart in the dealership model. The functions show that for example the contents of the attribute <Manufacturer> must be decomposed into a make component and a model component and then mapped to <Make> and <Model> respectively. Modelers have to build similar tables for each set of attributes. Table 3.4. Content Mapping of the Car Attributes before Data Alignment EZ Rental
A&C Dealership
Function
car_ID
VIN_Number
Car_ID=VIN_Number
Type
Unknown
Mileage
Unknown
Manufacturer
Make
Manufacturer.Make
Manufacturer
Model
Manufacturer.Model
3 A Layered Approach to Composition and Interoperation in Complex Systems
51
3.3.2.3 Applying Data Alignment to the Example Data Alignment addresses the holes represented by the “Unknown” fields in tables 3.1, 3.2 and 3.4. The recommended approach here is to either extend the target model or simply leave these attributes out because they are not important to the target model. In the car example, the modelers mapping the car concept from EZ Rental to A&C recognize that “type” is an attribute proper to the rental business and therefore decide not to include it in the exchange. The Mileage attribute on the other hand is very important to the dealership because it is a trigger in their decision making process. As a result they agree to extend their model by adding a “mileage” attribute to the car object. Table 3.5 shows the resulting mapping. Table 3.5. Mapping of the Car Attribute after Data Alignment EZ Rental
A&C Dealership
Function
car_ID
VIN_Number
Car_ID = VIN_Number
Mileage
Mileage (extended)
Mileage = Mileage
Manufacturer
Make
Manufacturer.Make
Manufacturer
Model
Manufacturer.Model
3.2.4 Applying Data Transformation to the Example In order to better illustrate Data Transformation, let’s assume that EZ rental decides to add a garage to make small repairs to their cars rather than use the dealership. As a result they would like to order parts from CheapCar directly. They extend their car model by adding a “parts” attribute to it. Let’s also assume that CheapCar has a different model for the parts side of their business and decide that they want to use this model to collect parts information from the rental company. It is obvious that a process of aggregation/disaggregation has to take place between the two systems. 3.3 Model Based Data Engineering Data Engineering presents a series of processes for modelers to use during integration projects in static environments. However, in rapidly changing environments, modelers need the capability to add emerging models without having to start over from scratch. The introduction of a Common Reference Model (CRM) as an integral part of Data engineering leads to Model Based Data Engineering (MBDE). Defining a Data Engineering process is a good step towards the goal of interoperability. In general, solutions derived from a systematic application of the Data Engineering process often lack flexibility and reusability. The traditional approach when integrating heterogeneous models is to create proprietary connecting interfaces. This results in peer-to-peer (P2P) connections that are satisfactory in static environments. However, in this era of rapid communication and globalization, the need to add new models can arise at any moment. Mathematically speaking, federating N models using P2P connections will result in N *( N − 1) ÷ 2 interfaces. In past decades, flexibility and reuse have been
52
A. Tolk et al.
neglected and more effort has been rightly directed at identifying and resolving mapping issues. In order to avoid those same pitfalls, the Data Engineering process must include rules and guidelines addressing these issues. For the solution to be flexible and reusable, the argument has been made (Tolk and Diallo 2005) that implementing Data Engineering must include a CRM. A valid CRM must represent: • Property values: For any model M containing a list of independent enumerated values V1, V2…Vn susceptible of being exchanged, there must be an exhaustive set S of unique enumerated values in the reference model such that SV = { V1,V2…Vn}. SV can be extended as more models are added to the federation. • Properties: For any model M containing a list of independent attributes A1, A2,…, An susceptible of being exchanged, there must be an exhaustive set S of attributes in the reference model such that SA = { A1, A2…An}. SA can be extended as more models are added to the federation. • Propertied Concepts: For real life objects O1, O2…On susceptible of being exchanged, there must be a set S of independent objects in the reference model such that SO = {O1, O2…On}. Objects can be added as new models join the federation • Associated Concepts: For any set of Objects O, linked through a relationship R describing real world concepts C1,C2…Cn susceptible of being exchanged there must be an exhaustive set S of concepts in the reference model such that SC = { C1, C2…Cn} The main advantage of MBDE is the creation of a series of information exchange requirements with a specific input/output data set and format to which all participating models have to abide. It becomes in fact the common language spoken and understood by all members of a federation. In MBDE, models interoperate through the CRM. Each model understands the language of the CRM and can therefore exchange information with any other model. While MBDE moves us one step closer to our ultimate goal of conceptual interoperability, it is important to recognize that a reference model should facilitate the information exchange needs of participating systems and not impose its view of the world on all systems. This means that while the CRM has its structure and its business rules, it must be flexible enough to provide a way for participating systems to fulfill their informational needs. As an analogy let’s look into the computer science world. A computer language such as C++ or java allows all programmers to express themselves by providing constructs that allow one to build complex structures. However nobody is forced to think in a computer language. The computer language is just the environment in which their ideas come to life. Similarly, the CRM is the environment in which systems that wish to interoperate can present their view of the world and learn other world views. With MBDE, systems can exchange information up to the pragmatic level. Even though the context of information exchange cannot be fully specified using MBDE, due to its inability to describe processes, systems can nonetheless fully specify their world view with respect to data. MBDE provides a first step into the pragmatics of information exchange. The pragmatic level is fully attained by applying process engineering to describe the processes that make up the system.
3 A Layered Approach to Composition and Interoperation in Complex Systems
53
3.4 Applying Process Engineering The traditional view for systems engineers to consider information systems is as a number of different states that the system progresses through internally. Within the system, a collection of data items exist and are transformed as the progress from state to state. Processes are acknowledged to exist, but they are considered to be the linkage between states. For our purposes in this chapter, we will consider a state to be a complete snapshot of the system. This is a snapshot of all the data in a temporarily frozen state; using the terms of MBDE, this means that all of the property values of the propertied concepts have a single value from a specific instant in the operational span of the system. The following section intends to show that for complex systems to be interoperable, especially interoperable in the sense that the higher levels (pragmatic and above) of the LCIM describes, then it is important to not only understand and describe the processes of the interoperating systems, but where there are significant differences, that addressing those differences is key to composition. We have seen how, by following the four principles of model based data engineering, how we can accommodate levels of interoperability between complex systems up through the semantic level, and begin to address some of the concerns of the pragmatic level. Even at the levels addressed, within the LCIM, thus far – data is not the only concern. There are also the internal processes of the systems as well as the constraints placed on the system (described further below). Finally, the overall reason for the individual systems, as well as the reason for combining them in a system-of-systems, derives from some organizational or business model. All four of these things must be considered. In figure 3.2 we see that these four elements – data (as represented by model based data engineering), processes (as represented by model based process engineering), organizational/business models, and finally constraints – all affect the level of achievable interoperability. The first three of these vary with the level of influence they exert; the fourth – constraints – remains potentially consistent for all achievable levels of interoperability. Level 6 Conceptual Interoperability
Fig. 3.2. Incremental Influences on Interoperation
54
A. Tolk et al.
Within a complex system, especially those where the internal activities may be describable but not predictable, understanding the processes (rather than just the data) is increasingly important for the pragmatic and dynamic interoperability for the following reasons. For pragmatic interoperability, the requirement is to understand the system context that the exchanged information between interoperating systems will be placed, once it reaches the target system. Likewise the target system of the exchange must understand the system context out of which the exchanged information came from in the originating system. An additional requirement at the dynamic level of interoperability is to not only have awareness of this context, but to also have awareness of how it will change, dynamically, with the execution of the system. To see what the system context is, for information to be transmitted or received, we have to consider, for a moment, what a system is. By taking the minimalist view that a system consists of data and processes for that data, then the context that the information is in will be not only the state of the system at the moment that context is considered, but also the processes that the data making up the information are involved in at that moment. This leads to a slightly different consideration of the internal workings of the system than is typical in most traditional system representation methods. Considering the early (but effective) methods of flow-charting, state machine diagrams, ladder diagrams – all of these concern themselves with the states of the system, and the connectivity between the states just illustrates the path from one state to another. But the interesting effects on the data, especially from a contextual point of view, occur during the processes that lead the data from state to state. This is where the emphasis must be placed in first understanding context, and then (especially) in having awareness of how that context may change. 3.4.1 Introducing Processes into the Complexity Viewpoint Based on the reasons and descriptions given so far, we see that the observing of and aligning of information concerning the processes of a system is of increasing importance compared with aligning information concerning data in order to achieve levels four (Pragmatic Interoperability) and above, in the LCIM. It then follows that some method for applying engineering methods to a representation of those processes must be possible, in order to enable the interoperability. To get to the ability to apply these engineering methods, first a clear picture of the process of a system must be held, rather than the (in comparison) static states of a system. Typically, the traditional view towards modeling a system is very state oriented, where the various states within the system are highlighted, but the processes whereby the transitions between states take place are reduced to just connections (in the view) between states. In this view (where states and their connections are the emphasis), what is not shown are any details concerning the activities that are part of the transition between states. The procedure of focus on the data from state to state is shown, so that process is implied, but there exists (in this view) no attempt at defining the nature of the process. Some of the questions that a modeler of a system might immediately ask are concerning time, the transformation of data elements, and what the change between states represents. Why would focus in the system change from one state to another? How long would it take to complete this change? What functions might come to bear on the data involved? These details concerning the processes in a system are not answered in the traditional system modeling techniques.
3 A Layered Approach to Composition and Interoperation in Complex Systems
55
By introducing a label to each relationship between states, we now have a reference point to insert attribution to the change between states. We can introduce the information about the process of change that answers some of the questions asked above. It is shown in (Sowa 2002) that a co-emphasis must be placed on considering states of data as well as the processes that transition between states. This co-emphasis (with an increasing share relying on examining and representing processes, when considering the requirements for pragmatic and dynamic interoperability) is what sets the motivation for process engineering (described in the following sections). In order to connect the results of process engineering with data engineering, it is a method of change where one or more propertied concepts have their property values changed. This method of change may have initialization requirements; it will introduce one or more effects (changes in property values) into the system; it may take some amount of time in order to complete; it may be an ongoing process that will have specific halting requirements defined; and it may introduce post-process conditions into the operational flow of the system. These are all required to be made explicit in order to fully identify and define a process. 3.4.2 Process Engineering By following this technique, or any of a number of other modern techniques for describing processes, it is possible to capture some of the attributes of a process. Knowing that this is possible (regardless of the particular technique used) is an important precursor to the next four sections, which introduce the four steps of Process Engineering, roughly corresponding to the four steps of Model Based Data Engineering. 3.4.2.1 Process Cataloging The first step necessary to follow in the engineering of processes for alignment is that of Process Administration. The important goal to achieve during this stage is the cataloging of all the processes within the systems concerned, and an understanding of where, in the operational mode of the system, these processes are to be used. In addition to an enumeration of all the processes for a system, a method for expressing relationally each process to the states between the processes is desired. This will not only give the process engineer the foundation he needs in order to move to the next step (Process Identification), but also begins to provide the context for any data that made provide information for interoperation with other systems. 3.4.2.2 Process Identification Once the Process Cataloging step is completed, the next step in Process Engineering is to provide identity for each of the processes identified in the catalog. By identity, what is meant is a clear description of what the process does, what its resource and time requirements are to successfully complete, and what data it operates on. Some of this information was already described when transitions for Petri nets were introduced, earlier, but that is hardly an exhaustion of the topic. A successful technique for this step would include 1. Process timing (not only how long the process will take, but also when in the operational span of the system the process is able to be enacted); 2. Process initialization requirements (in terms of resources – data and system state);
56
A. Tolk et al.
3. Process effects (which effects are enacted on affected system resources – if data, are attributes or relationships altered? Are new data created? Are previously existing data destroyed?); 4. Process halting requirements (when does the process end – does it continue until there are no longer any resources available? Will it halt after affecting only a certain group of resources? Are these questions answered based on the identity of the process, or are they determined by some subset of the initialization requirements?); 5. Post-process conditions (what effects on the overall system are put into place by the completion of the process? Is the ordering of other processes affected by the completion of this process? Are new states or processes created conditional on this process completed?). A complete enumeration of these five elements (timing, initialization, effects, halting, and post-process conditions) would provide the identity that a complete satisfaction of this step requires. 3.4.2.3 Process Alignment The next step Process Engineering is the alignment of processes. This is the comparison of the information provided in the preceding step for two processes that are part of the exchange of information for interoperability. The simplest level of alignment occurs when the initialization requirements of one process are met, at least partially, by the results of process effects of another process, from another system. In fact, for pragmatic interoperability this is enough – the ability for information derived from the resulting effects of one process to be considered as part of the initialization of another process. What occurs internal to the process of question is not necessarily important for pragmatic interoperability. For dynamic interoperability, it is important to understand the other identity attributes of the processes involved – especially as concerns halting conditions, timing requirements, and post-process conditions. These can all have an effect on the overall state of the operational span of the system in question, and as such, are important to be considered for dynamic interoperability to take place. Where this step in Process Engineering is unable to completely align two different processes, the result of the applying the step should be an understanding of where the two Processes differ. Again, in simplest terms, this could be a simple difference between what results from the Process Effects of the information producing process, and the Process Initialization Requirements of the target process. It is possible that such a difference can be satisfied by applying the steps of Model Based Data Engineering. Other differences can be considerably more profound and complex, and are the subject of the next, and final, step in Process Engineering. 3.4.2.4 Process Transformation In the cases where Process Alignment has identified differences between processes that cannot be handled by the steps of Model Based Data Engineering, then it may be necessary to perform some transformation between the processes in question. In this case, whatever differences are identified between processes are to be considered to see if they can be accommodated by some middle-ware process – a transformation process. In the case that this is possible (or if simpler differences are addressable via MBDE), then a composition of the processes is possible. If the differences between
3 A Layered Approach to Composition and Interoperation in Complex Systems
57
the processes are of such a nature that they cannot be addressed by a transformation, then it is likely that composition of the processes is not possible. The transformation process, if one is both necessary and possible, will appear to connect the two states that exist, post-process in the information producing system, and preprocess in the target system. In both of these respective cases, there are already likely to be post-processes or pre-processes internal to the system, in addition to this new state that branches out to accommodate the system-to-system information exchange. 3.4.3 Process Engineering for Interoperability When systems are to be aligned, for interoperability, and we are working from the premise of defining the systems in terms of not only states, but also the processes that occur between the states, then it becomes necessary to consider the alignment of processes between systems, rather than just the state of data produced by these processes. This is the premise of both the pragmatic and dynamic levels of the LCIM. For pragmatic interoperability, not only is the information exchanged between systems of interest, but knowledge of the context that the information exists within for each system is also of interest. It is because of this requirement for context awareness that process representation begins to become as important, or more important, than data representation (fig. 3.2). In order to be aware of this context, the processes that presented the information must be identified, and also the relationship of those processes to the operational span of the system they belong to. This establishes the context of the information – the process that produced it, and the states and other processes the preceded that production process. In similar terms, the context of the target system should also be made available to the producing system. What process will first operate on the information at the target system once it receives it, and what states and processes preceded that receiving process? By having this knowledge, the system engineer can ensure pragmatic interoperability by addressing the needs of Data Alignment and Data Transformation, so that the data that is to be part of the information exchanged between systems includes any particulars that might be affected either by the context of the system of origin, or the target system. 3.4.4 Applying Process Engineering to the Example Continuing with the example of the earlier portion of the chapter, we will now apply some examples to Process Engineering, which will illustrate the four steps of Model Based Process Engineering. As a background for the examples given, consider the following business (organizational) model. An automobile manufacturer is relying on automated information received from both dedicated (dealership) repair facilities, and automobile rental companies that handle the manufacturer’s brand of product. By aligning not only the information being exchanged between the systems of these three organizations, but also the processes involved, levels of interoperability can be achieved. 3.4.4.1 Example of Process Cataloging The processes belonging to the systems to be made interoperable must be cataloged prior to the other steps being followed. In the case of the example, the process for the automobile manufacturer that sets production levels for parts manufacturing is one
58
A. Tolk et al.
such process – as the goal of this example is to show how this process can be made interoperable with other processes. Similarly, the processes of the automobile rental company dealing with reservations, and the processes of the automobile repair facility dealing with parts ordering are required. Note that even for our simple example shown in table 3.6, this list is not complete – it only exists to serve as a guide. Table 3.6. Example for a Process Catalog
Process Name Part Manufacturing Demand Rental Fulfillment Service Model Manufacturing Demand Part Order Generation
Process Catalog System of Origin Manufacturer Factory Control System Automobile Rental Reservation System Manufacturer Factory Control System Automobile Repair Inventory System
Table 3.7. Example for Process Identification Process Identification for: Part Order Generation The process begins once the database indicates that the number of particular parts on hand have fallen below an “order now” level; the time to execute the process is based on how long it takes the system to generate output, there is no other time consideration. Initialization Requirements There are four initialization requirements that must be observed for this process to begin – first, the parts inventory control process has initiated the check for parts ordering for a particular part number; second, the flag for “automated part ordering” must be set in the database (the flag may be turned off manually or by other processes for any number of operational reasons); third, the part order generation process must be aware of any “parts on hand” values that are below their associated “order now” values; fourth, the particular part whose inventory has fallen below the “order now” value cannot already have a “parts ordered” flag set. Effects of the Process There are two effects of this process having been executed successfully. The first is that a data record is created for a part order, and this data record is both stored internally and sent to the parts manufacturer. The second effect of the process is that the “parts ordered” flag for the particular part that is ordered is set. Halting Requirements The halting requirements for the process are simple, once all of the initialization requirements are satisfied, determining that the process will execute, then if all of the effects of the process are successfully satisfied, the process will halt. Post-Process Conditions Once the process has been successfully executed and halted, then the post-process conditions are this – a data record of a part order has been generated, stored locally, sent to the manufacturer, and the “parts ordered” flag is set. Timing
3 A Layered Approach to Composition and Interoperation in Complex Systems
59
3.4.4.2 Example of Process Identification For each of the processes catalogued, the basic details that define the process must be provided to gain a proper identification of that processes attributes. As an example, we will take the process of Part Order Generation, and provide the basic details that identify the process’s attributes. Note that other processes, not considered here, would take care of checking to be sure the order record is received by the manufacturer, and that the “parts ordered” flag is reset upon the inventory increasing. This is just one example of one process. 3.4.4.3 Example of Process Alignment The step in Process Engineering of process alignment is described by two different examples, one based on a case where pragmatic interoperability is desired, and one based on a case where dynamic interoperability is desired. In the first case, we will look at what is involved in aligning two processes, namely the Rental Fulfillment Service process for an Automobile Rental Company, and the Model Manufacturing Demand Service process for an Automobile Manufacturer. The business model, for the first case, that exists for these two services being interoperable is simple – the Model Manufacturing Demand Service automatically sets the desired production levels for certain automobile models at the manufacturing plant, based on input that indicates the perception for need for those models. One of the sources for that input is the outcome that is produced by the Rental Fulfillment Service process, for an Automobile Rental Company. The assumption here is that the models that are requested more at the rental company are the models that the public has a higher demand for. For pragmatic interoperability (case one), the effects of the originating process – the Rental Fulfillment Service – are aligned so that the data elements that are produced as an effect of the process being run are then considered as data to satisfy the input requirements for the receiving process – the Model Manufacturing Demand Service. It is likely that a transformational process will have to be relied on. For our example, assume that the data that comes out of the Rental Fulfillment Service indicates quantity of each model requested during a certain period. The transformational process would take that data and then compare the data to some assumption based median value, to see if there is more or less than the anticipated demand. If more, then the transformational process would then send data representing a demand for increased manufacturing to the Model Manufacturing Demand Service, for the particular model in question. In the second case, dynamic interoperability, we can use a slight modification of the example already given. Assume that the internal workings of the process are of interest – in other words, the initialization state, the end state, the nature of data transformations, and details about the timing of the process – so that the receiving process can make better use of information it receives. This information is in context, but it also shows the dynamic nature of that context to the receiving system, because it now has specific information about the dynamic context within the originating system. In our example this could be a demand for more information by the Model Manufacturing Demand Service from the Rental Fulfillment Service, so that a specific understanding of the models requested has a deeper meaning. This could be specifics based on the timing of the data, the initialization state of the Rental Fulfillment Service (were only two different models available?), the data transformations (did one incom-
60
A. Tolk et al.
ing transaction produce a multitude of requests for one model for simplicity’s sake, rather than customer opinion?), and so on. This gives a more dynamic picture of the originating systems context for the information being produced by one process for another, and allows for a deeper understanding of the meaning of that context. 3.4.4.4 Example of Process Transformation Transformation between processes leverages what we have already seen from data engineering, and applies it to the exchange of information between processes. In the preceding example (of the Rental Fulfillment Service producing information that can be used by the Model Manufacturing Demand Service), it was shown that a transformational process was relied on to exchange the information. The effects of the originating process produced (among other things such as internal state changes, and resource depletion) information that was useful by the receiving process, but the likelihood of the information being data of the correct resolution, level of aggregation, and appropriately labeled are very low, so a transformation process would have to adjust for those things. In our example case, some modification of resolution of the data is required, to adjust for the raw (high granularity) output of the originating process, to the adjusted and normalized (low granularity) input for the receiving process. Another likely candidate, in our example, for transformation would be aggregation based on different timing issues. If, for instance, the originating process is one that runs for a 168 hour (1 week) period, yet the receiving process is one that is expecting information representing 1 month of operations, then there must be some time aggregation applied to the data. So far, we have seen how data engineering (and MBDE in particular) is key in aligning the data elements that have to be produced, and how these techniques can provide for the majority of interoperability needs for complex systems that wish to interoperate at levels up to the Semantic, and perhaps (in a few cases) the Pragmatic levels of interoperability. Based on the information in figure 3.2, it can be shown that the relationship between demand for data engineering, compared to process engineering, is a shifting paradigm, and as the higher levels of interoperability are desired, more reliance on process engineering is crucial. Even as process engineering becomes key, however, at levels such as pragmatic and dynamic interoperability, our definitions of the steps of process engineering, and the example cases given, illustrate how it relies on (in partnership) data engineering. The two are equally important, in shifting levels of responsibility, throughout the range of interoperability. In comparing process engineering to data engineering, it can be seen that by relying on some model based method for achieving the four steps identified as part of process engineering, that a move to MBDE can be achieved. Examples of some approaches that might allow for this are the Model Driven Architecture (specifically relying on the Unified Modeling Language, or UML) (Siegel 2005), or the Business Process Execution Language (Jordan and Evdemon 2007) as techniques to model the processes and apply the four steps to.
3.5 Assumption-Constraint Engineering Combining components into complex systems and into systems of systems requires processing information on several levels. At a minimum, to be integrated successfully
3 A Layered Approach to Composition and Interoperation in Complex Systems
61
components must be described in terms of their inputs, controls, functions and outputs. Successful re-use of system components additionally depends on a capability to index, search for, retrieve, adapt and transform (if necessary) components, not only with respect to their data, but also with respect to the conceptual models that underlie them. A conceptual model always involves a particular viewpoint. This may be the viewpoint of the model developer, system integrator, federation member, verification team member, model results user, and so on. Aligning conceptual models means, ultimately, mediating between these viewpoints. The difficult part in providing conceptual alignment between models is handling assumptions and model constraints to resolve conflicts between domain views. The difficulty of doing so varies. Within a small, specialized community of practitioners, the sharing of a common viewpoint is easier than between a large, diverse group. Robinson (2007) defines conceptual model as, a non-software specific description of the computer simulation model (that will be, is or has been developed), describing the objectives, inputs, outputs, content, assumptions and simplifications of the model. Robinson argues that the conceptual model establishes a common viewpoint that is essential to developing the overall model. Conceptual modeling captures aspects of interoperability that extend beyond data and process definition, specifically addressing understanding the problem, determining the modeling and project objectives, model content (scope and level of detail), and identifying assumptions and simplifications. King and colleagues (2007) make the case that the failure to capture and communicate the details of conceptual modeling decisions is at the root of model interoperation conflicts. Often, design decisions made during implementation become undocumented changes to a conceptual model. As a result, not all aspects of the conceptual model, its specified model, and modeling artifacts of the implementation are captured. Among the reasons why this is so are: • The cost in time and effort to document every modeling decision is prohibitive. Tools to capture design decisions suffer a variety of problems. • The discipline of design decision capture is not widely taught, practiced or enforced in software development projects. • In many domains of discourse, the required ontologies for expressing and reasoning about concepts lack maturity. • Data transformations to facilitate model reuse often introduce undetected model changes. • Application of iterative development paradigms to modeling and simulation can exacerbate these problems. Iterative development paradigms vary considerably in the number and type of iterations involved. Some paradigms, such as Incremental, Spiral and Evolutionary Development, depend on conceptual model refinement. Even so, there is no guarantee that the conceptual model is faithfully updated. Thus, when it becomes time to integrate models at the very least there will be some conflicts between them—owing to the failure to capture conceptual models fully. The effects can range from very benign (and unnoticed) to catastrophic. See Pace (2000) for discussions of the consequences of failures in conceptual modeling as it relates to system architecture. Our research has shown that conceptual interoperability implies the following characteristics in the system solution:
62
A. Tolk et al.
• Unambiguous Meaning - objects, characteristics, processes and concepts require unambiguous meaning. That is, a basis in ontology is critical. Ontology captures knowledge about a domain of interest. Ontology is the model (set of concepts) for the meaning of those terms and thus, ontology thus defines the vocabulary and the meaning of that vocabulary within a domain. By encoding domain knowledge (i.e. its properties, values and concepts), ontology makes it possible to share and to reason about it. Note that unambiguous meaning does not mean that some sort of global “reality” needs to be established. What is needed is a means to describe unambiguously what we mean when we model an object or process. Because meaning comes as much from the viewpoint of the modeler as from the object itself, solving the requirement for unambiguous meaning requires a capability to mediate between different points of view. • Use of a Supportive Framework - The framework borrows heavily from developments in service-oriented architectures (SOA), where business rules are invoked using web services. Selecting among business rules is very much like selecting among model components—but there is an important difference. Current web service practice is done with only textual metadata, not based in ontology. The characteristics of the framework include use of middleware, mediator agents, or another type of ‘glue’ to assemble components within the experimental frame. The glue should be well integrated with tools for managing ontologies (i.e. managing terms, concepts, relationships and process descriptions) in URI-compatible form for web distribution and access using protocols such as RDFS and OWL-DL. The framework should generate alerts as they occur related to mismatches in conceptual models, model parameters, assumptions and constraints. The resolution of potential conflicts may involve human or agent-based decision-making and should be recorded in a URI-compatible form. Thus, framework can recall and present the results of prior resolutions to assist in adjudication. • Functional Composability of Parts – In conceptual linkage, the model parts being joined must internally be functional compositions. Functional composability was introduced by King to better explain certain limiting cases when linking models in complex systems (interaction, evolution of system, infinity of states, transformations that result in information loss, and conceptual model misalignments) (King 2007). When making a functional composition, the outputs of one model become the inputs to another without ambiguity or unintended effect. • Alignment of Modeler’s Intent – modeler’s intent is the intention of the systems developer, stated explicitly or implicitly, that objects and processes be represented in a certain way. Alignment of modeler’s intent extends and builds upon conceptual modeling—it is concerned in identifying and resolving differences between the conceptual models of system components. Alignment of intent requires addressing the constraints and assumptions that underlie the model, system, and environment. The term conceptual linkage was suggested by King (2007) to refer to the application of these four requirements to achieve conceptual interoperability. Additionally, he argued that this list of requirements was not necessarily complete. In the context of this chapter, we will focus on the roles of assumptions and constraints and recommend a collection of best practices.
3 A Layered Approach to Composition and Interoperation in Complex Systems
63
3.5.1 The Role of Assumptions The problem of how assumptions are formulated and handled (or not handled) has received comparatively little research and must be addressed before conceptual interoperability can become a reality. Garlan and colleagues (1995) introduce architectural mismatch that stems from mismatched assumptions a reusable part makes about the structure of the system it is to be part of. They note that these assumptions often conflict with the assumptions of other parts and are almost always implicit, making them extremely difficult to analyze before building a system. They show how an architectural view of the mismatch problem exposes several fundamental challenges for software composition and suggest possible research avenues needed to solve them. They identify four main categories of architectural mismatch: • Assumptions about the nature of the components, including infrastructure, control model and data model assumptions about the way the environment will manipulate data managed by a component. • Assumptions about the nature of the connectors, including protocols and data model assumptions about the kind of data that is communicated. • Assumptions about the global architectural structure, including the topology of the system communications the presence or absence of particular components and connectors. • Assumptions about the construction process. Assumptions should be a fundamental part of every problem solution. A first (and frequently overlooked) step in problem solving is for the analyst to identify the problem’s assumptions. Many assumptions remain hidden and unrecognized until a deliberate effort is made to identify them. Often it is the unrecognized assumption that prevents a good solution. Assumptions are necessary for three reasons: • Assumptions reflect desired values that should be maintained throughout the solution. • Assumptions set limits to the problem and thus provide a framework within which to work. These limits might include constraints of possibility, economics, or some other desired narrowing. • Assumptions simplify the problem and make it more manageable by providing fewer things to consider and solve. A problem with no assumptions is usually too general to handle. 3.5.1.1 Assumptions and Constraints For our purposes, we will define: • • • •
Assumptions are statements, taken for granted, about the components, connectors, architecture or construction process of a system solution. Properties are the characteristics of system elements. Constraints specify required properties. (If something is not required then it is a consideration instead.) Considerations define desired properties. (Desire is relative—if something is desired so strongly that it is required, it is a constraint instead of a consideration.)
64
A. Tolk et al.
Assumptions can be stated explicitly or can be implicit. Assumptions can be direct statements such as a single proposition, a complex combination of propositions, or can even encompass an entire theory (e.g. classical mechanics). Additionally, assumptions can be derived from other assumptions through the application of logic. A constraint is a condition that a solution must satisfy what a data item, a component or a process must or must not be or do. The set of constraints on a system define the set of feasible and valid solutions. Often, constraints are explicitly stated. However, implicit constraints exist and can be derived by considering how the assumptions affect or shape the solution. To illustrate, consider the consequences when one part of the system uses an assumption that matches closely, but not precisely with another. For example, suppose one part-ordering model uses an assumption, “all Ford parts are made only by Ford” while another uses “Ford parts are made by Ford.” In logic, the second could validly conclude that Ford parts could also be made by manufacturers other than Ford. Now suppose that the system is trying to optimize part availability by minimizing transport time and that a third party manufacturer can deliver a generator faster because it is closer. Here is where an assumption can constrain the system. The modeler picks the first model if, perhaps, a restriction for only genuine parts is appropriate. The point is that picking the first model constrains the system to genuine Ford parts. Alternatively, if time is money and replacement parts are satisfactory, the second model is chosen. Thus, the choice of model constrains the system because of the model’s assumptions. It is a rare for an organization to set out all the important assumptions it makes. More typically, those assumptions must be identified from documentation, interviews, and observation. Even when such assumptions are written down explicitly, unstated, implicit assumptions remain to be identified and are evident only upon reflection and study. A domain of discourse can have many explicit and implicit assumptions. A domain expert is someone who understands what the implicit assumptions of the domain are, and who can apply them to constrain a system solution. Comparing assumption lists between components, environments and systems in this manner yields constraints. Hidden constraints are those derived from examination of the system’s implicit assumptions. Although hidden, they are nevertheless valid. History has shown that failure to uncover and resolve hidden constraints sometimes leads to catastrophic failures. input
output
input
f(i)
g(i) (assumed compatible) system
model
Assumption/Constraint List f
Compatible?
Assumption/Constraint List g
Fig. 3.3. Aligning assumption/constraint lists
3 A Layered Approach to Composition and Interoperation in Complex Systems
65
Figure 3.3 is our recommended view of the process of aligning models that highlights the roles played by assumptions and constraints. As shown, the components involved in the process are a model, a description of its inputs and output, a system that the model is being integrated into, and a description of the system input. To simplify the discussion we stipulate that the model output has been syntactically and semantically aligned with system input, and that an ontology exists for both model and system. To determine compatibility, the process creates and compares lists of assumptions and constraints between model and system. This requires a formalism for representing them. 3.5.1.2 Formalism for Representing Assumptions and Constraints It has been shown in (King and Turnitsa 2008) how assumptions are defined, characterized, used – or misused – by modelers. It was also shown how assumptions can be used to identify potential conflicts between domain views. The following presents our recommended definition of assumption or constraint in terms of its component parts (use function, referent, scope, and proposition).
Assumption/Constraint <=>
(useFN referent scope Proposition)
where: useFN describes how the assumption/constraint is used by the model or system (uses | does_not_use | requires | ignores | denies)
referent is the model or system component that the assumption/constraint is about scope is a description of which parts of the system the assumption/constraint refers to (component, system, environment, component-system, etc.)
Proposition is a statement about the referent’s existence, relations, or quality
Fig. 3.4. Assumption/Constraint composed of useFN, referent, scope, and proposition
In figure 3.4, the components are defined as follows: • Use function: The use function describes the role of the assumption/constraint in potentially modifying system behavior. Note that the latter three enumerated terms can establish an assumption’s relevance with respect to a system solution.
66
A. Tolk et al.
• Referent: The referent of an assumption/constraint is the entity to which it refers. A referent can be an object, a model, a process, a data entity, a system, or a property of one of these. When an assumption acts as a constraint, the referent is what is being limited by the assumption. • Scope: Scope is a description of the portions of the overall system to which the assumption/constraint can be applied. • Proposition: The proposition of an assumption/constraint is what it is saying – the statement that it is making. Propositions are not restricted to simple concepts—they may encompass the content expressed by theories, books, and even whole libraries. 3.5.2 A Process to Capture and Align Assumptions and Constraints Using the formalism developed above, the assumptions and constraints that define and shape each system component can be captured and compared to determine their compatibility with each other. To ensure success, the four step process should be performed in collaboration with both a domain expert and an ontology engineer. 3.5.2.1 Capturing Assumptions and Constraints The first step is to capture the assumption and constraint propositions for the model, system and environment. Each proposition begins as a natural language statement about the problem, one or more of its components, or its solution. The objective is to write down what the main concepts are, as this will form the basis of the ontology content. It is not necessary to document absolutely everything, just the things that are known to be within scope or that are important. Several factors will determine how much information is gathered in this step. In the initial stages of system development, few assumptions may have been made and “place holders” may represent model components whose characteristics are nebulous at the start. In later stages of development, and in particular when capturing the assumptions of legacy models and systems, a wide variety of data may be available in the form of design documents—sifting through these to extract only important information may prove challenging. The domain expert will need to pay attention to the difference between core and secondary concepts. Core concepts are those terms (usually nouns) that are central to the model or system—their absence would result in an incomplete description of the domain. Secondary concepts are those that are not central to the domain but that are required to complete the definition of core concepts. Obviously, core concepts should be documented thoroughly whilst secondary concepts need receive only limited detail. 3.5.2.2 Encoding Propositions The output of the first step is a list of propositions expressed in natural language statements. These must be encoded in a knowledge representation language (e.g., OWL-DL or KIF) before they can be used by a software agent such as a description logic reasoner. Each proposition will consist of its axioms and logical assertions that relate it to other concepts and propositions. Since we are building an ontology, this will require, as a minimum, use of an ontology editor or ontology-building tool (e.g., Protégé or SWOOP).
3 A Layered Approach to Composition and Interoperation in Complex Systems
67
The second process step also consists of assigning an assumption function, referent and scope to each proposition in both the model and the system lists. This establishes the relevance and use of each proposition. We recommend a search for existing ontologies to determine if core concepts have already been defined. If an ontology is found, special care should be taken to ensure that the semantic description in the existing ontology matches that desired of the core concepts. As long as there are no conflicting statements, the ontology engineer may wish to consider reusing the existing ontology. It is possible (indeed likely) that some propositions are found that encapsulate others. For example, the use of Newton’s second law encapsulates the concepts of force, mass and acceleration (which depend on the concepts of position and velocity). At this point, the domain expert and ontology engineer have an important trade off to consider, that of complexity vs. computability. The output of this step is list of statements encoded in a knowledge representation language—our list of assumptions and constraints for the component. 3.5.2.3 Comparing Assumption/Constraint Lists The third process step is to perform a comparison of the model and system lists. The task of comparing assumption lists requires a multi-level strategy to be effective. Consider an example of ordering parts for the body shop previously discussed—the user wants to know when the part will be available. Assumptions about transportation time when ordering are presented in table 3.8 for several alternative models. Table 3.8. Assumption function alternatives for modeling transportation time useFN alternatives for model f
Interpretation
uses partOrder component (transport time ∈ availability)
The model of part ordering includes transportation time as a factor in availability
uses partOrder component (transport time ∉ availability)
Transportation time is not an element of availability in the part ordering model
does_not_use partOrder component (transport time *)
The part ordering model does not use transportation time (roughly equivalent but slightly different from that above)
Table 3.9 shows the multi-level strategy for comparing assumption statements about transportation time model f with statements about the system g. An alert indicates an abnormal condition—raising one initiates another layer of detailed processing. Note that the relevance of transportation time as a factor is taken into account. The raising of an alert during the comparison of assumption/constraint lists initiates a second level of processing to determine if a match for a particular proposition exists between the model and system. The method of analogical reasoning (Sowa and Majumdar 2003) can be used to produce a measure of the semantic distance between propositions. Analogical reasoning employs a highly efficient analogy engine that uses
68
A. Tolk et al. Table 3.9. Strategy for assumption list comparisons useFN of model f uses = 1
does_not_use = 0
ignores = i
requires = r
denies = d
Example interpretation uses partOrder * (transport time) The model of part ordering uses transportation time does_not_use partOrder * (transport time) The model of part ordering does not use transportation time ignores partOrder * (transport time) The model of part ordering doesn’t care about transportation time requires partOrder * (transport time) The model of part ordering requires consideration for transportation time denies partOrder * (transport time) The model of part ordering requires that the system not consider transportation time
Matching proposition in system g? f 0 1 0 1
g(uses) 0 – don’t care 0 – not found in g: raise alert 1 – not found in f: raise alert 1 – aligned: no alert
f i r d i r d
g(uses) 0 – don’t care 0 – not found in g: raise alert 0 – not found in g: no alert 1 – don’t care 1 – aligned: no alert 1 – found in g: raise alert
conceptual graphs as the knowledge representation to produce the measure. If the semantic distance is zero, it is said that propositions match and no further processing is needed. For other cases, there is either a partial match, a match in meta-level mappings, or possibly no match at all. Each of these requires adjudication. 3.5.2.4 Adjudication and Resolution of Conflicts The kind of adjudication depends upon the assumption function associated with each proposition and whether the assumption is load bearing (i.e. its negation would lead to significant changes in system operation) or not. The intention is that for load bearing assumptions that are relied upon, or cared about, the adjudication is performed by a human or an agent that supports a human. The results of the adjudication can be stored and recalled at future times to provide precedents to human operators. The results can also be used to train agent-based systems. Our goal in this section was to address a fundamental issue in complex system interoperability and composability, namely that a gap exists between our ability to describe models and to use them. Closing this gap requires a framework for joining models that rests on at least the unambiguous definitions, composable parts, and supportive software and that focuses on aligning modeler’s intent. In particular alignment of modeler’s intent requires comparing assumptions and constraints, which requires a conceptual model. We have presented a formalism for expressing assumptions and a strategy for comparing assumption lists that accounts for relevance. Often it is the unrecognized assumption that prevents a good solution. If the manner in which assumptions are represented and used can be standardized, then it becomes possible to formalize the reference models of entire frameworks of theory. Eventually, semi-autonomous detection of mismatches in conceptual models would become possible—at least with respect to those critical assumptions that a system relies upon.
3 A Layered Approach to Composition and Interoperation in Complex Systems
69
3.6 Summary – The Elusiveness of Conceptual Interoperability We have introduced new artifacts of the knowledge-based environments that are needed to enable and facilitate interoperation and composition in complex systems. We started with model-based data engineering to unambiguously define the information exchange requirements and interfaces. In this section, we treated the complex system as a black box, focusing exclusively on the data exchanged as derived from the operational needs. By adding process engineering, we changed the black box into a white box: We identified processes that connect the different states of the system with each other. Finally, we added the conceptual models as a bridge between the modeled selection of reality and the systems implementation. Currently, part solutions exist and are working very well, but they do not solve the challenges of interoperation and composition, they just contribute to the solution. Only if they are conducted in an aligned and orchestrated form, the overall success becomes possible. The hierarchical nature of the LCIM facilitates the process of aligning models by organizing concepts into dependent layers. Within each layer, the interoperability concept addressed can be further broken down in terms of its definitions, subconcepts, processes and requirements. In this manner, the necessary elements for achieving a particular LCIM level can be listed. Figure 3.5 is adapted from a recent evaluation by the authors of the state of the art for the contributions of selected interoperability protocols and knowledge representation languages towards satisfying the levels of the LCIM. The first two are simulation interoperability protocols, which are of particular interest, as Modeling and Simulation explicitly expresses the ideas of conceptual models and using models as abstractions of reality as the basis for the simulation in its body of knowledge efforts. For each protocol (Distributed Interactive Simulation - DIS, High Level Architecture - HLA, Extensible Markup Language XML, Resource Description Framework/RDF for Services - RDF/RDFS, Web Ontology Language - OWL, OWL for Services - OWL-S, Rules), the density of the
square indicates the relative degree of support for the indicated level. As can be seen, the study reported a general lack of support for achieving the highest LCIM level: Conceptual Interoperability. Using the same evaluation criteria as applied in the study, we first add evaluations of the potential contributions of model-based data engineering (MBDE) and process engineering (PE) and conceptual linkage (CL). As shown, the artifacts gradually increase the supported level of interoperation. Only if all supporting methods and tools are aligned and orchestrated – i.e. that a pair wise connection between the concepts and entities exists allowing the map the different levels and that they are executed in a harmonized manner – the full spectrum of interoperation and composition is covered. The engineering processes of data engineering, process engineering, and constraint/ assumption engineering are one example and document best practices how to make this approach work for complex systems in knowledge-based environments. In the definitions and examples given until now, we have discussed what is required to be achieved if engineering practices are to be applied to two existing systems, in order to make them achieve some level of interoperability. The definitions and requirements described in this chapter, and the descriptions of the levels of interoperability from the LCIM can serve another use as well: The definitional levels of Table 3.10. Descriptive and Prescriptive Roles of the LCIM LCIM Levels
Description of Interoperability at this Level
Prescription of Requirements to reach this Level
Technical
Systems are exchanging data with each other.
Ability to produce and consume data in exchange with systems external to self is required to be technically interoperable.
Syntactic
Data exchange is taking place within an agreed to protocol, that all systems are able to produce to, and can consume them
An agreed-to protocol that all can be supported by the technical level solution is required to achieve this level of interoperability.
Semantic
Interoperating systems are exchanging a set of terms that they can semantically parse.
Agreement between all systems on a set of terms that grammatically satisfies the syntactic level solution requirements is required for this level.
Pragmatic
Interoperating systems will be aware of the context and meaning of information being exchanged.
A method for sharing meaning of terms is required, as well as a method for anticipating context. These both should be based on what exists at the semantic level.
Dynamic
Interoperating systems are able to reorient information production and consumption based on understood changes to meaning, due to changing context.
A method for defining meaning and context is required to achieve this level. The means of producing and consuming these definitions is required for dynamic interoperability.
Conceptual
Interoperating systems at this level are completely aware of each others information, processes, contexts, and modeling assumptions.
A shared understanding of the conceptual model of a system (exposing its information, processes, states and operations) must be possible in order to operate at this level.
3 A Layered Approach to Composition and Interoperation in Complex Systems
71
the LCIM can serve two broad functions – (a) the role of describing what level of interoperability exists within a composition of systems, and (b) the role of prescribing the methods and requirements that must be satisfied during the engineering of a system of systems in order to achieve a desired level of interoperability. The description of interoperating systems that are sharing information to a certain level of interoperability is described in the table 3.10 (Tolk et al., 2007). The prescription of the methods required to enable system-to-system interoperation at a certain level is also described. • When applied in the role of providing a description of interoperability, the LCIM is used in the cases where two or more systems are interoperating - at some level - and the richness of that interoperation is being evaluated. In this role, the levels of the LCIM are applied as a description of a capability that already exists. • When applied in the role of providing a prescription for interoperability development methods, the LCIM is used in cases where an assessment must be made of techniques to be applied in order to achieve a desired level of interoperability. In this role, the capability for systems to interoperate does not exist - the systems themselves may not even exist. The LCIM in this role is applied, as prescriptive, giving developers a measure of required effects, which must be satisfied in order to become interoperable to a certain level. As stated in the introduction to this chapter, the LCIM is already used in several communities and was successfully applied in support of interoperation and composition. Some of these examples are recorded in the special issue of the Journal for Intelligent Decision Technologies (Phillips-Wren and Jain, 2008). With the widening acceptance and use, the LCIM has the potential to become a framework for an Interoperation and Composition Maturity Model enabling the description and prescription of solutions in a standardized manner. The ultimate goal of the recommendations summarized in this chapter is to set up a framework enabling the mediation between different viewpoints. As stated before, a conceptual model allows making such viewpoints explicit. While the majority of current standardization efforts tries to support to come to a common view, the approach of this chapter recognizes that the real, physical world contains "more detail and complexity than any humanly conceivable model or theory can represent" (Sowa, 2000). As such a multitude of viewpoints resulting in a multitude of models is a good thing, as they allow evaluating a multitude of aspects of an underlying problem. The challenge for engineers is to make sure that the resulting recommendations are based on aligned views, i.e. that the viewpoints should not lead to contradicting results and recommendations. Therefore, if the viewpoints can be mediated, the resulting system of systems represents a multitude of viewpoints that are consistent and can be used to express different aspects of the same problem to be solved. Summarizing this chapter it was shown that interoperation and composition require to deal with data (concepts of entities that are modeled and are related and interact with each other), processes (concepts of functions that use the entities or are used by the entities), and constraints (defining the valid solution space for data and processes) using engineering methods. Each recommendation that only implements a subset will fall short of providing a solution. While the emphasis of research focused on data,
72
A. Tolk et al.
newer approaches are including processes as well. How to cope with assumptions and constraints is topic of current research. As a community, we are just starting to realize that diversity and heterogeneity is an advantage in the domain of system of systems. So far, however, this chapter summarizes best practices in an applicable way. In order to support a rigorous Interoperation and Composition Maturity Model, additional research work and broader application in the community is needed. As such, this chapter is just describing the first steps into this direction.
References Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-strength schema matching. Special Issue in Semantic Integration, SIGMOD Rec. 33(4), 38–43 (2004) Brown, G., Jeremy, J.: Website Indexing: enhancing access to information within websites, 2nd edn. Auslib Press, Blackwood (2004) Buede, D.M.: The Engineering Design of Systems. John Wiley & Sons, Inc., Chichester (1999) Davis, P.K., Anderson, R.H.: Improving the Composability of Department of Defense Models and Simulations. RAND Corporation (2003) Dobrev, P., Kalaydjiey, O., Angelova, G.: From Conceptual Structures to Semantic Interoperability of Content. In: Conceptual Structures: Knowledge Architectures for Smart Applications, pp. 192–205. Springer, Heidelberg (2007) Garlan, D., Allen, R., Ockerbloom, J.: Architectural Mismatch: Why Reuse Is So Hard. In: Proceedings 17th International Conference on Software Enginneering, Seattle, Washington, pp. 179—185 (1995) GridWise Architecture Council Interoperability Framework Team, Interoperability ContextSetting Framework, V1.0 (July 2007) Hofmann, M.: Challenges of Model Interoperation in Military Simulations. SIMULATION 80, 659–667 (2004) Jordan, D., Evdemon, J.: Web Services Business Process Execution Language (2007), http://docs.oasis-open.org/wsbpel/2.0/wsbpel-v2.0.pdf King, R.: Towards Conceptual Linkage of Models and Simulations. In: Proceedings of the Fall Simulation Interoperability Workshop. IEEE CS Press, Los Alamitos (2007) King, R., Turnitsa, C.: The Landscape of Assumptions. In: Spring Simulation MultiConference, Ottawa. IEEE CS Press, Los Alamitos (2008) King, R., Diallo, S., Tolk, A.: How to Play Fairly: Agents and Web Services Can Help. In: Proceedings Spring Simulation Interoperability Workshop. IEEE CS Press, Los Alamitos (2007) Morris, E., Levine, L., Meyers, C., Place, P., Plakosh, D.: System of Systems Interoperability (SOSI), Final Report. Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA (2004) O’Kelly, P.: Data Modeling is Underrated: A Bright Future Ahead in the Grand Schema Things. Burton Group, Ontolog-Forum Presentation (November 30, 2006) Pace, D.: Simulation Conceptual Model Development Issues and Implications for Reuse of Simulation Components. In: Fall Simulation Interoperability Workshop. IEEE CS Press, Los Alamitos (2006) Page, E.H., Briggs, R., Tufarolo, J.A.: Toward a Family of Maturity Models for the Simulation Interconnection Problem. In: Spring Simulation Interoperability Workshop. IEEE CS Press, Los Alamitos (2004)
3 A Layered Approach to Composition and Interoperation in Complex Systems
73
Parent, C., Spaccapietra, S.: Issues and approaches of database integration. Communications of the ACM 41(5), 166–178 (1998) Parent, C., Spaccapietra, S.: Database Integration: The Key To Data Interoperability Advances in Object-Oriented Data Modeling. In: Papazoglou, M., Spaccapietra, S., Tari, Z. (eds.) Advances in Object-Oriented Data Modeling, pp. 221–253. MIT Press, Cambridge (2000) Phillips-Wren, G., Jain, L.: Decision Support in Agent Mediated Environments. IOS Press B.V., Amsterdam (2005) Phillips-Wren, G., Jain, L.: Ontology Driven Interoperability for Agile Applications using Information Systems: Requirements and Applications for Agent Mediated Decision Support. Journal for Intelligent Decision Technologies 2(1) (Special Issue 2008) Rahm, E., Do, H., Massmann, S.: Matching large XML schemas. SIGMOD Record. Special Issue in Semantic Integration (2004) Robinson, S.: Issues in Conceptual Modeling for Simulation: Setting a Research Agenda. OR Society 3rd Simulation Workshop, Ashorne, UK (2006) Robinson, S.: Conceptual modelling for simulation Part I: definition and requirements. Journal of the Operational Research Society 59(3), 278–290 (2007) Seligman, L.A., Rosenthal, P., Lehner, P., Smith. A.: Data integration: Where does the time go? IEEE Data Engineering Bulletin (2002) Siegel, J.: Introduction To OMG’s Unified Modeling Language (UML), Object Management Group (2005), http://www.omg.org/gettingstarted/what_is_uml.htm Sowa, J., Majumdar, A.: Analogical Reasoning. In: de Moor, L., Ganter (eds.) Conceptual Structures for Knowledge Creation and Communication, pp. 16–36. Springer, Berlin (2003) Sowa, J.F.: Knowledge Representation: Logical, Philosophical and Computational Foundations. Monterey. Brooks Cole Publishing Co. (2000) Spaccapietra, S., Parent, C., Dupont, Y.: Model Independent Assertions for Integration of Heterogeneous Schemas. Very Large Database (VLDB) Journal 1(1), 81–126 (1992) Su, H., Kuno, H., Rundensteiner, E.: Automating the transformation of XML documents. In: 3rd International Workshop on Web Information and Data Management, pp. 68–75. ACM Press, New York (2001) Tolk, A., Diallo, S.: Model-Based Data Engineering for Web Services. IEEE Internet Computing 9(4), 65–71 (2005) Tolk, A., Diallo, S.: Model-Based Data Engineering for Web Services. In: Nayak, R., et al. (eds.) Evolution of the Web in Artificial Intel. Environ, pp. 137–161. Springer, Heidelberg (2008) Tolk, A., Muguira, J.: The Levels of Conceptual Interoperability Model (LCIM). In: IEEE Fall Simulation Interoperability Workshop. IEEE CS Press, Los Alamitos (2003) Tolk, A.: Moving towards a Lingua Franca for M&S and C3I – Developments concerning the C2IEDM. In: European Simulation Interoperability Workshop, Edinburgh, Scotland (June 2004) Tolk, A., Diallo, S., Turnitsa, C.: Ontology Driven Interoperability – M&S Applications, Old Dominion University, Whitepaper in support of the I/ITSEC Tutorial, VMASC Report 2548 (2006) Tolk, A., Turnitsa, C., Diallo, S.: Implied Ontological Representation within the Levels of Conceptual Interoperability Model. Journal of Systemics, Cybernetics and Informatics 5(5), 65–74, IIIS (2007) Tolk, A., Diallo, S., Turnitsa, C., Winters, L.S.: Composable M&S Web Services for Netcentric Applications. Journal Defense Modeling and Simulation 3(1), 27–44 (2006)
74
A. Tolk et al.
Tolk, A.: Common Data Administration, Data Management, and Data Alignment as a Necessary Requirement for Coupling C4ISR Systems and M&S Systems. Information & Security: An International Journal 12(2), 164–174 (2003) Turnitsa, C.: Extending the Levels of Conceptual Interoperability Model. In: Proc. Summer Computer Simulation Conference, Philadelphia (2005) Yilmaz, L., Paspuletti, S.: Toward a Meta-Level Framework for Agent-supported Interoperation of Defense Simulations. Journal of Defense Modeling and Simulation 2(3), 161–175 (2005) Yilmaz, L.: On the Need for Contextualized Introspective Simulation Models to Improve Reuse and Composability of Defense Simulations. Journal of Defense Modeling and Simulation 1(3) (2004) Yilmaz, L., Tolk, A.: Engineering ab initio dynamic interoperability and composability via agent-mediated introspective simulation. In: IEEE Winter Simulation Conference, pp. 1075– 1182. IEEE CS Press, Los Alamitos (2006) Zeigler, B., Hammonds, P.: Modeling & Simulation-Based Data Engineering: Introducing Pragmatics into Ontologies for Net-Centric Information Exchange. Academic Press, London (2007)
4 Ontology Driven Data Integration in Heterogeneous Networks Isabel F. Cruz and Huiyong Xiao ADVIS Lab Department of Computer Science University of Illinois at Chicago, USA {ifc,hxiao}@cs.uic.edu Abstract. We propose a layered framework for the integration of syntactically, schematically, and semantically heterogeneous networked data sources. Their heterogeneity stems from different models (e.g., relational, XML, or RDF), different schemas within the same model, and different terms associated with the same meaning. We use a semantic based approach that uses a global ontology to mediate among the schemas of the data sources. In our framework, a query is expressed in terms of one of the data sources or of the global ontology and is then translated into subqueries on the other data sources using mappings based on a common vocabulary. Metadata representation, global conceptualization, declarative mediation, mapping support, and query processing are addressed in detail in our discussion of a case study.
4.1 Introduction Data integration refers to the ability to manipulate data transparently across multiple data sources, such as networked data sources on the web. The concept of data integration is part of the broader concept of interoperability among systems, services, or programs [33]. However, because of their increasing number, reference models and standards that have been developed to enable interoperability stand paradoxically in the way of their intended goals. To address this problem, common metamodels and mappings have been proposed [44]. Metamodels support abstraction and generalization that aid in identifying problems that can use the same solutions. Mappings provide bridges between those alternative solutions. A compelling parallel has been made between models and metamodels: “What metadata and mediation services enable for data, metamodels and mappings can provide for models.” [44]. The migration of solutions to new problems depends heavily on the ability to use metamodels and in particular on using a common layered metamodel to organize and map existing standards. An example of a layered model is the ISO/OSI model for networking, where layers support clearly defined functions and interfaces. It is in this structure that a solution can be devised.
This research was supported in part by the National Science Foundation under Awards ITR IIS-0326284, IIS-0513553, and IIS-0812258.
A. Tolk, L.C. Jain (Eds.): Comp. Sys. in Knowledge-based Environments, SCI 168, pp. 75–97. c Springer-Verlag Berlin Heidelberg 2009 springerlink.com
76
I.F. Cruz and H. Xiao
The Levels of Conceptual Interoperability Model (LCIM) has been devised to measure the level of interoperability between systems: Level 6 (of maximum interoperability) is labeled “Conceptual Interoperability.” The remaining levels (in decreasing order of interoperability that they support) are respectively labelled “Dynamic Interoperability,” “Pragmatic Interoperability,” “Semantic Interoperability,” “Syntactic Interoperability,” and “Technical Interoperability.” Tolk et al. recognize the need of using ontologies for defining the meaning of data and of processes in a system and propose a ontology-based approach for the different layers of LCIM [45]. In this paper, we follow an approach that follows closely the objectives, if not the methods, of the above approaches. First, we focus on data operation and therefore on interoperability. Second, we distinguish levels of interoperability, with Semantic Interoperability being the highest level obtained thus far. Third, our structure for interoperability is not only layered but also follows closely the ISO/OSI model for networking. Fourth, we consider several types of data heterogeneity and deal with all of them within the same framework. Fifth, we deal with different data representation standards and would be able to accommodate emerging standards. Finally, we use an approach where ontologies play several roles. In particular, we use a metamodel (conceptually an ontology) to bridge across the ontologies that model the different interoperating data sources. In the rest of the paper we focus on the particular problem of integrating data sources that can be heterogeneous in syntax, schema, or semantics, thus making data integration a difficult task [10]. More specifically, syntactic heterogeneity is caused by the use of different data models (e.g., relational, XML, or RDF). Within the same model, the structural differences between two schemas lead to schematic heterogeneity. Semantic heterogeneity results from different meanings or interpretations of data that may arise from various contexts. To demonstrate our approach we have adapted an example that considers legacy databases for aircraft maintenance scheduling [38] to illustrate syntactic, schematic, and semantic heterogeneities, which is shown in Figure 4.1. In this example, syntactic heterogeneity stems from data that is stored in relational and XML databases. The example also illustrates schematic heterogeneity in that the schemas of the relational databases differ substantially. For example, aircraft models are either table names, attribute names, or attribute values. Semantic heterogeneity is also apparent: “RDYF15S” (table name in System D), “F15S” (attribute name in System C), and “F15” (attribute value in System B) relate to the same kind of airplane. In this paper, we achieve integration among syntactically, schematically, and semantically heterogeneous network data using an approach that is based on the layered model proposed by Melnik and Decker [31]. Their model, which is inspired by the ISO/OSI layered networking model, consists of four layers: the syntax, object, semantic, and application layers. This paper expands on our preliminary work [16]. We follow a hybrid approach, which is embodied in the semantic layer. In particular, we associate an ontology with each networked data source, called
4 Ontology Driven Data Integration in Heterogeneous Networks
77
Fig. 4.1. Five syntactically, schematically, and semantically heterogeneous legacy databases for aircraft maintenance
henceforth local ontology, and create a global ontology to mediate across the data sources. The global ontology relies on a common vocabulary shared by all data sources [47]. Considering the fundamental role of RDF as an ontology language, we express all the data models in our approach using RDF1 and RDF Schema.2 Queries are propagated across the semantic layers of the data sources and are expressed using a RDF query language. We use the RDF Query Language, RQL [25], but any other language for RDF could have been used, including SPARQL.3 The queries are expressed using mappings established between pairs of ontologies, one called the source ontology and the other one the target ontology. The rest of the paper is organized as follows. In Section 4.2 we describe the different ways under which ontologies can be used and the different roles they play in data integration. Our layered approach is presented in Section 4.3. In this section, we describe the different layers and the overall architecture of a system that integrates data from the networked sources. To demonstrate our layered approach, in Section 4.4 we expand the legacy example we have just introduced in light of the different kinds of roles that ontologies play. Section 4.5 describes 1 2 3
in detail query processing across different data sources and lists the kinds of assumptions we have made. We discuss related work in Section 4.6 and finally draw some conclusions in Section 4.7.
4.2 Ontologies in Data Integration We call semantic data integration the process of using a conceptual representation of the data and of their relationships to eliminate heterogeneities. At the heart of this process is the concept of ontology, which is defined as an explicit specification of a shared conceptualization [21, 22]. Ontologies are developed by people or organizations to facilitate knowledge sharing and reuse [23]. For example, they can embody the semantics for particular domains, in which case they are called domain ontologies. Ontologies are semantically richer than traditional database schemas. In what follows we describe data integration architectures in terms of database schemas. Later we describe how ontologies can be used instead of database schemas. We distinguish two data integration architectures: one that is central [2, 5, 12, 18, 32, 46] and the other one that is peer-to-peer [4, 8, 9, 19, 24, 34]. A central data integration system has a global schema, which provides the user with a uniform interface to access information stored in the data sources by means of queries posed in terms of the global schema [28]. In contrast, in a peer-to-peer data integration system, any peer (a data source) can accept user queries to access information in other peers. To enable data integration in a central data integration system, mappings need to be created between the global schema and the data source schemas. In a peer-to-peer data integration system mappings are established between peers. The two main approaches to building such mappings are Global-as-View (GaV) and Local-as-View (LaV) [28, 46]. In the GaV approach, every entity in the global schema is associated with a view over the data sources. Therefore querying strategies are simple because the mappings are explicitly defined. However, every time that there is a change to the data sources, it could change the views. In contrast, the LaV approach allows for changes to the data sources that do not affect the global schema, since the local schemas are defined as views over the global schema. However, query processing is more complex. Now that we have introduced the main concepts behind data integration, we describe the different forms in which ontologies can intervene [47]: Single ontology approach. All source schemas are directly related to a shared global ontology, which provides a uniform interface to the user. However, this approach requires that all sources have similar characteristics, for example the same level of granularity. An example of a system that uses this approach is SIMS [5]. Multiple ontology approach. Each data source is described by its own local ontology. Instead of using a global ontology, local ontologies are mapped to one another. For this purpose, an additional representation formalism is
4 Ontology Driven Data Integration in Heterogeneous Networks
79
necessary for defining the inter-ontology mappings. The OBSERVER system is an example of this approach [32]. Hybrid ontology approach. A combination of the two preceding approaches is used. First, a local ontology is built for each source schema, which is mapped to a global ontology. New sources can be easily added with no need for modifying existing mappings. Our layered framework is an example of this approach. The single and hybrid approaches are appropriate for building central data integration systems, the former being more appropriate for GaV systems and the latter for LaV systems. A peer-to-peer system, where a global ontology exists in a “super-peer” can also use the hybrid ontology approach [19]. However, the multiple ontology approach is better suited to “pure” peer-to-peer data integration systems, where there are no super-peers. We identify the following five roles of ontologies in a data integration process [17]: Metadata representation. Each source schema can be explicitly represented by a local ontology. All ontologies use the same representation language and are therefore syntactically homogeneous. Global conceptualization. The global ontology provides a conceptual view over the schematically heterogeneous source schemas. Support for high-level queries. The global ontology provides a high-level view of the sources. Therefore, a query can be formulated without specific knowledge of the different data sources. The query is then rewritten into queries over the sources, based on the semantic mappings between the global and local ontologies. Declarative mediation. A hybrid peer-to-peer system uses the global ontology as a mediator for query rewriting across peers. Mapping support. A common thesaurus or vocabulary, which can be formalized as an ontology, can be used to facilitate the automation of the mapping process. In the following sections, we further elaborate on these five roles using our case study to exemplify them.
4.3 A Layered Data Interoperability Model 4.3.1
Overview
In this section, we present the layered approach for data interoperability proposed by Melnik and Decker [31], which we show in Figure 4.2. Of the four proposed layers, we concentrate on the semantic layer and identify three sublayers that are contained in it. Application Layer. The application layer is used to express queries. For example, a visual user interface may be provided in this layer for users to submit
80
I.F. Cruz and H. Xiao
Local Source
Remote Source
Application Layer
Application Layer
User Interface
User Interface
Semantic Layer
Semantic Layer
Language Domain Models
Language Domain Models
Conceptual Models
Conceptual Models
Object Layer (Objects / Relations between objects)
Object Layer (Objects / Relations between objects)
Syntax Layer (Serialization, Storage)
Syntax Layer (Serialization, Storage)
Fig. 4.2. The layered model
their queries. Ideally, query results are integrated and shown to the users, so as to give the appearance that the distributed databases interoperate seamlessly. Semantic Layer. This layer consists of three cooperating sublayers that accomplish different tasks: • Languages. The main purpose of this sublayer is to accept the user queries and to interpret them so that they can be understood by the other interoperating systems. The language can be highly specialized or be a general purpose language. • Domain Models. They include the ontologies for a particular application domain and can be different from one another even for the same domain [29]. Examples of domains include transportation, manufacturing, e-business, digital libraries, and aircraft maintenance. • Conceptual Models. They model concepts, relationships and constraints using constructs such as generalization, aggregation, or cardinality constraints. RDF Schema and UML Foundation/Core are two examples. Object Layer. The purpose of the object layer is to give an object-oriented view of the data to the application. This layer enables manipulations of objects and binary relationships between them. Every object in a data schema is mapped to a particular class. The object layer also forwards all the information that it receives from the syntax layer to the semantic layer. Syntax Layer. The main purpose of the syntax layer is to provide a way of specifying both the semantic and object layer information using a common representation. XML has been used to represent RDF and RDF Schema. This layer is also responsible for data serialization and for mapping the queries
4 Ontology Driven Data Integration in Heterogeneous Networks
81
from the object layer to the XML representation of the data schemas. The information that is extracted by the queries is returned to the object layer. 4.3.2
Architecture
Based on the layered approach discussed above, we concentrate now on the semantic layer. The semantic layer accepts user queries from the application layer and processes them. We use a hybrid ontology approach and RDF Schema both for the conceptual model of the data sources and for the domain model, as expressed by means of a global ontology. Given our choice of RDF and RDF Schema, a query language for RDF is appropriate. We chose RQL (RDF Query Language) [25] for the language sublayer. The application layer can either support RQL or an application specific user interface. As for the implementation of the object layer and of the syntax layer, we use RSSDB (RDF Schema Specific Database), which is an RDF Store that uses schema knowledge to automatically generate an Object-Relational (SQL3) representation of RDF metadata and to load resource descriptions [1]. Figure 2 shows the architecture that is used to build applications based on our layered approach. The whole process can be further divided into three subprocesses as follows: Constructing local ontologies. A component called Schema Integrator is used to transform source schemas and source data automatically into a single conceptual schema, expressed using a local ontology. We consider relational, XML, and RDF databases. Mapping. We use a global ontology (using RDF Schema) to serve as the mediator between different local ontologies, each of which is mapped to the global ontology. We utilize a common vocabulary (which also uses RDF Schema) to facilitate this mapping process. This process is usually semi-automatic in that it may require input from users. Query processing. When the user submits a query to the local ontology, the query will be executed directly over the local RSSDB. This process is called local query processing. Queries are transformed from one local ontology to all the other local ontologies by a query rewriting algorithm. This process, which is based on the mappings between the global ontology and the local ontologies, is called remote query processing; it can be performed automatically under some simplifying assumptions. The answers to both local query processing and remote query processing are assembled and returned to the user.
4.4 Case Study In this section, we describe in detail the process for semantic data integration as illustrated by the case study already introduced in Section 4.1 and displayed in Figure 4.1. We also discuss the various ontology roles of Section 4.2 in the context of the case study.
82
I.F. Cruz and H. Xiao
Result Display
System B (Local) Relational DB XML Files RDF Files
Schema Integrator
System E (Remote)
Query Submission Integrated Local Schema (Local and Remote Results are returned to the user) Query Execution
Integrated Local Schema (Results are integrated and returned to System B) Query Execution
RSSDB
Mapping
Common Vocabulary
RSSDB
Relational DB Schema Integrator
XML Files RDF Files
Mapping
Global Ontology
Ontology and Mapping Maintenance
Fig. 4.3. Architecture of the semantic layer
4.4.1
Construction of Local Ontologies
The construction of a local ontology includes two phases. In the first phase, we integrate all source schemas into a single local ontology, which is represented in RDF Schema and stored in RSSDB. In the second phase, we transform data from the original relational or XML databases into RDF files, which are then also stored in RSSDB. When integrating RDF databases, this step is not needed. When the underlying schema is relational, we analyze the attributes of each table and the dependency relationships between every pair of tables through their foreign keys. In particular, if two tables are connected using a foreign key, we create a schema tree for each table. The root of each tree represents one table with type rdfs:Class and its children represent the attributes of the table with type rdf:Property. Therefore, the arcs connecting the root and its children are rdfs:domain arcs. Then we connect two trees using an rdfs:range edge from the node corresponding to the foreign key to the root of the other tree. Hence, we obtain the local ontology that can be used by the system for data interoperation. An example is shown for System B where STAFF ID is a foreign key, and the local ontology is shown in Figure 4.4(a). In the case of System D, where there is no foreign key between two tables, we also create a schema for each table. We then build a new root with type rdfs:Class and two new nodes of type rdf:Property, one for each table as children of this new root. Then we connect each of these nodes to one of the nodes representing the tables using an arc of type rdfs:range.
4 Ontology Driven Data Integration in Heterogeneous Networks
83
RDYACFT
MODEL
AVAILTIME
QTY
AIRBASE
STAFF_ID
STAFF TITLE
TEAM_LEADER STAFF_NUM
(a) Local ontology of SYSTEM B Class
AIRCRAFT
rdfs:domain Property rdfs:range
NAME
RDYTIME
NUMBER
AIRBASE
(b) Local ontology of SYSTEM E
MTSTAFF
Class Legend
Fig. 4.4. Local ontology for legacy source schemas
If a system uses XML files, we analyze the DTD file corresponding to each XML file and find all the element and attribute tags and their hierarchy, which we use to construct the local ontology for the system. Figure 4.4(b) shows the schema tree for System E. In this way, each of the participating systems have their local schemas expressed as a single RDF Schema local ontology and data stored using RSSDB. Figure 4.5 gives a fragment of the underlying RDF Schema representation of the local ontology of System E.
Fig. 4.5. RDF Schema for the local ontology of system E
84
I.F. Cruz and H. Xiao
Fig. 4.6. SQL query for data transformation in System B
Based on the local ontology, we consider two cases when transforming the data from a source schema into a local ontology. Relational database. We use the left outer join operation to unite the data from multiple tables that are connected through foreign keys. The table containing the foreign key acts as the left part of the left outer join operation. Figure 4.6 gives the example of a SQL query that expresses this transformation in System B. In the case where there is no foreign key relationship, we use a Cartesian product to realize the data transformation. XML data. The data model uses XML Schema or a DTD. In both situations we use XSLT expressions to transform data from an XML file into an RDF file. After the data is saved into an RDF file, we can use RQL (or any of the APIs for RSSDB), to access the data. 4.4.2
Mapping Process
The global ontology is used for mediating among distributed schemas [29]. For this purpose, we set up the relationships between the global ontology and the local ontologies based on a common vocabulary. Figure 4.7 shows a fragment of the global ontology for the domain of aircraft maintenance and its partial RDF Schema representation. We focus on the class and properties that are contained in the box. We use rdfs:isDefinedBy, which is an RDF property, to make the connection between a class or a property and the associated vocabulary. An important component of our approach is that all local ontologies share the common dictionary with the global ontology (see Figure 4.8). This dictionary stores the common vocabulary of all the concepts and the relationships between the global ontology and each local ontology. When presented with a new local ontology that needs to be mapped to the global ontology, the system checks every concept name against the dictionary to obtain an appropriate matching for that concept and chooses the optimal one according to a score function. The construction of a dictionary and the determination of an appropriate matching is an important area of research [37]. In this paper, we consider that the dictionary has a simplified structure, in that it only supports one-to-one total mappings. The components that participate in the mapping process (namely the local ontologies, the common vocabulary, and the global ontology) are all represented using RDF Schema. The mapping between the global ontology and a local ontology is realized according to the common vocabulary. Figure 4.9 shows the local
4 Ontology Driven Data Integration in Heterogeneous Networks
FLYING-OBJECT
AIRCRAFT
COMBAT-AIRCRAFT
MEASURE
STAFF
MAINTENANCE
NAME
TIME
85
TITLE
NUMBER
AIRBASE
LOCATION
"F15" Class
rdfs:subClassOf Class
rdf:type
rdfs:domain Property
Instance
rdfs:subPropertyOf Property Legend
Fig. 4.7. The global ontology and its RDF Schema description
ontologies of the five legacy systems of Figure 4.1, the relationships among the local ontologies, the global ontology, and the common vocabulary. Similarly to the global ontology, each local ontology has its properties mapped to the properties in the common vocabulary through rdfs:isDefinedBy. When the local ontology becomes related to the common vocabulary, the RDF representation of the local ontology also needs to be updated by inserting rdfs:isDefinedBy into any rdfs:Class or rdfs:Property being mapped. We use System E as an example (refer to Figure 4.4 for the previous RDF description of its local ontology). Figure 4.10 shows a fragment of the new RDF representation.
Class rdfs:domain Property rdfs:range Class Legend
VOCABULARY
AIRCRAFT
READYTIME
NUMBER
ADDRESS
MACHINIST
VOCABULARY _AIRCRAFT
VOCABULARY _READYTIME
VOCABULARY _NUMBER
VOCABULARY _ADDRESS
VOCABULARY _MACHINIST
SYNONYMS
Fig. 4.8. The common vocabulary
86
I.F. Cruz and H. Xiao
MAINTENANCE
NAME
TIME
AIRCRAFT
READYTIME
NUMBER
AIRBASE
TITLE
Global Ontology
ADDRESS
MACHINIST
Common Vocabulary
STATE
MECHANIC
SYSTEM A
VOCABULARY
NUMBER
ACMAINT
ACTYPE
RDYWHEN
NUM
RDYACFT
BASENAME
SYSTEM B MODEL
AVAILTIME
QTY
AIRBASE
a T_1 AIRCRAFT
AIRCFT
RDYTIME
F15S
STAFF
STAFF_NUM
T_2
SYSTEM C
STAFF_ID TITLE
MAINTSCHED
F16S
TEAM_LEADER
MECH_GROUP
SYSTEM C
T1
RDYF15S
Class
Property
SYSTEM D T2
RDYF15S
SYSTEM D
rdfs:domain rdfs:isDefinedBy
Property WHEN
QUANTITY rdfs:range Class
AIRCRAFT
SYSTEM E NAME
RDYTIME
NUMBER
AIRBASE
MTSTAFF
Legend
Fig. 4.9. Mapping between the global ontology and the local ontologies
Fig. 4.10. RDF Schema for the mapping information of System E
4.4.3
Discussion
The case study illustrates the following three roles played by ontologies in data integration: Metadata representation. To uniformly represent the metadata, which are source schemas in our case, a local ontology is constructed by means of a straightforward schema transformation process. Although a uniform syntax has been achieved, this process may seem too simplistic to encode rich and interpretable semantics. For example, names that do not correspond to words in a dictionary, such as ACTYPE in System A, may cause difficulties in the mapping process if not replaced by an English word or phrase, for example Aircraft Type. It remains an open issue how to generate an ontology that
4 Ontology Driven Data Integration in Heterogeneous Networks
87
represents the metadata of a data source in a conceptual but semantically lossless way [30, 43]. Global conceptualization. In our architecture, the global ontology, which is mapped to the local ontologies using an LAV approach, provides the user with a conceptual high-level view of all the source schemas. We recall that our architecture uses an hybrid ontology approach where we associate an ontology with each networked data source and create a single ontology, the global ontology, whose properties are mapped to the properties in the common vocabulary. Mapping support. The support provided by the common vocabulary for the mapping process corresponds to one of the roles played by ontologies, as the vocabulary can be represented using an ontology, and play the role of a meta-ontology. It describes the semantic relations between the vocabularies of the global ontology and of the local ontologies, thus serving as the basis for the mappings.
4.5 Query Processing Across Data Sources In our layered approach, as in any data integration system, query processing plays a critical role. Query processing is performed by rewriting a query posed on one data source to all the other sources that are connected to it using established mappings. 4.5.1
RQL
RQL is a typed functional language and relies on a formal model for directed labeled graphs [25]. In Figure 4.11, we show the example of an RQL query on the local ontology of System B. In the local query processing phase the system executes the RQL query on the local ontology and on the resources that are stored in the local RSSDB; the answer to the query, shown in Figure 4.12, is expressed in RDF and returned to the user (a visual user interface can display the results in other formats). In the remote query processing phase, the query gets rewritten on the schemas of the other networked data sources. 4.5.2
Query Rewriting Algorithm
Our algorithm makes the following simplifying assumptions about the schemas, mappings, and queries: (1) We assume that the local ontology is either a hierarchy or can be converted into a hierarchy without losing important schematic or semantic information (see Figure 4.4). (2) We assume that the mappings between the global ontology and local ontologies are total mappings, that is, all the concepts (classes or properties) occurring in the query are mapped. (3) We consider only one-to-one mappings, that is, a concept in one schema maps to a single concept in the other schema. (4) To keep the current discussion simple, we
RQL
I.F. Cruz and H. Xiao
QTY
TITLE
AVAILTIME MODEL
RDYACFT
STAFF_ID
MODEL
AV
0800
STAFF_ID aircraft1
E TI M A IL
STAFF
12
STAFF_NUM
TEAM_LEADER staff1
AI R
BA SE
CA, Anaheim
T IT LE
F15
TEAM_LEADER
AIRBASE
QTY
R DF r esou rce descr ip tio n of System B
RD F Schema of System B
88
Johnson ST AF F_
NU
M
6
F15_team
Select a1, a2, a3, a4, a5 From {a} MODEL {a1}, {b} AVAILTIME {a2}, {c} QTY {a3}, {d} STAFF_ID {a4}, {g} AIRBASE {a5} Where a=b and b=c and c=d and e=f and d=g and a1= 15 ”
{e}, {f} TITLE
rdfs:domain rdfs: range rdf:type correspondence between RQL and resources
Fig. 4.11. A typical RQL query on System B
Fig. 4.12. Query results of executing an RQL query on local System B
assume that the variables in the query only refer to “pure” classes or properties, that is, no functions such as subClassOf, subPropertyOf, domain, and range are applied to them; we consider queries with syntax {class1}property{class2} as shown in Figure 4.11. (5) We consider only schema or ontology mappings, not value mappings (which is our focus elsewhere [15]). The rewriting algorithm uses the established mapping information between any pair of local ontologies, having the global ontology as mediator. Before the algorithm starts, we initialize the source ontology and the target ontology as schema trees using the following rules. For every pair of concepts (classes or
4 Ontology Driven Data Integration in Heterogeneous Networks
89
(a) Schema Tree of SYSTEM B RDYACFT
MODEL
AVAILTIME
QTY
AIRBASE
STAFF_ID
STAFF_NUM
TEAM_LEADER TITLE
AIRCRAFT
NAME
RDYTIME
NUMBER
AIRBASE
MTSTAFF
(b) Schema Tree of SYSTEM E
Fig. 4.13. Local ontologies of System B and System E and their mappings
properties), say S and O, if S rdfs:domain O, or S rdfs:subClassOf O, or S rdfs:subPropertyOf O, we make S a child of O; if S rdfs:range O or S rdf:type O, we incorporate S and O into a single node. In addition, we establish mappings between the source ontology and the target ontology according to their respective mappings to the global ontology. Figure 4.13 shows the schema trees of System B and System E and their mappings (refer also to Figure 4.9). In System B, for instance, the properties MODEL, AVAILTIME, and QTY are made children of the class RDYACFT. The property STAFF ID and the class STAFF are incorporated into a single node. In the following illustration of the query rewriting algorithm, Q is the source query and Q is the target (rewritten) query, and we use the RDF terms subject, object, and statement [25]. As an example, we consider query Q on the local ontology of System B, which is shown in Figure 4.11. Q is the target query on the local ontology of System E. The following steps will be executed: Step 1. Get all the statements in the from clause that correspond to the objects in the select clause. In our example, we obtain (MODEL, AVAILTIME, QTY, AIRBASE, TITLE ). Each of these statements corresponds to a class or property in the source ontology. Step 2. Use the mapping information between the source ontology and the target ontology to find all the concepts (classes or properties) that correspond to the statements found in Step 1 and make these concepts statements in Q , as follows: select from
Step 3. Consider each pair of nodes, Ei and Ej , corresponding to each pair of consecutive statements in Q , in the following cases:
90
I.F. Cruz and H. Xiao E0' E1'
Ei E1'
E1''
Ek'
Ek'
Ek''
En'
En'
En'' Ej
Ei (a)
Ej (b)
Fig. 4.14. Relationships between Ei and Ej
1) If Ei and Ej are siblings (have the same parent), meaning that this pair of statements share a common subject, then we append the condition <subjecti >=<subjectj > to the where clause of Q using and. For example, in System E (see Figure 4.13), NAME and RDYTIME are siblings, therefore, s1 = s2 is to be appended to the where clause. 2) If Ei and Ej share a common ancestor E0 that is not their parent, as shown in Figure 4.14(a), then we append all the intermediate nodes (statementk ), between E0 and Ei and between E0 and Ej , to the from clause in the form {<subjectk >} statementk {}. In addition, we append new conditions to the where clause in the following way: •