Proceedings of the First International Workshop on
Model-Driven Interoperability (MDI 2010) In conjunction with MoDELS 2010 Oslo, Norway, October 3-5, 2010 http://mdi2010.lcc.uma.es/ ACM International Conference Proceedings Series ACM Press
Editors: Jean Bézivin, INRIA & Ecole de Mines de Nantes, France Richard Mark Soley, OMG, Needham, USA Antonio Vallecillo, University of Málaga, Spain
ISBN: 978-1-4503-0292-0
The Association for Computing Machinery 2 Penn Plaza, Suite 701 New York New York 10121-0701 ACM COPYRIGHT NOTICE. Copyright © 2007 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or
[email protected] For other copying of articles that carry a code at the bottom of the first or last page, copying is permitted provided that the per-copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-7508400, +1-978-750-4470 (fax).
Notice to Past Authors of ACM-Published Articles ACM intends to create a complete electronic archive of all articles and/or other material previously published by ACM. If you have written a work that was previously published by ACM in any journal or conference proceedings prior to 1978, or any SIG Newsletter at any time, and you do NOT want this work to appear in the ACM Digital Library, please inform
[email protected], stating the title of the work, the author(s), and where and when published.
ACM ISBN: 978-1-4503-0292-0
TABLE OF CONTENTS Editorial to the MDI 2010 Workshop Jean Bézivin, Richard M. Soley and Antonio Vallecillo…………................................. 1
Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project Youness Lemrabet, David Clin, Michel Bigand, Jean-Pierre Bourey and Nordine Benkeltoum ……………………………………………………………..... 3
Semantic Interoperability of Clinical Data Idoia Berges, Jesus Bermudez, Alfredo Goñi and Arantza Illarramendi ..……………. 10
A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering Wikan Danar Sunindyo, Thomas Moser, Dietmar Winkler and Stefan Biffl …………. 15
Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars Frank Hermann, Hartmut Ehrig, Ulrike Golas and Fernando Orejas............................. 22
Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities Manuel Wimmer, Gerti Kappel, Angelika Kusel, Werner Retschitzegger, Johannes Schoenboeck and Wieland Schwinger ……………………………………… 32
Specifying Overlaps of Heterogeneous Models for Global Consistency Checking Zinovy Diskin, Yingfei Xiong and Krzysztof Czarnecki ……………………………... 42
Anticipating Unanticipated Tool Interoperability using Role Models Mirko Seifert, Christian Wende and Uwe Assmann .…………………………………. 52
Aligning Business and IT Models in Service-Oriented Architectures using BPMN and SoaML Brian Elvesæter, Dima Panfilenko, Sven Jacobi and Christian Hahn…………………. 61
Domain-specific Templates for Refinement Transformations Lucia Kapova, Thomas Goldschmidt, Jens Happe and Ralf Reussner ………………. 69
Advanced Modelling Made Simple with the Gmodel Metalanguage Jorn Bettin and Tony Clark ..………………………………………………………….. 79
Model-driven Rule-based Mediation in XML Data Exchange Yongxin Liao, Dumitru Roman and Arne J. Berre …………………………………… 89
Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic and Richard Paige ………………………………………………….. 98
iii
List of Authors Assmann, Uwe Benkeltoum, Nordine Berges, Idoia Bermudez, Jesus Berre, Arne.J Bettin, Jorn Biffl, Stefan Bigand, Michel Bourey, Jean-Pierre Clark, Tony Clin, David Czarnecki, Krzysztof Diskin, Zinovy Ehrig, Hartmut Elvesaeter, Brian Goñi, Alfredo Golas, Ulrike Goldschmidt, Thomas Hahn, Christian Happe, Jens Hermann, Frank Illarramendi, Arantza
52 3 10 10 89 79 15 3 3 79 3 42 42 22 61 10 22 69 61 69 22 10
Jacobi, Sven Kapova, Lucia Kappel, Gerti Kusel, Angelika Lemrabet, Youness Liao, Yongxin Moser, Thomas Orejas, Fernando Paige, Richard Panlenko, Dima Radjenovic, Alek Retschitzegger, Werner Reussner, Ralf Roman, Dumitru Schoenboeck, Johannes Schwinger, Wieland Seifert, Mirko Sunindyo, Wikan Danar Wende, Christian Wimmer, Manuel Winkler, Dietmar Xiong, Yingfei
61 69 32 32 3 89 15 22 98 61 98 32 69 89 32 32 52 15 52 32 15 42
Program Committee Patrick Albert Uwe Assmann Colin Atkinson Jorn Bettin Jean Pierre Bourey Tony Clark Robert Clarisó Gregor Engels Jean Marie Favre Robert France Dragan Gasevic Sébastien Gérard Martin Gogolla Jeff Gray Esther Guerra Tihamer Levendovszky Richard Paige Alfonso Pierantonio Bernhard Rumpe Jim Steel Hans Vangheluwe Andrew Watson Jon Whittle Manuel Wimmer
IBM, France Technische Universitat Dresden, Germany University of Mannheim, Germany Sofismo AG, Switzerland Laboratoire de Génie Industriel de Lille, France Middlesex University, UK Universitat Oberta de Catalunya, Spain University of Paderborn, Germany University of Grenoble, France Colorado University, USA Atabasca University, Canada CEA LIST, France University of Bremen, Germany University of Alabama, USA Carlos III University, Spain Vanderbilt University, USA University of York, UK University of L'Aquila, Italy Aachen University, Germany Queensland University of Technology, Australia University of Antwerp, Belgium OMG, Needham, USA Lancaster University, UK Viena University of Technology, Austria
Additional reviewers Fabian Buettner, Lars Hamann, Mirco Kuhlmann, Ivano Malavolta, Antonio Navarro Perez, Ingo Weisemoeller, Christian Wende, Claas Wilke.
iv
Editorial to the Proceedings of the First International Workshop on Model-Driven Interoperability Jean Bézivin
Richard Mark Soley
Antonio Vallecillo
INRIA and Ecole de Mines de Nantes 4 rue Alfred Kastler - F-44307 Nantes Cedex 3 - France +33 251 858 704
Object Management Group, Inc. Building A, Suite 300.140 Kendrick Street. Needham, MA 02494 +1 781 444 0404
Universidad de Málaga Bulevar Louis Pasteur 35 29071 Málaga, Spain +34 952 132794
[email protected]
[email protected]
[email protected]
Model interoperability is much more complex than simply defining a local serialization format, e.g., XMI. This would just resolve the syntactic (or “plumbing”) issues between models and modeling tools. However, interoperability should also involve further aspects, including behavioral specifications of models (which in turn describe the behavioral aspects of the systems being modeled), and other “semantic” issues [2] such as agreements on names, context-sensitive information, agreements on concepts (ontologies), integration conflict analysis (including for example automatic data model matching), semantic reasoning, etc. Furthermore, interoperability not only means being able to exchange information and to use the information that has been exchanged [3], but also to exchange services and functions to operate effectively together. All these interoperability issues and needs become clear in any complex system, as it has recently happened in the HL/7 and DICOM healthcare projects, for instance.
ABSTRACT This paper describes the scope, structure and contents of the First International Workshop on Model Driven Interoperability (MDI 2010), which was held on October 5, 2010, in conjunction with the MoDELS 2010 conference in Oslo, Norway.
Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability. I.6.5 [Simulation and Modeling]: Model Development – Modeling methodologies.
General Terms Design, Standardization, Languages.
Keywords Model-driven engineering, interoperability.
Models and MDE techniques (especially metamodeling and model transformations) can play a fundamental role for fully accomplishing these tasks. Thus, models can become cornerstone elements for enabling and achieving interoperability between all kinds of systems and artifacts, including data sets (under the presence of different data schemata, and possibly at different levels of abstraction), services (despite their differences in data representation, access protocols and underlying technological platforms), event systems (with different complex types and origins), languages (that use different notations and may have different semantics), tools (with different data formats and semantic representations), technological platforms (with different notations, tools and semantics), etc. It should also be emphasized that the success of MDE has created accidental complexity, for example by generating a number of overlapping metamodels (UML, SySML, BPML, etc.) and this situation reveals itself in a number of contexts as an additional metamodel interoperability problem.
1. INTRODUCTION Interoperability is the ability of separate entities, systems or artifacts (organizations, programs, tools, etc.) to work together. Although there has always been the need to achieve interoperability between heterogeneous systems and notations [1], the difficulties involved in overcoming their differences, the lack of consensus on the common standards to use and the shortage of proper mechanisms and tools, have severely hampered this task. Model-Driven Engineering (MDE) is an emergent discipline that advocates the use of (software) models as primary artifacts of the software engineering process. In addition to the initial goals of being useful to capture user requirements and architectural concerns, and to generate code from them, models are proving to be effective for many other engineering tasks. New model-driven engineering approaches, such as model-driven modernization, models-at-runtime, model-based testing, etc. are constantly emerging.
2. THE MDI 2010 WORKSHOP The goal of the MDI2010 workshop was to discuss the potential role of models as key enablers for interoperability, and the challenges ahead. The workshop aims to provide a venue where researchers and practitioners concerned with all aspects of models and systems interoperability could meet, disseminate and exchange ideas and problems, identify some of the key issues related to model-driven interoperability, and explore together possible solutions.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10…$10.00.
1
The MDI 2010 workshop was held on October 5, 2010, in conjunction with the MoDELS 2010 conference in Oslo, Norway.
The Workshop was a huge success. An excellent Program Committee was assembled to help with the review process, which included very well-known and respected experts in the topics of the workshop: Patrick Albert, Uwe Assmann, Colin Atkinson, Jorn Bettin, Jean Pierre Bourey, Tony Clark, Robert Clarisó, Gregor Engels, Jean Marie Favre, Robert France, Dragan Gasevic, Sébastien Gérard, Martin Gogolla, Jeff Gray, Esther Guerra, Tihamer Levendovszky, Richard Paige, Alfonso Pierantonio, Bernhard Rumpe, Jim Steel, Hans Vangheluwe, Andrew Watson, Jon Whittle and Manuel Wimmer.
In response to the call for papers, a total of 19 submissions were received. Submitted papers were formally peer-reviewed by three referees, and 12 papers were finally accepted for presentation at the workshop and publication at the Proceedings, that have been published in the ACM Digital Library.
We counted with some external reviewers that helped PC members to review the papers: Fabian Buettner, Lars Hamann, Mirco Kuhlmann, Ivano Malavolta, Antonio Navarro Perez, Ingo Weisemoeller, Christian Wende and Claas Wilke.
These papers contribute in different aspects to the area of model driven interoperability, from its foundations to the potential benefits it may bring to the emerging field of MDE.
The workshop was organized in four sessions. The first three were dedicated to the presentation of the selected papers. The last session was dedicated to discussions among the participants about the open issues and topics identified during the paper presentations.
4. ACKNOWLEDGMENTS We would like to thank the MoDELS 2010 organization for giving us the opportunity to organize this Workshop, especially to the Workshop Chairs, Juergen Dingel and Arnor Solberg. Many thanks to all those that submitted papers, and particularly to the contributing authors. Our gratitude also goes to the paper reviewers and the members of the MDI 2010 Program Committee, for their timely and accurate reviews and for their help in choosing and improving the selected papers. Finally we would like to acknowledge the research projects TIN2008-03107 and P07-TIC-03184 that have helped supporting this workshop.
3. WORKSHOP PAPERS The following 12 papers were presented in the workshop:
Retschitzegger, Johannes Schoenboeck and Wieland Schwinger. “Specifying Overlaps of Heterogeneous Models for Global Consistency Checking” by Zinovy Diskin, Yingfei Xiong and Krzysztof Czarnecki. “Anticipating Unanticipated Tool Interoperability using Role Models” by Mirko Seifert, Christian Wende and Uwe Assmann. “Behavioural Interoperability to Support Model-Driven Systems Integration” by Alek Radjenovic and Richard Paige. “Aligning Business and IT Models in Service-Oriented Architectures using BPMN and SoaML” by Brian Elvesæter, Dima Panfilenko, Sven Jacobi and Christian Hahn. “Domain-specific Templates for Refinement Transformations” by Lucia Kapova , Thomas Goldschmidt , Jens Happe and Ralf Reussner. “Advanced Modelling Made Simple with the Gmodel Metalanguage” by Jorn Bettin and Tony Clark. “Model-driven Rule-based Mediation in XML Data” by Yongxin Liao, Dumitru Roman and Arne.J. Berre.
“Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project” by Youness Lemrabet, David Clin, Michel Bigand, Jean-Pierre Bourey and Nordine Benkeltoum. “Semantic Interoperability of Clinical Data Exchange” by Idoia Berges, Jesús Bermudez, Alfredo Goñi and Arantza Illarramendi. “A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering” by Wikan Danar Sunindyo and Thomas Moser. “Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars” by Frank Hermann, Hartmut Ehrig, Ulrike Golas and Fernando Orejas. “Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities” by Manuel Wimmer, Gerti Kappel, Angelika Kusel, Werner
5. REFERENCES [1] Wegner, P., Interoperability, ACM Comput. Surv., 28, 1 (March 1996), 285-287 [2] Heiler. S., Semantic interoperability. ACM Comput. Surv. 27, 2 (Jun. 1995), 271-273. [3]
2
Institute of Electrical and Electronics Engineers. IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries. 1990.
Model Driven Interoperability in practice: preliminary evidences and issues from an industrial project Youness Lemrabet Univ Lille Nord de France, F-59000 Lille, France
Michel Bigand Univ Lille Nord de France, F-59000 Lille, France
David.Clin Univ Lille Nord de France, F-59000 Lille, France
LM2O, Ecole Centrale de Lille, BP48 LM2O, Ecole Centrale de Lille, BP48 LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, 59651 Villeneuve d'Ascq cedex, 59651 Villeneuve d'Ascq cedex, France. France. France. (+33) 6 71 15 33 55 (+33) 3 20 33 54 60 (+33) 3 20 67 60 25
[email protected]
[email protected]
[email protected] Jean-Pierre Bourey Univ Lille Nord de France, F-59000 Lille, France
Nordine BENKELTOUM Univ Lille Nord de France, F-59000 Lille, France
LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, France. (+33) 320 33 54 08
LM2O, Ecole Centrale de Lille, BP48 59651 Villeneuve d'Ascq cedex, France. (+33) 20 67 60 25
[email protected]
[email protected]
ABSTRACT
1. INTRODUCTION
Problems of interoperability inside and outside organizations have recently been the subject of considerable amount of studies. Although the Model Driven Interoperability (MDI) and Service Oriented Architecture approaches are widely accepted among scholars to improve interoperability, little was known about the ins and outs of the combination between these approaches in practice. This article is based on an industrial project called ASICOM which aimed at building a platform that enables interoperability among industrial partners. It suggests some preliminary evidences and issues for both theories and practices.
Interoperability is defined as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [1]. It is an important issue for Information Systems (IS) practitioners since the growing need of integration of heterogeneous IS. Enterprises and more widely organizations meet problems that are similar to the lack of interoperability. Enterprises have to fit their functions and processes taking into consideration internal and external constraints. Thanks to this strategy they are able to take advantage of new business opportunities and improve their competitiveness by delivering high quality products/services while keeping the production cost as low as possible [2].
Categories and Subject Descriptors [Software Engineering]: Interoperability
Recent studies show that Model Driven Interoperability (MDI) and a Service Oriented Architecture (SOA) can be combined to support interoperability [3]. The main research question of this article is the following: “how combine MDI and SOA approaches in a collaborative context to improve interoperability and strategic alignment of IS?” The paper reflects on aspects of enterprise interoperability within the framework of the ASICOM project.
[Simulation and Modeling]: Model Development – Modeling methodologies.
General Terms Experimentation, Languages
Keywords
ASICOM project aimed at providing Small and Medium Enterprises (SMEs) from trade and logistics sectors with a pragmatic and generic approach that allows to set up simplified, interoperable and adaptable solutions that improve communication with their partners throughout dematerialization. More precisely, the ASICOM project focuses on customers (firm from retail industry and stockiest) relations to make administrative procedures easier (i.e.: goods clearance procedure, customs’ duties payment). Furthermore, it will allow SME’s to manage their customers’ bonded warehouse in which dutiable goods are
Model Driven Interoperability (MDI), Business Process Management (BPM), Business Process Modeling Notation (BPMN), Service oriented architecture (SOA), ATHENA Interoperability framework (AIF). Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00 3
stored and manipulated without payment of duty and also to communicate with the French Administrative custom systems like Delt@D and Delt@C systems using dematerialized documents. In the ASICOM project the Service Oriented Architecture was chosen to guide and facilitate alignment effort between business models and IT models. SOA provides the required flexibility to integrate new SMEs to the ASICOM project. Based on our own implication in the ASICOM project, the existing interoperability framework and modeling practices, we identified the elements that have to be taken into consideration in an enterprise interoperability project. Especially for projects that use model-driven development and service-oriented architecture as a key solution to tackle the interoperability problem. To facilitate the interoperability and communication at both the modeling and the technical levels, we assume the use of existence modeling practices and standard notations such as Model-Driven Architecture (MDA)1, Business Motivation Model (BMM)2, Business Process Modeling Notation (BPMN)3 , Service oriented architecture Modeling Language (SoaML)4, Business, Business Process Execution Language (BPEL)5, Unified Modeling Language (UML)6, eXtended Markup Language (XML)7, and Web Service Description Language (WSDL)8.
Figure 1. Reference Model for MDI. The Model Driven Interoperability (MDI) proposal [4] explains how a model-driven approach can be a useful way to solve . interoperability problems. It attempts to introduce different abstraction levels to reduce the gap between enterprise models and code level. The level definition is based on the three levels of MDA: CIM, PIM, and PSM.
The remainder of this paper is divided into four parts. The second section deals with the state of the art on MDI, SOA and SoaML. The third section introduces advantages and challenges for model driven systems and a reflection on the combination of MDI and SOA approaches to support interoperability through the ATHENA Interoperability framework (AIF). It describes also evidences and issues from the ASICOM project. The paper closes by a conclusion and describes further research.
A considerable number of interoperability frameworks have evolved during the last 10 years [5]. Project like ATHENA [6] provides interoperability frameworks that explain how MDD should be applied in software engineering practice and support business interoperability. ATHENA Interoperability Framework (AIF) describes each system by enterprise models and different aspects. It focuses on the provided and required artifacts of each collaborating systems inside or outside an enterprise. In the AIF, interoperations take place at different viewpoints: enterprise/business, process, service and information/data. At each viewpoint a model-driven interoperability is prescribed.
2. RELATED WORK 2.1 Overview of MDI Model-Driven Development (MDD), and in particular OMG’s MDA is emerging as a standard in practice to develop model driven applications and systems. Figure 1 presents the Reference Model for MDI.
1
http://www.omg.org/cgi-bin/doc?omg/03-06-01.pdf
2
http://www.omg.org/cgi-bin/doc?formal/08-08- 02.pdf
3
http://www.omg.org/spec/BPMN/2.0
4
http://www.omg.org/spec/SoaML/1.0/Beta2/
5
http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0-OS.pdf
6
http://www.omg.org/spec/UML/2.2/
7
http://www.w3.org/TR/2008/REC-xml-20081126/
8
http://www.w3.org/TR/2008/REC-xml-20081126/
Figure 2. AIF conceptual framework-simplistic view. Figure 2 is derived from the ATHENA Interoperability . Framework. It gives a simplistic view of the reference model that indicates the required and provided artifacts of two collaborating enterprises. Each enterprise is described by enterprise models and different viewpoints (business, process, service, information) on
4
using Unified Modeling Language10 (UML). It is a set of extensions to UML that define SOA concepts and support service modeling and designing [10]. The goal of SoaML is also to support automatic generation of SOA derived artifacts following an MDA approach.
different abstract levels. For [7] interoperations are only meaningful where all aspects of an enterprise are addressed.
2.2 Overview of SOA The expression service-oriented architecture (SOA) refers to a way of organizing and understanding organizations, communities and systems to maximize agility and scale. Thus it is also seen as an architectural approach, guideline and pattern to realize a system through a set of provided and required services.
SoaML offer several benefits such as:
SOA is technology independent; it means that the choice of technologies and tools is secondary. Various technologies might be used to support SOA implementation. According to a recent research [8], “to achieve its potential, an SOA needs to be business-relevant, thus driven by the business and implemented to support the business”.
2.2.1 SOA infrastructure Patterns While SOA infrastructure is far from being sufficient in making SOA work, it is a necessary component that underlies any architectural approach [9]. It is crucial to understand the merit of each infrastructure pattern before choosing a style of infrastructure (See Section 3.3.2). It is also important to note that a discussion about the targeted SOA infrastructure patterns does not map a specific vendor or open source application infrastructure on one to one basis. Many products implement a hybrid infrastructure patterns. According to [9] there are four SOA infrastructure patterns: First the service container infrastructure pattern: In this pattern the service can be implemented on a “container” that provide a runtime environment which coordinates the service interactions by marshalling request to and from the service. For example in this type of infrastructure a service can be implemented as a servlet in an application server platform. Then, the hub-and-spoke infrastructure pattern, in which an integration middleware platform acts as the coordinate point for all the interactions between services, this coordination point interacts with services through adapters. This pattern is known as Enterprise Application Integration. The third pattern is the centralized messaging infrastructure pattern which leverage message-oriented middleware and messaging infrastructure to coordinate messages between services (managing the messages matters more than managing the specific runtime endpoint). Consequently, rather than connecting to Service endpoints through adapters or a hub-and-spoke approach, one simply needs to instrument the end points to utilize a particular message bus or publish/subscribe infrastructure. And fourthly, the network intermediary as infrastructure pattern, in this pattern the challenge is to use a single standard for system interoperability at the seventh layer of the OSI9 network. Intelligent network can be used as SOA infrastructure to intermediate the interactions between services. To perform this feature the seventh layer of the OSI model network must be more specific, intelligent, and enabled with respect to Services [9].
allowing service interoperability at the model level;
enabling a community or organization to work together using SOA services at a higher level of abstraction;
addressing service interaction concerns at the architectural levels by using architecture as the bridge between business requirements and automated IT solutions;
Leveraging standards.
and
integrating
with
existing
OMG
3. SOLUTION AND LESSONS LEARNED: SOA TO RATIONALISE MDI Our work has been inspired by will-known and existing frameworks. Several engineering methods and frameworks dealing with the design, construction, implementation, governance and tools for the development of information systems exist. These methods belong to the following areas: (i) Model driven development (MDD) frameworks. (ii) Enterprise architecture (EA) methodologies frameworks. (iii) Service oriented development methodologies and frameworks. The most known Enterprise Modelling Frameworks and Architectures are: The Zachman Framework [11], The Open Group Architecture Framework [12], The GERAM Framework from ISO IS 15704:2000 [13], the GIM architecture [14], The CIMOSA Framework [15] and Praxeme methodology [16]. However these frameworks don’t give a special focus on interoperability problems. Other projects (Shape11 and Bsopt12) aim to support the development of enterprise systems by developing a methodology backed by the concepts of SOA’s and a model driven engineering tool set. ATHENA project is based on a multidisciplinary approach that combine three research fields to support the development of enterprise interoperability [7]: (i) enterprise modeling which defines interoperability requirements and supports solution implementation, (ii) architectures and platforms which provide the technological base of interoperability system, and (iii) ontology which identifies interoperability semantics in the enterprise. We do not take into consideration the ontology since we consider it to be outside the scope of this study. We rather focus on enterprise modeling and architecture and platforms areas. The idea of interoperability is multi-faces. Actually, it is necessary to distinguish the interoperability concepts. Using the AIF and MDI approaches, we suggest to use a grid to capture the good
2.2.2 Overview of SoaML The OMG standard Service oriented architecture Modelling Language (SoaML) is aimed at taking advantages of SOA. SoaML provides a new way of designing and modelling SOA solutions
9
Open Systems Interconnection
5
10
http://www.omg.org/spec/UML/2.3/
11
http://www.shape-project.eu
12
http://www.bsopt.at/
practices at each level of MDI (CIM, PIM, PSM) for each level defined in the AIF (business, process, service and information).
3.1.3 PSM level At this level each partners must first choose a style of architecture to implement (e.g. SOA), and then understand the various styles of projects to build. The Gartner Group identifies three styles of projects (which are not detailed here) [21] [22]:
Table 1. MDI approach in each aspect of AIF. CIM
PIM
PSM
Business Process Service
Data Table 1 aims to give a holistic perspective on interoperability to allow each partner to analyze and understand their business needs and technical requirements. This grid defines interoperability components as a set of sub-domains. The interaction of a level (line) and an aspect (column) constitutes a sub-domain. The 12 sub-domains of interoperability make easier the definition of expertise area among partners. However the fulfillment of all subdomains is not a sign of excellence or maturity. A partner is fully interoperable in the sense that new business relationships can be done at low costs [17]. This section does not address the full scope of each AIF aspect, but rather suggests an overview of the main issues of this project. We will give examples and describe each level of interoperability based on our experience in the ASICOM project. The following gives an overview of the central formalisms and concepts as well as the methods of each level of the matrix.
Execute a new and complete SOA approach: the primary objective of these projects is the design, creation and execution of new SOA artifacts. Composite applications and business process support: the primary objective of these projects is the assembly and deployment of composite applications and processes. Orchestration of services in support of an application process is important. The focus of these projects is on combining existing functionality rather than creating new business functionality. Application integration: the primary objective of these projects is the integration of the data and business logic of applications.
3.2 Process Aspect Business process models contain what has to be done in the business to achieve the business goals and vision [8]. Business analyst starts by distinguishing business processes from goals and models. The OMG specification BPMN can be used to capture the business processes that can be shared between the stakeholders. BPMN is very expressive and provides a notation that is intuitive to business users. In this methodology, business processes are designed to cover many types of modeling and can be used at different levels of details (CIM and PIM).
3.1 Business aspect Interoperability at this level is seen as the organizational and operational ability of an enterprise to cooperate with external organization in spite of different working practices, legislations, cultures and commercial approaches [18]. Cooperating partners must have a compatible vision and focusing on the same elements [19]. Thus each partner must start by focusing on its business goals and project objectives using business modeling practices. BMM should be used to define clear goals and objectives of each partner. An industrial network is not a stable and permanent entity, business objectives of each partner can change, and this evolution must be taken into account.
3.2.1 CIM level BPMN choreography diagrams which focus on the exchange of information between participants can be used at this level to create the initial drafts of processes. Nevertheless, these models must be further refine and related to other kind of BPMN models. The BPMN 2.0 specifies that implementation is not expected to support directly choreography modeling elements.
3.2.2 PIM level The BPMN choreography Business process can be refined using BPMN collaboration diagrams which describe in details collaboration between participants. First, the business analyst has to identify two types of business processes: (i) public business processes that involve by the interaction with the partners and (ii) private business processes under the ownership control of each participant. Then it has to identify the parts of the process to computerize.
3.1.1 CIM level At this level partners have to find the factors that motivate the establishment of business plans and business perspectives by interviews involving relevant stakeholders and workshops.
3.1.2 PIM level The PIM specifies the elements of business plans and stresses on the description of business goals, tactics and rules. It is necessary to define the interoperability approach and then to choose the project style that will be implemented. According to ISO 14258 (1990), there are three ways to establish interoperations between related systems (which are not details here) [20]: (i) Integrated approach (ii) Unified approach (iii) Federal approach. In the ASICOM project, none of the partners imposes their models, languages and methods of work: (i) all partners do not use a common format for all models (not integrated) and (ii) there is no common meta-model between partners (not Unified). The chosen way to tackle interoperability issue is the federal interoperability approach.
Then BPMN models are mapped to more technical models on the PSM level using Business Process Execution Language (BPEL).
3.2.3 PSM level BPMN specifications explicitly suggest BPEL to be used for the execution of business process. After the description of orchestration process in BPMN, they can be formalized and refined with the implementation details using BPEL that describes how the partners collaborate.
6
processes have to be able to respond to changes in the customs regulations through a single shared business solution.
3.3 Service aspect The main concern at this aspect is to identify SOA services which can be used to enable business agility through business processes reuse. This viewpoint bridges the gap between business requirements and a service based solution. According to [23], BPMN model does not contain all information needed to implement SOA. Consequently, modeling services can be supported by SoaML formalism which can be used to model services at the CIM level and then subsequently refining them towards a platform-specific implementation. SOA has been associated with a variety of approaches such as Service-Oriented Analysis and Design (SOAD) [24], Service-Oriented Modeling and Architecture (SOMA) [25], Praxeme [9]) and technologies such as Enterprise Service Bus (ESB). The different approaches are intended to identify SOA services. The service-centric approach suggested by [5], argues that a goal-driven identification of services allows a better strategic alignment. In this approach BMM and SoaML are used to describe the realization of interoperability through business services. This approach proposes to map BMM to business services instead of business process to reduce the complexity introduced by the interorganizational business process.
At this level SoaML models should be used to give to support IT concerns. The most used SoaML concepts are: ServiceInterface and MessageType.
3.3.3 PSM level A lot of products propose different infrastructure patterns or hybrids SOA infrastructural approaches. At this level, it is important to provide an answer to this question: what type of SOA solution to implement? Architect must care about two points: the targeted technologies to implement SOA services and the application infrastructure13 to support SOA solutions. Both points will be discussed below. The architect must specify the implementation artifacts of the services-oriented architecture of the chosen technology, e.g. Web Services, Java Enterprise Edition (JEE), .NET, multi-agent systems (MAS). Then it has also to choose the adapted application infrastructure to implement SOA solutions: The diversity, heterogeneity of application solutions, business processes, and the business context of each partner must be considered [7]. In the ASICOM project, two target infrastructures have been tested to support SOA solutions: the open source Petals ESB from the Petals SOA Suite14 and the Business Process Management solution BizAgi15. BizAgi BPM suite is very intuitive but it suffers from the limitation of the technology (i.e.: it does not support the execution of BPEL and it provides access to existing applications through Web services only). Consequently, we chose the standard-based integration platform Petals ESB as application infrastructure.
The implementation of each interoperability approach can be supported by one or many SOA infrastructure patterns (see Section 2.4.1).
3.3.1 CIM level To work together the participant must agree about a formalism to describe services at a high level of abstraction. SoaML can be used to model services both at CIM and PIM levels. For more details the MDSE methodology [26] and IBM [8] provide guidelines for how to use SoaML to define and specify a serviceoriented architecture. SoaML concepts as Capability, Participant, ServiceArchitacture and ServiceContract can be used at this level. Those concepts give a top view and describe the communication between different participants. They are used to express the business operations supported with the service-oriented architecture.
In the ASICOM project we chose Web Services and Java Enterprise Edition (JEE) technologies to build upon SOA services. Thus WSDL and XML Schema Definition Language16 (XSD) are used to support syntactical Interoperation.
3.4 Data aspect The Data models have to be studied in parallel with the process and service models. A traditional item in service and process modeling is to create and manipulate the information. As we said in section 3.3 the data structure deficit is evident in BPMN; the concept of message flow is not supported by data models [25]. Data and information models are out of the scope of BPMN and UML class diagrams can be used to describe messages.
3.3.2 PIM level Even if implementing SOA should not depend on a SOA platform strategy, enterprises have to define a SOA platform target and a SOA infrastructure patterns (see Section 2.4.1). The partners do not have to choose a specific product at this level, but the discussion about the target SOA infrastructure patterns is very important. It is imperative for each partner to understand the advantages and drawbacks of each SOA infrastructure patterns. Thus, the definition of the SOA patterns must be strongly motivated by the interoperability approach chosen at the PIM level of the Business aspect (see Section 3.1.2). In the ASICOM project, a Mediation Information System (MIS) was chosen to support the mediation interoperability approach. The MIS is in charge of (i) information exchange, (ii) services sharing and (iii) behavior orchestration [27]. And at least the MIS must implements the centralized messaging infrastructure pattern.
In this section we present a very simple example from the ASICOM warehouse management module data-model and we show how it can be refined to generate a physical model which represents the relational database concepts. The warehouse
In the ASICOM project a MIS seems to be a pertinent way of supporting interoperability for three reasons. Firstly, the members of the ASICOM project need to communicate with their own channels. Secondly, their systems are not adapted to exchange information between each other. Thirdly, the collaborative
7
13
Gartner has defined a new category of software called “Application Infrastructure”. “Application infrastructure includes the majority of runtime middleware, as well as application development and management tools that support the new generation of applications, based on service-oriented architecture (SOA), event-driven architecture (EDA) and business process management (BPM)” [21]
14
http://www.petalslink.com/en
15
http://www.bizagi.com/
16
http://www.w3.org/TR/xmlschema-2/
management module provides functionality to manage multi and structured stock locations.
resolve the interoperability problem. This initial work shows that model-driven approach and service-oriented architecture enhance interoperability. However, a number of challenges must be overcome.
3.4.1 CIM level UML has become widely used in object-oriented system modeling such as J2EE and .NET. A first version of conceptual data-model can be done at this level figure 3. This model permits to identify the different entities and how they relate to one another.
The ASICOM project is based on a processes-centered approach which associates methodologies, information technologies and governance. It aims to allow people from different background to collaborate together on a project of interoperability. Our next goal is to refine and investigated further in detail the relation between aspects. We believe that being able to model service orchestration with BPMN and BPEL and services details with SoaML to generate SOA artifacts is an important step to solve interoperability problem. We will continue to work on services modeling and transformation, in particular using the Software and Systems Process Engineering Meta-Model (SPEM) to defining the development process in an interoperable project using a service-oriented architecture.
Figure 3. Data-model at the CIM level.
3.4.2 PIM level
. The conceptual data-model is refined at the PIM level. So we add the details to the logical model without worrying about how they will be implemented. For example Data type can be added to the diagram at this level (figure 4).
5. ACKNOWLEDGMENTS This work was partially funded by the ASICOM project. This project started in April 2008 was approved by two French poles of competitiveness: PICOM in Trade Industries domain and Nov@log in Logistics domain.
6. REFERENCES [1] IEEE, 1990. IEEE (Institute of Electrical and Electronics Engineers): Standard Computer Dictionary- A Compilation of IEEE Standard Computer Glossaries. [2] Jean-Pierre Lorre, Yiannis Verginadis, Nikos Papageorgiou, and Nicolas Salatge. 2010. Ad-hoc Execution of Collaboration Patterns using Dynamic Orchestration. Enterprise interoperability IV 2010, Part I, 3-12, DOI: 10.1007/978-1-84996-257-5_1.
Figure 4. Data-model at the PIM level.
3.4.3 PSM level . has become a standard objectThe Unified Modeling Language oriented system modeling language and is supported by major corporations. Thus it can be used for object-relational database modeling. There are many techniques for transforming UML models to object-relational database systems, as discussed in [28]. Those techniques focused on transformations and are suited to be used with the Model Driven Development (MMD) approach.
[3] ATHENA. Model-Driven Interoperability (MDI) Framework, http://www.modelbased.net/mdi/framework.html [4] Jean-Pierre Bourey, Reyes Grangel, Guy Doumeingts, Arne J. Berre, Report on Model Driven Interoperability. Technical Report, INTEROP, 2007. http://interopvlab.eu/ei_public_deliverables/interop-noe-deliverables.
At this level details are added to the PIM models (UML class diagrams) to adapt it to a specific platform (i.e.: relational database). Figure 5 shows a very simple example from the database model diagram of the ASICOM project. This diagram includes concepts as tables, columns, views, and foreign keys.
[5] Fenglin Han, Espen Moller, Arne.J.Berre. 2009. Organizational interoperability supported through goal alignment with BMM and service collaboration with SoaML. Interoperability for Enterprise Software and Applications (268 – 274). IESA '09. International Conference, China (2122 April 2009). [6] ATHENA, Advanced Technologies for Interoperability of Heterogeneous Enterprise Networks and their Applications, FP6-2002-IST-1, Integrated Project, (April. 2003). [7] Arne-Jørgen Berre, Brian Elvesæter1, Nicolas Figay, Claudia Guglielmina, Svein G. Johnsen, Dag Karlsen, Thomas Knothe and Sonia Lippe. 2007. The ATHENA Interoperability Framework. Enterprise Interoperability II, 2007, Part VI, 569-580, DOI: 10.1007/978-1-84628-8586_62.
Figure 5. Physical model at the PSM level.
4. CONCLUSION
. In this paper we have introduced a new practical vision of interoperability based on Model Driven Architecture and ATHENA Interoperability Framework. Our research goal is not to propose yet another approach, but combine existing ones to
[8] Jim Amsden, Modeling with SoaML, the Service-Oriented Architecture Modeling. (January. 2010)
8
http://www.ibm.com/developerworks/rational/library/09/mod elingwithsoaml-1/index.html
[21] Hayward, Simon. and Natis, Yefim. V. 2006. Application Infrastructure' Reflects New Dynamics in the Software Market. Gartner (December. 2006).
[9] Ronald Schmelzer. 2007. SOA Infrastructure Patterns and the Intermediary Approach (July. 2007) http://www.zapthink.com/2007/07/04/soa-infrastructurepatterns-and-the-intermediary-approach/.
[22] Johan den Haan. 2008. Architecture requirements for Service-Oriented Business Applications, (May. 2008), http://www.theenterprisearchitect.eu/archive/2008/05/19/arc hitecture-requirements-for-service-oriented-businessapplications.
[10] Michael Stollberg. 2009. Integrated and tool-supported Methodology Deliverable D2.2 – Initial Version – Work Package 2, SHAPE Project No 216408 (January. 2009).
[23] Jihed Touzi, Frédérick Bénaben, Hervé Pingaud, Jean-Pierre Lorré. 2009. A model-Driven approach for collaborative service-oriented architecture design. International journal of production economics, Volume 121, Issue 1, Pages 5-20, Modelling and Control of Productive Systems: Concepts and Applications Elsevier (September. 2009).
[11] Zachman, A Framework for Information Systems Architecture, IBM Systems Journal, vol. 31, no. 3, pp. 445– 470, 1999. [12] The Open Group Architecture Framework. 2009. TOGAF version 9, http://www.opengroup.org/
[24] O. Zimmermann, P. Krogdahl, and C. Gee, Elements of Service-Oriented Analysis and Design, An interdisciplinary modeling approach for SOA projects, IBM, 2 June 2004. http://www128.ibm.com/developerworks/webservices/library/ws-soad1/
[13] IFIP-IFAC Task Force, 1999. GERAM: Generalized Enterprise Reference Architecture and Methodology, Version 1.6.2, Annex to ISO WD15704, IFIP-IFAC [14] Doumeingts G., Vallespir B., Zanettin M., Chen D. 1992 GIM: GRAI IntegratedMethodology. A methodology for designing CIM systems. GRAI/LAP. Université-Bordeaux 1, version 1.0.
[25] Arsanjani A., Service-oriented modeling and architecture: how to identify, specify, and realize servIces for your SOA. IBM whitepaper, 2004.
[15] AMICE 1993. CIMOSA: Open System Architecture for CIM. 2nd extended revised version. Springer-Verlag, Berlin.
[26] Brian Elvesæter, Cyril Carrez, Parastoo Mohagheghi, ArneJørgen Berre, Svein G. Model-Based Development with SoaML.2010. http://www.uio.no/studier/emner/matnat/ifi/INF5120/v10/un dervisningsmateriale/MDSE-SoaML-INF5120.pdf.
[16] Praxeme Institue, Version 2.0, (June.2006), http://www.praxeme [17] Roland Jochem. 2010. Enterprise Interoperability assessment, 8th International Conference of Modeling and Simulation, MOSIM’10 - Hammamet-Tunisia, (December, 2010).
[27] Frédérick Bénaben, Jihed Touzi, Vatcharaphum Rajsiri, Sebastien Truptil, Jean-Pierre Lorré, and Hervé Pingaud. 2008. Mediation Information System Design in a Collaborative SOA Context through a MDD Approach (June. 2008).
[18] ATHENA. 2005. D.A1.3.1: Report on Methodology description and guidelines definition, Version 1.0, ATHENA Integrated Project, Deliverable D.A1.3.1, (March. 2005).
[28] E.S. Grant, R. Chennamaneni, and H. Reza, Towards analyzing UML class diagram models to object-relational database systems transformations, Proceedings of the 24th IASTED international conference on Database and applications, Innsbruck, Austria: ACTA Press, 2006, pp. 129-134.
[19] Arne-Jørgen Berre, Brian Elvesæter. 2008. Model-based System Development Part IV : MDI – Model Driven Interoperability Notes for Course material “Model Based System Development, INF5120 , (2008). [20] Chen, D. Dassisti, M. and Tsalgatidou, A. 2005. Interoperability Knowledge Corpus, An intermediate Report, Deliverable DI.1, Workpackage DI (Domain of Interoperability), INTEROP NoE, (November. 2005).
9
Semantic Interoperability of Clinical Data Idoia Berges
University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain
[email protected]
Jesus Bermudez
University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain
[email protected] Arantza Illarramendi
Alfredo Goñi
University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain
[email protected]
University of the Basque Country P. Manuel de Lardizabal 1 Donostia-San Sebastian, Spain
[email protected] ABSTRACT
the links among the terms of the upper and lower levels of the ontology. It obtains a declarative mapping specified in OWL2 and puts a wide range of mapping scenarios within reach of health systems’ administrators.
The use of Electronic Health Records (EHRs) has brought multiple benefits to the healthcare domain. However, those advantages would be greater if seamless interoperability of EHRs between heterogeneous Health Information Systems were achieved. Nowadays, achieving that kind of interoperability is on the agenda of many national and regional initiatives, and in the majority of the cases, the problem is addressed through the use of different standards. In this paper we present a proposal that goes one step further and tackles the interoperability problem from a formal ontology driven perspective. So, our proposal allows one system to interpret on the fly clinical data sent by another one even when they use different representations. We present in the paper the three key components of the proposal: 1. An ontology that provides –in its upper level– a canonical representation of EHR statements, more precisely of medical observations, which can be then specialized –in the lower level– by health institutions according to their proprietary models, 2. A translator module that facilitates the definition of the lower level of the ontology from the particular EHRs data storage structures following a semiautomatic approach: first a translation process of underlying data structures, using –whenever possible– information about properties (functional dependencies, etc.) into ontology elements described in OWL2, and next, an edition process where the health system administrators can define new axioms to adjust and enrich the result obtained in the semi-automatic process. Finally we show the third component, a mapping module that helps in the task of defining
Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability
General Terms Design
1.
INTRODUCTION
It is of no doubt that Information Technologies are playing a relevant role in the research and improvement of the healthcare domain. In the case of Electronic Health Records, several advantages can be mentioned: first, legibility problems due to poor handwriting –which might lead to misunderstandings– are avoided. Moreover, EHRs hold great clinical decision support, by translating practice guidelines into automated reminders and actionable recommendation [10], which can lead to safer, less error-prone, less expensive and higher-quality care. Finally, another advantage is the possibility of exchanging EHRs among different organizations. A patient is likely to receive medical attention from several institutions over his lifetime, so it seems reasonable for each institution to have unrestricted access at any time to the previously recorded patient’s data. Authors in [4] have identified certain problems that can be avoided thanks to an effective exchange of EHRs: Communicating vital information like adverse drug reaction histories can prevent from deaths and other serious consequences. Moreover, providing the clinicians with easy access to patients’ previous test results eliminates unnecessary duplication of tests. Finally, monitoring chronically ill patients, which usually requires great costs and collaboration between many professionals at distinct points of care becomes easier. As beneficial as EHR interoperability may seem, nowadays it is still an unreached goal1 , mainly because Health Information Systems
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010 October 5, 2010, Oslo, Norway Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00.
1
10
Epsos project in the European community [6]
used within the medical institutions have been developed independently, which results in a high number of heterogeneous proprietary models for representing and recording EHR information. One of the most recurring approaches to solve interoperability issues is the use of standards. In the case of EHR interoperability, several standards are under development for this purpose, such as openEHR [16], CEN-13606 [5] and HL7-CDA [9]. The openEHR standard follows a dual model approach for representing EHRs: The Reference Information Model (RIM) contains basic and generic structures for representing EHR information. Terms such as list, table or entry are described in this level. It is a stable model which is not expected to change over the time. However, since the RIM is composed of a small number of classes and they are too general to describe the semantics that clinical terms require, another model is necessary: the archetype model. The archetype model describes knowledge elements, such as Heart Rate or Barthel Index that are created by using and restricting components of the RIM. The CEN standard also follows the aforementioned dual model approach and provides by now a quite simple RIM and few archetypes based on those of openEHR. Finally, HL7-CDA has been developed by HL7 and also follows the layered model approach. More precisely, it provides a RIM and a draft template specification, which in this standard represents the same idea of the openEHR archetypes. Although the idea of using a standard may seem suitable for the desired goal, the interoperability problem remains unsolved unless these standards merge into a single one. Moreover, in [11] three different levels of interoperability that can be considered for EHRs are described: level 1 refers to syntactical interoperability, level 2 to partial semantic interoperability and level 3 to full semantic interoperability. They also express that the research effort should be nowadays oriented to the development of mechanisms that will allow achieving full semantic interoperability, in which case neither language nor technological differences will prevent Health Information Systems to seamlessly integrate the received EHRs into the local model. In general, semantic interoperability is defined as the ability of one computer system to receive some information and interpret it in the same sense as intended by the sender system, without prior agreement on the nature of the exchanged data. In this paper we present a proposal to move towards the notion of full semantic interoperability of EHRs of medical observations, based on semantic web technologies, and more precisely on OWL2 [17] ontologies and corresponding reasoners. These technologies facilitate semantic interoperation between heterogeneous information systems ([15]; [2]) as opposed to other formats for interchanging data –such as XML– which do not deal with the semantics of the exchanged data [7]. Two general approaches for interoperability among systems are described in [12]: Using a canonical model to which the particular systems are linked or aligning the particular models two by two. The proposal presented in this paper is sustained in the former approach and additionally presents the following novel contributions:
• The development of the EHROnt ontology, which represents at different levels the definitions of clinical terms that appear in EHRs. At the Canonical level, it contains ontological definitions of EHR statements (in particular of medical observations) and at the Application level, it contains the specializations of the definitions of the Canonical level according to the standards mentioned previously or according to proprietary models of health institutions (it favors the notion of extensibility to different models). • The management of a reasoning mechanism that, using axioms stated in the ontology, infers knowledge that allows the discovery of more relationships among the different models used by the different Health Information Systems (it decreases the need of human intervention). • The provision of one module that facilitates the task of obtaining the definitions of the lower level of EHROnt from the particular EHRs data storage structures and another module that facilitates the task of linking definitions of the lower level to definitions of the upper level (it facilitates seamlessly adaptation of existing Health Information Systems). In the area of EHRs’ interoperability a certain number of related works can be found at present. Among those works closer to our proposal we can mention the following ones: Authors in [13] provide a solution to achieve interoperability between systems that have been developed under the HL7 RIM. However, this proposal requires that the source system has some prior knowledge about the target system and moreover, it does not tackle the communication between systems that use proprietary EHR specifications. In [3] ontology mappings are proposed between pairs of archetype-based models. Moreover, in [14] a software architecture that transforms one openEHR archetype into a CEN-13606 archetype is presented. Ontologies that describe archetype models of both standards, in addition to an integrated ontology are used in the process. Notice that in those works, the features of extensibility and lower grade of user intervention provided by our framework are not supported. In summary in this paper we show a proposal that allows one system to interpret on the fly clinical statements sent by another one –even when they use proprietary formats. We support our claim on the following techniques: • Logic-based descriptions: Representations of clinical statements considered by particular Health Information Systems, described using standards as well as proprietary models, are expressed in our approach by using OWL2 ontology axioms. Moreover, terms in those axioms are related with canonical ontology terms that focus their descriptions on language and technology independent aspects. This approach increases the opportunities of solving the interoperability issue since it relies mainly on semantic aspects. • Automated reasoning: All ontology descriptions, as well as the mappings among elements of the ontology, are expressed in the same formalism OWL2. This uniform representation allows the use of well-known reasoners in order to derive new axioms from the existing ones. Furthermore, the mismatch problem is avoided and automatic integration is facilitated.
• The use of formal ontologies as canonical conceptual model, which allows to focus on aspects that are independent of the languages or technologies used to describe EHRs (it favors the notion of semantic interoperability).
11
• Transfer mechanism: A process, guided by the previous two items, is implemented to transform a particular clinical statement from a health institution into a corresponding clinical statement for another health institution.
which the observation was taken; and the state of the patient, which is intended to record the state of the subject of the observation during the observation process. On the other hand, composite observations are composed of two or more observations, either simple or composite. They are intended to represent observations of phenomena such as the Glasgow Coma Scale (GCS) value –which is calculated as the sum of the values obtained from three simple observations: the Eye Response (EyeR), the Motor Response (MotorR) and the Verbal Response (VerbalR)– or the more complex Revised Trauma Score (RTS), a physiological scoring system for predicting death taking into account three measures: the aforementioned Glasgow Coma Scale value, the Systolic Blood Pressure (SysBP) and the Respiration Rate (RespRate). Below, we present some OWL2 axioms that represent classes of medical observations.
In the rest of the paper we present briefly first, the main features of the EHROnt ontology developed for representing different kinds of medical observations. Then, the main characteristics of the translator and mapping modules are presented in sections 3 and 4 respectively. We finish with some conclusions.
2.
CANONICAL REPRESENTATION OF MEDICAL OBSERVATIONS
In general, an EHR includes clinical statements such as observations, laboratory tests, diagnostic imaging reports, treatments, therapies, administered drugs and allergies. The different standards mentioned in the previous section reflect those kinds of statements in one or another way. Formally, a clinical statement is an expression of a discrete term of clinically related information that is recorded because of its relevance to the care of a patient [8]. In this paper we focus on the exchangeability of medical observations statements, which are used to record all notionally objective observations of phenomena and patient-reported phenomena, such as physical examinations, laboratory results or basic information about the patient (weight, sex,...). We advocate for representing those observations in one ontology called EHROnt. That ontology is made up of two layers (Canonical layer and Application layer ) that attempt to collect observation statements at different levels of abstraction. This division into layers allows a clearer visualization of the ontology, but it does not imply a technical division of it. The elements of the Canonical layer should be designed by experts in the medical field and they should be considered as a framework agreement. Moreover each element of the Canonical layer may be associated to its corresponding SNOMED code [19]. The elements of the Application layer, describe the medical observations as they are understood in the specific e-health systems. While the Canonical layer will be the same in all versions of EHROnt, the Application layer will be proper to each system. Thus, each health institution will be responsible for creating this layer and relating it to the Canonical layer, using the tools that we have developed to help this process and which will be described in sections 3 and 4, respectively. The representation of the statements described in the EHR standards also belongs to the Application layer. In the EHROnt ontology, the elements that compose EHRs are described as classes and properties using the OWL2 language. Moreover, in the Canonical layer we propose a subdivision of medical observations into two groups depending on their complexity: simple observations and composite observations. Simple observations have a single value and unit of measurement. Additionally, we have also identified three properties that may be relevant at the time of characterizing an observation: the protocol, which records information about how the observation process was carried out, either by indicating a particular clinical protocol (e.g. the Balke protocol for treadmill graded exercise testing) or the medical device used for taking the measurement (e.g. a stethoscope); the anatomical site, to indicate the specific body location in
Observation
≡
Simple_Obs t Comp_Obs
Simple_Obs
≡
=0 comp
Simple_Obs
v
=1 value u ≤ 1 unit u ≤ 1 protocol.Protocol u
Comp_Obs
≡
≥ 2 comp.Observation
RTS
≡
Comp_Obs u ∃comp.GCS u ∃comp.SysBP
GCS
≡
EyeR
v
Simple_Obs
VerbalR
v
Simple_Obs
MotorR
v
Simple_Obs
SysBP
v
Simple_Obs
RespRate
v
Simple_Obs
∀state.State u =1 site.AnatomicalSite
u∃comp.RespRate Comp_Obs u ∃comp.EyeR u ∃comp.VerbalR u∃comp.MotorR
Additional axioms may exist that associate classes of medical observations to SNOMED codes: RTS
≡
owl:hasValue snomed.{’273885003’}
(1)
EyeR
≡
owl:hasValue snomed.{’281395000’}
(2)
In addition to the EHROnt ontology, our framework also uses three auxiliary domain ontologies. As it was pointed out previously, there are three relevant properties that often characterize observations: the protocol, the anatomical site and the state of the patient. As a result, one Protocol ontology is necessary to represent this information in a controlled way. We advocate for using an ontology that comprises classes from the Device and Procedure categories of SNOMED-CT. Moreover, in order to represent anatomical information, the Foundational Model of Anatomy ontology [18] is suggested. Finally, one ontology has been developed for describing information about the state of the patient, such as the level of exertion (low, medium, high intensity) or the position of the patient (standing, sitting,...). It is up to the particular systems whether to use these same auxiliary ontologies or to choose other ones. In the latter case, mappings with the proposed auxiliary ontologies should be created. Finally, our ontology driven approach can present some similarities with the Knowledge Discovery Metamodel (KDM) notion used in the Architecture Driven Modernization (ADM) [1]. In our case, knowledge is obtained from existing data sources.
12
3. TRANSLATOR MODULE Each health institution has its own information system and in the majority of the cases it deals with a proprietary EHR representation. However the interoperability opportunities increase if an ontological representation of the proprietary representations is obtained, because the shared logic-based representation allows formal inference of implicit knowledge. For that reason we have developed a translator module that is in charge of building the Application layer of the EHROnt ontology for each proprietary information system. In many cases this module will receive as input a relational database schema but in other cases it may receive schemata for semi-structured data sources or plain files. The output of the translator module is a description mapping D = hS, O, Mi that consists of a source schema S, a set of OWL2 axioms O that comprises the Application layer corresponding to the source S, and a valid mapping M. The set of ontology axioms O is the semantic description of source S and the third component M is a set of correspondences of the form hC, CSi, hP, P Si, where C, P , respectively, are class and property names appearing in O, and CS, P S are sentences, expressed in an appropriate language for the source schema S, that define sets of ground values. We can consider a universal domain of interpretation ∆ and then an extension function ε that associates a set CS ε ⊆ ∆ to every CS sentence, and associates a correspondence P S ε ⊆ ∆ × ∆ to every P S sentence. The universal domain ∆ represents the real world objects of an actual extension of the considered source S. Given some basic correspondences of the form hC, CSi, hP, P Si (let us write M(C) = CS, M(P ) = P S), it is straightforward to define compositionally the correspondences for class expressions Cexp and property expressions P exp (let us write M(Cexp) and M(P exp)), following the same technique as interpretation definitions in description logics. Then, we say that a set of correspondences M satisfies a OWL2 axiom C v Cexp if M(C)ε ⊆ M(Cexp)ε . Analogously for P v P exp. Notice that any equivalence axiom (using ≡) can be expressed as a pair of subsumption axioms (using v and w). We say that M is a valid mapping if its correspondences satisfy the axioms in O for any possible extension of the source schema S. The translation process is divided into two main steps: a semi-automatic one and an edition one.
Semi-automatic process We present this step for the case of having a relational database schema as input. In fact this case is the most complete from the translation perspective. First of all, relations of the relational schema are translated into OWL2 classes, and attributes into properties that have as domain the class related to the relation in which it is defined and as a range the type of the attribute. Moreover integrity constraints are translated into descriptions associated with the properties. Once the previous task is accomplished, the next one involves enriching the obtained descriptions by using information about dependencies (inclusion, exclusion and functional dependencies), null values and semantic properties (that correspond to domain information for attribute values). This type of information is provided most of the time
by the health systems’ administrators, because it is rarely available in the database system. Health systems’ administrators are supposed to be technically prepared people who have a deep knowledge of the source information system. All the previous types of properties are applied in the following sequence: first inclusion properties; then when the input relational schema is not in second or third normal form, functional dependencies are used to create new classes; next exclusion dependencies are exploited and last integrity constraints and domain information for attribute values are considered. For example, a particular registration for Revised Trauma Score values may consist of two relational tables according to the following schema: RTS-Table(code, RR, SBP, GCS, total) GCS-Table(code, ER, MR, VR) RTS-Table.GCS ⊆ GCS-Table(code)
Then, some axioms obtained using the mentioned inclusion property, for the Application layer for that information system, are the following: sa:RTS
≡
∃sa:hasRR.sa:RR u ∃sa:hasSBP.sa:SBP
sa:GCS
≡
∃sa:hasER.sa:ER u ∃sa:hasMR.sa:MR
u∃sa:hasGCS.sa:GCS u ∃sa:hasTotal.float u∃sa:hasVR.sa:VR
Edition process The goal of this step is to permit the health system administrator create the Application layer of the ontology in a flexible way. The administrator can choose to start from scratch or from the ontological definitions obtained using the semiautomatic module. In any case the health system administrator can add new axioms to obtain the desired result. For example, the edition process can be used to assign SNOMED codes to the classes created by the semi-automatic process. sa:RTS
≡
owl:hasValue sa:hasSnomed.{’273885003’}
(3)
sa:ER
≡
owl:hasValue sa:hasSnomed.{’281395000’}
(4)
In summary the translator module obtains semantic descriptions of proprietary formats used to represent EHRs, and it has to capture –with the health system administrator’s collaboration– semantics that are hidden, in order to make them explicit.
4.
MAPPING MODULE
This module is in charge of managing the mappings between the terms of the Application layer and the terms of the Canonical layer. In our context, an integration mapping is a structure I = hO, G, Mi where O is a set of OWL2 axioms that comprises the Application Layer corresponding to a health care institution, G is the set of OWL2 axioms for the Canonical Layer, and M is a set of mapping axioms of the form C v Gexp, C w Gexp, C ≡ Gexp where C is a class name from O, and Gexp is a OWL2 class expression using only terms from G. Furthermore, M may include generalized property inclusion axioms as provided by OWL2, as well as pathMappings, that relate one path in the Application Layer with another path
13
in the Canonical layer. A path is a valid composition of properties. The Mapping module receives as input a set of basic mapping axioms, specifically defined by the system administrator, that relate classes or properties of both layers, such as: sa:hasSnomed
≡
snomed
[3]
(5)
[4]
These basic mapping axioms are incorporated into the ontology and, with the help of a reasoner, new relationships between terms in the Application layer and those in the Canonical layer are inferred. For instance, applying the basic mapping axiom 5 to axioms 3 and 4 infers: sa:RTS
≡
owl:hasValue snomed.{’273885003’}
(6)
sa:ER
≡
owl:hasValue snomed.{’281395000’}
(7)
[5] [6] [7]
and consequently, applying axioms 1 and 2 from the Canonical layer (see section 2) the equivalence mappings sa:RTS ≡ RTS and sa:ER ≡ EyeR are obtained. All those mappings are expressed through OWL2 axioms that put a wide range of mapping scenarios within reach of health systems’ administrators. Following with the process, the Mapping Module checks whether some path mappings may exist. It is captured from the definition of sa:RTS in the Application layer that there is a path sa:hasGCS·sa:hasER from class sa:RTS to class sa:ER. Moreover, it is captured from the Canonical layer that there is a path comp·comp from class RTS to class ER. Since the Mapping Module has already discovered an equivalence mapping between the source classes of both paths (sa:RTS ≡ RTS) and also another equivalence mapping between their target classes (sa:ER ≡ EyeR), the Mapping Module suggests that there may be a path mapping between those paths. The system administrator may then either accept or delete the suggested path mapping.
5.
[8]
[9] [10]
[11]
[12]
[13]
CONCLUSION
The use of Electronic Health Records has brought several advantages to the healthcare domain. However there is still much work to do regarding certain issues such as EHR interoperability. We have presented one approach that supports the notion of interoperability of medical observations sustained in two techniques: one, logic-based ontology descriptions of EHRs statements as well as of mappings defined among elements of the ontologies and two, automated inference on ontology descriptions.
6.
[14]
[15]
ACKNOWLEDGMENTS
This work is supported by the Spanish Ministry of Education and Science (TIN2007-68091-C02-01) and the Basque Government (IT-427-07). The work of Idoia Berges is also supported by the Basque Government (Programa de Formaci´ on de Investigadores del Departamento de Educaci´ on, Universidades e Investigaci´ on).
7.
[16] [17]
[18]
REFERENCES
[1] Architecture-Driven Modernization, 2010. Available at http://adm.omg.org. [2] I. Berges, J. Bermudez, A. Go˜ ni, and A. Illarramendi. Semantic Web Technology for Agent Communication Protocols. In Proceedings of the 5th European
[19]
14
Semantic Web Conference (ESWC 2008), pages 5–18, Tenerife, Spain, 2008. V. Bicer, O. Kilic, A. Dogac, and G. B. Laleci. Archetype-Based Semantic Interoperability of Web Service Messages in the Health Care Domain. Int’l Journal on Semantic Web & Information Systems, 1(4):1–22, 2005. L. Bird, A. Goodchild, and Z. Z. Tun. Experiences with a Two-Level Modelling Approach to Electronic Health Records. Journal of Research and Practice in Information Technology, 35(2):121–138, 2003. EN 13606-1: Electronic Health Record Communication, 2007. The Epsos project. http://www.epsos.eu/. J. Hefflin and J. Hendler. Semantic Interoperability on the Web. In Proceedings of Extreme Markup Languages 2000, pages 111–120. Graphic Communications Association, 2000. HL7 Version 3 Standard: Clinical Statement Pattern, Release 1. Available at http://www.hl7.org/v3ballot/html/domains/uvcs/uvcs.htm. HL7-CDA, 2009. Available at http://www.hl7.org. L. Hoffman. Implementing Electronic Medical Records. Communications of the ACM, 52(11):18–20, nov 2009. D. Kalra, P. Lewalle, A. Rector, J. M. Rodrigues, K. A. Stroetmann, G. Surjan, B. Ustun, M. Virtanen, and P. E. Zanstra. Semantic Interoperability for Better Health and Safer Healthcare. Technical report, European Commission, Jan. 2009. V. Kashyap and A. P. Sheth. Semantic and schematic similarities between database objects: A context based approach. The Very Large Databases Journal, 5(4):276–304, 1996. O. Kilic and A. Dogac. Achieving Clinical Statement Interoperability using R-MIM and Archetype-based Semantic Transformations. IEEE Transactions on Information Technology in Biomedicine, to appear., 2009. C. Mart´ınez-Costa, M. Men´ arguez-Tortosa, R. Valencia-Garc´ıa, J. Maldonado, and J. T. Fern´ andez-Breis. Transformaci´ on Autom´ atica de Arquetipos UNE-EN 13606 y openEHR para Facilitar la Interoperabilidad Sem´ antica. In Inforsalud 2009, Madrid, Spain, mar 2009. L. Obrst. Ontologies for Semantically Interoperable Systems. In Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, pages 366–369, New Orleans, Louisiana, USA, nov 2003. ACM. openEHR, 2009. Available at http://www.openehr.org. OWL2 Web Ontology Language. http://www.w3.org/TR/2009/REC-owl2-overview20091027/. C. Rosse and J. L. V. Mejino. A Reference Ontology for Biomedical Informatics: the Foundational Model of Anatomy. Journal of Biomedical Informatics, 36:478–500, 2003. SNOMED, 2009. Available at http://www.ihtsdo.org/snomed-ct/.
A Process Model Discovery Approach for Enabling Model Interoperability in Signal Engineering Wikan Danar Sunindyo, Thomas Moser, Dietmar Winkler, Stefan Biffl Christian Doppler Laboratory for Software Engineering Integration for Flexible Automation Systems Vienna University of Technology Favoritenstrasse 9-11/188 1040 Vienna, Austria +43 588 01 - 18801
{wikan,moser,winkler,biffl}@ifs.tuwien.ac.at ABSTRACT
1. INTRODUCTION
In automation systems engineering, signals are considered as common concepts for linking information across different engineering disciplines, such as mechanical, electrical, and software engineering. Signal engineering is facing tough challenges in managing the interoperability of heterogeneous data tools and models of each individual engineering discipline, e.g., to make signal handling consistent, to integrate signals from heterogeneous data models/tools, and to manage the versions of signal changes across engineering disciplines. Currently, signal changes across engineering disciplines are primarily managed manually which is costly and error-prone. The main contribution of this paper is the signal change management process model as an input for semantic integration of engineering tools and models to support (semi) automated signal change management. Major result was that the process model discovery approach well supports the discovery of semantic integration requirements across heterogeneous engineering tools and models more efficient compared to the manual signal change management.
Complex automation systems, like power plants or car manufacturing workshops, typically involve several different engineering disciplines, e.g., mechanical engineering, electrical engineering, and software engineering that should collaborate to achieve their goals. In such complex automation systems, stakeholders from different engineering fields usually apply individual and discipline-specific tools and models for task execution. Nevertheless, information sharing and collaboration across disciplines and data exchange are pre-conditions for successful project execution. Thus it exists the need for interoperability between different tools and models of such complex automation systems. Currently, a lot of research is done on achieving interoperability between heterogeneous systems and notations [6, 9, 13]. However, most of the approaches are still facing the difficulties involved in overcoming their differences, the lack of consensus on common required standards, and the shortage regarding proper mechanisms and tools [7, 11]. Results of our observation in industry identified signals are common concepts in complex automation systems that link information across different engineering disciplines, e.g., mechanical interfaces, electrical signals (wiring), and software I/O variables. The application field called “Signal engineering” deals with managing signals from different engineering disciplines and is facing some important challenges, e.g., (1) to make signal handling consistent, (2) to integrate signals from heterogeneous data models/tools, and (3) to manage versions of signal changes across engineering disciplines.
Categories and Subject Descriptors D.2.9 [Software Engineering]: Management – software configuration management, software process models (e.g., CMM, ISO, PSP). D.2.12 [Software Engineering]: Interoperability I.6.5 [Simulation and Modeling]: Model Development – Modeling methodologies.
To overcome these challenges, one needs to define an interoperability model that illustrates the signal data models and tools from each engineering field as well as their interactions. However, manual design of an interoperability model from different engineering fields is costly and error prone. In the manual model design, all models and required information have to be collected from the domain expert of each engineering fields. Then, the domain expert needs to create the model and its interactions based on the different models collected and cross check with each stakeholder whether the model is correct and the interactions between different engineering fields are correct also. One should do this work and refinement repetitively to obtain conflict-free models. Sometimes, it is quite hard to get a final model that fulfills the requirement from every party, since the requirements themselves could change over the time.
General Terms Management, Design.
Keywords Signal Change Management, Model Interoperability, Automation Systems Engineering.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10...$10.00.
The main contribution of this paper is the proposition of a process model discovery approach to identify the process model for a
15
manage signal change and enhance decision making characteristics of PLM. VISE is a highly integrated application of recent CAD/CAM, human computer, collaborative, product data management, Internet portal, and intelligent information processing techniques in PLM system. The authors introduce the concept of change affect zones (CAZ). CAZ comprises a set of engineering objects on which a change may have any effect. Objects in an affect zone may be both inside and outside of a virtual space. So, the new changes/modifications or conflicts will be handled in CAZ before they are executed.
exemplary signal change management process and find out the requirements of semantic integration between heterogeneous data models and tools. By using this approach we are able to discover the interoperability model based on the actual data. This model can be useful for illustrating the interactions between engineering fields and detecting the needs of semantic integration in the signal change management. Major results show that by using the process model discovery approach, the requirements of semantic integration across heterogeneous tools and data models from different engineering fields can be discovered efficiently. This model can support further semantic integration and interoperability of the models, e.g., by using the Engineering Knowledge Base (EKB) approach [4, 12].
2.2 Process Modeling and Analysis Process analysis approaches focus on analyzing (engineering) process data collected during the systems operation. Process analysis approaches have been applied to some types of complex systems, for example workflow management systems, Enterprise Resource Planning (ERP), and Customer Relationship Management (CRM) systems. Van der Aalst et al. [16] used workflow technologies to illustrate the structure of the operational processes of a system. Workflow technology provides event data that could be useful for process analysis in software engineering (SE) by enabling particular models that link basic tool events to process/workflow events [16].
The remainder of this paper is structured as follows. Section 2 summarizes related work on signal change management, semantic integration and process modeling and analysis. Section 3 identifies the research issues. Section 4 develops the solution approach to discover model for signal change management in complex automation systems. Section 5 describes the evaluation based on signal change management processes. Section 6 discusses benefits and limitation of model discovery approach; and finally section 7 concludes the paper.
Van der Aalst et al. [16] also used stored events, which refer to tasks and process cases originating from people/tools/systems, to monitor and analyze real workflows with respect to designed workflows. This approach is called process mining and can be used for process discovery, performance analysis, and conformance checking. The approach has been implemented in the open source tool ProM2 and can be used to discover the process model based on the available event log, analyze the performance of the processes, and suggest possible process improvement candidates.
2. RELATED WORK This section summarizes related work on signal change management, semantic integration technologies and process analysis approach as ways to build models for heterogeneous engineering areas.
2.1 Signal Change Management
According to the Meriam Webster dictionary1, a signal can be defined as an object used to transmit or convey information. In this paper we define a signal as a common concept for linking information between disciplines. Thus, signals are not limited to electrical signal (wiring) in electrical engineering, but also include mechanical interfaces in mechanical engineering and software I/O variables in software engineering. In complex automation systems, we define relationships between different kinds of signals from different engineering fields and use them to collaborate and communicate.
Ferreira and Ferreira [8] proposed a reusable workflow engine based on Petri Net theory as a basis for workflow management. They introduced the workflow kernel, a prototype implementation of common workflow functionality which can be abstracted and reused in systems or embedded in applications intended to become workflow-enabled. The workflow kernel is based on common workflow functionality from several workflow engines, while the Petri net theory can be used as a process representation language for process analysis.
Formerly, domain experts used manual change management approaches like in [1] to manage signal changes between different engineering fields. Manual changes use documents to manage changes between different engineering fields in the system. By using a primarily manual approach, the researchers collect the signal lists from each engineering field and then connect relationships between different engineering fields manually. If there is any signal change in one document, then the change has to be mapped to the relationship document and all relevant stakeholders have to find out which other signals in different engineering fields could be affected with this change. Manual change handling is costly and error prone. Thus, signal change handling automation is a promising research area to improve product and process quality.
Sunindyo et al. [15] proposed an approach to monitor, analyze, and improve tool-based engineering processes. Main idea is to generate an interoperability model based on event-based process analysis activities to link heterogeneous software engineering tools.
2.3 Semantic Integration Semantic integration is an approach to solve problems from an intention to share data across disparate and semantically heterogeneous data [9], which are including (a) the detection of duplicate entries, (b) the matching of ontologies or schemas, (c) the
Research on signal change management in product lifecycle management (PLM) context is done by e.g., Horvath and Rudas [11]. They propose a virtual intelligent space for engineering (VISE) to 1
2
http://www.merriam-webster.com
16
http://www.processmining.org
Figure 1. Challenges in Signal Engineering. with the new tools and data formats that make their work even more difficult.
modeling of complex relations in different data sources, and (d) the reconciliation of inconsistencies [13]. One of the most important and the most actively studied problems in semantic integration is how to establish semantic correspondences (mapping) between vocabularies of different data sources [6]. Hence, the application of ontologies as semantic web technologies to manage knowledge in specific domains is inevitable. There are five reasons to develop ontology, i.e., (a) to make domain assumptions more explicit, (b) to share common understanding of the structure of information among software agents or people, (c) to enable reuse of domain knowledge, (d) to analyze domain knowledge, and (e) to separate domain knowledge from the operational knowledge [14].
Other challenges in the signal change management include how to integrate the signal data originating from heterogeneous data models and tools. Figure 1 shows the requirements of mechanical engineers, electrical engineers and software engineers to share related signal data. The mechanical engineer uses different formats of data than the electrical engineer and the software engineer do. The challenge is how to integrate signals from heterogeneous data models/tools (1). By using a so-called “virtual common data model” [12], the different engineers can share their related data from electrical to mechanical signals and to the software variables. The “virtual common data model” becomes a foundation for mapping proprietary tool-specific engineering knowledge and more generic domain-specific engineering knowledge to support transformation between related engineering tools. It is “virtual” because actually there is no need to provide a separate repository to store the common data model. The management of the common data model with respect to different engineering fields is done via a specified mapping mechanism. The mechanism of the “virtual common data model” approach includes 5 steps: (a) Extraction of tool data from each engineering field; (b) Storage of extracted tool data into its own model; (c) Description of the tool knowledge for each engineering field’s tool: (d) Description of common domain knowledge: (e) Mapping of tool knowledge to the common domain knowledge. This work should be done carefully to obtain a complete list of signal mappings from the electrical to the mechanical and the software engineers. In real systems, stakeholders could also include people from other engineering fields.
Moser et al. [12] introduced the Engineering Knowledge Base (EKB) framework as a semantic web technology approach to address challenges from data heterogeneity which is applied in the production automation domain [12]. Biffl et al. [4] also used the EKB framework for solving similar problem in the context of Open Source Software Projects. This EKB framework is applicable to solve semantic heterogeneity problems in other automation engineering systems.
3. RESEARCH ISSUES Complex automation systems, like power plants, need to handle a high amount of data, e.g., up to 40,000 signals originating from different engineering fields. Stakeholders need to manage these signals to enable signal data consistency within the project. Thus, efficient and effective signal data management approaches are required to handle signal changes properly. In addition, individual engineers may not pay attention to signal data management but keep focused on their individual engineering work within their discipline, i.e., engineers from different fields don’t have to deal
This semantic integration challenge can be solved for example by applying semantic integration approaches like the Engineering Knowledge Base (EKB) framework [4, 12]. Other challenges are
17
to manage version of signal changes across engineering disciplines and to manage common concepts based on the semantic integration (2). The research question is how to discover the process model from the actual data provided by heterogeneous engineering fields? Based on this research question, we can discover the structure across heterogeneous data models/tools and their interactions and we can identify the need for the semantic integration to link heterogeneous data models and tools.
5. RESULTS For discovering the interoperability model for signal change management processes in the design time and runtime of complex automation systems, we collect process event data from each engineering field, e.g., electrical, engineering, and software engineering. By using the ProM tool, we conduct an analysis to discover the underlying process model by applying the Alpha Algorithm [5] to the collected data. The work of Alpha Algorithm is based on discovering transitions which are causally related between different event traces. From collected event log data as an input, we can discover a set of related transition from all event traces. For each tuple (A,B) in this set, each transition in set A causally relates to all transitions in set B, and no transitions within A (or B) follow each other in some firing sequence. We refine the set by taking only the largest elements with respect to set inclusion.
Linking heterogeneous disciplines can enable a so-called end-toend test (see Figure 2) to trace signals from hardware sensors to software variables across system borders. This approach support defect detection during development and changes.
Figure 2. Interaction between different engineering fields. Figure 2 shows the interaction between different engineering fields in managing the signal changes. Three different engineers, namely mechanical engineer, electrical engineer, and software engineer, typically share a lot of signals that are connected to each other. These relationships should be maintained in such an Engineering Knowledge Base, such that when some changes happen in one engineering field, they can be propagated to other engineering field automatically or semi-automatically. The evaluation is done by comparing the manual signal change management process and the automated/semi-automated signal change management process after applying the process model discovery approach to reveal semantic requirements in engineering processes across different engineering fields.
Figure 3. Manual Signal Change Management.
4. USE CASE
The output is a workflow net that connected each event trace to other related event traces via transitions [5]
To show how to manage interoperability between engineering tools in the complex automation systems, we use a signal change management use case from mechanical to electrical and software engineer. Figure 3 illustrates how to merge different signals (and changes) and resolve conflicts between signals coming from different disciplines manually. The conceptual steps are as follows:
Error! Reference source not found. shows the results of the model discovery analysis by using the Alpha Algorithm [5]. Here we have 4 different scenarios of the process model of the signal change management process. (1) no conflict: the mechanical engineer executes changes and performs a manual difference analysis to other engineering fields via interaction between mechanical engineering plan, electrical plan, and software development environment. The mechanical engineer manually propagates changes to
(1) The mechanical engineer executes changes in the mechanical plan that will also affect the tool data. (2) The mechanical engineer manually make difference analysis for interaction with other engineering tools, to check whether there is any conflict with other engineering tools data. (3) The mechanical engineer makes manual propagation to mechanical engineering tools and software engineering tools. (4) The mechanical engineer and the software engineer execute changes in their mechanical plan and software development environment.
18
Figure 4. Signal Change Management Processes Model.
ASB via connector components, which allows addressing all deployed components as services via the ASB. The ASB integrates components in both office-like design and onsite environments with a common integration architecture but different implementations [3]. In signal change management, the different tools to manage different signals from heterogeneous engineering fields are connected to the ASB via connector components. Each tool is treated as a component. The communication between components is also managed by the ASB, so when there is a signal changed in one tool it will be communicated via ASB and distributed to other tools automatically.
other tools. The electrical engineer and software engineer execute changes in their environments. (2) normal conflict: after manual difference analysis the mechanical engineer starts managing conflicts and resolves the conflicts, by modifying the old signal with the new signal. If the conflicts are resolved, the mechanical engineer transforms the change to other engineering fields. (3) critical conflict: almost the same as the normal conflict. The difference is in the action after managing the conflict is over. The mechanical engineer has to remove the signal and send a notification to the electrical engineer and software engineer. The electrical engineer and software engineer will consider this as a critical conflict and decide whether to accept the signal removal or reject it.
The EKB is a semantic-web-based framework, which supports the efficient integration of information originating from different expert domains without a complete common data schema [12]. The EKB framework stores the engineering knowledge in ontologies and provides semantic mapping services to access design-time and run-time concepts and data. The EKB framework aims at making tasks, which depend on linking information across expert domain boundaries, more efficient [12]. The EKB is connected to other tools via the ASB. In the signal change management, the EKB plays a role as semantic integration between different signal data from heterogeneous engineering fields. Each signal is stored in the ontology as a base of EKB together with its relationships to other signals. The changing of signal in the ontology means the modification of the signal entity in the ontology and its relationship.
(4) looping condition: if the electrical engineer and software engineer reject the signal removal, then there will be any option to argue on signal change on the electrical engineers side. Hence, the situation loops back to the condition before the change is transformed to other engineering fields. From Figure 4, we can suggest for signal change management process improvement, by collecting and integrating the heterogeneous signal data models and tools from different engineering fields using automation service bus (ASB) [3] and EKB [4, 12]. ASB technically integrates heterogeneous tools while the EKB semantically integrate the heterogeneous data models of electrical engineers, mechanical engineers, and software engineers.
The result of the signal change management improvement can be seen in Figure 5. It shows the usage of ASB and EKB to improve the signal change from mechanical engineer to electrical engineer and software engineer. (1) The mechanical engineer executes change in his mechanical plan. (2) The mechanical engineer checks in the change and makes difference analysis by using ASB and EKB. (3a & 3b) The electrical engineer and software engineer check out changes from ASB and EKB.
ASB is an approach similar to the “Enterprise Service Bus” in the business IT context [10] for complex automation systems engineering. The current “Enterprise Service Bus” approach is applied in the business IT context and the most of its implementations are making some design assumptions, e.g., service will always be online and resources (computing, network bandwidth, memory) are not the main issues of the design. These assumptions are not suitable with the requirements of the signal change management. Thus the ASB has to be designed more lightweight and be able to bridges technical gaps between engineering processes, models and tools for quality and process improvements [2]. Engineering components are connected to the
19
7. CONCLUSION AND FURTHER WORK Collaboration and interaction between different engineering fields are critical issues in heterogonous engineering environments because individual disciplines apply different tools and data models. This heterogeneity hinders efficient collaboration and interaction between various stakeholders, e.g., mechanical, electrical, and software engineers. Semantic integration based the purposed model enables data exchange based on common concepts, e.g., signals, and increase collaboration efficiency and effectiveness. In addition, process observation based on event data is a promising approach for (a) identifying the current (real) process workflow, (b) measurement data, and (c) is the foundation for process analysis and improvement. In this paper, we have explained the usage of a process model discovery approach to derive the model immediately from the actual engineering process data and identified improvement options for increasing process quality. We applied a signal change management process to illustrate (a) the basic concepts, (b) semantic integration approaches, and (c) process improvement based on collected and analyzed event data. Figure 5. Signal Change Management by using ASB & EKB.
We found that this approach is easier to be adapted in alreadyrunning systems which consist of different tools and data models for each engineering area. This approach can also be adapted and generalized to other model-driven interoperability systems.
6. DISCUSSION In this section, we discuss the benefits and limitations of the model discovery approach compare to the manual approach.
Future works will include the application of the model discovery approach to other problem domains and exploring the idea how to detect defects in signal change management and how to make decision on signal change management based on prior experience. We will develop a framework to prepare process model discovery for signal change management in different engineering fields, such that the process model discovery and other process analysis approach can be implemented more effective and more efficient.
The benefits of the model discovery approach are as follows. (1) The model, which is obtained from the model discovery approach, is more precise and accurate because it is generated from actual event data from different engineering processes. (2) The model is easier to maintain and change. If some modifications of the system happen, we can collect the new event log data and run the process mining tool to get the latest model. (3) This model can be used to learn and understand the whole signal change management processes in the system. It also support model-driven interoperability for other purposes, e.g., decision making and signal defect detection.
8. ACKNOWLEDGMENTS This work has been supported by the Christian Doppler Forschungsgesellschaft and the BMWFJ, Austria. This work has been partially funded by the Vienna University of Technology, in the Complex System Design and Engineering Lab.
The limitations of this approach are as follows. (1) We should provide complete event log data from each engineering processes for model discovery. (2) The ProM tool has limitations in managing the inputs format, so we should transform the processes event log data to ProM format (Mining XML).
9. REFERENCES [1]
Akerblom, R. A management system for quality development. Requirements, methods and traps. In Proceedings of the Telecommunications Energy Conference, 1997. INTELEC 97., 19th International (1923 Oct 1997, 1997). [2] Biffl, S. and Schatten, A. A Platform for Service-Oriented Integration of Software Engineering Environments. In Proceedings of the Eight Conference on New Trends in Software Methodologies, Tools and Techniques (SoMeT 09) (2009). IOS Press. [3] Biffl, S., Schatten, A. and Zoitl, A. Integration of heterogeneous engineering environments for the automation systems lifecycle. In Proceedings of the 7th
From this discussion, it is possible for other model driven interoperability systems to adapt the model discovery approach to get their process model immediately, rather than building from the scratch and improve later via several iterations. The alternative of the process model discovery approach is making interview sessions for each engineer from different engineering fields to acquire the requirements to make a model. This model should be consulted between engineers to obtain an integrated view on the model from different engineering perspectives that support interoperability between different engineering fields.
20
IEEE International Conference on Industrial Informatics (INDIN 2009) (23-26 June 2009, 2009). [4] Biffl, S., Sunindyo, W. D. and Moser, T. Semantic Integration of Heterogeneous Data Sources for Monitoring Frequent-Release Software Projects. In Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) (2010). IEEE Computer Society. [5] de Medeiros, A. K. A., van Dongen, B. F., van der Aalst, W. M. P. and Weijters, A. J. M. M. Process Mining: Extending the alpha-algorithm to Mine Short Loops. Eindhoven University of Technology, Eindhoven, 2004. [6] Doan, A., Noy, N. F. and Halevy, A. Y. Introduction to the special issue on semantic integration. SIGMOD Rec., 33, 4, 2004, 11-13. [7] Elvesæter, B., Hahn, A., Berre, A.-J. and Neple, T. Towards an Interoperability Framework for Model-Driven Development of Software Systems. 2006. [8] Ferreira, D. M. R. and Ferreira, J. J. P. Developing a reusable workflow engine. J. Syst. Archit., 50, 6, 2004, 309-324. [9] Halevy, A. Why Your Data Won't Mix. Queue, 3, 8, 2005, 50-58. [10] Hohpe, G. and Woolf, B. Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions. Addison-Wesley Professional, 2003. [11] Horvath, L. and Rudas, I. J. Information Content Orientated Product Model Assisted Change Management. In Proceedings of the 5th International Symposium on
[12]
[13] [14] [15]
[16]
21
Intelligent Systems and Informatics (SISY 2007) (Subotica, 24-25 Aug. 2007, 2007). Moser, T., Biffl, S., Sunindyo, W. D. and Winkler, D. Integrating Production Automation Expert Knowledge Across Engineering Stakeholder Domains. In Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) (Krakow, Poland, 2010). Andrzej Frycz Modrzewski Cracow College. Noy, N. F., Doan, A. H. and Halevy, A. Y. Semantic Integration. AI Magazine, 26, 1, 2005, 7-10. Noy, N. F. and McGuinness, D. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford Knowledge Systems Laboratory, 2001. Sunindyo, W. D., Moser, T., Winkler, D. and Biffl, S. Foundations for Event-Based Process Analysis in Heterogeneous Software Engineering Environments. In Proceedings of the 36th Euromicro Conference on Software Engineering Advanced Applications (SEAA 2010) (Lille, France, 1-3 September 2010, 2010). IEEE Computer Society. van der Aalst, W. M. P., Weijters, A. J. M. M. and Maruster., L. Workflow Mining: Discovering Process Models from Event Logs. IEEE Transactions on Knowledge and Data Engineering, 16, 9, 2004, 11281142.
Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars Frank Hermann
Hartmut Ehrig
Ulrike Golas
Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany
Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany
Department of Theoretical Computer Science and Software Technology Technische Universität Berlin Berlin, Germany
frank(at)cs.tu-berlin.de
ehrig(at)cs.tu-berlin.de Fernando Orejas
ugolas(at)cs.tu-berlin.de
Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Barcelona, Spain
orejas(at)lsi.upc.edu ABSTRACT
Keywords
Triple Graph Grammars are a well-established, formal and intuitive concept for the specification and analysis of bidirectional model transformations. In previous work we have formalized and analyzed already termination, correctness, completeness, local confluence and functional behaviour. In this paper, we show how to improve the efficiency of the execution and analysis of model transformations in practical applications by using triple rules with negative application conditions (NACs). In addition to specification NACs, which improve the specification of model transformations, the generation of filter NACs improves the efficiency of the execution and the analysis of functional behaviour supported by critical pair analysis of the tool AGG. We illustrate the results for the well-known model transformation from class diagrams to relational database models.
Model Transformation, Triple Graph Grammars, Functional Behaviour
1.
Categories and Subject Descriptors D.2.1 [Software Engineering]: Requirements/Specifications; D.2.12 [Software Engineering]: Interoperability; I.6.5 [Simulation and Modeling]: Model Development Modeling methodologies
General Terms Theory, Design, Verification
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI 2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00.
22
INTRODUCTION
Model transformations based on triple graph grammars (TGGs) have been introduced by Sch¨ urr in [19]. Operational rules are automatically derived from the triple rules and used to define various bidirectional model transformation and integration tasks that are mainly focused on model-to-model transformations Since 1994, several extensions of the original TGG definitions have been published [20, 17, 10], and various kinds of applications have been presented [22, 11, 16]. Besides model transformation TGGs are also applied for model integration [1] and model synchronization [8] in order to support model driven interoperability. For source-to-target model transformations, so-called forward transformations, forward rules are derived which take the source graph as input and produce a corresponding target graph. Similarly, backward rules are used for target-tosource transformations making the transformation approach bidirectional. Major properties expected to be fulfilled for model transformations are termination, correctness, completeness, efficient execution and — for several applications — functional behaviour. Termination, completeness and correctness of model transformations have been studied already in [6, 3, 7, 4]. Functional behaviour of model transformations based on triple graph grammars has been analyzed for triple rules without application conditions in [15] using forward translation rules that use additional translation attributes for keeping track of the elements that have been translated so far. The main aim of this paper is to extend the analysis techniques for functional behaviour in [15] to the case of triple rules with negative application conditions (NACs) and to improve the efficiency of analysis and execution of
TGS
model transformations studied in [3, 4, 7, 15]. For this purpose, we distinguish between specification NACs and filter NACs. Specification NACs have been introduced already in [7, 4], where triple rules and corresponding derived source and forward rules have been extended by NACs in order to improve the modeling power. Exemplarily, we show that NACs improve the specification of the model transformation CD2RDBM from class diagrams to relational data base models presented in [6, 3]. Therefore, we extend the forward translation rules introduced in [15] by corresponding NACs and show that model transformations based on forward translation rules with NACs are equivalent to model transformations studied in [7, 4], such that main results concerning termination, correctness and completeness can be transferred to our new framework (see Thm. 1). In order to analyze functional behaviour we can use general results for local confluence of transformation systems with NACs in [18]. But in order to improve efficiency in the context of model transformations we introduce so-called filter NACs. They filter out several misleading branches considered in the standard analysis of local confluence using critical pairs. In our second main result (see Thm. 2) we show how to analyze functional behaviour of model transformations based on forward translation rules by analyzing critical pairs for forward translation rules with filter NACs. Moreover, we introduce a strong version of functional behaviour, including model transformation sequences. In our third main result (see Thm. 3) we characterize strong functional behaviour by the absence of “significant” critical pairs for the corresponding set of forward translation rules with filter NACs. In Sec. 2 we introduce model transformations based on TGGs with specification NACs and show the first main result on termination, correctness, and completeness. In Sec. 3 we introduce forward translation rules with filter NACs and present our main results on functional and strong functional behaviour. Based on these main results we discuss in Sec. 4 efficiency aspects of analysis and execution. Related work and a conclusion are presented in Sections 5 and 6. The full proofs of the main results are given in [14].
2.
TGC
0..1 0..*
parent
TGT
Class 1
attrs
1 0..*
src
1 0..*
name: String 1
dest
references
Association
fcols
cols 0..*
Column type: String name: String
1
AC
1
0..1
0..1
Attribute
0..*
fkeys 0..1 pkey
FKey
AFK
name: String is_primary: boolean
1
0..*
0..*
name: String
0..*
Table
CT
name: String
type
1
PrimitiveDataType name: String
Figure 1: Triple type graph for CD2RDBM ple model transformation from class diagrams to database models. The source component TG S defines the structure of class diagrams while in the target component the structure of relational database models is specified. Classes correspond to tables, attributes to columns, and associations to foreign keys. Throughout the example, originating from [6], elements are arranged left, center, and right according to the component types source, correspondence and target. Morphisms starting at a correspondence part are specified by dashed arrows. The denoted multiplicity constraints are ensured by the triple rules in Figs. 3 and 5. Note that the case study uses attributed triple graphs based on E-graphs as presented in [6] in the framework of weak adhesive HLR categories. We refer to [2] for more details on attributed graphs. L tr
R
MODEL TRANSFORMATIONS BASED ON TRIPLE GRAPH GRAMMARS WITH NACS
(LS o trS
sL
(RS o
LC trC
sR
tL
RC
/ LT ) / RT )
trT tR
L m
tr
(P O)
G
t
/R
n
/H
Figure 2: Triple rule and triple transformation step Triple rules synchronously build up their source, target and correspondence graphs, i.e. they are non-deleting. A triple rule tr (left of Fig. 2) is an injective triple graph morphism tr = (trS , trC , trT ) : L → R and w.l.o.g. we assume tr to be an inclusion. Given a triple graph morphism m : L → G, a triple graph transformation (TGT) step tr,m G ===⇒ H (right of Fig. 2) from G to a triple graph H is given by a pushout of triple graphs with comatch n : R → H and transformation inclusion t : G ,→ H. A grammar TGG = (TG, S, TR) consists of a triple type graph TG, a triple start graph S = ∅ and a set TR of triple rules.
Triple graph grammars [19] are a well-known approach for bidirectional model transformations. Models are defined as pairs of source and target graphs, which are connected via a correspondence graph together with its embeddings into these graphs. In this section, we review main constructions and results of model transformations based on [20, 4, 15] and extend them to the case with NACs. sG tG A triple graph G =(GS ← −− GC − − → GT ) consists of three graphs GS , GC , and GT , called source, correspondence, and target graphs, together with two graph morphisms sG : GC → GS and tG : GC → GT . A triple graph morphism m = (mS , mC , mT ) : G → H between triple graphs G and H consists of three graph morphisms mS : GS → HS , mC : GC → HC and mT : GT → HT such that mS ◦ sG = sH ◦ mC and mT ◦ tG = tH ◦ mC . A typed triple graph G is typed over a triple graph TG by a triple graph morphism typeG : G → TG.
Example 2. Triple Rules: The triple rules in Fig. 3 are part of the rules of the grammar TGG for the model transformation CD2RDBM . They are presented in short notation, i.e. left and right hand side of a rule are depicted in one triple graph. Elements which are created by the rule are labeled with green ”++” and marked by green line colouring. The rule “Class2Table” synchronously creates a class with name “n” together with the corresponding table in the
Example 1. Triple Type Graph: Fig. 1 shows the type graph TG of the triple graph grammar TGG for our exam-
23
Class2Table(n:String) ++
:Class
:CT
name=n
Definition 1. Triple Rules with Negative Application Conditions: Given a triple rule tr = (L → R), a negative application condition (NAC) (n : L → N ) consists of a triple graph N and a triple graph morphism n. A NAC with n = (nS , idLC , idLT ) is called source NAC and a NAC with n = (idLS , idLC , nT ) is called target NAC. A match m : L → G is NAC consistent if there is no injective q : N → G such that q ◦ n = m for each NAC ∗ n L− N . A triple transformation G ⇒ H is NAC consistent → if all matches are NAC consistent.
++ :Table
++
name=n
Subclass2Table(n:String) S1:Class ++ :parent :Class ++ name=n
:Table
:CT
++ :CT
Attr2Column(n:String, t:String) S1:Class ++ :PrimitiveDataType :attrs ++ name=t ++ ++ :Attribute :type
name=n
C1: CT
Association2ForeignKey(an:String, cn:String)
T1:Table :cols ++
:Class :src ++ :Association ++
++
:Column name=n type=t
++ :AC
is_primary=false
:dest ++ :Class
/ ∅)
(RS o ∅ / ∅) source rule trS
(∅ o
∅
/ LT )
trT
(∅ o ∅ / RT ) target rule trT
(RS o
trS ◦sL
LC
id
trC
PrimaryAttr2Column(n:String, t:String) S1:Class
NAC1
:attrs
/ LT ) tR / RT ) tL
T1:Table
C1: CT
NAC2
:pKey
:attrs
:Attribute
:cols ++ ++
++
is_primary=true
trT
(RS o RC forward rule trF sR
:CT
++ ++ :Column :Table :fkeys ++ :cols type = t ++ ++ name = an+“_“+cn :FKey :fcols :Column :references ++ :pkey type = t :Table name = cn
++ :Attribute name=n
:Column
:Column ++ name=n type=t
++ :AC
is_primary=true
Figure 4: Derived operational rules of a TGG
:pKey
Triple Rule
∅
trS
++ :AFK
name = an
Figure 3: Rules for the model transformation CD2RDBM , Part 1 (LS o
:CT
:type ++ ++ :PrimitiveDataType
relational database. Accordingly, subclasses are connected to the tables of its super classes by rule “Subclass2Table”. Attributes with type “t” are created together with their corresponding columns in the database component via the rule “Attr2Column”.
name=t PrimaryAttr2ColumnFT(n:String, t:String) S1:Class tr=T NAC1
:attrs tr=T
:cols
:Attribute tr=T is_primary=true tr_is_primary=T
NAC2
:pKey :Column
:attrs tr=[F)T]
:Attribute tr=[F)T] name=n tr_name=[F)T] is_primary=true tr_is_primary=[F)T]
:type
:pKey ++ ++ :AC
++ :Column ++ name=n type=t
Forward Translation Rule
From each triple rule tr we derive a source rule tr S for the construction resp. parsing of a model of the source language and a forward rule trF for forward transformation sequences (see Fig. 4). By TR S and TR F we denote the sets of all source and forward rules derived from the set of triple rules TR. Analogously, we derive a target rule trT and a backward rule tr B for the construction and transformation of a model of the target language leading to the sets TR T and TR B . A set of triple rules TR and the start graph ∅ generate a visual language VL of integrated models, i.e. models with elements in the source, target and correspondence component. The source language V LS and target language VLT are derived by projection to the triple components, i.e. V LS = projS (V L) and V LT = projT (V L). The set V LS0 of models that can be generated resp. parsed by the set of all source rules TR S is possibly larger than ∗ VLS and we have VLS ⊆ VLS0 = {GS | ∅ ⇒ = (GS ← ∅ → ∅) via TR S }. Analogously, we have V LT ⊆ V LT 0 = ∗ {GT | ∅ ⇒ = (∅ ← ∅ → GT ) via TR T }. According to [7, 4] we present negative application conditions for triple rules. In most case studies of model transformations source-target NACs, i.e. either source or target NACs, are sufficient and we regard them as the standard case. They prohibit the existence of certain structures either in the source or in the target part only, while general NACs may prohibit both at once.
T1:Table
C1: CT
tr=[F)T]
:PrimitiveDataType tr=[F)T] name=t tr_name=[F)T]
Figure 5: Rules for the model transformation CD2RDBM , Part 2 Example 3. Triple Rules with NACs: Figure 5 shows the remaining two triple rules for the model transformation “CD2RDBM ” and additionally a derived forward translation rule as explained in Ex. 4. NACs are specified in short notation using the label “NAC” with a frame and red line colour
24
within the frame. A complete NAC is obtained by composing the left hand side of a rule with the red marked elements within the NAC-frame. The rule “Association2ForeignKey” creates an association between two classes and the corresponding foreign key and the NAC ensures that there is only one primary key at the destination table. The parameters “an” and “cn” are used to set the names of the association and column nodes. The rule “PrimaryAttr2Column” extends “Attr2Column” by creating additionally a link of type “pkey” for the column and by setting “is primary=true”. Furthermore, there is a source and a target NAC, which ensure that there is no primary attribute nor column currently present. The extension of forward rules to forward translation rules is based on additional attributes, called translation attributes, that control the translation process by keeping track of the elements which have been translated so far. While in this paper the translation attributes are inserted in the source models they can be kept separate as an external pointer structure in order to keep the source model unchanged as shown in Sec. 5 of [13].
Moreover, for each NAC n : L → N of tr we define a forward translation NAC nF T : LF T → NF T of trF T as inclusion with NF T = (LF T +L N ) ⊕ AttT NS \LS . Remark 1. Note that (LFT +L N ) is the union of LFT and N with shared L and for a target NAC n the forward translation NAC nF T does not contain any translation attributes because NS = LS . Example 4. Forward Translation Rule with NACs: Fig 5 shows in its lower part the forward translation rule with NACs “PrimaryAttr2ColumnFT ”. According to Def. 3 the source elements of the triple rule “PrimaryAttr2Column” are extended by translation attributes and changed by the rule from “F” to “T”, if the owning elements are created by the triple rule. Furthermore, the additional elements in the NAC are extended by translation attributes set to “T”. Thus, the source NACs concern only elements that have been translated so far. From the application point of view model transformation rules should be applied along matches that are injective on the structural part. But it would be too restrictive to require injectivity of the matches also on the data and variable nodes, because we must allow that two different variables are mapped to the same data value. For this reason we use the notion of “almost injective matches” [15], which requires that matches are injective except for the data value nodes. This way, attribute values can still be specified as terms within a rule and matched non-injectively to the same value. Next, we define model transformations based on forward translation rules based on complete forward translation sequences.
Definition 2. Graph with Translation Attributes: Given an attributed graph AG = (G, D) and a subgraph G0 ⊆ G we call AG 0 a graph with translation attributes over AG if it extends AG with one boolean-valued attribute tr x for each element x (node or edge) in G0 and one booleanvalued attribute tr x a for each attribute associated to such an element x in G0 . This means that we have a partition of the items (nodes, edges, or attributes) of G0 into I1 and F T F I2 s.t. AG 0 = AG ⊕ Att T I1 ⊕ Att I2 , where Att I1 and Att I2 denotes the translation attributes with value T for I1 and value F for I2 . Moreover, we define Attv (AG) := AG ⊕ AttvG for v ∈ {T, F}. In any case we require that there is at most one translation attribute tr x or tr x a for each item.
Definition 4. Completely Translated Graphs and Complete Sequences: A forward translation sequence tr ∗
G0 ==FT =⇒ Gn with almost injective matches is called complete if Gn is completely translated, i.e. all translation attributes of Gn are set to true (“ T”).
The new concept of forward translation rules as introduced in [15] extends the construction of forward rules by additional translation attributes in the source component. The translation attributes keep track of the elements that have been translated so far, which ensures that each element in the source graph is not translated twice. The rules are deleting on the translation attributes and thus, the triple transformations are extended from a single (total) pushout to the classical double pushout (DPO) approach [2]. We call these rules forward translation rules, because pure forward rules need to be controlled by additional control conditions, such as the source consistency condition in [6, 4].
Definition 5. Model Transformation Based on Forward Translation Rules: A model transformation setr ∗
quence (GS , G0 ==FT =⇒ Gn , GT ) based on forward translation rules with NACs consists of a source graph GS , a tartr ∗
Definition 3. Forward Translation Rules with NACs: Given a triple rule tr = (L → R), the forward translation l r rule of tr is given by tr F T = (LFT ← −FT − − KFT − −FT − → RFT ) tr F defined as follows using the forward rule (LF − −− → RF ) and tr S the source rule (LS − −− → RS ) of tr , where we assume w.l.o.g. that tr is an inclusion: •
LFT
=
F LF ⊕ Att T LS ⊕ Att RS \LS
•
KFT
=
LF ⊕ Att T LS
•
RFT
= =
T RF ⊕ Att T LS ⊕ Att RS \LS T RF ⊕ Att RS ,
get graph GT , and a complete TGT-sequence G0 ==FT =⇒ Gn with almost injective matches, G0 = (Att F (GS ) ← ∅ → ∅) and Gn = (Att T (GS ) ← GC → GT ). A model transformation MT : VLS0 V VLT 0 based on forward translation rules with NACs is defined by all model transformation sequences as above with GS ∈ VLS0 and GT ∈ VLT 0 . All these pairs (GS , GT ) define the model transformation relation MTR ⊆ VLS0 × VLT 0 . The model transformation is terminating if there are no infinite TGTsequences via forward translation rules and almost injective matches starting with G0 = (Att F (GS ) ← ∅ → ∅) for some source graph GS . Now, we are able to state our first main result concerning termination, correctness and completeness of model transformations. Theorem 1. Termination, Correctness and Completeness: Each model transformation MT : VLS0 V VLT 0 based on forward translation rules is
• lFT and rFT are the induced inclusions.
25
G0
• terminating, if each forward translation rule changes at least one translation attribute from “F” to “T”,
S1:Class
• correct, i.e. for each model transformation sequence tr ∗
(GS , G0 ==FT =⇒ Gn , GT ) there is G ∈ VL with G = (GS ← GC → GT ), and it is
:Table
S3:Class tr=F name=n tr_name=F
)
• complete, i.e. for each GS ∈ V LS there is G = (GS ← GC → GT ) ∈ VL with a model transformation setr ∗
quence (GS , G0 ==FT =⇒ Gn , GT ).
!
Proof Idea. The proof (see [14]) is based on a corresponding result in [15] for the case without NACs and a Fact showing the equivalence of (1) source and NAC-consistent TGT-sequences based on forward rules and (2) complete NAC-consistent TGT-sequences based on forward translation rules.
G
S1:Class
:CT
:Table
tr=T S2:parent tr=F S3:Class tr=T name=n tr_name=T
Applying a rule according to the DPO approach involves the check of the gluing condition in general. However, in the case of forward translation rules and almost injective matches we have that the gluing condition is always satisfied. This means that the condition does not have to be checked, which simplifies the analysis of functional behaviour in Sec. 3.
:CT
:Table name=n
Class2Table
Figure 6: Step G0 ========FT =⇒ G with misleading graph G
w.r.t. the model transformation relation is preserved. Filter NACs are based on the following notion of misleading graphs, which can be seen as model fragments that are responsible for the backtracking of a model transformation.
Fact 1. Gluing Condition for Forward Translation Rules: Let tr FT be a forward translation rule and mFT : LFT → G be an almost injective match, then the gluing condition is satisfied, i.e. there is the transformation step tr FT ,mFT G === ====⇒ H.
Definition 7. Translatable and Misleading Graphs: A triple graph with translation attributes G is translatable ∗ if there is a transformation G ⇒ H such that H is completely translated. A triple graph with translation attributes G is misleading, if every triple graph G0 with translation attributes and G0 ⊇ G is not translatable.
Proof Idea. Since only attribution edges are deleted there are no dangling points and almost injective matching ensures that there are no identification points (see [14] for full proof).
3.
:CT
tr=T S2:parent tr=F
ANALYSIS OF FUNCTIONAL BEHAVIOUR
Example 5. Misleading Graph: Consider the transformation step shown in Fig. 6. The resulting graph G is misleading according to Def. 7, because the edge S2 is labeled with a translation attribute set to “F”, but there is no rule which may change this attribute in any larger context at any later stage of the transformation. The only rule which changes the translation attribute of a “parent”-edge is “Subclass2TableFT ”, but it requires that the source node “S3” is labeled with a translation attribute set to “F”. However, forward translation rules do not modify translation attributes if they are set to “T” already and additionally do not change the structure of the source component.
Functional behaviour of a model transformation means that each model of the source language LS ⊆ VLS is transformed into a unique model of the target language. This section presents new techniques especially developed to show functional behaviour of correct and complete model transformations based on TGGs. Definition 6. Functional Behaviour of Model Transformations: A model transformation MT based on forward translation rules has functional behaviour if each execution of MT starting at a source model GS of the source language LS ⊆ VLS leads to a unique target model GT ∈ VLT . The execution of MT requires backtracking, if there are ter-
Definition 8. Filter NAC: A filter NAC n for a forward translation rule tr FT : LFT → RFT is given by a morphism tr FT ,n n : LFT → N , such that there is a TGT step N === ==⇒ M with M being misleading. The extension of tr FT by some set of filter NACs is called forward translation rule tr FN with filter NACs.
tr ∗
0 minating TGT-sequences (Att F (GS ) ← ∅ → ∅) ==FT =⇒ Gn 0 S T with Gn 6= Att (GS ).
The standard way to analyze functional behaviour is to check whether the underlying transformation system is confluent, i.e. all diverging derivation paths starting at the same model finally meet again. In the context of model transformations, confluence only needs to be ensured for transformation paths which lead to completely translated models. For this reason, we introduce so-called filter NACs that extend the model transformation rules in order to avoid misleading paths that cause backtracking. The overall behaviour
Example 6. Forward Translation Rule with Filter NACs: The rule in Fig. 7 extends the rule Class2Table FT by a filter NAC obtained from graph G0 of the transformaClass2Table tion step G0 ========FT =⇒ G in Fig. 6, where G is misleading according to Ex. 5. In Ex. 7 we extend the rule by a further similar filter NAC with “tr = T” for node “S2”.
26
NAC
:parent S2:Class
tr=F
transformation step for any given source model GS . The full proof is given in [14].
S1:Class tr=F name=n tr_name=F
A direct construction of filter NACs according to Def. 8 would be inefficient, because the size of the considered graphs to be checked is unbounded. For this reason we now present efficient techniques which support the generation of filter NACs and we can bound the size without losing generality. At first we present a static technique for a subset of filter NACs and thereafter, a dynamic generation technique leading to a much larger set of filter NACs. The first procedure in Fact 2 below is based on a sufficient criteria for checking the misleading property. Concerning our example this static generation leads to the filter NAC shown in Fig. 7 for the rule Class2TableFT for an incoming edge of type “parent”.
The following dynamic technique for deriving relevant filter NACs is based on the generation of critical pairs, which define conflicts of rule applications in a minimal context. By the completeness of critical pairs (Lemma 6.22 in [2]) we know that for each pair of two parallel dependent transformation steps there is a critical pair which can be embedded. For this reason, the generation of critical pairs can be used to derive filter NACs. A critical pair either directly specifies a filter NAC or a conflict that may lead to non-functional behaviour of the model transformation. For the dynamic generation of filter NACs we use the tool AGG [23] for the generation of critical pairs for a plain graph transformation system. For this purpose, we first perform the flattening construction for triple graph grammars presented in [3, 15] extended to NACs using the flattening construction for morphisms. A critical pair tr tr 2,FT P1 ⇐=1,FT === K ====⇒ P2 consists of a pair of parallel dependent transformation steps. If a critical pair contains a misleading graph P1 we can use the overlapping graph K as a filter NAC of the rule tr 1,FT . However, checking the misleading property needs human assistance, such that the generated critical pairs can be seen as filter NAC candidates. But we are currently working on a technique that uses a sufficient criteria to check the misleading property automatically and we are confident that this approach will provide a powerful generation technique.
Fact 2. Static Generation of Filter NACs: Given a triple graph grammar, then the following procedure applied to each triple rule tr ∈ TR generates filter NACs for the derived forward translation rules TR FT leading to forward translation rules TR FN with filter NACs:
Fact 3. Dynamic Generation of Filter NACs: Given a set of forward translation rules, then generate the tr 1,FT ,m1 tr 2,FT ,m2 set of critical pairs P1 ⇐== ===== K =======⇒ P2 . If P1 (or similarly P2 ) is misleading, we generate a new filter NAC m1 : L1,FT → K for tr 1,FT leading to tr 1,FN , such that
tr=T
RHS
LHS
S1:Class tr=F name=n tr_name=F
S1:Class
)
tr=T name=n tr_name=T
:CT
:Table
Figure 7: A forward translation rule with filter NAC: Class2TableFN
tr
1,FN K = ==== ⇒ P1 violates the filter NAC. Hence, the critical pair for tr 1,FT and tr 2,FT is no longer a critical pair for for tr 1,FN and tr 2,FT . But this construction may lead to new critical pairs for the forward translation rules with filter NACs. The procedure is repeated until no further filter NAC can be found or validated. This construction starting with TR FT always terminates, if the structural part of each graph of a rule is finite.
• Outgoing Edges: Check the following conditions – tr creates a node (x : Tx ) in the source component and the type graph allows outgoing edges of type “Te ” for nodes of type “Tx ”, but tr does not create an edge (e : Te ) with source node x. – Each rule in TR which creates an edge (e : Te ) also creates its source node. – Extend LFT to N by adding an outgoing edge (e : Te ) at x together with a target node. Add a translation attribute for e with value F. The inclusion n : LFT → N is a NAC-consistent match for tr . For each node x of tr fulfilling the above conditions, the filter NAC (n : LFT → N ) is generated for tr FT leading to tr FN .
Proof. The constructed NACs are filter NACs, because tr 1,FT ,m1 the transformation step K === ====⇒ P1 contains the misleading graph P1 . The procedure terminates, because the critical pairs are bounded by the amount of possible pairwise overlappings of the left hand sides of the rules. The amount of overlappings can be bounded by considering only constants and variables as possible attribute values. For our case study the dynamic generation terminates already after the second round, which is typical for practical applications, because the amount of already translated elements in the new critical pairs usually decreases. Furthermore, the amount of NACs can be reduced by combining similar NACs differing only on some translation attributes. The remaining critical pairs that do not specify filter NACs show effective conflicts between transformation rules and they can be provided to the developer of the model transformation to support the design phase. The filter NACs introduced in this paper on the one hand support the analysis of functional behaviour and on the
• Incoming Edges: Dual case, this time for an incoming edge (e : Te ). • TR FN is the extension of TR FT by all filter NACs constructed above. Proof Idea. Each generated NAC (n : LFT → N ) for a node x in tr with an outgoing (incoming) for an edge tr FT ,n e in N \ L defines a transformation step N === ==⇒ M , where edge e is still labeled with “F”, but x is labeled with “T”.By the structure of forward translation rules it follows that edge e cannot be labeled with “T” at any later model
27
other hand, they also improve the efficiency of the execution. By definition, the occurrence of a filter NAC at an intermediate model means that the application of the owning rule would lead to a model that cannot be translated completely, i.e. the execution of the model transformation would perform backtracking at a later step. This way, a filter NAC cuts off possible backtracking paths of the model transformation. As presented in Fact 2 some filter NACs can be generated automatically and using Fact 3 a larger set of them can be obtained based on the generation of critical pairs. Finally, by Thms. 2 and 3 we can completely avoid backtracking if TR FN has no significant critical pair or, alternatively, if all critical pairs are strictly confluent. As shown by Fact 4 below, filter NACs do not change the behaviour of model transformations. The only effect is that they filter out derivation paths, which would lead to misleading graphs, i.e. to backtracking for the computation of the model transformation sequence. This means that the filter NACs filter out backtracking paths. This equivalence is used on the one hand for the analysis of functional behaviour in Thms. 2 and 3 and furthermore, for improving the efficiency of the execution of model transformations as explained in Sec. 4.
If the set of generated critical pairs of a system of forward translation rules with filter NACs TR FN is empty, we can directly conclude from Thm. 2 that the corresponding system with forward translation rules TR FT has functional behaviour. From an efficiency point of view, model transformations should be based on a compact set of rules, because large rule sets usually involve more attempts of matching until finding a valid match. In the optimal case, the rule set ensures that each transformation sequence of the model transformation is itself unique up to switch equivalence. For this reason, we introduce the notion of strong functional behaviour. Definition 9. Strong Functional Behaviour of Model Transformations: A model transformation based on forward translation rules TR FN with filter NACs has strong functional behaviour if for each GS ∈ LS ⊆ VLS there is a GT ∈ VLT and a model transformation sequence tr ∗
(GS , G0 ==FN =⇒ Gn , GT ) and each two terminating TGTtr ∗
tr ∗
0
0 0 sequences G00 ==FN =⇒ Gm are switch=⇒ Gn and G0 ==FN equivalent up to isomorphism.
Remark 3. 1. The sequences are terminating means that no rule in TR FN is applicable any more, but it is not required that the sequences are complete, i.e. 0 that G0n and Gm are completely translated.
Fact 4. Equivalence of Transformations with Filter NACs: Given a triple graph grammar TGG = (TG, ∅, TR) and a triple graph G0 = (GS ← ∅ → ∅) typed over TG. Let G00 = (Att F (GS ) ← ∅ → ∅). Then, the following are equivalent for almost injective matches:
2. Strong functional behaviour implies functional be0 haviour, because G0n and Gm completely translated tr ∗
tr ∗
0
tr ∗ ,m∗
0 0 implies that G00 ==FN =⇒ Gn and G0 ==FN =⇒ Gm are terminating TGT-sequences.
tr ∗ ,m∗
3. Two sequences t1 : G0 ⇒∗ G1 and t2 : G0 ⇒∗ G2 are called switch-equivalent, written t1 ≈ t2, if G1 = G2 and t2 can be obtained from t1 by switching sequential independent steps according to the Local Church Rosser Theorem with NACs [18]. The sequences t1 and t2 are called switch-equivalent up to isomorphism if t1 : G0 ⇒∗ G1 has an isomorphic sequence t10 : G0 → G2 ∼ (using the same sequence of rules) with i : G1 − − → G2 , written t10 = i ◦ t1, such that t10 ≈ t2. This means especially that the rule sequence in t2 is a permutation of that in t1.
0 FT 1. ∃ a complete TGT-sequence G00 === ===FT =⇒ G via forward translation rules. 0 FN 2. ∃ a complete TGT-sequence G00 === ===FT =⇒ G via forward translation rules with filter NACs.
Proof Idea. Sequence 1 consists of the same derivation diagrams as Sequence 2. The additional filter NACs in sequence 2 prevent a transformation rule to create a misleading graph. Both sequences lead to completely translated models, such that we know that the matches in sequence 1 also fulfill the filter NACs of the rules in sequence 2. The full proof is given in [14]. Theorem 2. Functional Behaviour: Let MT be a model transformation based on forward translation rules TR FT and let TR FN extend TR FT with filter NACs such that TR FN is terminating and all critical pairs are strictly confluent. Then, MT has functional behaviour. Moreover, the model transformation MT 0 based on TR FN does not require backtracking and defines the same model transformation relation, i.e. MTR 0 = MTR.
The third main result of this paper shows that strong functional behaviour of model transformations based on forward translation rules with filter NACs can be completely characterized by the absence of “significant” critical pairs. Definition 10. Significant Critical Pair: A critical tr 1,FN tr 2,FN pair P1 ⇐ ==== = K = ==== ⇒ P2 for TR FN is called significant, if it can be embedded into a parallel dependent pair tr 1,FN 0 tr 2,FN 0 G01 ⇐ ==== = G = ==== ⇒ G2 such that there is GS ∈ VLS
Remark 2. TR FN is terminating, if TR FT is terminating and a sufficient condition is given in Thm. 1. Termination of TR FN with strict confluence of critical pairs implies unique normal forms by the Local Confluence Theorem in [18].
tr ∗
0 0 F and G00 ==FN =⇒ G with G0 = (Att (GS ) ← ∅ → ∅).
Proof Idea. The proof (see [14]) is based on a decomposition theorem of triple rule sequences into match-consistent TGT-sequences based on source and forward rules with NACs in [7]. The latter are equivalent to complete TGTsequences based on forward translation rules without NACs in [15] and with NACs in Fact 1 in [14]. Finally, by Fact 4 complete TGT-sequences via forward translation rules with and without filter NACs are equivalent.
G00
∗
tr 1,FN c -5 0 c +3 G0 c[c[c[c[c[c[c [[ G1 [ )1 0 G tr 2,FN
2
Theorem 3. Strong Functional Behaviour: A model transformation based on terminating forward translation rules TR FN with filter NACs has strong functional behaviour and does not require backtracking iff TR FN has no significant critical pair.
28
and efficiently by checking (RS \LS ) 6= ∅. In Thm. 1 we have given an explicit condition for the forward translation rules to be terminating. Functional Behaviour: The new concept of filter NACs introduced in this paper provides a powerful basis for reducing the analysis efforts w.r.t. functional behaviour. Once termination is shown as explained above, functional behaviour of model transformations based on forward translation rules TR FT can be checked by generating the critical pairs of the transformation system with AGG [23] and showing strict confluence. The static and dynamic generation of filter NACs (Facts 2 and 3) allows to eliminate critical pairs. In the best case, all critical pairs disappear showing the functional behaviour of the model transformation immediately. The new notion of strong functional behaviour of a system based on transformation rules TR FN with filter NACs is completely characterized by the absence of “significant” critical pairs, such that we can ensure for each source model that the transformation sequence is unique up to switch equivalence. Furthermore, the critical pairs generated by AGG can be used to find the conflicts between the rules which may cause non-functional behaviour of the model transformation. The modeler can decide whether to change the rules or to keep the non-functional behaviour.
Proof Idea. The proof (see [14]) is based on that of Thm. 2 and the fact that in the absence of critical pairs two terminating sequences with the same source can be shown to be switch-equivalent up to isomorphism using the Local Church-Rosser and Parallelism Thm. with NACs in [18].
GS
1:Class name=“Company“
6:src
:CT
7:fkeys
8:Association name = “employee“
11:dest
14:Class name=“Person“ 16:parent 22:attrs
18:Class name=“Customer“ 23:Attribute is_primary = true name=“cust_id“ 23:type
3:Table name=“Company“
:AFK
:CT
:CT
AC
10:FKey
4:cols
GT
5:Column type = “int“ name = “employee_cust_id“
12:fcols 13:references 17:Table name=“Person“ 20:cols 25:Column type = “int“ name = “cust_id“
21:pkey
27:PrimitiveDataType name = “int“
Figure 8: Triple graph instance Example 7. Functional Behaviour: We analyze functional behaviour of the model transformation CD2RDBM with triple rulesTR given in Figs. 3 and 5. First of all, CD2RDBM is terminating according to Thm. 1. For analyzing the local confluence we can use the tool AGG [23] for the generation of critical pairs. We use the extended rule Class2TableFN as shown in Fig. 7 and extend it by a further filter NAC obtained by the static generation acc. to Fact 2. AGG detects two critical pairs showing a conflict of the rule “PrimaryAttr2Column” with itself for an overlapping graph with two primary attributes. Both critical pairs lead to additional filter NACs by the dynamic generation of filter NACs in Fact 3 leading to a system of forward translation rules with filter NACs without any critical pair. Thus, we can apply Thm. 3 and show that the model transformation based on the forward translation rules with filter NACs TR FN has strong functional behaviour and does not require backtracking. Furthermore, by Thm. 2 we can conclude that the model transformation based on the forward translation rules TR FT without filter NACs has functional behaviour and does not require backtracking. As an example, Fig. 8 shows the resulting triple graph (translation attributes are omitted) of a model transformation starting with the class diagram GS .
4.
Model Size
Model Transformation Sequences of CD2RDBM without Filter NACs with Filter NACs Time 1)
[Elements2)] 11 25 53 109
[ms] 143.75 302.75 672.68 1,481.43
Success Rate Time 1) Overhead Success Rate [%] 42.86 16.84 3.94 0.17
[ms] 158.33 335.45 742.62 1,584.86
[%] 10.14 10.80 10.40 6.98
[%] 100.00 100.00 100.00 100.00
1) Average time of 100 successful model transformation sequences 2) Nodes and Edges
Table 1: Benchmark, Tool: AGG [23] Efficient Execution: Filter NACs do not only improve the analysis of functional behaviour of a TGG, but also the execution of the model transformation process by forbidding the application of misleading transformation steps that would lead to a dead-end eliminating the need of backtracking for these cases. Table 1 shows execution times using the transformation engine AGG [23]. The additional overhead caused by filter NACs is fairly small and lies in the area of 10% for the examples in the benchmark, which is based on the average execution times for 100 executions concerning models with 11, 25, 53 and 109 elements (nodes and edges), respectively. The first model with 11 elements is the presented class diagram in the source component of Fig. 8. We explicitly do not compare the execution times of the system with filter NACs with one particular system with backtracking, because these times can vary heavily depending on the used techniques for partial order reduction and the chosen examples. Instead we present the computed success rates for the system without NACs which show that backtracking will cause a substantial overhead in any case. Thus, the listed times concern successful execution paths only, i.e. those executions that lead to a completely translated model. The success rate for transformations without filter NACs decreases fast when considering larger models. Times for the
EFFICIENT ANALYSIS AND EXECUTION
Our approach to model transformations based on triple graph grammars (TGGs) with NACs will be discussed now with respect to the efficiency for both, analysis of properties and execution. Correctness and Completeness: As shown by Thm. 1 based on [7, 4] model transformations based on TGGs with NACs are correct and complete with respect to the language of integrated models VL generated by the triple rules. Thus, correctness and completeness are ensured by construction. Termination: As presented in [4] termination is essentially ensured, if all triple rules are creating on the source component. This property can be checked statically, automatically
29
unsuccessful executions, which appear in the system without filter NACs, are not considered. However, in order to ensure completeness there is the need for backtracking for the system without filter NACs. This backtracking overhead is in general exponential and in our case study misleading graphs appear already at the beginning of many transformation sequences implying that backtracking is costly. Backtracking is reduced by filter NACs and avoided completely in the case that no “significant critical pair” remains present (see Thm. 3), which we have shown to be fulfilled for our example. The additional overhead of about 10% for filter NACs is in most cases much smaller than the efforts for backtracking. Moreover, in order to perform model transformations using highly optimized transformation machines for plain graph transformation, such as Fujaba and GrGen.Net [21], we have presented how the transformation rules and models can be equivalently represented by plain graphs and rules. First of all, triple graphs and morphisms are flattened according to the construction presented in [3, 15], which can be extended to NACs using the flattening of morphisms. Furthermore, we presented in this paper how forward rules with NACs are extended to forward translation rules with NACs, such that the control condition “source consistency” [6] and also the gluing condition (Fact 1) are ensured automatically for complete sequences, i.e. they do not need to be checked during the transformation. Summing up, the presented results allow us to combine the easy, intuitive and formally well founded specification of model transformations based on triple graph grammars with NACs with the best available tools for executing graph transformations while still ensuring correctness and completeness.
5.
In the following we discuss how the presented results can be used to meet the “Grand Research Challenge of the TGG Community” formulated by Sch¨ urr et.al. in [20]. The main aims are “Consistency”, “Completeness”, “Expressiveness” and “Efficiency” of model transformations. The first two effectively require correctness, completeness w.r.t. the triple language VL and additionally termination and functional behaviour. They are ensured as shown in Sec. 3. While we considered functional behaviour w.r.t. unique target models, the more general notion in [20] regarding some semantical equivalence of target models will be part of further extensions of our techniques. “Expressiveness” requires suitable control mechanisms like NACs, which are used extensively in this paper and we further extend the technique by additional control mechanisms. In [9] more general application conditions [12] are considered, but functional behaviour is not yet analyzed. In general, the overall usage of complex control structures should be kept low, because they may cause complex computations. Finally, we discussed in Sec. 4 that our approach can be executed efficiently based on efficient graph transformation engines. Especially model transformations fulfilling the conditions in Thm. 3 do not need to backtrack, which bounds the number of transformation steps to the elements in the source model as required in [20].
6.
CONCLUSION
In this paper we have studied model transformations based on triple graph grammars (TGGs) with negative application conditions (NACs) in order to improve efficiency of analysis and execution compared with previous approaches in the literature. The first key idea is that model transformations can be constructed by applying forward translation rules with NACs, which can be derived automatically from the given TGG-rules with NACs. The first main result shows termination under weak assumptions, correctness and completeness of model transformations in this framework, which is equivalent to the approach in [7]. The second key idea is to introduce filter NACs in addition to the NACs in the given TGG-rules, which in contrast are called specification NACs in this paper. Filter NACs are useful to improve the analysis of functional behaviour for model transformations based on critical pair analysis (using the tool AGG [23]) by filtering out backtracking paths and this way, some critical pairs. The second main result provides a sufficient condition for functional behaviour based on the analysis of critical pairs for forward translation rules with filter NACs. If we are able to construct filter NACs such that the corresponding rules have no more “significant” critical pairs, then the third main result shows that we have strong functional behaviour, i.e. not only the results are unique up to isomorphism but also the corresponding model transformation sequences are switch-equivalent up to isomorphism. Surprisingly, we can show that the condition “no significant critical pairs” is not only sufficient, but also necessary for strong functional behaviour. Finally, we discuss efficiency aspects of analysis and execution of model transformations and show that our sample model transformation CD2RDBM based on TGG-rules with NACs has strong functional behaviour. The main challenge in applying our main results on functional and strong functional behaviour is to find suitable filter NACs, such that we have a minimal number of critical pairs for the forward translation rules with filter NACs. For this purpose, we provide static and dynamic techniques
RELATED WORK
Since 1994, several extensions of the original TGG definitions [19] have been published [20, 17, 10] and various kinds of applications have been presented [22, 11, 16]. The formal construction and analysis of model transformations based on TGGs has been started in [6] by analyzing information preservation of bidirectional model transformations and continued in [3, 5, 4, 7, 15], where model transformations based on TGGs are compared with those on plain graph grammars in [3], TGGs with specification NACs are analyzed in [7] and an efficient on-the-fly construction is introduced in [4]. A first approach analyzing functional behaviour was presented for restricted TGGs with distinguished kernels in [5] and a more general approach, however without NACs, based on forward translation rules in [15]. The results in this paper for model transformations based on forward translation rules with specification and filter NACs are based on the results of all these papers except of [5]. In [6] a similar case study based on forward rules is presented, but without using NACs. This causes that more TGT-sequences are possible, in particular, an association can be transformed into a foreign key with one primary key, even if there is a second primary attribute that will be transformed into a second primary key at a later stage. This behaviour is not desired from the application point of view. Thus, the grammar with NACs in this paper handles primary keys and foreign keys in a more appropriate way. Furthermore, the system has strong functional behaviour as shown in Sec. 3.
30
for the generation of filter NACs (see Facts 2 and 3). The dynamic technique includes a check that certain models are misleading. In any case, the designer of the model transformation can specify some filter NACs directly by himself, if he can ensure the filter NAC property. Furthermore, we can avoid backtracking completely by Thms. 2 and 3 if TR FN has no significant critical pair or, alternatively, if all critical pairs are strictly confluent. In future work, we will study further static conditions to check whether a model is “misleading”, because this allows to filter out misleading execution paths. In addition to that, we currently develop extensions to layered model transformations and amalgamated rules, which allow to further reduce backtracking in general cases and to simplify the underlying rule sets. Moreover, we study applications to model transformations that partially relate two DSLs, were some node types are irrelevant for the model transformation.
7.
[11]
[12]
[13]
[14]
REFERENCES
[1] Ehrig, H., Ehrig, K., Hermann, F.: From Model Transformation to Model Integration based on the Algebraic Approach to Triple Graph Grammars. In: Ermel, C., de Lara, J., Heckel, R. (eds.) Proc. GT-VMT’08. EC-EASST, vol. 10. EASST (2008) [2] Ehrig, H., Ehrig, K., Prange, U., Taentzer, G.: Fundamentals of Algebraic Graph Transformation. EATCS Monographs, Springer (2006) [3] Ehrig, H., Ermel, C., Hermann, F.: On the Relationship of Model Transformations Based on Triple and Plain Graph Grammars. In: Karsai, G., Taentzer, G. (eds.) Proc. GraMoT’08. ACM (2008) [4] Ehrig, H., Ermel, C., Hermann, F., Prange, U.: On-the-Fly Construction, Correctness and Completeness of Model Transformations based on Triple Graph Grammars. In: Sch¨ urr, A., Selic, B. (eds.) Proc. ACM/IEEE MODELS’09. LNCS, vol. 5795, pp. 241–255. Springer (2009) [5] Ehrig, H., Prange, U.: Formal Analysis of Model Transformations Based on Triple Graph Rules with Kernels. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) Proc. ICGT’08. LNCS, vol. 5214, pp. 178–193. Springer (2008) [6] Ehrig, H., Ehrig, K., Ermel, C., Hermann, F., Taentzer, G.: Information preserving bidirectional model transformations. In: Dwyer, M.B., Lopes, A. (eds.) Proc. FASE’07. LNCS, vol. 4422, pp. 72–86. Springer (2007) [7] Ehrig, H., Hermann, F., Sartorius, C.: Completeness and Correctness of Model Transformations based on Triple Graph Grammars with Negative Application Conditions. In: Heckel, R., Boronat, A. (eds.) Proc. GT-VMT’09. EC-EASST, vol. 18. EASST (2009) [8] Giese, H., Wagner, R.: From model transformation to incremental bidirectional model synchronization. Software and Systems Modeling 8(1), 21–43 (2009) [9] Golas, U., Ehrig, H., Hermann, F.: Enhancing the Expressiveness of Formal Specifications for Model Transformations by Triple Graph Grammars with Application Conditions. In: Proc. Int. Workshop on Graph Computation Models (GCM’10) (2010) [10] Guerra, E., de Lara, J.: Attributed typed triple graph transformation with inheritance in the double pushout
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
31
approach. Tech. Rep. UC3M-TR-CS-2006-00, Universidad Carlos III, Madrid, Spain (2006) Guerra, E., de Lara, J.: Model view management with triple graph grammars. In: Corradini, A., Ehrig, H., Montanari, U., Ribeiro, L., Rozenberg, G. (eds.) Proc. ICGT’06. LNCS, vol. 4178, pp. 351–366. Springer (2006) Habel, A., Pennemann, K.H.: Correctness of high-level transformation systems relative to nested conditions. Mathematical Structures in Computer Science 19, 1–52 (2009) Hermann, F., Ehrig, H., Golas, U., Orejas, F.: Formal Analysis of Functional Behaviour for Model Transformations Based on Triple Graph Grammars Extended Version. Tech. Rep. 2010-8, TU Berlin, Fak. IV (2010) Hermann, F., Ehrig, H., Golas, U., Orejas, F.: Efficient Analysis and Execution of Correct and Complete Model Transformations Based on Triple Graph Grammars - Extended Version. Tech. Rep. 2010-13, TU Berlin, Fak. IV (2010) Hermann, F., Ehrig, H., Orejas, F., Golas, U.: Formal Analysis of Functional Behaviour of Model Transformations Based on Triple Graph Grammars. In: Proc. Int. Conf. on Graph Transformation (ICGT’10). LNCS, vol. 6372, pp. 155–170. Springer (2010) Kindler, E., Wagner, R.: Triple graph grammars: Concepts, extensions, implementations, and application scenarios. Tech. Rep. TR-ri-07-284, Department of Computer Science, University of Paderborn, Germany (2007) K¨ onigs, A., Sch¨ urr, A.: Tool Integration with Triple Graph Grammars - A Survey. In: Proc. SegraVis School on Foundations of Visual Modelling Techniques. ENTCS, vol. 148, pp. 113–150. Elsevier Science (2006) Lambers, L.: Certifying Rule-Based Models using Graph Transformation. Ph.D. thesis, Technische Universit¨ at Berlin (November 2009) Sch¨ urr, A.: Specification of Graph Translators with Triple Graph Grammars. In: Tinhofer, G. (ed.) Proc. WG’94. LNCS, vol. 903, pp. 151–163. Springer (1994) Sch¨ urr, A., Klar, F.: 15 years of triple graph grammars. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) Proc. ICGT’08. pp. 411–425. LNCS, Springer (2008) Taentzer, G., Biermann, E., Bisztray, D., Bohnet, B., Boneva, I., Boronat, A., Geiger, L., Gei¨s, R., Horvath, A., Kniemeyer, O., Mens, T., Ness, B., Plump, D., Vajk, T.: Generation of Sierpinski Triangles: A Case Study for Graph Transformation Tools. In: Sch¨ urr, A., Nagl, M., Z¨ undorf, A. (eds.) Proc. AGTIVE’07. LNCS, vol. 5088, pp. 514 – 539. Springer (2008) Taentzer, G., Ehrig, K., Guerra, E., de Lara, J., Lengyel, L., Levendovsky, T., Prange, U., Varro, D., Varro-Gyapay, S.: Model Transformation by Graph Transformation: A Comparative Study. In: Proc. MoDELS 2005 Workshop MTiP’05 (2005) TFS-Group, TU Berlin: AGG (2009), http://tfs.cs.tu-berlin.de/agg
Towards an Expressivity Benchmark for Mappings based on a Systematic Classification of Heterogeneities∗ M. Wimmer
G. Kappel
TU Vienna
TU Vienna
[email protected] W. Retschitzegger JKU Linz
[email protected]
A. Kusel
[email protected] J. Schoenboeck TU Vienna
[email protected]
ABSTRACT
JKU Linz
[email protected] W. Schwinger JKU Linz
[email protected]
modeling tools is available supporting different tasks, such as model creation, model simulation, model checking, model transformation, and code generation. Seamless exchange of models among different modeling tools increasingly becomes a crucial prerequisite for effective MDE. Due to the lack of interoperability, however, it is often difficult to use tools in combination, thus the potential of MDE cannot be fully exploited. For achieving interoperability in terms of transparent model exchange, current best practices comprise creating model transformations between different tool metamodels (MMs) with the main drawback of having to deal with all the intricacies of a certain transformation language. In contrast to that, first mapping tools [6, 18] have been proposed, allowing to specify a transformation on a more abstract level by means of reusable components. Out of the resulting mapping definitions corresponding executable transformation code can be generated. In the definition of a mapping between MMs the resolution of heterogeneities represents the key challenge. Thereby heterogeneities result from the fact that semantically similar metamodeling concepts (M2) can be defined with different meta-metamodeling concepts (M3) leading to differently structured metamodels. As a simple example Fig. 1 shows two metamodels of fictitious1 domain-specific tools administrating publications. Whereas the MM of Tool1 models the type of a publication by the attribute Publication.kind (e.g., conference, workshop or journal), the MM of Tool2 represents the same semantic using the class Publication which refers to a class Kind to determine the kind of the publication.
A crucial prerequisite for the success of Model Driven Engineering (MDE) is the seamless exchange of models between different modeling tools demanding for mappings between tool-specific metamodels. Thereby the resolution of heterogeneities between these tool-specific metamodels is a ubiquitous problem representing the key challenge. Nevertheless, there is no comprehensive classification of potential heterogeneities available in the domain of MDE. This hinders the specification of a comprehensive benchmark explicating requirements wrt. expressivity of mapping tools, which provide reusable components for resolving these heterogeneities. Therefore, we propose a feature-based classification of heterogeneities, which accordingly adapts and extends existing classifications. This feature-based classification builds the basis for a mapping benchmark, thereby providing a comprehensive set of requirements concerning expressivity of dedicated mapping tools. In this paper a first set of benchmark examples is presented by means of metamodels and conforming models acting as an evaluation suite for mapping tools.
Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability
General Terms Measurement
Keywords Classification of Heterogeneities, Mapping Benchmark
1.
Publication
INTRODUCTION
name:String kind:Integer
With the rise of MDE models become the main artifacts of the software development process [3]. Hence, a multitude of
MM of Tool1
∗This work has been funded by the Austrian Science Fund (FWF) under grant P21374-N13.
Publication name:String
kind 1..1
Kind name:String
MM of Tool2
Figure 1: Two Heterogeneous Tool Metamodels In order to resolve such heterogeneities mapping tools provide certain reusable components. Nevertheless, it is still unclear, which kinds of reusable components are required to provide the necessary expressivity. Therefore this paper provides a systematic classification of heterogeneities occurring in the domain of MDE between object-oriented MMs,
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI2010, October 5, 2010, Oslo, Norway. Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00.
1
Due to reasons of comprehensibility examples comprising ontological concepts have been preferred over examples comprising linguistic concepts.
32
thereby adapting and extending existing classifications [2, 4, 10, 11, 12, 13, 15, 17]. Moreover, this classification is used to derive an evaluation suite building an expressivity benchmark for mapping tools. Thereby a first set of examples is presented in this paper. Additional heterogeneity examples can be downloaded from our homepage2 complementing the expressivity benchmark. The remainder of this paper is structured as follows. In Section 2 we present the design rationale behind our classification as well as the feature-based classification itself. In the Sections 3-5 we exemplary discuss heterogeneities, thereby presenting six examples of our expressivity benchmark. Related work is discussed in Section 6 and finally, Section 7 concludes the paper together with an outlook on future work.
2.
In this respect, Fig. 3 depicts the relevant extract of the Ecore meta-metamodel for mappings. When comparing two Ecore-based metamodels, different cases can be distinguished, namely (i) that in the left-hand side (LHS) MM and in the right-hand side (RHS) MM the same Ecore concept is used. Thereby differences wrt. the owned attribute settings can arise, e.g., if two EClasses are used, one can be set abstract whereas the other is not – leading to a concreteness difference. Moreover, (ii) in the LHS MM and in the RHS MM different Ecore concepts may be used, e.g., an EAttribute in the LHS MM and an EReference, an EClass and an EAttribute in the RHS MM (cf. example in Fig. 2). Finally, (iii) both cases mentioned get more complex, if the number of Ecore concepts for modeling a certain MM concept differs. A simple example in this respect is that in one MM two EAttributes firstName and lastName are used whereas in the other MM this information is contained in just one EAttribute name.
TOWARDS A SYSTEMATIC CLASSIFICATION OF HETEROGENEITIES
This section presents the design rationale behind the proposed classification of heterogeneities as well as the classification itself. Since the classification targets at the domain of MDE, it bases on object-oriented MMs in contrast to existing classifications from the domain of data engineering basing either on the relational or the XML data model. To clearly make explicit the interconnections between heterogeneities we build our classification on a feature model [5].
ENamedElement name : String
ETypedElement ordered : boolean lowerBound : int upperBound : int
Order Difference Multiplicity Difference Concreteness Difference
Naming Difference
Inheritance Type Difference
EClassifier …
Breadth Difference Depth Difference eSu uperTypes 0..**
EClass abstract : boolean
2.1
Deriving Heterogeneities from Ecore
Context Difference
Con ncrette Syyntaxx
name:String g
Instaance of Eccore ((Absttract Synttax)
eAttributeType
name = ‘Publication‘ abstract = false
name = ‘kin kind‘ d ordered = fa alse lowerBound d=1 upperBound d=1 containment = false
eStructuralFeatures
EAttribute
eStructuralFeatures
name = ‘kind‘ ‘ki d‘ lowerBound = 1 upperBound B d=1
eAttributeType
eAttributeType
EString
EReference e
name = ‘Publication‘ Publication abstract = false
eAttributeType
eReferenceType
EInteger
EAttribute
EClass name = ‘Kind‘ Kind abstract = false
eStructuralFeatures s
Containment C i Difference
EAtt ib t EAttribute
Figure 3: Variation Points in Ecore-based MMs
Kind
EClass
Datatype Difference
…
name:String g
name = ‘nam me‘ lowerBound d=1 upperBound d=1
EReference containment: boolean
Constraint Difference
EAttribute
EString
0..*
…
eStructuralFeatures
EClass EAttribute name = ‘name‘ lowerBound = 1 upperBound = 1
kind 1..1
Publication
name:String kind:Integer
eStructuralFeatures
eStructuralFeatures
EStructuralFeature
MM of Tool2 MM of 2
Publication
1..1 eAttributeType yp
1..1 eReferenceType R f T
Direection Difference
Heterogeneities result from the fact that semantically similar concepts can be defined with different metamodeling concepts (e.g., Ecore3 ) leading to differently structured tool metamodels. To exemplify this, Fig. 2 depicts the MMs of Fig. 1 as Ecore instances. Thereby, several heterogeneities arise, e.g., the MM of Tool1 represents the publication kind by an EAttribute whereas the MM of Tool2 utilizes an EReference, an EClass and an EAttribute to represent the semantically equivalent information. MM of Tool1 MM of
EDataType …
name = ‘name‘ lowerBound = 1 upperBound = 1
Figure 2: Tool Metamodels as Instances of Ecore To gain a systematic classification of different kinds of syntactic heterogeneities, we investigated potential variation points between two Ecore-based metamodels (cf. Fig. 3). Ecore has been used since it is the prevalent meta-metamodel in MDE and since it comprises the core concepts of semantic data models [9], being classes, attributes, references and inheritance. Therefore, the proposed classification can also be applied to other data models comprising these common core concepts, e.g., OWL4 . 2
www.modeltransformation.net http://www.eclipse.org/modeling/emf/ 4 http://www.w3.org/TR/owl-features/ 3
33
Besides syntactic heterogeneities, comprising all heterogeneities that can be derived from the syntactic definition in Ecore, also semantic heterogeneities may arise [15]. They occur when the valid instance set differs – either (i) in the number of valid instances or (ii) in the interpretation of the instance values. An example for the first case is that one MM comprises an EClass Publication whereas the other MM comprises an EClass JournalPublication, allowing only for journal instances - thus being a subset of the valid instances of the EClass Publication. An example for the second case is that one MM comprises an EAttribute amount encoding pricing information in Dollar, whereas the other MM also exhibits an EAttribute amount but encoding the pricing information in Euro. Thus, semantic heterogeneities can not be derived from the syntax (since in both cases the MMs can be represented syntactically equal) but only by incorporating interpretation, i.e., an assignment of a meaning to each piece of data [8].
2.2
Classification of Heterogeneities
Based on this design rationale, we introduce a classification of heterogeneities. It is expressed using the feature model formalism [5], which allows to clearly point out the interconnections between the different kinds of heterogeneities (e.g., xor features modeling mutual exclusive features versus or features allowing to pick several features at once). Thereby heterogeneities are divided into the two main classes of (i) semantic heterogeneities, i.e., heterogeneities wrt. what
Heterogeneity g y
Required Feature
XOR Features
Optional Feature
OR Features
Legend Semantic Heterogeneity
Number of Instances Difference
Syntactic Heterogeneity
Interpretation of Instance Values Difference
Naming Difference
Structural Difference
I h it Inheritance Diff Difference
C Core C Concept Diff Difference Intersection
Subset
Superset
Disjoint
Source-Target-Concept Cardinality
1:1
Context Difference
n:1
1:n
m:n
Same Metamodeling Concept
C2C
A2A
R2R
Different Metamodeling Concept
C2A
A2C
R2A
2R
Reference Source
Context Difference
O de Order Difference
R2C
Same Metamodeling Concept
I2I
Re eference Target
I2C
I2A
I2R
C2I
A2I
R2I
Concreteness Difference Breadth Difference
O d Order Difference
Multiplicity Difference
Different Metamodeling Concept
Multiplicty Difference
Datatype Difference iff
C
A
R
C
A
Depth Difference Diff
R Inheritance Type Difference
Direction Difference iff Containment Difference
Constraint Difference
Constraint Differen nce
Figure 4: Heterogeneity Feature Model is represented by a MM and (ii) syntactic heterogeneities, i.e., heterogeneities wrt. how it is represented (cf. Fig. 4) whereby these two classes might occur jointly as modeled by the or relationship in between. Semantic Heterogeneities. Concerning semantic heterogeneities – as mentioned above – two main cases can be distinguished namely (i) differences in the number of valid instances and (ii) differences in the interpretation of the instance values. With respect to the first case all the settheoretic relationships might occur as modeled by the corresponding sub-features. Regarding the second case diverse modifications of the values might be necessary to translate the values of one MM to correct values of the other MM such that it conforms to the interpretation of the other MM. Syntactic Heterogeneities. With respect to syntactic heterogeneities we distinguish between simple naming differences (i.e., a difference in the value of the name attribute of ENamedElement – cf. Fig. 3) and more challenging structural differences. Although names play an important role when deriving the semantics of a certain concept, names do not allow to automatically conclude on the semantics. Thereby, the two cases (i) same semantic and different naming, i.e., synonyms and (ii) different semantic and same naming, i.e., homonyms can be distinguished. With respect to structural differences again two main cases can be distinguished – namely core concept differences and inheritance differences. Thereby, core concept differences are differences that occur due to the different usage of classes, attributes and references between two MMs. In addition, these two main categories can be further distinguished into same metamodeling concept heterogeneities and different metamodeling concept heterogeneities, differentiating whether the same Ecore concepts have been used in the LHS MM and in the RHS MM or not. In the context of core concept differences additionally a different number of concepts may have been used in the two MMs leading to different sourcetarget-concept cardinalities. In the following sections a first set of benchmark examples is given divided into three main
34
packages, comprising (i) core concept heterogeneities with same metamodeling concept heterogeneities, (ii) core concept heterogeneities with different metamodeling concept heterogeneities and (iii) inheritance heterogeneities. Due to space limitations only a subset of all potential heterogeneities is explained in detail by means of concrete metamodels and according model instances but nevertheless examples from each main category are given. In this respect, the benchmark examples are described uniformly comprising (i) a short description, (ii) the main challenges, (iii) the example description, and (iv) a discussion of resolution strategies. Complementary benchmark examples are presented on our collaborative homepage which invites the community to participate in adding and discussing benchmark examples.
3.
CORE CONCEPT HETEROGENEITIES – SAME CONCEPTS
Same metamodeling concept heterogeneities are heterogeneities, that occur although the same modeling concept has been used in the LHS MM as well as in the RHS MM as mentioned above. In this respect, two main differences might emerge – either the concepts exhibit different attribute settings (cf. Fig 3) or a different number of concepts has been used in the MMs to express the same semantic concept (cf. Source-Target-Concept Cardinality in Fig. 4). In the following two examples of this category are given.
3.1
Benchmark Example 1
This first example (cf. Fig. 5) only exhibits differences wrt. different attribute settings (cf. optional features of A(ttribute)2A(ttribute) and R(eference)2R(eference) in Figure 4) as well as semantic heterogeneities. The main challenges in this example can be summarized as follows: 1. EAttribute Professor.dateOfBirth – EAttribute Prof.bornIn: A2A, Multiplicity Difference, Datatype Difference
Con ncrette Syyntaxx
Professor name:String dateOfBirth:Date [0..1] salary:Integer
Currency = D ll Dollar
publications 0..*
EClass
Prof Publication
eStructuralFeatures
name = ‘dateOfBirth‘ lowerBound = 0 upperBound = 1
Absttract Synttax
eStructuralFeatures
name = ‘name‘ lowerBound = 1 upperBound = 1
Challenge 1 Challenge 1
EString
eAttributeType
EDataType name = ‘Date‘ ‘D t ‘
eStructuralFeatures St t lF t
C Challenge 2
eStructuralFeatures
Challenge 3
eStructuralFeatures
Ch ll Challenge 4 4
eReferenceType
EClass
name = ‘bornIn‘ lowerBound = 1 upperBound = 1
eAttributeType
name = ‘salary‘ ‘ l ‘ lowerBound = 1 upperBound B d=1
eAttributeType
EI t EInteger
EReference f
1:1, R2R, Naming Differen nce, Multiplicity Difference
p name = ‘publications‘ ordered = false lowerBound = 0 upperBound = -1 1 containment = false
EString
EAttribute
1:1, A2A, Semanttic Heterogeneity eAttributeType
EI t EInteger
eAttributeType
EAttribute tt bute
1:1 A2A Naming Difference, Multipl 1:1, A2A, Naming Difference Multiplicity Difference Difference, Datatype Datatype Difference
EReference f eStructuralFeatures
EAttribute
eAttributeType
EAttribute name = ‘salary‘ ‘ l ‘ lowerBound = 1 upperBound B d=1
eStructuralFeatures
1:1, A2A (no heterogeneity)
EAttribute tt bute eStructuralFeatures
name = ‘Prof‘ abstract = false
EAttribute name = ‘name‘ lowerBound = 1 upperBound = 1
Journal name:String
EClass
1:1 C2C Naming Difference 1:1, C2C, Naming
name = ‘Professor‘ abstract b t t = false f l
journals 1..*
name:String bornIn:Integer [1 [1..1] 1] salary:Integer
Currency = Euro
name:String type:String
j name = ‘journals‘ ordered = false lowerBound = 1 upperBound = -1 containment = false
eReferenceType
EClass
1:1, C2C, Naming Difference, Semantic Heterogeneity
name = ‘Publication‘ abstract = false
name = ‘Journal‘ abstract = false eStructuralFeatures
EAtt ib t EAttribute eStructuralFeatures
name = ‘name‘ lowerBound = 1 upperBound = 1 EAttribute
Exxamp ple In nstan nces
eStructuralFeatures
EAtt ib t EAttribute
1:1, A2A (no heeterogeneitiy)
name = ‘name‘ lowerBound = 1 upperBound = 1
eAttributeType
eAttributeType
1:0, Information Loss
name = ‘type‘ lowerBound = 1 upperBound = 1
P1:Professor
P1:Prof
P2:Prof
name = ‘Prof1‘ dateOfBirth = 12.04.1956 12 04 1956 salary = 5000
name = ‘Prof1‘ bornIn = 1956 salary = 3970
name = ‘Prof2‘ b I 2000 bornIn= salary = 2382
publications
publications
P10:Publication
P11:Publication
name = ‘Paper1‘ Paper1 type = ‘Conference‘
name = ‘Paper2‘ Paper2 type = ‘Journal‘
eAttributeType
P2:Professor name = ‘Prof2‘ ‘P f2‘ dateOfBirth = ‘‘ salary l = 3000
journals
journals
P11:Journal
P0:Journal
name = ‘Paper2‘
name = ‘TODO‘
Autogenerated or userinteraction necessary
Autogenerated or userinteraction necessary
Figure 5: Benchmark Example 1 – Same Metamodeling Concept Heterogeneities 2. EAttribute Professor.salary – EAttribute Prof.salary: Semantic Heterogeneity (Interpretation of Instance Values Difference), A2A 3. EReference Professor.publications – EReference Prof.journals: R2R, Multiplicity Difference 4. EClass Publication – EClass Journal: Semantic Heterogeneity (Number of Instances Difference), C2C Example Description. This first benchmark example (cf. Fig. 5) exhibits four main challenges. With respect to the first challenge, a multiplicity difference as well as a datatype difference between the EAttributes Professor.dateOfBirth and Prof.bornIn arise. Concerning the second challenge a semantic heterogeneity between the EAttributes Professor.salary and Prof.salary emerges since Professor.salary is encoded in Dollars whereas Prof.salary is encoded in Euros, i.e., difference in the interpretation of the values. Regarding the third challenge a multiplicity difference between the EReferences Professor.publications and Prof.journals exists. Finally, the fourth challenge incorporates again a semantic heterogeneity – but this time a difference in the number of valid instances. For resolving the differences of the first three challenges corresponding functions are required which either are able to generate values or to transform values. In contrast to that, for resolving
35
the heterogeneity of the fourth challenge a corresponding condition is needed, that filters those instances, that are still valid in the context of the RHS EClass. Discussion of Resolution Strategies. When taking a look at the example instances, one can see that a resolution strategy has been chosen to minimize information loss and to achieve valid instances only. This is since instance P2 has been kept in the RHS although it does not reference any journal publication in the LHS model. Another potential resolution strategy would be to keep only those Professor instances that actually exhibit a journal publication. If this is the case, also a semantic heterogeneity between the EClasses Professor and Prof would exist, since the valid instance set would be potentially different. Another interesting point in this example is that the RHS MM is more restrictive than the LHS MM since the EAttribute Prof.bornIn always requires a value and since each instance of Prof requires at least one link to a journal publication. Since these restrictions do not exist in the LHS MM, instances of the LHS MM may not fulfill them. Therefore some resolution strategy is needed – either by auto-generating values or by incorporating user-interaction in order to produce valid instances of the RHS MM.
3.2
Benchmark Example 2
In contrast to the first example which restricts itself to source-target-concept cardinalities of 1:1, this example (cf.
Concrete Syyntaxx
N 1 I t Concept N:1 Intra C pt kind 1..1
Publication title:String subtitle:String eStructuralFeatures
EClass
Publication
Ki d Kind
EAttribute
Challenge 1 Challenge 1
name = ‘title‘ lowerBound = 1 upperBound = 1
name = ‘Publication‘ abstract = false
name:String ki d St i kind:String
name:String g
eAttributeType
EAttribute
n:1, A2A, Naming eStructuralFeatures name = ‘name‘ Difference lowerBound = 1
eStructuralFeatures
eAttributeType
upperBound pp =1
EString
EAttribute
Ab bstracct Syntax
name = ‘subtitle‘ lowerBound = 1 upperBound = 1
eAttributeType yp
Challenge 2 Challenge 2 EClass
EReference eStructuralFeatures
EString
name = ‘Publication‘ abstract = false
n:1, C2 2C
name = ‘kind‘ ordered = false lowerBound = 1 upperBound = 1 containment = false
eReferenceType y
EClass name = ‘Kind‘ abstract b t t = false f l
eStructuralFeatures
EAttribute
EAttribute name = ‘name‘ lowerBound = 1 upperBound B d=1
Exxamp ple Instances
Chaallenge 3
eStructuralFeatures
1:1, A2A, Naming Difference, Context D Difference
name = ‘kind‘ lowerBound = 1 upperBound pp =1
eAttributeType
P1: Publication
P2: Publication
P3: Publication
PK1:Publication
title = ‘P1‘ subtitle = ‘S1‘
title = ‘P2‘ subtitle = ‘S2‘
title = ‘P3‘ subtitle = ‘S3‘
name = ‘P1 – S1‘ kind = ‘Journal‘
kind
kind
kind
K1: Kind
K2: Kind
name = ‘Journal‘ ‘J l‘
name = ‘Conference‘ ‘C f ‘
PK2:Publication
eAttributeType Att ib t T
PK3:Publication name = ‘P3 – S3‘ kind = ‘Conference‘
name = ‘P2 – S2‘ kind = ‘Journal‘
Figure 6: Benchmark Example 2 – Same Metamodeling Concept Heterogeneities Fig. 6) additionally contains differences wrt. the number of concepts (cf. Source-Target-Concept Cardinality in Fig. 4). The main challenges in this example can be summarized as follows:
strategies could be followed, whereby in this case a simple concatenation has been chosen. Other strategies comprise another concatenation order. In case of other datatypes (e.g., numbers) arbitrary calculations could be incorporated.
1. EAttribute Publication.title, EAttribute Publication.subtitle – EAttribute Publication.name: Source-Target-Concept Cardinality: n:1, A2A
4.
CORE CONCEPT HETEROGENEITIES – DIFFERENT CONCEPTS
Different metamodeling concept heterogeneities result from expressing the same semantic concept with different modeling concepts in the LHS MM and in the RHS MM. In our classification, potential heterogeneities were derived by systematically combining the identified core concepts of semantic data models. To exemplify these heterogeneities two benchmark examples are discussed in the following.
2. EClass Publication, EClass Kind – EClass Publication: Source-Target-Concept Cardinality: n:1, C2C 3. EAttribute Kind.name – EAttribute Publication.kind: A2A, Context Difference
4.1
Example Description. This benchmark example (cf. Fig. 6) possesses three challenges. Concerning the first challenge, there is a n:1 source-target-concept cardinality between the EAttributes title, subtitle and name. In order to resolve this heterogeneity, merging functionality is needed, which is basically a concatenation function in this case. Concerning the second challenge, again a n:1 sourcetarget-concept cardinality exists, but this time between the EClasses Publication, Kind and Publication. Therefore, again merging functionality is needed, allowing to merge objects under a certain condition. Finally, the third challenge consists in a context difference between the EAttributes Kind.name and Publication.kind. For its resolution the assignment of values across object boundaries is needed. Discussion of Resolution Strategies. When taking a look at the example instances in Fig. 6, one can see, that for each combination of a Publication object and the referenced Kind object a Publication object should be generated. Concerning the merge of the attributes different
36
Benchmark Example 3
The third example (cf. Fig. 7) deals with the fact that a concept is modeled in the LHS MM by means of an EAttribute whereas the RHS MM models this concept explicitly by means of an EClass. Thus, the main challenges in this example can be summarized as follows: 1. EAttribute Publication.kind – EClass Kind: A2C 2. EClass Publication, EAttribute Publication.kind – EReference Publication.kind: CA2R Example Description. The first challenge is that the kind of the publication is represented by means of the EAttribute Publication.kind in the LHS MM whereas the RHS MM makes the type explicit by means of the EClass Kind, which is therefore classified as A(ttribute)2C(lass) in Fig. 7. In order to link publications with the publication kind, the RHS MM provides the EReference Publication.kind for which there is no according counterpart in
Con ncrete Syyntax
Inter Concep Inter‐Concep pt A2C CA2R pt, A2C, CA2R Publication
eStructuralFeatures
kind 1..1
Publication
name:String kind:String
title:String
Kind
unique
name:String
eStructuralFeatures
EAttribute
EClass
name = ‘name‘ lowerBound = 1 upperBound = 1
name = ‘Publication‘ abstract = false
EAttribute name = ‘title‘ lowerBound = 1 upperBound pp =1
1:1, A2A,, Naming Difference
eAttributeType
A ract SSyntaax Abst
eAttributeType eAttributeType
Challenge 2
EString
name = ‘kind‘ ordered = false lowerBound = 1 upperBound = 1 containment = false
1:1, CA2R, Naming , , g Difference
EClass name = ‘Publication‘ abstract = false
EString
EReference
eStructuralFeatures
eAttributeType
eReferenceType
EClass C name = ‘Kind‘ abstract = false
EAttribute
Exam mple Instaancees
eStructuralFeatures
name = ‘kind‘ lowerBound = 1 upperBound = 1
P3:Publication
P2:Publication
name = ‘P3‘ P3 kind = ‘Conference‘
EAttribute name = ‘name‘ o e ou d = 1 lowerBound upperBound = 1
Challenge 1 1:1, A2A, Naming Difference, Context D Difference
P1 P bli ti P1:Publication name = ‘P1‘ kind = ‘Journal‘
eStructuralFeatures
P1: Publication
P2: Publication
P3: Publication
title = ‘P1‘ P1
title = ‘P2‘ P2
title = ‘P3‘ P3
kind
name = ‘P2‘ P2 kind = ‘Journal‘
kind
kind
K1: Kind
K2: Kind
name = ‘Journal‘
name = ‘Conference‘
Figure 7: Benchmark Example 3 – Different Metamodeling Concept Heterogeneities (A2C, CA2R) the LHS MM, i.e., the RHS links have to be generated, representing the second challenge in the example. In order to establish such additional links in the RHS, the information is needed in which relation the to be linked concepts have been in the LHS MM. With respect to this example, the source of the EReference Publication.kind is represented in the LHS MM by means of the EClass Publication and the target of the EReference by means of the EAttribute Publication.kind. Therefore, this heterogeneity is classified as C(lass)A(ttribute)2R(eference), whereby the first letter depicts the used LHS concept for the source of the to be generated reference and the second letter the used LHS concept for the target of the to be generated reference. Discussion of Resolution Strategies. When taking a look at the example instances, one can see that the desired intention of an A2C heterogeneity is that only for distinct Publication.kind attribute values an according Kind object should be generated. Therefore, the RHS model exhibits only a single object named Journal (cf. K1 in Fig. 7), which is referenced by the Publication objects P1 and P2.
4.2
4. EReference Professor.publications, EClass Publication – EReference DBLPEntry.publication: RC2R Example Description. Whereas the class Professor in the LHS MM in Fig. 8 has a direct EReference Professor.publications, the LHS MM offers this information only indirectly by means of the EClass DBLPEntry and its EReference DBLPEntry.publication, representing the first challenge in this example (cf.R(eference)2C(lass) feature value in Fig. 4). Concerning the second challenge, values for the DBLP.id EAttribute have to be generated. Since the containing RHS EClass is generated on basis of the LHS EReference Professor.publications the according EAttribute has also to be generated on basis of this EReference (cf. R(eference)2A(ttribute) feature value in Fig. 4). With respect to the third and fourth challenge, the according links have to be established. For this again the information is needed in which relation the to be linked concepts have been in the LHS MM, as described above. Concerning the Professor.entries EReference, the source of the EReference (Professor) is generated on basis of the LHS EClass Professor and the target of the EReference (DBLPEntry) on basis of the EReference Professor.publications – thus this heterogeneity is classified as C(lass)R(eference)2R(eference). A similar situation occurs for the RHS EReference DBLPEntry.publication but in this case the source of the EReference bases on an EReference and the target bases on an EClass – a heterogeneity classified as R(eference)C(lass)2R(eference). Discussion of Resolution Strategies. The challenge in this benchmark example is to obtain objects conforming to the RHS EClass DLBPEntry (cf. example instances in Fig. 8). These RHS objects have to created on basis of the LHS links since these links encode the information which publications belong to which professor which is also the task of DBLPEntry objects. Therefore, Fig. 8 depicts four DLBPEntry objects which originate from the four LHS Professor.publications links. To set the DBLPEntry.id
Benchmark Example 4
Whereas the previous example exhibited the heterogeneity that a LHS concept is modeled by means of an EAttribute and the RHS concept by means of an EClass, the following example (cf. Fig. 8) exhibits the heterogeneity that a LHS concept is modeled by means of an EReference whereas the equivalent RHS concept is again represented by an EClass. The main challenges in this example are: 1. EReference Professor.publications – EClass DBLPEntry: R2C 2. EReference Professor.publications – EAttribute DBLPEntry.id: R2A 3. EClass Professor, EReference Professor.publications – EReference Professor.entries: CR2R
37
Concrette Syyntaxx
P f Professor
publications 0..*
name:String g
Publication
Professor
name:String
name:String
eStructuralFeatures
EClass
name = ‘Professor‘ abstract = false
EAttribute eAttributeType eAttributeType yp
publication 1..1
Publication name:String
EAttribute name = ‘name‘ lowerBound = 1 upperBound = 1
1:1, A2A (no , ( heterogeneity) g y)
name = ‘name‘ lowerBound = 1 upperBound = 1
DBLPEntry id:Integer
eStructuralFeatures
EClass
1:1, C2C (no ( heterogeneity) g y)
name = ‘Professor‘ abstract = false
entries 0.*
eAttributeType
EString g
eAttributeType Att ib t T
EString
EReference
1:1, CR2R, Naming 1:1 CR2R Naming Difference eStructuralFeatures
Ch llenge 3 Challe 3
name = ‘entries‘ ordered = false l lowerBound B d=0 upperBound = -1 containment = false
Ab bstract Syyntaxx
eReferenceType
Challenge 1 1
EReference eStructuralFeatures St t lF t
name = ‘publications‘ ‘ bli ti ‘ ordered = false lowerBound = 0 upperBound = -1 containment = false
EClass name = ‘DBLPEntry‘ ‘DBLPE t ‘ abstract = false
1:1, R2C, Naming Difference
eStructuralFeatures
eAttributeType Att ib t T
EAttribute
EInteger
name = ‘id‘ id lowerBound = 1 upperBound = 1
1:1, R2A, Nam ming Difference ff
Challenge 2 EReference eReferenceType
p name = ‘publication‘ ordered = false lowerBound = 1 upperBound = 1 containment = false
eStructuralFeatures
Challenge e 4
1:1, RC2R, Naming 1:1, RC2R, Naming Difference eReferenceType
EClass
EClass
name = ‘Publication‘ abstract b t t = false f l
eStructuralFeatures
E mple Instaancess Exam
name = ‘Publication‘ abstract b t t = false f l
1:1 C2C (no heterogeneity) 1:1, C2C (no EAttribute
EAttribute
1:1, A2A (no heterogeneity) 1:1, A2A (no
name = ‘name‘ lowerBound = 1 upperBound = 1
entries publications
P1:Professor name = ‘Prof1‘
publications
P2:Professor publications
name = ‘name‘ lowerBound = 1 upperBound = 1
eStructuralFeatures
name = ‘Prof2‘
P1:Professor
entries
name = ‘Prof1‘
entries
P2:Professor name = ‘Prof2‘
entries
publications bli ti
P10:Publication
P11:Publication
P12:Publication
name = ‘P1‘
name = ‘P2‘
name = ‘P3‘
D1:DBLPEntry
D2:DBLPEntry
D3:DBLPEntry
D4:DBLPEntry
id = 1
id = 2
id = 3
id = 4
publication
P10:Publication name = ‘P1‘
publication publication
P11:Publication name = ‘P2‘
publication
P12:Publication name = ‘P3‘
Figure 8: Benchmark Example 4 – Different Metamodeling Concept Heterogeneities (R2C, R2A) value a function is needed which generates an according id whereby again for every LHS link an according RHS value should be created.
5.
concreteness differences. The main challenges in this example can be summarized as follows: 1. EClass FullProf, EClass AssistantProf – EClass FullProf: I2I, Breadth Difference
INHERITANCE HETEROGENEITIES
2. EClass Assistant – EClass Assistant: I2I, Concreteness Difference, Depth Difference
In the previous sections we discussed potential heterogeneities when considering the metamodeling concepts of classes, attributes and references. Finally, heterogeneities might be caused by the concept of inheritance. In this respect we again divide into heterogeneities that might occur although both MMs use inheritance (cf. same metamodeling concept inheritance differences in Fig. 4) and heterogeneities that occur if only one MM makes use of inheritance (cf. different metamodeling concept inheritance differences in Fig. 4). Similar to the afore mentioned same metamodeling concept differences (cf. Section 3), same metametamodeling concept inheritance differences occur due to different attribute values or links in the Ecore MMs (cf. Fig. 3) whereas the latter heterogeneities occur if an inheritance hierarchy in one MM is expressed by other concepts (i.e., classes, attributes, and references) in the other MM. In the following one example per category is given.
5.1
3. EClass PrePhd, EClass PostPhd – No corresponding EClass: I2I, Breadth Difference
Benchmark Example 5
This example (cf. Fig. 9) belongs to the same metametamodeling concept category and therefore both MMs make use of inheritance. Nevertheless certain heterogeneities occur, comprising breadth differences, depth differences and
38
Example Description. Concerning the first challenge, a breadth difference between the LHS EClasses FullProf, AssistantProf and the RHS EClass FullProf exists. This is since the number of sibling classes in the context of a certain parent class differs. For resolving breadth differences, the strategy can be applied to map instances of some class only existing in the LHS MM to a concrete parent class in the RHS MM. Nevertheless, since the parent classes of the EClass AssistantProf are abstract, instances of AssistantProf get lost. With respect to the second challenge, a concreteness difference as well as a depth difference occurs between the two EClasses Assistant. This is since the EClass Assistant in the LHS MM is set abstract whereas the corresponding EClass Assistant in the RHS MM is concrete. Additionally, a depth difference exists, since the longest path of subclasses in the context of the EClass Assistant in the LHS MM is 1 whereas it is 0 in the context of the corresponding class in the RHS MM. For resolving the
ResearchStaff
C crete Synttax Conc
ResearchStaff
name:String
name:String
Professor f
FullProf
Assistant
AssistantProf
PrePhd
EClass
Professor
Assistant
FullProf
PostPhd
EClass
1:1, C2C (no heterogeneity)
name = ‘ResearchStaff‘ ‘R hSt ff‘ abstract = true
name = ‘ResearchStaff‘ ‘R hS ff‘ abstract = true eSuperTypes
eSuperTypes
EAttribute eStructuralFeatures eSuperTypes p yp
name = ‘name‘ l lowerBound B d=1 upperBound = 1
EAttribute
1:1, A2A (no heteerogeneity) eAttributeType
EClass
EString
EClass
1:1, C2C (no heterogeneity)
name = ‘Professor‘ Professor abstract = true eSuperTypes S T
eSuperTypes S T
Absttract Synttax
eAttributeType
eSuperTypes
name = ‘Professor‘ Professor abstract = true eSuperTypes S T
name = ‘name‘ l lowerBound B d=1 upperBound = 1
eStructuralFeatures
EString
EClass
EClass
1:1, C2C C, breadth difference
name = ‘FullProf‘ abstract = false EClass
name = ‘FullProf‘ abstract = false
Challenge 1 Challenge 1 1:0, breadth difference
name = ‘AssistantProf‘ AssistantProf abstract = false
Ch llenge 2 Chall 2 EClass
EClass
1:1, C2C, concreteness difference, d depth difference
name = ‘Assistant‘ abstract = true eSuperTypes
name = ‘Assistant‘ abstract = false
eSuperTypes
EClass
1:0, breadth difference
name = ‘PrePhd‘ PrePhd abstract = false EClass
Challenge 3 1:0, breadth difference
Exxamp ple Insstancces
name = ‘PostPhd‘ abstract = false
F1:FullProf name = ‘Prof1‘ Prof1
P1:PrePhd name = ‘PrePhd1‘ PrePhd1
A1:AssistantProf name = ‘AssProf1‘
P2:PostPhd name = ‘PostPhd1‘
F1:FullProf name = ‘Prof1‘ ‘P f1‘
P1:Assistant name = ‘PrePhd1‘ ‘P Phd1‘ P2:Assistant name = ‘PostPhd1‘
Figure 9: Benchmark Example 5 – Same Metamodeling Concept Heterogeneities
5.2
concreteness difference no strategy is needed in this example, since the LHS class is abstract and therefore no instances can exist. The situation would be different, if it would be inverse. Then instances might be lost, if no concrete class in the RHS MM for including those instances might be found. For resolving the depth difference, the strategy can be pursued to map instances of the classes only existing in the LHS MM to some concrete parent class in the RHS MM. Therefore, in this case the instances of the EClasses PrePhd and PostPhd result in instances of the parent EClass Assistant in the RHS MM. Finally, regarding the third challenge, a breadth difference between the EClasses PrePhd and PostPhd and the non-existing RHS classes exists. Since in this case the breadth difference overlaps with the depth difference of challenge 2 (being the case since the EClass Assistant in the RHS MM exhibits no subclasses at all), no additional resolution strategy is needed here. Discussion of Resolution Strategies. When taking a look at the chosen resolution strategies, one can see that a strategy has been chosen that tries to minimize instance loss and thus information loss. Therefore instances of a class that only exist in the LHS MM should be kept by mapping them to some concrete parent class due to the is-a relationship between the classes. Nevertheless, the explicit type information and additional features only owned by the subclass are lost. Therefore sometimes also a strategy that omits these instances might be useful.
Benchmark Example 6
This example (cf. Fig. 10) belongs to the different metamodeling concept category and therefore only one MM makes use of inheritance. The main challenge in this example can be summarized as follows: 1. EAttribute ResearchStaff.kind – EClasses ResearchStaff, Professor, Assistant and FullProf in inheritance hierarchy: A2I Example Description. With respect to the main challenge in this example, an A(ttribute)2I(nheritance) heterogeneity between the EAttribute ResearchStaff.kind and the EClasses ResearchStaff, Professor, Assistant and FullProf occurs. For resolving this kind of heterogeneity a condition is needed to divide the instances of the EClass ResearchStaff according to the values of the EAttribute kind in order to instantiate instances of the corresponding RHS classes. Thereby the problem may arise, that the EAttribute of the LHS MM comprises values that do not correspond to any (concrete) EClass in the RHS MM. This is the case in the example with the instance R1, since the corresponding EClass Professor in the RHS MM is abstract and can thus not be instantiated causing information loss. Discussion of Resolution Strategies. Concerning the resolution strategy chosen in this example again information loss should be prevented whenever possible. Nevertheless, as already discussed above, this may not always be possible.
39
Inheritance Difference,, Non‐Overlapping, A2I pp g
ResearchStaff
Co oncreete Syyntaxx
name:String R ResearchStaff hSt ff Professor
name:String g kind:String
Assistant
FullProf
EClass ECl
EClass
name = ‘ResearchStaff‘ abstract = true
name = ‘ResearchStaff‘ ResearchStaff abstract = false
eSuperTypes
EAttribute
Abstracct Syn ntax
eStructuralFeatures
name = ‘name‘ l lowerBound B d=1 upperBound = 1
eAttributeType eAttrib teT pe eAttributeType
eStructuralFeatures
EString
EAttribute eStructuralFeatures
EAttribute
1:1 A2A (no heterogeneity) 1:1, A2A (no h
name = ‘name‘ l lowerBound B d=1 upperBound = 1
eAttributeType
EString
EClass
name = ‘kind‘ lowerBound = 1 upperBound = 1
name = ‘Professor‘ abstract = true eSuperTypes S T
A2I
EClass
eSuperTypes
name = ‘FullProf‘ abstract = false
Challenge 1 h ll EClass
Exxample Insstances
name = ‘Assistant‘ abstract = false
R1:ResearchStaff
R2:ResearchStaff
R3:ResearchStaff
R2:FullProf
R3:Assistant
name = ‘staff1‘ ki d = ‘Professor‘ kind ‘P f ‘
name = ‘staff2‘ ki d = ‘FullProf‘ kind ‘F llP f‘
name = ‘staff3‘ ki d = ‘Assistant‘ kind ‘A i t t‘
name = ‘staff2‘
name = ‘staff3‘
Figure 10: Benchmark Example 6 – Different Metamodeling Concept Heterogeneities (A2I)
6.
RELATED WORK
ments are presented, but on a rather coarse-grained level, e.g., conditional patterns dealing with attribute differences and transformation patterns, vaguely dealing with different metamodeling concept heterogeneities. With respect to existing classifications, Visser et al. [17] and Klein [12] provide a comprehensive list of semantic heterogeneities. Nevertheless, they have a strong focus on semantic heterogeneities, neglecting syntactic heterogeneities. Summarizing, although there are several classifications available, none explicitly focuses on the domain of MDE. Therefore we systematically analyzed variation points in the Ecore meta-metamodel in order to extend and adapt existing classifications. In this respect, we aligned on the one hand terms of existing classifications, e.g., most classifications introduced terms for the heterogeneities summarized in our classification by same metamodeling concept heterogeneities. On the other hand, we introduced new heterogeneities stemming from the explicit concepts of references and inheritance in object-oriented metamodels in contrast to existing classifications basing either on the relational or the XML data model. Finally, current classifications miss to explicate how different types of heterogeneities relate to each other, which we formalized by means of a feature model. Existing Benchmarks. Model Engineering. To the best of our knowledge no benchmark for mapping systems in the area of MDE exists. Nevertheless, a benchmark for evaluating the performance of graph transformations [16] has been proposed. Data Engineering. In the area of data engineering Alexe et. al. propose in [1] a first benchmark for mapping systems, thereby presenting a basic suite of mapping scenarios which should be readily supported by any mapping system focussing on information integration. In this respect, ten examples are discussed for which the actual transformation functions are given in terms of XQuery5 expressions. Addi-
In the following, two threads of related work are considered. First, our feature-based classification is compared to existing classifications. Second, mapping benchmark is related to existing mapping benchmarks. In this respect, at first the most closely related area of model engineering is examined. Moreover, the more widely related areas of data engineering and ontology engineering are investigated. Existing Classifications. Model Engineering. Although model transformations and thus the resolution of heterogeneities between MMs play a vital role in MDE, to the best of our knowledge no dedicated survey examining potential heterogeneities exists. Data Engineering. In contrast to that, in the area of data engineering a plethora of literature exists for decades highlighting different aspects of heterogeneities in the context of database schemata. A first classification of semantic and structural heterogeneities when integrating two different schemas was presented by Batini et al. in [2]. A systematic classification of possible variations in a SQL statement was presented by Kim et al. in [11], detailing Table-Table and Attribute-Attribute heterogeneities, e.g., wrt. cardinalities. The classification of Kashyap et al. presented in [10] provides a broad overview on possible heterogeneities in a data integration scenario comprising semantic heterogeneities and conflicts occurring between same modeling concepts. The work of Blaha et al. presented in [4] describes patterns resolving syntactic heterogeneities, comprising same metamodeling concept heterogeneities as well as different metametamodeling concept heterogeneities. Finally, the classification of Legler [13] presents a systematic approach for attribute mappings by combining possible attribute correspondences with cardinalities. Ontology Engineering. Concerning the domain of ontology engineering pattern collections as well as classifications exist. A pattern collection has been presented by Scharffe et al. in [14]. Thereby correspondence patterns for ontology align-
5
40
http://www.w3.org/TR/xquery/
tional examples are presented on their homepage6 . Although the benchmark provides a first set of mapping scenarios it remains unclear how the scenarios have been obtained and if they provide full coverage in terms of expressivity. Although XQuery expressions are given to define the semantics, some of the XQuery functions assume the availability of custom functions which are not provided. Since there are also no RHS models given it is hard to get the actual outcome of the transformation. Finally, some scenarios are not clearly specified with the given query (cf. scenario 2 and 17 on their homepage). A further benchmark called THALIA is presented by Hammer et. al in [7]. It provides researchers with a collection of twelve benchmark queries given in XQuery, focusing on the resolution of syntactic and semantic heterogeneities in a data integration scenario. For every query a socalled reference schema (i.e., global schema) and a challenge schema is provided (i.e., the schema to be integrated) together with instances. Although the paper claims a systematic classification of semantic and syntactic heterogeneities leading to the presented queries, it is merely an enumeration of heterogeneities where the rationale behind is left unclear. Ontology Engineering. With respect to the area of ontology engineering, no dedicated mapping benchmark exists. Nevertheless, efforts concerning the evaluation of matching tools, i.e., tools for automatically discovering alignments between ontologies have been spent, resulting in an ontology matching benchmark 7 whereby these examples could be of interest for a dedicated mapping benchmark as well. Summarizing, although both benchmarks from the area of data engineering provide useful scenarios in the context of XML they do not provide a systematic classification resulting in a systematic set of benchmark examples to evaluate the expressivity of a certain mapping system.
7.
[3] J. B´ezivin. On the Unification Power of Models. Journal on SoSyM, 4(2):31, 2005. [4] M. Blaha and W. Premerlani. A catalog of object model transformations. In Proc. of the 3rd Working Conference on Reverse Engineering, WCRE’96, pages 87–96, 1996. [5] K. Czarnecki, S. Helsen, and U. Eisenecker. Staged Configuration Using Feature Models. In Proc. of Third Software Product Line Conf., pages 266–283, 2004. [6] M. Del Fabro and P. Valduriez. Towards the efficient development of model transformations using model weaving and matching transformations. Journal on SoSyM, 8(3):305–324, July 2009. [7] J. Hammer, M. Stonebraker, and O. Topsakal. THALIA: Test harness for the assessment of legacy information integration approaches. In Proc. of the Int. Conf. on Data Engineering, ICDE, pages 485–486, 2005. [8] D. Harel and B. Rumpe. Meaningful modeling: What’s the semantics of ”semantics”? Computer, 37:64–72, 2004. [9] R. Hull and R. King. Semantic Database Modeling: Survey, Applications, and Research Issues. ACM Comput. Surv., 19(3):201–260, 1987. [10] V. Kashyap and A. Sheth. Semantic and schematic similarities between database objects: A context-based approach. The VLDB Journal, 5(4):276–304, 1996. [11] W. Kim and J. Seo. Classifying Schematic and Data Heterogeneity in Multidatabase Systems. Computer, 24(12):12–18, 1991. [12] M. Klein. Combining and relating ontologies: an analysis of problems and solutions. In Proc. of Workshop on Ontologies and Information Sharing, ˇ IJCAIS01, 2001. [13] F. Legler and F. Naumann. A Classification of Schema Mappings and Analysis of Mapping Tools. In Proc. of the GI-Fachtagung f¨ ur Datenbanksysteme in Business, Technologie und Web (BTW’07), 2007. [14] F. Scharffe and D. Fensel. Correspondence Patterns for Ontology Alignment. In Proc. of the 16th Int. Conf. on Knowledge Engineering, EKAW ’08, pages 83–92, 2008. [15] A. P. Sheth and J. A. Larson. Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Comput. Surv., 22(3):183–236, 1990. [16] G. Varro, A. Sch¨ urr, and D. Varro. Benchmarking for graph transformation. In Proc. of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, VLHCC ’05, pages 79–88, 2005. [17] P. R. S. Visser, D. M. Jones, T. J. M. Bench-Capon, and M. J. R. Shave. An analysis of ontological mismatches: Heterogeneity versus interoperability. In Proc. of AAAI 1997 Spring Symposium on Ontological Engineering, 1997. [18] M. Wimmer, G. Kappel, A. Kusel, W. Retschitzegger, J. Sch¨ onb¨ ock, and W. Schwinger. Surviving the Heterogeneity Jungle with Composite Mapping Operators. In Proc. of the 3rd Int. Conf. on Model Transformation, ICMT 2010, pages 260–275, 2010.
CONCLUSION AND FUTURE WORK
In this paper we presented a systematic classification of heterogeneities occurring between Ecore-based MMs. Nevertheless, this classification of heterogeneities can also be applied to other semantic data models, comprising the common core concepts this classification bases on. Moreover, a first set of benchmark examples has been proposed stating the requirements a mapping tool should fulfill. Additionally, these benchmark examples can be used to compare solutions realized with ordinary transformation languages. Further work comprise the completion of the benchmark examples to fully cover the classification. However, the success of a benchmark heavily depends on the agreement of the community – thus our collaborative homepage invites for discussions. Finally, a tool evaluation on basis of this benchmark is envisioned comparing and evaluating mapping tools from diverse engineering domains wrt. their expressivity.
8.
REFERENCES
[1] B. Alexe, W.-C. Tan, and Y. Velegrakis. STBenchmark: Towards a Benchmark for Mapping Systems. VLDB Endow., 1(1):230–244, 2008. [2] C. Batini, M. Lenzerini, and S. B. Navathe. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Comp. Surv., 18(4):323–364, 1986. 6 7
http://www.stbenchmark.org/ http://oaei.ontologymatching.org/2010/
41
Specifying Overlaps of Heterogeneous Models for Global Consistency Checking Zinovy Diskin, Yingfei Xiong, and Krzysztof Czarnecki Univesity of Waterloo Waterloo, ON, Canada
{zdiskin, yingfei, kczarnec}@gsdlab.uwaterloo.ca ABSTRACT
[25] or a pair of model [9]. However, individual consistency or pairwise consistency do not guarantee global consistency. For example, Fig. 1 shows three UML class diagrams D1,2,3 , where the classes connected by a dashed line are considered to be the same class (though named differently). Each of the three diagrams is consistent, and each pair of them is consistent, but taken together the three diagrams are inconsistent: there is a cycle in the inheritance chain. The example shows two issues in checking global consistency. First, we need to specify the models’ overlap. For models like code and UML class diagrams extracted from code, we may know their overlap by matching the elements by name. But for models in the conceptual stage, we cannot deduce their overlap automatically. For example, an entity “Person” created by a business analyst and a table “Employee” existing in a legacy database may refer to the same concept even though they have different names. Second, when we have an overlap specification, we need an approach to check global consistency. Sabezadeh et al.[22] proposed to check global consistency of homogeneous models by their merging. First, the models’ overlap is specified by a correspondence diagram: a set of auxiliary models and mappings “in-between” the local model, which declare some elements in different local models as being actually the same. Then all local models are merged into one model modulo the correspondence, i.e., elements of local models declared the same in the correspondence diagram become one element. Finally, consistency of the merged model is checked. Thus, verifying global consistency amounts to checking consistency of a single model. However, the approach was developed for the case of homogeneous models only. The goal of the paper is to adopt the consistency-checkingby-merging (CCM) idea for the heterogeneous situation. A straightforward solution is to first merge all involved metamodels so that all local models become instances of the same global metamodel; then we can merge them and check the result wrt. the constraints in the global metamodel. Though theoretically possible, in practice this approach leads to dealing with huge models and metamodels resulting from the merge, which is cumbersome and not effective. We present another approach in which merging metamodels is significantly reduced to an unavoidable minimum, and merging models is reduced to only merging their relevant parts. Briefly, we find common views between metamodels, project related models to spaces of instances (overlaps) determined by those views, and then apply the CCM approach to the homogeneous set of projections.
Software development often involves a set of models defined in different metamodels, each model capturing a specific view of the system. We call this set a mutlimodel, and its elements partial or local models. Since partial models overlap, they may be consistent or inconsistent wrt. a set of global constraints. We present a framework for specifying overlaps between partial models and defining their global consistency. An advantage of the framework is that heterogeneous consistency checking is reduced to the homogeneous case yet merging partial metamodels into one global metamodel is not needed. We illustrate the framework with examples and sketch a formal semantics for it based on category theory.
Categories and Subject Descriptors D.2.12 [Software Engineering]: Interoperability
General Terms Design, Languages, Theory, Verification.
1.
INTRODUCTION
Software development often D1 D2 involves a set of heterogeneous Class A Class C models, such as use cases, process models, UML design Class B Class D models, and code. These models are defined by differClass E ent metamodels, and are often built by different teams, but collectively represent a sinClass H Class G D3 gle system. Due to possible overlaps between models, individually consistent models Figure 1: Three may be globally inconsistent if globally inconsistent taken together. Many existing models approaches focus on checking consistency of a single model
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MDI’2010, October 5, 2010, Oslo, Norway Copyright 2010 ACM 978-1-4503-0292-0/10/10 ...$10.00.
42
A Order
OnlineOrder
10:Class 11:Name “Order”:String :attr :type :end :Generalization :start 20:Class :attr :type 21:Name “OnlineOrder”:String GA
is held, and &&N be e itself. For example, &‘Order’=11 and &&’Order’=10. In its turn, graph GM is typed over the metametamodel graph GM M . Any UML class diagram can be represented by a typed graph as above but not the converse. To ensure that a typed graph is a correct diagram, constraints must be declared and added to the metamodel. For example, (C1) a class has only one name, or (C2) a class has only one parent class (we assume that multiple inheritance is prohibited), or (C3) classes with stereotype ’singleton’ cannot be instantiated with more than one object. Note that constraints can either be imposed by a particular metamodeling technique, e.g., constraints (C1) and (C2), or can be user-defined, e.g., (C3), in a suitable language like OCL. In this paper we do not distinguish these two types and consider them abstractly as constraints over graphs.
tA Generalization:Node start:arrow end:arrow
Node arrow GMM
tM GM
type:arrow
Class:Node attr:arrow Name:Node String:Node
2.2
Figure 2: Graph Representation
We formulate the framework in a general way based on category theory. This makes it applicable to a wide class of models and metamodels, whose carrier structures are graphs, attributed graphs, or general graph-like structures. By the latter we mean systems of sets (nodes, arrows, arrows between arrows...) interrelated by (source and target) functions. Realization of the approach requires several challenging issues to be solved: type-safe model matching, specification of indirect overlap between metamodels, and inter-metamodel constraints. We will discuss these issues in more detail in Section 3 after we briefly outline the basics of CCMapproach in Section 2. The rest of the paper is structured as follows. Section 4 describes our main techniques with simple examples. Section 5 presents general definitions and constructions in a semi-formal way. Relation to other works is discussed in Section 6. Section 7 concludes.
2.
BACKGROUND: HOMOGENEOUS OVERLAP AND CONSISTENCY
We briefly review the basics of the CCM-approach, and also show how to manage conflicts between values.
2.1
Matching models via spans
Suppose two business analysts independently build two UML diagrams, A1 and A2 in Figure 3. To check their global consistency, we first need to specify overlap between the diagrams. Suppose we know that class ’OnlineOrder’ in diagram A1 and class ’Order’ in A2 refer to the same class, and their ’price’ attributes refer to the same attribute. We could write the following two informal equations OnlineOrder@A1 = Order@A2 price@A1 = price@A2 . Note that these equations conform to the type system of class diagrams: we match a class to a class and an attribute to an attribute. Hence, we can represent the set of equations by a class diagram A0 shown in the middle of Fig. 3. The question mark indicates that the name of the class is unknown and the corresponding slot is empty. That is, the slot node ( :Name) in the graph representing model A0 does not have any arrow ( :type) adjoint to it (see the auxiliary top-rightmost box in the figure). Nevertheless, it is convenient to denote the slot and its owner by &’ ?’ and &&’ ?’ like if ’ ?’ were a name. Since elements of model A0 represent pairs of elements (e1 , e2 ) with ei ∈ Ai , i = 1, 2, we have two inter-model mappings fi : A0 → Ai . Formally, these mappings are functions between the corresponding graphs, e.g., f1 acts on GA0 ’s nodes as follows: f1 (&&’?’) = &&’OnlineOrder’, f1 (&&’price’) = &&’price’, f1 (&’?’) = &’OnlineOrder’, f1 (&’price’) = &’price’, f1 (’price’) = ’price’.
Software models are typed graphs
We consider metamodels as pairs M = (GM , CM ) with GM a graph and CM a set of constraints. A model (M ’s instance) is a graph typed over M , i.e., a pair A = (GA , tA ) with GA a graph (typically much bigger than GM ) and tA : GA → GM a graph mapping (which preserves the incidence relationship between arrows and nodes) such that all constraints in set CM are satisfied. For example, Fig. 2 shows how to represent a UML class diagram A as a typed graph. GM is the graph representing the metamodel of UML class diagrams; GA is the graph representing the diagram A; and tA is the type mapping. UML classes, attributes, primitive values and generalization relations are represented as nodes; their relationships are captured by arrows. The value of mapping tA at an element e is given after colon, e.g., expression “10:Class” means tA (10)=Class for node 10. Identifiers of some elements are omitted, e.g., for all arrows. To refer to the elements, we will use the following notation: if N is the name of an element e, let &N be the slot (owned by e) where the name
Its action on arrows is evident. Mapping f2 is defined similarly. Importantly, both mappings preserve the types of elements, i.e., commute with the typing mappings of the corresponding graphs. In Fig. 3 we specify mappings in a shortened way, but precise formal specifications like above will be needed when we consider merging. We call a pair of mappings with a common source a (binary) span. The source (model A0 ) is called the head of the span, mappings fi are legs and their targets (models Ai ) are feet. Thus, an overlap of two homogeneous models is specified by a correspondence span over the same metamodel. An overlap of n models is described by an n-ary span with n legs and feet.
2.3
Merging and conflicts
After specifying the overlap by a correspondence span, we merge two models into one and check whether it satisfies all constraints defined in the metamodel.
43
A1
f1
Order
OfflineOrder
OnlineOrder price: int
Mapping f1 OnlineOrder int
? price:int
g1 Mapping g1 Order --> Order OnlineOrder-->{OnlineOrder, Order} price ---> price int ---> int OfflineOrder ---> OfflineOrder
f2
A0
denotes
Order
OfflineOrder
Game
Order price:int g2
colimit A∑
?
OnlineService
OnlineService
OnlineOrder, Order Game price :int
:Class :Name :attr
Mapping g2 OnlineService ’Mary Doe’, children -> {Alice}] Rules: ?X:human :- ?X:person. Queries: Whose child is Bob in module Mod: ?X : person@Mod, ?X[name ->?Y, children->Bob]@Mod. Output Result: ?X=’John’,?Y=’John Doe’ Loading programs in modules: ?- [‘path/filename.flr’>>Mod] #include “path/filename.flr” ….
Figure 2. Flora2 examples: objects, rules, queries.
The remaining of this paper is organized as follows. Section 2 provides a brief introduction to F-logic/Flora2. Section 3 presents our mapping approach for lifting of XSD schemas to objectoriented modes, mapping specification and run-time execution. Section 4 provides an overview of the architecture of our data exchange system together with some preliminary performance results. Section 5 gives concludes this paper, together with some relevant related work and potential extensions.
The core motivation for choosing Flora2 is that it is a rule based object-oriented logical language which provides support for flexible specification of schemas, instances, mapping rules, and at the same time it can be used to execute mapping rules on instance data. Flora2 comes with an XML package which supports loading and parsing XSD/XML documents, converting them to sets of Flora2 objects stored in user-specified Flora2 modules. It also provides equivalent entities for XSD and XML, features that used in our framework for data mediation.
2. BRIEF OVERVIEW OF FLORA2 In order to realize data mediation at a more abstract, semantic level, we need a higher level of abstraction for the representation of XML schemas and instances. Our approach is based on using object-oriented representations to abstract XML schemas and instances and then to perform mapping between a source and a 1
Attribute
3. MAPPING APPROACH Our proposed solution called FloraMap which is based on logical rules for specifying mappings at the schema level and executing those mappings at the instance level. The choice for logical rules is motivated by their declarative and procedural semantics, making them a powerful tool for declaratively specifying and at
http://flora.sourceforge.net/
90
the same time executing mappings. Logical rules cannot work directly with XSDs, and therefore proper abstraction mechanisms need to be developed for abstracting XSD schemas, on top of which mappings can be designed and executed. Our choice for such abstractions is the use of object-oriented techniques for representing XSD and XML, on top of which mapping rules can be more easily specified.
Design-time:
Figure 3 below gives an overview of the mapping approach. We can separate the mapping in two parts: Design-time and Run-time.
3.
The Source XML is represented as Flora2 objects of the source Flora2 schema
4.
Logical rules from step 2 are executed for the source Flora2 objects and target Flora2 objects are generated
5.
The target Flora2 objects are serialized in target XML instances
Source XSD Flora2 Schema Design-time Run-time
2.
Logical rules are used to specify the mappings between the source Flora2 schemas and target Flora2 schemas.
The rest of this section will give an overview of how abstraction is achieved (mapping XML schemas and instance to Flora2 representations), how mappings are specified and executed (i.e. mapping Flora2 source objects to Flora2 target objects), and how the resulting Flora2 objects are serialized in XML (i.e. mapping Flora2 objects to XML instances).
Flora2 Schema
Semantic Mapping (Specification and Execution)
Flora2 Objects
The Source XSD and Target XSD are represented as source and target Flora2 object-oriented schemas.
Run-time:
Target XSD
Transform Engine
1.
To exemplify these steps we will use the exchange of an XML invoice between a company X (source) and a company Y (target). The schemas of the invoices of companies X and Y are presented in Figure 4, together with the following mappings:
Flora2 Objects
1. Source XML
Target XML
Bizszam in source is the same as InvoiceNumber in target
2.
Bizkelt in source is the same as InvoiceDate in target
3.
City in source is the same as DeliveryAddress.city in target
4.
Zip in source is the same as DeliveryAddress.zip in target
5.
Street in source is the same as DeliveryAddress.street in target
6.
AccDate in target is a concatenation of Ev in the source, a delimiter, Kanyvho in the source, a delimiter, and the string ’01’, i.e. AccDate = (Ev+‘_’+Kanyvho+‘_’+’01’)
Figure 3. Mapping Approach – Overview.
Target XSD: Company Y (b)
Source XSD: Company X
1 6 2 3 4 5
(a)
Figure 4. XML Schemas and mappings example.
91
XSD import and include have natural equivalents to Flora2 modules. For example, “filename.xsd” is included in XSD file which is presented as . It can be transformed as #include “filename_Abstract.flr” in Flora2 Abstract file and #include “filename_Special.flr” in Flora2 Special file. For XSD import, the following steps can be used for the mapping:
3.1 XSD2OO The technique we designed for abstracting XML schemas to object-oriented models will generate two Flora2 models for each XSD: one Flora2 model (Abstract) contains the “clean” conceptual model of the schema (without any technicalities of XSD, but focusing on the semantics of the elements), and the other one (Special) contains XSD specific information (sequence, choice, etc.) which will be used for generating the structure of target XML instances. In most cases, XSD elements can find a natural representation in Flora2. For example, if a job element in XSD is specified as , it can be transform as in Flora2 as [job {0:5}*=>string]. The {0:5} cardinality is equivalent to minOccurs=”0” and maxOccurs=”5” in XSD.
1.
[‘filename_Abstract.flr’>>namespace] in Flora2 abstract file
2.
[‘filename_Special.flr’>>namespace] in Flora2 special file
3.
Keep the element name and replace the “:” with “_” in the type
Table 2 below exemplifies the way XSD import and include are handled in Flora2 schemas. Table 2. XSD contains import and include to Flora2 mapping.
Due to length restrictions, we do not provide the reader with a complete mapping of XSD to Flora2 schemas. Nevertheless, Table 1 provides three examples of how top-level elements in XSD are mapped to Flora2 representations.
Situation 1
Table 1. Example of XSD elements to Flora2 schema mapping Situation 1
Top-level Element with BaseType
XSD Abstract
name[name {1:2} *=>string].
Special
none
Situation 2
Top-level Element with ComplexType
XSD
Abstract Special Situation 3
XSD
Abstract Special
XSD
name[firstname {1:1} *=>’string’]. name[lastname {1:1} *=> ‘string’]. Elements[name->firstname]. Elements[name->lastname]. Sequences[name->[firstname,lastname]]. Top-level Element with SimpleType
Abstract
Special
age[base *=>’int’]. age[maxInclusive-> 200]. none
Situation 2
XSD Import ?- [‘path/Information_ Abstract.flr’>>ccts] person[name {1:1} *=> ccts_nameType]. person[‘ccts:age’ {1:1} *=> ccts_age]. person[work {1:1} *=> personwork]. personwork[‘ccts:workType’ {1:1} ccts_workType]. ?- [‘path/Information_Special.flr’>>ccts] Elements[person -> name]. Elements[person -> ‘ccts:age’]. Elements[person -> work].
XSD
XSD Include
Abstract
#include “path/person_ Abstract.flr”
Special
#include “path/person_Special.flr”
*=>
The result of applying the XSD to Flora2 transformation to the XSD schema of Company X (Figure 4.a) is depicted in Figure 5, and the result of applying the transformation to the XSD schema of Company Y (Figure 4.b) is depicted in Figure 6.
Attribute and Element are different things in XSD, but we abstract them as the same in Abstract and identify the difference in Special.
92
For example, if an instance of a job element is represented in XML as Programmer, then it can be transformed to obj_1:person [job->’programmer’] in Flora2. Obj_1 is a unique object name and obj_1:person means obj_1 is one of the instances of person. To transform the XML instance to Flora2 objects, the following high-lever steps are devised:
Flora2 Abstract (Company X) Namespace[value->'xs:']. InvoiceCompanyX [Bizszam{1:1}*=>'xs:string']. InvoiceCompanyX [Ev{1:1}*=>'xs:string']. InvoiceCompanyX [Kanyvho{1:1}*=>'xs:string']. InvoiceCompanyX [Bizkelt{1:1}*=>'xs:string']. InvoiceCompanyX [city{0:*}*=>'xs:string']. InvoiceCompanyX [zip{0:*}*=>'xs:int']. InvoiceCompanyX [street{0:*}*=>'xs:string'].
1.
Flora2 Special (Company X) Sequences[InvoiceCompanyX ->['Bizszam','Ev', 'Kanyvho',’Bizkelt','city','zip','street']] . Elements[InvoiceCompanyX ->Bizszam]. Elements[InvoiceCompanyX ->Ev]. Elements[InvoiceCompanyX ->Kanyvho]. Elements[InvoiceCompanyX ->Bizkelt]. Elements[InvoiceCompanyX ->city]. Elements[InvoiceCompanyX ->zip].
Parsing XML instance files in Flora2, resulting in a Flora2 tree.
2.
Load Flora2 Abstract source files in Flora2.
3.
Generate the Flora2 object structure according to the Flora2 abstract and query the value from Flora2 tree; Object names are constructed by concatenating “obj_” + a unique number (e.g. 1_1_2) generated from the unique location in the tree.
Step 1 is performed by Flora2 engine itself which is not part of our implementation (Flora2 XML package provides XML parsing support). It stores XML instances in Flora2 tree automatically when XML files are parsed. FloraMap uses this package to load XML file and uses Flora2 tree to query the value. Step 2 and 3 are performed by FloraMap. FloraMap generates the Flora2 objects structure according to the Flora2 Abstract and queries the value from Flora2.
Figure 5. Flora2 schema representation of Company X XSD schema (Figure 4.a)
Figure 7 shows the generation of a Flora2 object from an XML instance example of Company X. On the upper part are X’s XML instance and Flora2 Abstract. The output is the Flora2 object obj.
Flora2 Abstract (Company Y) Namespace[value->'xs:']. InvoiceCompanyY[InvoiceNumber{1:1}*=>'xs:string']. InvoiceCompanyY[AccDate{1:1}*=>'xs:string']. InvoiceCompanyY[InvoiceDate{1:1}*=>'xs:string']. InvoiceCompanyY[DeliveryAddress{0:*}*=> CompanyYDeliveryAddress]. CompanyYDeliveryAddress[city{0:*}*=>'xs:string']. CompanyYDeliveryAddress[zip{0:*}*=>'xs:string']. CompanyYDeliveryAddress[DoorNo{0:*}*=>'xs:string']. CompanyYDeliveryAddress[street{0:*}*=>'xs:string'].
Source XML (Company X) I_001 2010 05 2010-05-18 Oslo 1234 First Street
Flora2 Special (Company Y) Sequences[InvoiceCompanyY ->['InvoiceNumber','AccDate', 'InvoiceDate‘ ,'DeliveryAddress',TheOrderEnd]]. Elements[InvoiceCompanyY ->InvoiceNumber]. Elements[InvoiceCompanyY ->AccDate]. Elements[InvoiceCompanyY ->InvoiceDate]. Elements[InvoiceCompanyY ->DeliveryAddress]. Sequences[CompanyYDeliveryAddress->['city','zip', 'DoorNo','street']]. Elements[CompanyYDeliveryAddress ->city]. Elements[CompanyYDeliveryAddress ->zip]. Elements[CompanyYDeliveryAddress ->DoorNo]. Elements[CompanyYDeliveryAddress ->street].
Flora2 Abstract (Company X)
Flora2 object (Company X) obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-0518']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234']. bj I i C X [' ' 'Fi S ']
Figure 6. Flora2 schema representation of Company Y XSD schema (Figure 4.b)
Figure 7. XML to Flora2: Company X
3.3 OO2OO
These Flora2 Abstract and Special parts represent the source and target XSD and will be used as input in the design-time mapping and run-time target XML instance generation.
The core part of data mediation is the specification and execution of the mappings in Flora2, process which takes as input the Flora2 Abstract schemas of the source and target and the mappings between them, the Flora2 source objects, and generates Flora2 target objects according to the specification of the mappings. This phase can be separated in three steps:
3.2 XML2OO The technique we designed for abstracting XML instance to object-oriented models will generate one Flora2 model. Flora2 provides natural equivalences between object entities and XML instances.
1.
93
Specification of the design-time mappings between the source and target Flora2 Abstract schemas.
2.
Generation of the executable (run-time) mappings from the design-time specification of the mappings.
3.
Execution of the mappings on source Flora2 object for generation of Flora2 target objects.
Flora2 Abstract Company X
For step 1 we provide a simple mechanism to capture the correspondences between the Flora2 Abstract source and target schemas. This is achieved by the following Flora2 predicates:
Design-time Mappings: Company X to Y
Flora2 Abstract Company Y
?- [‘InvoiceCompanyX.flr'>>SourceInstances]. ?-?h: CompanyX@SourceInstances,newoid{?t},newoid{?t_4}, insert{ ?t: InvoiceCompanyY[InvoiceNumber->?t_1], ?t: InvoiceCompanyY [AccDate->?t_2], ?t: InvoiceCompanyY [InvoiceDate->?t_3], ?t: InvoiceCompanyY [DeliveryAddress->?t_4], ?t_4: InvoiceCompanyYDeliveryAddress[city->?t_4_1], ?t_4: InvoiceCompanyYDeliveryAddress[zip->?t_4_2], ?t_4: InvoiceCompanyYDeliveryAddress[street->?t_4_4] | ?t_1=?h.Bizszam@SourceInstances, flora_concat_items([?h.Ev@SourceInstances,_, ?h.Kanyvho@SourceInstances,_01],?t_2)@_plg(flrporting), ?t_3=?h.Bizkelt@SourceInstances, ?t_4_1=?h.city@SourceInstances, ?t_4_2=?h.zip@SourceInstances, ?t_4_4=?h.street@SourceInstances}.
OneToOne([source],[target]). OneToMany([source],[[target1],[target2],…],[n1,m1,n2,m2,..]). ManyToOne([[source1], [source2], [source3],…],[target]). OneToOne means that a class or attribute in the source schema corresponds to a class or attribute in the target schema. OneToMany means that a class or attribute in the source schema corresponds to more than one class or attributes in the target. ManyToOne means that more than one class or attribute in the source schema correspond to one class or attribute in the target. [source] is the path of the source class or attribute. [target] is the path of the target class or attribute. [n1,m1,n2,m2…] are values to identify substrings, first substring is from n1 to m1, second substring is from n2 to m2 and so on.
Figure 9. Fora2 executable program (run-time mappings)
Figure 8 shows the Flora2 specification of correspondences/mappings between the Flora2 Abstract source and target schemas from Figures 5 and 6, respectively. The mapping information is taken from our running example in Figure 4.
In step 3, Flora2 system is used as the underlying reasoning engine to execute the Flora2 program on source instances. Figure 10 shows the result of applying the executable mapping program on an instance of Company X invoice (obj) and the resulting instance of the Company Y invoice (obj1). Flora2 source object (Company X)
OneToOne([InvoiceCompanyX],[ InvoiceCompanyY]). OneToOne([InvoiceCompanyX,Bizszam], [ InvoiceCompanyY,InvoiceNumber ]). OneToOne([InvoiceCompanyX,Bizkelt], [InvoiceCompanyY,InvoiceDate ]). OneToOne([InvoiceCompanyX,City], [InvoiceCompanyY,DeliveryAddress, city]). OneToOne([InvoiceCompanyX,Zip], [InvoiceCompanyY,DeliveryAddress, zip]). OneToOne([InvoiceCompanyX,Street], [InvoiceCompanyY,DeliveryAddress, street]). ManyToOne([[InvoiceCompanyX,EV],‘_’, [InvoiceCompanyX,KANYVHO],‘_’,‘01’]], [InvoiceCompanyY, AccDate]).
obj: InvoiceCompanyX ['Bizszam'->'I_001']. obj: InvoiceCompanyX ['Ev'->'2010']. obj: InvoiceCompanyX ['Kanyvho'->'05']. obj: InvoiceCompanyX ['Bizkelt'->'2010-05-18']. obj: InvoiceCompanyX ['city'->'Oslo']. obj: InvoiceCompanyX ['zip'->'1234']. obj: InvoiceCompanyX ['street'->'First Street'].
Executable Mapping Program (Fig 9)
Figure 8. Design-time correspondences between the Flora2 schemas of company X and Y For step 2 we have devised a mechanism that takes as input the Flora2 source and target schemas, the design-time correspondences between them, and generates a Flora2 program that represents the executable mappings. This can be achieved in Flora2 in a rather intuitive and straightforward way: for each object instances in source generate new objects (using the newoid primitive defined in Flora2), assign the values to the new objects according to the design-time correspondences rules, and store the new objects in a target knowledge base (using the transactional feature insert of Flora2). Figure 9 shows the generated executable mapping program for our running example.
Flora2 target object (Company Y) obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4:CompanyYDeliveryAddress[city->’Oslo']. obj_4:CompanyYDeliveryAddress[zip->‘1234']. obj_4:CompanyYDeliveryAddress[street->‘First Street'].
Figure 10. Run-time mapping of Flora2 objects
94
At run-time FloraMap takes as input the XML source instances, the Flora2 source and target schemas, and the executable mapping rules produced at the design-time. Based on these inputs, FloraMap transforms XML source instances to Flora2 objects, executes the mappings on these source objects and generates target objects, and finally serializes the target objects into XML target instances.
3.4 OO2XML Flora2 to XML mapping is the last process in FloraMap execution and is concerned with serialization of generated Flora2 objects into XML instances. This process takes as input the target schema (both Flora2 Abstract and Special target schemas) and the Flora2 target objects and generates a target XML instances. In the XSD to Flora2 lifting process, FloraMap generated two Flora2 models: Flora2 Abstract (contains conceptual model of the schema) and Flora2 Special (contains XSD specific information). These two Flora2 files are used for generating the structure of target XML instances. Note that the Flora2 Special target schema plays a key role in the serialization of the objects, because it indicates the technical details of the XML instance that should be generated. In Flora2 to Flora2 mapping process, FloraMap generated Flora2 objects which are used to query the values of each class and attribute. Figure 11 depicts the Flora2 to XML process in our running example.
Figure 12 presents a high-level overview of the FloraMap modules and the interactions between them. The followings are the core modules of FloraMap: • • • •
Flora2 object (Company Y) obj1: InvoiceCompanyY[InvoiceNumber->’I_001’]. obj1: InvoiceCompanyY [AccDate->'2010_05_01']. obj1: InvoiceCompanyY[InvoiceDate->’ 2010-05-18']. obj1: InvoiceCompanyY[DeliveryAddress->{obj_4}]. obj_4: CompanyYDeliveryAddress[city->’Oslo']. obj_4: CompanyYDeliveryAddress [zip->‘1234']. obj_4: CompanyYDeliveryAddress [street->‘First Street'].
XSD to Flora2: Transforms the input XSDs to Flora2 schema models XML to Flora2: Transforms the input XML instances to Flora2 objects Flora2 to Flora2: Specifies the mappings between the source and target Flora2 models (OO level) Flora2 to XML: serializes the Flora2 objects to XML instances Target XSD
Source XSD XSD to Flora2
Source
Target
Flora2 Schema
Flora2 Schema
Target XML (Company Y) Flora2 Abstract (Company Y)
Flora2 Special (Company Y)
< InvoiceCompanyY > I_001 2010_05_01 2010-05-18 Oslo 1234 First Street
Flora2 to Flora2 Source
Target
Flora2 Objects
Flora2 Objects
XML to Flora2
Flora2 to XML
Source XML
Figure 11. Serialization of Flora2 objects to XML instances
4. System Architecture, Implementation, and Experimental Results
Target XML
Figure 12. FloraMap: Core modules and interactions Several experiments have been performed on the current implementation to test the scalability of FloraMap. The experiments have been carried out on a commodity computer (Intel(R) Core(TM) 2 Duo CPU P8600 @ 2.4GHz, 4GB RAM, Windows Vista 32-bit OS). Two types of experiments have been performed:
The techniques outlined in the previous section have been implemented in FloraMap - as a set of modules implemented in Flora2 which can be used to parse and transform XML schemas and instances into Flora2 schemas and objects, and execute the mediation rules specified at the Flora2 level. At design-time FloraMap takes as input the source and target XML schemas and generates the object-oriented models of the schemas. Then, the mappings creator specifies the correspondences/mappings between the schemas (similar to the example given in Figure 8), and generates the executable mapping program (similar to the example given in Figure 9) that will be used to execute mediation on source instances.
1.
Transformation of XSDs of various sizes and complexities to Flora2 Schema.
2.
End-to-end data exchange of increasing number of instances for the running example presented in above section.
For the first type of experiments we have used XSDs of various sizes and complexities to test the scalability of generating Flora2 object-oriented models from XML schemas. The used XSDs ranged from simple schemas such as those presented in this paper
95
(in Figure 4) to very complex schemas such as the Northern European Subset of UBL (NES).2 The times needed to generate object-oriented models from XSDs are reported in Figure 13.
5. Related Work, Conclusions, and Outlook The problem of mapping between data structures has been extensively studied for decades, and schema mapping is well established as research field [6,2]. Nevertheless, the use of rulebased logical systems for data mapping/exchange hasn’t been yet widely investigated in the community. With this paper we provided a solution to the end-to-end data exchange problem based on the use of F-logic/Flora2 as a logical framework which we used for high-level, abstract specification of schemas and mappings between them, as well as for run-time execution of mappings. Our approach allows the mappings creator to focus on the semantic, object-oriented model behind the XSD schemas and specify the mappings at a more abstract, semantic level, rather than having to deal with technicalities of XSD schemas. The proposed approach allows both specification and execution of data mappings (i.e. design- and run-time mapping) in a single, unifying framework, providing an end-to-end solution to the problem of XML data exchange.
Figure 13. Performance results: Generation of Flora2 models from XML schemas
There are several works that can be related to our approach. For example [4] presents algorithms to represent XML and XSD in a mainstream object-oriented programming language. It develops two mappings: one uses a set of rules that map an XSD schema into its object-oriented schema, and the other one maps XML instances that conform to an XSD schema to their representation as objects. This is directly related to our generation of Flora2 object-oriented models from XML schemas and instances, however, the representation in [4] does not seem to be complete (e.g. it is unclear how XSD import/include statements are handled). Furthermore, our approach targets specification of mediation as well as run-time execution, whereas [4] focuses just on an object-oriented representation of XML schemas. Another relevant work is [5], which focuses on generation of XML from object oriented modes. This can be related to our serialization of Flora2 objects into XML, but as in the case of [5] the scope of our work is much broader.
The results show that mapping large and complex schemas such as NES is a time consuming task (took about 7 minutes), however this is not an issue since this generation needs to be done at design time and only once. After producing the Flora2 representations of the XSDs, they can be loaded and processed rather fast by FloraMap, for run-time mediation. For the second type of experiments, where we tested the end-toend data exchange, we have used increased numbers of synthetically generated instances of the source schema presented in Figure 4, to generate instances of the target schema (also presented in Figure 4). This type of experiment included the complete mapping of source instances to target instances, through an intermediary schema (not presented here), meaning that we had three schemas and two set of mappings. The time needed to have a complete transformation of increased numbers (1 to 4000) of invoice instances of Company X XSD to instances of Company Y XSD is reported in Figure 14.
In a wider context, the work presented in this paper is related to MDE model transformation techniques and languages [7,8] such as ATL Transformation Languages (ATL). 3 Whereas model transformation languages can be applied to the XML data exchange problem addressed in this paper, it is unclear how suitable and easy is to apply such general purpose languages for the specific case of XSD/XML. A thorough analysis of model transformation techniques developed in the MDE community is needed in order to judge their suitability for XML data exchange. Furthermore, a systematic comparison of mode transformation techniques and logical rule-based approaches for data exchange is needed in order to understand their similarities and differences, and have a clear understanding of their advantages and disadvantages for data exchange.
Figure 14. Performance results: End-to-end data mediation These results show that the larger the number of instances the more time is needed for end-to-end processing, with the time being somewhere between linear and exponential. Whereas in some applications this can be acceptable (e.g. processing 4000 instances in about 15 minutes, as our results showed), in some other applications this might not be reasonable.
The FloraMap mapping technique proposed in this paper is promising, and its implementation and experiments showed that run-time mediation is possible and feasible with a logic-based rule approach. However, there are still some directions can be considered to further enhance FloraMap:
2
3
http://www.nesubl.eu/
96
http://www.eclipse.org/atl/
1.
2.
3.
4.
5.
6.
Extensions for handling end-to-end n-m mappings between, where multiple sources and multiple targets can exchange data.
6. REFERENCES [1] Christoph Bussler. B2B Integration. 2003, Springer, ISBN 3540434879.
Inconsistent mappings may lead to errors during the run-time data exchange, therefore design and implementation of a consistency check technique at design time would significantly improve the mapping process. It is expected that the underlying reasoning mechanism provided by Flogic will significantly contribute to the automated detection of inconsistencies between mapping rules, and therefore making logical rule based approaches even more attractive for data exchange.
[2] Ken Smith, Peter Mork, Len Seligman, et al. The Role of Schema Matching in Large Enterprises, CIDR Perspectives 2009. [3] Guizhen Yang, Michael Kifer. FLORA-2: User’s Manual 2008. [4] Suad Alagic, Philip A. Bernstein, Mapping XSD to OO Schemas, Microsoft Research, 2008. [5] R. Xiao, Tharam S. Dillon, E. Chang, Ling Feng. Modeling and Transformation of Object-Oriented Conceptual Models into XML Schema, Database and Expert Systems Applications, 795-804.
Design and implementation of a graphical interface for design-time mapping. In its current implementation, FloraMap does not come with a graphical editor of Flora2 models and mappings. Reuse of open-source tools such as the emerging in the context of the OpenII project4 could be relevant in this context.
[6] Bernstein, P. A. and Melnik, S. Model management 2.0: manipulating richer mappings. In Proceedings of the 2007 ACM SIGMOD international Conference on Management of Data (Beijing, China, June 11 - 14, 2007).
FloraMap has been designed for XML data mapping, however since the approach works at an expressive model level, it should be fairly simple to extend it to handle other types of schemas such as relational schemas. This would enable exchange of data that conform to different schematic representation, e.g. relational, XML schemas, etc.
[7]
[8] Czarnecki, K, and Helsen, S. Classification of Model Transformation Approaches. In: Proceedings of the OOPSLA'03 Workshop on the Generative Techniques in the Context Of Model-Driven Architecture, Anaheim, California, USA.
(Semi-)Automated generation of executable mapping rules. Approaches for automated generation of rules in the area of ontology and MDE model transformation techniques such as [9,10], as well ideas from semantic Web services matchmaking such as [11], can be employed here to provide sophisticated support for a (semi-) automated generation of mapping rules.
[9] Stephan Roser, Bernhard Bauer. Automatic Generation and Evolution of Model Transformations Using Ontology Engineering Space. J. Data Semantics 11: 32-64 (2008). [10] Gerti Kappel, Elisabeth Kapsammer, Horst Kargl, Gerhard Kramler, Thomas Reiter, Werner Retschitzegger, Wieland Schwinger, Manuel Wimmer: Lifting Metamodels to Ontologies: A Step to the Semantic Integration of Modeling Languages. MoDELS 2006: 528-542.
More comprehensive validation. Whereas we provided some initial experimental results for the scalability of FloraMap, other aspects of our approach need to be analyzed in a more systematic way. For example, analyzing the complexity of the specification of mapping rules, compared for example to the complexity of the specification of mapping rules using model transformation techniques would be another potential direction for future work.
[11] Klusch, M. and Kaufer, F. WSMO-MX: A hybrid Semantic Web service matchmaker. Web Intelli. and Agent Sys. 7, 1 (Jan. 2009), 23-42.
ACKNOWLEDGMENTS This work is partly funded by the EU projects “A Semantic Service-oriented Private Adaptation Layer Enabling the Next Generation, Interoperable and Easy-to-Integrate Software Products of European Software SMEs (EMPOWER)” 5 and “Environmental Services Infrastructure with Ontologies (ENVISION)” 6.
4
http://www.openintegration.org/
5
http://empower-project.eu/
6
http://www.envision-project.eu/
Mens, T, and Van Gorp, P. A Taxonomy of Model Transformation, Electronic Notes in Theoretical Computer Science, Volume 152, 27 March 2006, Pages 125-142.
97
Behavioural Interoperability to Support Model-Driven Systems Integration Alek Radjenovic
Richard F Paige
The University of York Department of Computer Science York YO10DD, United Kingdom +44 1904 567836
The University of York Department of Computer Science York YO10DD, United Kingdom +44 1904 343242
[email protected]
[email protected]
ABSTRACT
1. INTRODUCTION
Software system integration is a process in which the target system is synthesised from discrete components (subsystems) whilst ensuring they function together as a system and are able to deliver required functionality. System integration is particularly important in projects in which new technologies must integrate with legacy systems. In such scenarios, this process can be broadly divided in two stages: interoperability checking and composition. Model-based approaches are promising since they allow us to carry out some of this process earlier (thus identifying problems earlier in the development lifecycle when they are easier to rectify). In this paper we describe a generic modelbased platform for system integration, applicable to different modelling languages, that supports both interoperability checking (at different levels of abstraction) and composition; our presentation focuses on the platform’s support for interoperability checking. The approach, which consists of a language and a simulation tool, is presented, and its use is illustrated in a simple example for interoperability checking involving architectural models enriched with behaviour.
Software system integration is a process in which the target system is synthesised from discrete components (subsystems) whilst ensuring they function together as a system and are able to deliver their intended functionality. These software elements are typically developed separately. Indeed, many software-intensive and software-dependent projects, whilst taking advantages of the next generation technologies as well as ‘ready-made’ third party components, are required to reuse existing legacy software. In such scenarios, integration introduces risk because the interoperability between various parts cannot be ascertained before late stages in the development process (i.e. during the system integration phase). Modern software projects often use model-based development (employing various modelling platforms and notations) where models are created prior to the development of executable code. Even when models are not available (e.g., in legacy systems), system architects can use tools to generate component and architectural models automatically from source code. Increasingly, component models are described using heterogeneous modelling languages and tools. Thus, there is a substantive technical problem to be addressed in model integration. We argue that the identification of model integration mechanisms at the software architecture level is highly desirable. In particular, interoperability checking at the model level is key to identifying system integration problems early on.
Categories and Subject Descriptors I.6.4 [Simulation and Modeling]: Model Validation and Analysis
General Terms Design, Verification.
Interoperability checking represents the necessary first step in model integration. The incompatibilities may arise in two different planes - structural (mainly observed at the syntax level) and behavioural (mainly observed at the semantics level). Our framework tries to address both of these issues. Many current approaches focus on one or the other, or are not sufficiently generic to support all modelling languages as they focus on specific standards, such as those of the OMG.
Keywords Model analysis, model integration, model consistency, behaviour modelling, simulation.
Our solution, which we call SMILE, is a framework within which we can (amongst other things) attach semantics, relevant to behaviour, to various structural model elements and perform execution of the specified behavioural model through simulation. Consequently, we are able to identify undesired behaviours of the combined models either through post-simulation analysis of the simulation trace or, actively, by formulating undesired conditions which cause the simulation to halt if they have been detected. SMILE is a platform capable of manipulating models specified in different modelling languages and checking different behavioural paradigms. We achieve this by means of transformation and
98
techniques, however, are defined at the metamodel level (various taxonomies can be found in [7]).
simulation. First, the relevant behavioural information from the input models is extracted to create a SMILE behavioural model comprising behavioural types. And second, these types are instantiated as simulation objects used in the simulation. Thus, SMILE is essentially an interchange platform for exploring behaviours in combined models. Although SMILE is a generic platform, applicable to arbitrary modelling languages, in this paper we illustrate the principles behind it using UML and its State Machine diagrams and use this exemplar to show how it can be used in interoperability checking.
In terms of breadth of usage, three of the more successful model transformation approaches are ATL, ETL and VIATRA. ATL (ATLAS Transformation Language) [13] is a model transformation language and toolkit which provides a means to produce a set of target models from a set of source models. Developed on top of the Eclipse platform, the ATL Integrated Environment (IDE) provides a number of standard development tools (syntax highlighting, debugger, etc.) that aims to ease development of ATL transformations. ATL also includes a library of ATL transformations and has been defined to perform general transformations within the MDA framework. There are currently over 100 defined transformations in the online library. The language itself appears to be somewhat cumbersome which is reflected in the supplied transformation examples. This may partly be due to its substantially declarative nature, because some transformations are not necessarily best expressed in this fashion (e.g., transformations that involve iterations over complex structures). However, ATL's tool support is some of the most robust in the MDE community.
The rest of the text is organised as follows. Section 2 describes the related work. Section 3 provides an overview of our approach. Section 4 first introduces the case study and then presents the compatibility checking results. In closing, section 5 makes conclusions and suggests future directions.
2. RELATED WORK 2.1 Model Compatibility Various organizations and companies (OMG, IBM, Microsoft, etc.) have proposed environments to support Model Driven Engineering (MDE). Among these, the OMG MDA (Model Driven Architecture) [22] is most prominent, and it focuses on the identification of basic MDE principles, its practical characteristics (direct representation, automation, and open standards), original scenarios, and discussions of suitable tools and methods. System functionality is defined as a platformindependent model (PIM) through an appropriate domainspecific language (DSL). Given a platform definition model (PDM) corresponding to a particular software technology (such as, CORBA [25], or .NET [20]), the PIM is then translated to one or more platform-specific models (PSMs) for the actual implementation. One of the key obstacles to model-based interoperability and hence system integration is the incompatibility of models evident mainly at the syntactical level. In order to resolve this issue it was suggested that a unifying meta-model, to which all modelling languages concerned would conform, would be required. OMG's UML profile for Enterprise Application Integration (EAI) is defined as a complete MOFbased [27] metamodel that provides facilities for modelling the integration architecture, focusing on connectivity, composition and behaviour. The EAI UML profile also defines a MOF-based standardised data format intended for use by different systems to exchange data during integration. Data exchange is achieved by defining an EAI application metamodel that handles interfaces and metamodels for programming languages (such as C, C++, PL/I, and COBOL) to aid the automation of transformation. While standardising on MOF is a step in the right direction, in practice there are various problems, such as the lack of widespread support for MOF by various tools, and the differences between versions of XML Metadata Interchange (XMI) [26] support in tools [3]. MOF is currently the only standard that attempts to cut across the different modelling and implementation platforms.
ETL (Epsilon Transformation Language) [14] provides model-to-model transformation capabilities to Epsilon and can be used to transform an arbitrary number of input models into an arbitrary number of output models specified in different modelling languages. ETL, like ATL [13] and QVT [24], has a mixture of both declarative and imperative language characteristics. Declarative transformation languages are generally limited to scenarios where the source and target metamodels are similar to each other in terms of structure and thus, the transformation is a matter of a simple mapping. Imperative languages, in addition, include operations but operate at a low abstraction level. Subsequently, users have to manually address issues such as tracing and resolving target elements from their source counterparts and orchestrating the transformation execution. Hybrid languages provide both a declarative rulebased execution scheme as well as imperative features for handling complex transformation scenarios. ETL is firmly in the hybrid camp, and thus targets both mapping transformations (where the source/target metamodels are similar) as well as more complex transformation scenarios. Like ATL and QVT, ETL reuses a portion of OCL for navigating model elements. Unlike ATL and QVT, ETL includes imperative constructs (such as loops, assignment statements, and sequencing of statements) that makes iterative transformations much easier to express. The VIATRA (VIsual Automated model TRAnsformations) [9] framework is the core of a transformation-based verification and validation environment for improving the quality of systems designed using UML by automatically checking consistency, completeness, and dependability requirements. Its main objective is to provide a general-purpose support for the entire life-cycle of engineering model transformations including the specification, design, execution, validation and maintenance of transformations within and between various modelling languages and domains. VIATRA intends to complement existing model transformation frameworks in providing: a model space for uniform representation of models and meta-models, a transformation
2.2 Model Transformation In an ideal situation, during model transformation the syntax is changed to the target modelling language whilst the semantics is preserved [12]. The overall majority of model transformation
99
Platform Independent Model (PIM) is merged with a Platform Definition Model (PDM) (which contributes platform-specific aspects) to form a Platform Specific Model (PSM). This has been particularly useful for, e.g., performance analysis where different system configurations (corresponding to platform-specific performance data) have been merged with system models, and the result has been used for simulation. When combined with other features of the Epsilon platform, this merging capability can be carried out iteratively, thus allowing batch-generation of arbitrarily large numbers of simulation models and simulation results. EML has also been used successfully for managing versions of models.
language (with both declarative and imperative features; also, based on popular formal mathematical techniques of graph transformation (GT) and abstract state machines (ASM)), a high performance transformation engine (which supports incremental model transformations, but also trigger-driven live transformations where complex model changes may trigger execution of transformations), and with main target application domains in both model-based tool integration framework as well as model analysis transformations. More importantly, VIATRA considers scalability as an important factor and is claiming to be able to handle well over 100,000 model elements.
2.3 Model Composition
openArchitectureWare (oAW) [1] is a modular MDA generator framework implemented in Java, based on the Eclipse platform. oAW can parse arbitrary models, and it has a family of languages to check and transform models as well as generate code from them. oAW has strong support for EMF (Eclipse Modelling Framework) based models but can work with other models, too (e.g. UML2, XML or simple JavaBeans). oAW is based around a workflow engine which allows the definition of generator mechanism. Various pre-built workflow components can be used to read and instantiate models, check for constraint violations, perform transformation into other models, or generate code. openArchitectureWare have also submitted to Eclipse a project proposal called Textual Modeling Framework (TMF). TMF focuses on textual DSLs and Eclipse IDE integration. One of two initial contributions will be Xtext - a framework and generator to provide a specialised Eclipse editor and an EMF meta-model from a simple EBNF-style grammar. Its focus will be on very short turnarounds and it is hoped to provide powerful abstractions for development of textual DSLs.
The process of model composition consists of four distinct phases: comparison, conformance checking, merging and reconciliation (or restructuring) [5,28]. In the comparison phase, the correspondences between equivalent elements of the two models are identified making sure that the elements in question are duplicated in the merged model. In the conformance checking phase, matched elements from the previous phase are examined for conformance with each other in order to identify potential conflicts that would render merging impossible. The majority of proposed approaches (e.g., [18]) address conformance checking of models through their compliance with the same meta-model. In the merging phase, a number of approaches have been proposed, such as graph-based algorithms [19,28] or an interactive process for merging of UML 2.0 models [18]. The limitations of these approaches are related to the fact that they either only address merging models of the same (specific) metamodel, or use an inflexible merging algorithm with no means of extension or customisation. In the reconciliation and restructuring phase, the inconsistencies in the target model are fixed.
2.4 Multi-paradigm Modelling
Next, some of the key approaches to model composition are described.
Multi-paradigm modelling (MPM) combines three orthogonal research fields: multi-formalism modelling (using different languages while modelling a system), model abstraction (relationship between models at different levels of abstraction), and metamodelling (construction of the collection of concepts that highlight the properties of the modelling language) [29]. The advocates of MPM recognise that the design of systems increasingly requires representations in various languages (formalisms) and with different abstractions, where these representations must be “coupled, combined, integrated, and transformed” [33].
The AMW (ATLAS Model Weaver) [2] is a tool for establishing relationships between models. These links are stored in a weaving model which conforms to a weaving metamodel. Weaving models may be used in several application scenarios, such as meta-model comparison, traceability, model matching, model annotation, tool interoperability. AMW provides a base weaving meta-model (enabling creation of links between model elements and associations between those links) which may be extended to add further mapping semantics providing a mechanism for creating variable mapping languages dedicated to specific application requirements.
In [11], the authors explore various multi-paradigm modelling techniques and evaluate them based on two criteria: 1) their level of support for an open set of modelling languages, and 2) their support for formal verification of properties. With respect to the first criterion, they make three key conclusions. Firstly, the platforms under observation (GME [8], Eclipse Modeling Project [10], and AToM3 [17]) allow the automatic generation of tool support for user-defined modelling languages but the limitation is their dependency on the underlying metamodels. Secondly, the composition of modelling languages is highly dependent on the syntax and semantics being expressed in a given format. And thirdly, the task of adding support for an additional modelling language can be very difficult (the order of magnitude corresponds to describing the semantics of a
The EML (Epsilon Merging Language) [14] adds model merging capabilities to the Epsilon platform. More specifically, EML can be used to merge an arbitrary number of input models of potentially diverse metamodels and modelling technologies. The key motivation for EML was to have a mechanism that would enable automatic model merging on a set of established correspondences. This has a number of applications in MDE. For example, EML can be used to unify two complementary, but potentially overlapping, models that describe different views of the same system. As well, EML can be used to merge a core model with an aspect model (section 4.5) (potentially conforming to different meta-models). This is discussed in [21] where a core
100
or one or more of the OMG standards. Consequently, checking of model interoperability, particularly at the behavioural level, is often too dependent on, or skewed towards, Java, Ecore and/or MOF. An approach in which we could reason about model behaviours and model interoperability in a generic fashion, away from the underlying meta-models, is highly desirable.
modelling language).With respect to the second criterion, the authors found that to reason about properties – at a global level and on a set of heterogeneous models – in a formal fashion represents quite a challenge. AToM3 [16] is a tool which received much attention from the research arena. It implements model transformation techniques based on graph rewriting. Here, input models are represented (internally) using graphs while the transformations are specified by graph grammars which spell out the rewriting rules. The authors claim that AToM3 can potentially support a wide range of modelling languages, provided that their abstract syntax is described by a metamodel and that a transformation can be written between the source and target metamodels. This may be particularly difficult for certain types of languages [11]. The key limitation is that the number of transformations that one needs to design increases exponentially with the number of participating languages.
3. THE SMILE-X PLATFORM Our platform for model integration is called SMILE (Simple Model Integration Language with Execution engine). It comprises the techniques, languages and tools. SMILE has effectively three main components two of which deal with different aspects of model compatibility – SMILE-S (for structural checking) and SMILE-X (for behavioural checking) – whilst the third, SMILE-I, deals with integration. We have initially explored the compatibility of models at the structural level. SMILE-S defines an interchange format for describing the structure of heterogeneous models in terms of trees [31], where the tree vertices (nodes) represent structural elements and the edges express the containment relationship between the elements. In addition, the nodes typically contain a collection of properties to further describe characteristics of the structural elements. The concrete syntax is effectively a DSL (domain specific language) that provides a way to represent heterogeneous input models (e.g. UML, Simulink, and AADL as shown in Figure 2) in a uniform fashion. The transformation of input models into SMILE trees is external to the core tool, i.e. the knowledge of the underlying meta-models and parsing is delegated to plug-in components.
Other approaches either lack in mature tool support (e.g. Rosetta [15]) or the supported semantics to describe behaviours is of a limited range (e.g. BIP [4] uses labelled transition systems).
2.5 Model Interoperability The general notion of interoperability between systems is defined in [30] as the ability of one system to communicate with and access the functionality of the other system. The concept of interoperability can also be characterised as a certain degree of compatibility [6] where the levels of compatibility include coexistence, interconnection, interworking, interoperation and interchangeability, while the relevant system features that define the compatibility level comprise communication protocols and interfaces, data access and types, parameter semantics, application functionality and dynamic behaviour.
SMILE-X is a natural extension of SMILE-S, and is the component described in this paper. DSL (structural models)
Furthermore, two or more models are interoperable if they are related to one another in one of the following ways: •
UML
Unified –there is a common meta-level structure across constituent models which provides a way to establish semantic equivalence
•
Federated – models have to be dynamically accommodated rather than have a predetermined metamodel (this assumes that concept mapping is done at the semantic level)
T1
e.g. state machines
Integrated – diverse models are interpreted in the standard format which must be as rich as any of the constituent system models
•
SM1
DSL (behavioural models)
Simulink
SM2
T2
AADL
SM3
T3
SIMULATION
(*) T1, T2, T3 ß transformations
templates applied to structural models
Figure 1.SMILE: conceptual approach In SMILE-X, we ignore the structural incompatibilities of input models, such as the name and type mismatches (identification of which is dealt within SMILE-S as part of the structural compatibility checking) and focus solely on the behavioural properties. A particular behavioural model (such as the state machines) is provided as a template that enables model transformations (i.e. mappings from specific SMILE-S elements to a SMILE-X behavioural model – essentially a set of behaviour types). As shown in Figure 2, SMILE-X descriptions are yet another DSL that enable uniform representation of behavioural models for the SMILE-X simulation engine. In SMILE-X, we
Such view clarifies the difference between full integration and interoperability: integrated systems are interoperable while the interoperable ones are not necessarily integrated.
2.6 Summary A large majority of the existing approaches to model-based system integration lack in one or more of the ‘ingredients’ we discussed in this section. Even those which support the more or less full set of model management techniques are typically tightly integrated with either the Eclipse Modeling Framework (EMF)
101
mechanisms. A fully bidirectional platform capable of transforming the analysed models back to the original format is being implemented in the SMILE-I component.
neither attempt to resolve any inconsistencies or incompatibilities (i.e. we merely identify them and report back to the SMILE-X user) nor do we deal with behaviour preservation. These are dealt with inside the integration (SMILE-I) component of the platform.
The SMILE-X tool depends on a specific SMILE-X language whose concrete syntax is in XML and conforms to a well defined XML Schema [32]. The tool is essentially an execution engine for the SMILE-X models and it also provides a capability to add transformation plug-ins for interchange with other modelling languages.
In this paper, we look at the scenario of homogeneous, but vendor specific models. In particular, we use models specified in different versions of UML using different UML tools. This scenario is commonly present in projects of large organisations where various software components are typically a mixture of legacy code, new code, and third party (supply chain), off-theshelf, components.
SMILE-X architecture is illustrated in Figure 3. Two or more input models are converted to SMILE trees (this functionality is part of the SMILE-S component). A behavioural template is then used to extract the relevant information from the trees and create a behavioural model (e.g. state machines) which essentially consists of a set of types that describe particular behaviours of model components. Next, a configuration is applied by means of a manual intervention and with the help of other artefacts (such as class and object diagrams) in order to instantiate the behavioural types into simulation objects and to create a simulation model. Finally, we define or select a specific schedule before we can perform simulation. Each simulation run provides a trace as an output. We can then analyse the trace in order to identify undesired behaviours in the system. Alternatively, by formulating undesired conditions through the definition of triggers we can cause the simulation to halt as soon as these conditions are detected.
SMILE-X provides a framework where elements of input models can be mapped to (or matched against) the specified behavioural model. This behavioural model is provided in the form of a template (meta-model) which enables us to attach semantics to structural model elements and which describes a particular behavioural paradigm (or, a related family of behaviours) that we are interested in analysing. The chosen template transforms the input models into an integrated SMILEX model which describes system's behaviour in the form of a collection of elements and maps that convey information about interactions within the system. SMILE-X transparently glues elements together either fully automatically or with additional information entered interactively by the user. Thus, SMILE-X facilitates a mechanism through which we can integrate behaviours of input models based on the chosen perspective, and consequently perform simulations on the integrated system. BEHAVIOUR TEMPLATE
MAP
The template used in this paper is that of state machines which takes an approach that is based on a modified discrete event system specification (DEVS) [23]. The fact that there is a significant overlap between behavioural models (such as sequence, communication and state machine diagrams) on one hand, and structural models (e.g. class, object, or component diagrams) on the other, enables us to have an uncomplicated extension to the work on structural compatibility in models. The structural elements are further enhanced with concepts that add semantics, such as: event, time, action and state. Consequently, state machine models in SMILE-X are described in terms of a well-known state transition system with actions and guards. The behaviours specified must be regarded as specifications of the actual behaviours which can be both deterministic as well as non-deterministic. These behaviours are characterised by state variables whose evolution is specified by transitions. The transitions are triggered by events, guarded by conditions, and enriched by actions. However, we reiterate that SMILE-X is not restricted to state machine models; these are used here only as one example.
CONFIGURATION
BEHAVIOURAL MODEL
INSTANTIATE
SIMULATION MODEL
SMILE TREES
TRACE
SIMULATE
SCHEDULE
There are two key parts to the SMILE-X language specification. The first part (the behavioural templates) enables generic descriptions of the input models' behaviours. The second helps in defining the rules (as triggers) which aid in detection of the behavioural incompatibilities. SMILE-X builds on SMILE-S by reusing all structural component declarations and adding semantics in terms of a 'behavioural layer' to the specification. The behavioural templates also enable the user to specify sequential execution behaviour (so, UML sequence, communication and state machine diagrams can all be input to the SMILE-X tool).
TRIGGERS
Figure 2. SMILE-X architecture Fundamentally, SMILE-X is designed to be a model interchange format. The interchange is one way - from the native (source) models (e.g. UML) to SMILE-X models – because we focus only on detecting incompatibilities and not the integration
102
this case study, the infrared beam component is not modelled but it is assumed that the door will close after a fixed period of time after it has been opened.
SMILE-X not only allows for behaviour descriptions to be attached to instances of types (objects) but also to types, in which case all instances of any such types will behave the same and according to the provided specification. The users of SMILE-X can also specify whether a particular behaviour description specified on a type is also to be applied to any types that are descended from that type (the so called 'loose mode') or just that particular type ('strict mode'). In the case of loose mode, if a descendant type has its own specific behaviour description attached to it, then that description overrides the inherited behaviour from the super-type.
•
When a lift has no requests, it remains at its current floor with its door closed.
4.2 Uses cases There are two main use cases. ‘Calling a lift’ use case describes the following scenario: (1) Passenger presses floor button; (2) Lift system detects floor button pressed; (3) Lift moves to the floor; (4) Lift doors open. ‘Travelling in a lift’ use case consists of the following sequence of events: (1) Passenger gets in and presses a lift button; (2) Lift system detects lift button pressed; (3) Lift closes the doors if they are open; (4) Lift travels to the required floor; (5) Lift doors open; (6) Passenger gets out; (7) Lift doors close.
4.3 Class diagram The system class diagram is presented in Figure 5. The Controller class represents the lift system and there is a single instance of this class in the target software system. This class controls directly one or more Lift objects and two or more Floor objects. The Floor object has one or two FloorButton objects as explain in the introduction to the chapter. The floor also has one or more Doors (depending on the number of lifts in the building. Each Lift object has two or more LiftButton objects. The FloorButton and LiftButton classes share common features embodied in their Button superclass.
Figure 3. The state machine behavioural template (UML class diagram) SMILE-X also supports concurrency - i.e. multiple threads of execution or multiple devices. The state machine template adopted is well known, describing a set of modelling artefacts sufficient to describe the behaviour of a model representing a reactive software system. It is depicted in the form a UML class diagram in Figure 4, showing key classes and the relationships between them. The Element class is a SMILE-S structural meta-model class with which the described behaviour is associated.
4.4 Sequence and state machine diagrams The system has two sequence diagrams which correspond to the two use cases explained above. Each class also has a separate state machine diagram (apart from FloorButton and LiftButton which inherit their behaviour from the Button class). Due to space limitations these diagrams are not presented here but some are illustrated in the section on Compatibility Checking (below).
4. EXAMPLE 4.1 Case Study This case study describes a real-time software application which is installed to control M lifts in a building with N floors. The problem concerns the logic required to move lifts between floors according to the following constraints: •
Each lift has a set of N buttons, one for each floor. These illuminate when pressed and cause the lift to visit the corresponding floor. The illumination is cancelled when the lift visits the corresponding floor
•
Each floor has one (top and bottom floors) or two (all other floors - to indicate the intended direction of travel: up or down) floor buttons to request the lift to come to the floor. The button illuminates when pressed. The illumination is cancelled when a lift visits the floor.
•
Figure 4. Lift system (UML class diagram)
4.5 The development process
Upon the arrival of the lift to any floor, the door opens and remains open for a fixed period of time after the infrared beam has last been cut by people or objects moving in and out of the lift. After the expiry of that fixed period of time, the door automatically closes. In
It is assumed that the behaviour of each of the five main classes – Lift, Button, Floor, Door, and Controller – is specified by a different team, using different UML tools which use different UML versions. This is an attempt to replicate a real-
103
the expiry of a door timer instructing an open door to automatically close after a fixed period of time (for example, 5 seconds after the door infrared beam was last cut indicating that no more people or objects are going in our out of the lift).
world software lifecycle where the development is distributed and the tools and platforms are potentially heterogeneous.
4.6 Compatibility checking As explained earlier, compatibility checking of behaviours in models in SMILE-X is performed through simulation, which has three key concerns. First, we have to ensure that we have selected relevant key characteristics and behaviours, and that the source information acquired is valid. The second concern is the use of simplifying approximations and assumptions typically used in the process of abstraction. Finally, we must have high level of confidence in the simulation outcomes in terms of their trustworthiness and validity.
Door is an uncomplicated class (as shown in Figure 5). Its public property Closed indicates whether the door is open or closed. The Controller’s state machine coordinates the operation of the Lift objects and the Floor objects (which in turn control the opening and closing of the doors) by sending appropriate messages (such as MOVE, HALT to a Lift object, and LIFT_ARRIVED to a Floor object). If a MOVE message was sent to the Lift while the door was still open (i.e. before the AUTO_CLOSE event occurred), then this would represent a hazardous scenario.
Our approach is that we deliberately assume that the first two conditions stated above are satisfied. By focusing solely on the third and by identifying incompatibilities, we are able to demonstrate that both the system characteristics used in behaviour descriptions, as well as the approximations used in modelling, are inaccurate and need readjustment. The criterion we used in detecting behavioural incompatibilities was incorrect and/or unpredictable behaviour. From a state machine perspective, this means that: •
all states can be (have been) reached during the simulation run
•
all state combinations are valid (i.e. invalid state combinations do not occur)
•
all events have been used ('fired') at least once
•
all guard conditions are satisfied at least once
•
all actions are performed successfully
•
there aren't subsystems that are disconnected from the rest of the system
•
there aren't deadlocks (multiple objects waiting for a resource simultaneously and thus preventing a state change)
•
relevant properties hold true
Figure 5. Explicit compound state The purpose of these definitions is to ensure that a particular set of components within the system are not concurrently in states which are mutually prohibiting. As an example extracted from the case study, we declare an explicit compound state (Figure 6) to ensure that the lift is not moving at the same time that one of its floor doors is open.
In our tool, many operations are done automatically at first (e.g. model parsing), but some manual assistance is often needed (e.g. mapping of elements to state machines, or definition of element dependencies with respect to model behaviour) where an interactive process is employed. Next, we describe each type of behaviour incompatibility detected, providing an example related to the case study above.
Figure 6. Door state machine The definition of the compound state DoorsLeftOpen (Figure 6) enables the detection of such scenarios. An addition of a simple guard condition in the Controller state machine would, for example, rectify this design fault.
4.7 Invalid state combinations We define the compound state as the union of the current state of one element and the current states of all its sub-elements within the structural component hierarchy. This is not to be confused with the UML definition of a composite state which is different and more complex. We also define the concept of an explicit compound state as a union of current states of an arbitrary (user-defined) set of elements.
4.8 Unused events Another type of analysis which reveals behavioural incompatibilities is the search for unused events. This is achieved by analysing the trace obtained from a simulation run, either by manually inspecting it or by applying a search filter.
A simplified version of the Door state machine (without guard conditions or actions) is illustrated in Figure 7. The AUTO_CLOSE event is an internal timed event generated upon
For example, two state machines are shown in Figure 8 and Figure 9 (Floor and Controller). When a passenger wishes to call
104
error in design. Whatever the case, the occurrence of such circumstances requires design modification. We have just demonstrated how an event (LIFT_REQUEST) may never be used. The Controller state machine in Figure 9 was purposely simplified. In particular, the Active state can be made more specific by introducing substates (e.g. NoRequests and RequestsPending, as in Figure 10). Assuming a successful power up, Controller is in Active state and NoRequests substate. In this scenario, a LIFT_REQUEST event is processed in the context of the current substate. Assuming that any guards are satisfied and any declared actions performed, a QUEUE event is generated (internally to the Controller subsystem) and the substate changes to RequestsPending. However, because a LIFT_REQUEST event never arrives, as explained in the previous section, the RequestPending substate can never be reached. After a simulation run under these circumstances, the SMILE-X tool can also report on all unreachable states.
a lift to the floor they are on, they press the floor button. This is an external user event, which can be specified in SMILE-X language and subsequently included in the simulation. This event is received by the Button object and converted into a BUTTON_PRESS event sent to the Floor object (Figure 8). The event is processed if Floor is in the Vacant state. Ignoring guard conditions and actions (not shown in the diagram) as they are irrelevant to this scenario, the Floor object eventually moves to the Waiting state and sends the SERVICE_REQUEST to the Controller object.
4.10 Disconnected subsystems In the absence of sequence (or communication) diagrams, the state automatons are initially disconnected and there is no communication between various subsystems. SMILE-X provides a means of mapping events generated by an element to the intended target element. For example, the Floor objects generate DOOR_OPEN, DOOR_CLOSE and SERVICE_REQUEST events for its environment (Figure 8). The first two are intended for the Door objects, while the third is for the Controller object. Figure 11 is a screenshot from the SMILE-X tool showing how this mapping is done in a straightforward fashion.
Figure 7. The Floor state machine
Figure 8. The Controller state machine This request, however, never gets serviced. Quick scrutiny of the Controller state machine reveals that a LIFT_REQUEST event, rather than SERVICE_REQUEST, is expected from the Floor object. Hence, the LIFT_REQUEST event is never used and as a consequence the lift never arrives at the floor which issued the request. The ability to detect and report unused events is not a direct language capability of SMILE-X, but that of the SMILE-X simulation tool. Nevertheless, it is the SMILE-X language metamodel which indirectly enables the tool to expose unused events.
Figure 9. Controller state machine with substates
4.9 Unreachable states
Figure 10. Event mapping in SMILE-X
The tool can also detect unreachable states which may sometimes prove to be simply superfluous or otherwise signify an
105
We have identified deficiencies and fragmentation in the current approaches and have provided an initial framework that checks for incompatibilities both at the structural as well as the behavioural level. Here, we have focused on the latter and described how we have identified some of the more important incompatibilities in behavioural descriptions of models. We directed our attention to a specific system type - the reactive system - which represents a large group of complex, large-scale software systems today. Behaviour can be described in various ways. UML defines several different kinds of behaviour diagrams, two of which are most commonly used: the sequence diagrams and the state machine diagrams. In this paper, we have focused on exactly these types of behavioural descriptions.
4.11 Out of sequence messages Sequence diagrams show how objects (or, processes) operate with one another and in what order. In SMILE-X, sequence diagrams are primarily used in order to extract the dependency information between various objects in the system in order to create a 'source object event target object' map that identifies which elements generate which events as well as which elements are recipients of these events. Once extracted from the input models, the sequence (order) of messages (events) exchanged between the elements is recorded internally in the SMILE-X notation. Manual adjustments of timing parameters (service times) of the elements' actions, as well as the priority of events, is part of the normal analysis and the refinement process that occurs between simulation runs. In particular scenarios (using a particular set of simulation parameters), the sequence of messages exchanged may become different to the intended one. SMILE-X can detect situations like this. Moreover, SMILE-X notation allows for the event sequences to be specified manually, by the tool user. It is not necessary that all events are described just the key ones. This enables the designers to ensure that particular events appear in order. For example, we may want to make certain that the DOOR_OPEN event always occurs after the LIFT_ARRIVED event.
We have extended our existing SMILE platform to include SMILE-X - the behavioural component. SMILE-X builds on the previous structural component (SMILE-S) by providing a mechanism to add semantics to the existing structural elements. SMILE-X comprises a language and a tool. The language champions interchange between differing model descriptions of system behaviour using the standard XML format. SMILE-X models are not compiled but simulated. Our tool provides a way of loading the input models, refining and extending behavioural descriptions in SMILE-X format, specifying various simulation parameters, and it also includes an execution engine which enables user to run simulations.
4.12 Deadlocks Deadlock is a condition when two or more software objects (processes) are waiting for each other to release a resource, or are waiting for resources in a circular fashion. Typically, deadlocks are a widespread problem in multiprocessing where multiple processes share a specific type of mutually exclusive resource known as a lock or soft lock. Deadlocks can be identified in state machines by an object which cannot leave a particular state even though it is receiving events that should cause a transition. This typically happens if a guard condition does not hold true either every time that event arrives or for a long successive number of event arrivals. SMILE-X can detect and report on these situations.
We have managed to identify seven different generic behavioural incompatibilities related to the state machine descriptions as well as sequential communication between components of the system. We have used a case study which models system components in different, vendor-specific, versions of UML. In this proof-of-concept study, we transformed the input models into the SMILE-X interchange format, as well manipulated, refined and glued these models together in order to perform meaningful simulations. There are a number of interesting directions in which to go next. We feel that one of them would be to - through the observation of particular behavioural attributes such as events or state - identify general purpose patterns to automatically detect some of the behavioural incompatibilities. Another opportunity is to see if causal paths can be uncovered. These would ideally (in the state machine scenario) include paths that lead to same states and events, as well as the ability to show alternative paths between states (if such paths exist).
4.13 Properties that do not hold Often, we would like to reason about invariants in the context of state transitions in the form of a guard condition which holds true throughout the entire transition. The evaluations of such guards are performed at the following points in time: (i) at the arrival of the event; (ii) after each transition action is executed; (iii) just before the change of state.
6. ACKNOWLEDGMENTS
SMILE-X can monitor these situations, and detect and report if those kinds of guards fail to hold.
This work was undertaken at SSEI (Software Systems Engineering Initiative), an MOD (Ministry of Defence) funded strategic initiative intended to enhance through life capability management for software intensive defence systems.
5. CONCLUSIONS AND FUTURE WORK We believe that model integration at the architectural level is of great importance. It provides a way of detecting and resolving issues that would not otherwise become apparent until very late in the project - during the system integration phase. This approach may not only reduce the typical risks associated with integration of systems whose components are developed in a distributed fashion, but can also substantially reduce the development costs.
7. REFERENCES 1. openArchitectureWare. 2008. http://www.openarchitectureware.org/. 2. ATLAS Group. Atlas Model Weaver (AMW). INRIA, Eclipse, 2008. http://www.eclipse.org/gmt/amw/.
106
3. Balasubramanian, K., Schmidt, D.C., Molnár, Z., and Lédeczi, Á. System Integration using Model-Driven Engineering. In P.F. Tiako, Designing Software-Intensive Systems: Methods and Principles. IGI Global, 2009, 474-504.
18. Letkeman, K. Comparing and merging UML models. IBM Rational Software Architect, IBM Developerworks, (2005). 19. Melnik, S., Rahm, E., and Bernstein, P. Rondo: A Programming Platform for Generic Model Management. SIGMOD, ACM, (2003).
4. Basu, A., Bozga, M., and Sifakis, J. Modeling Heterogeneous Real-time Components in BIP. Fourth IEEE International Conference on Software Engineering and Formal Methods (SEFM'06), (2006), 3-12.
20. Microsoft. .NET Framework 3.5. 2008. http://msdn.microsoft.com/en-us/library/w0x726c2.aspx. 21. Miller, J. and Mukerji, J. Model Driven Architecture (MDA). 2001.
5. Batini, C., Lenzerini, M., and Navathe, S. A Comparative Analysis of Methodologies for Database Schema Integration. ACM Computing Surveys 18, 4 (1986), 323-364.
22. OMG. Model Driven Architecture ( MDA ). Architecture, 2001, 1-31.
6. Chen, D. and Doumeingts, G. European initiatives to develop interoperability of enterprise applications—basic concepts, framework and roadmap. Annual Reviews in Control 27, 2 (2003), 153-162.
23. OMG. Interactive Objects and Project Technology, MOF Query/Views/Transformations. 2003. 24. OMG. MOF QVT Final Adopted Specification. 2005.
7. Czarnecki, K. and Helsen, S. Feature-based survey of model transformation approaches. IBM Systems Journal 45, 3 (2006), 621-645.
25. OMG. Common Object Request Broker Architecture. Management, 2006.
8. Davis, J. GME : The Generic Modeling Environment. OOPSLA '03: Companion of the 18th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, ACM (2003), 82-83.
26. OMG. MOF 2.0/XMI Mapping, Version 2.1.1. 2006. 27. OMG. Meta Object Facility (MOF) Core Specification (version 2.0). Management, 2006.
9. Eclipse. VIATRA. 2008. http://dev.eclipse.org/viewcvs/indextech.cgi/gmthome/subprojects/VIATRA2/index.html.
28. Pottinger, R. and Bernstein, P. Merging Models Based on Given Correspondences (Technical report). 2003. 29. Vangheluwe, H. and Lara, J.D. An Introduction to MultiParadigm Modelling and Simulation. AI, Simulation and Planning in High Autonomy Systems (AIS 2002), Society for Modeling and Simulation International (SCS) (2002), 163-169.
10. Eclipse. Eclipse Modeling Project. 2010. http://www.eclipse.org/modeling/. 11. Hardebolle, C. and Boulanger, F. Exploring Multi-Paradigm Modeling Techniques. Simulation 85, 11-12 (2009), 688-708.
30. Vernadat, F. Enterprise Modeling and Integration: Principles and Applications. Springer, 1996.
12. Harel, D. and Rumpe, B. Meaningful Modeling: What’s the Semantics of "Semantics"? Computer 37, 10 (2004), 64-72.
31. Wilson, R.J. Introduction to Graph Theory. Longman, 1985.
13. Jouault, F., Piers, W., and Wagelaar, D. ATLAS Transformation Language. Eclipse, 2008. http://www.eclipse.org/m2m/atl/.
32. World Wide Web Consortium (W3C). XML Schema 1.1. 2006. http://www.w3.org/XML/Schema.
14. Kolovos, D., Paige, R., Rose, L., and Polack, F. The Epsilon Book. Structure, 2010, 178.
33. de Lara, J., Levendovszky, T., and Mosterman, P.J. Guest Editorial: Special Issue on Multi-paradigm Modeling. Simulation 85, 11-12 (2009), 685-687.
15. Kong, C. and Alexander, P. The Rosetta meta-model framework. 10th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2003. Proceedings., (2003), 133-140. 16. Lara, J.D. and Vangheluwe, H. AToM 3 : A Tool for Multiformalism and Meta-modelling. Fundamental Approaches to Software Engineering, Springerlink (2002), 174-188. 17. Lara, J.D., Vangheluwe, H., Posse, E., Indrani, A.V., Provost, M., and WeiBin Liang. AToM3. 2010. http://atom3.cs.mcgill.ca/index_html.
107