Formal Foundations of Reuse and Domain Engineering: 11th International Conference on Software Reuse, ICSR 2009, Falls Church, VA, USA, September ... Programming and Software Engineering)

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris ...

Author: Stephen H. Edwards | Gregory Kulczycki

52 downloads 788 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

5791

Stephen H. Edwards Gregory Kulczycki (Eds.)

Formal Foundations of Reuse and Domain Engineering 11th International Conference on Software Reuse, ICSR 2009 Falls Church, VA, USA, September 27-30, 2009 Proceedings

13

Volume Editors Stephen H. Edwards Department of Computer Science Virginia Tech Blacksburg, VA, USA E-mail: [email protected] Gregory Kulczycki Department of Computer Science Virginia Tech, Northern Virginia Center Falls Church, VA, USA E-mail: [email protected]

Library of Congress Control Number: 2009934448 CR Subject Classification (1998): D.2.13, D.2, D.3, D.1, D.3.3 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13

0302-9743 3-642-04210-4 Springer Berlin Heidelberg New York 978-3-642-04210-2 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2009 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12752911 06/3180 543210

Preface

ICSR is the premier international conference in the ﬁeld of software reuse. The main goal of ICSR is to present the advances and improvements within the software reuse domain, as well as to promote interaction between researchers and practitioners. The 11th International Conference on Software Reuse (ICSR 2009) was held during September 27–30, 2009 in Falls Church, VA, USA. 2009 was the year that ICSR went back to its roots. The theme was “Formal Foundations of Reuse and Domain Engineering.” We explored the theory and formal foundations that underlie current reuse and domain engineering practice and looked at current advancements to get an idea of where the ﬁeld of reuse was headed. Many of the papers in these proceedings directly reﬂect that theme. The following workshops were held in conjunction with ICSR 2009: – – – –

Second Workshop on Knowledge Reuse (KREUSE 2009) RESOLVE 2009: Software Veriﬁcation – the Cornerstone of Reuse First International Workshop on Software Ecosystems International Workshop on Software Reuse and Safety (RESAFE 2009)

Aside from these workshops and the papers found here, the conference also included ﬁve tutorials, eight tool demos, and a doctoral symposium. Links to all of this information and more can be found at the ICSR 11 conference website at icsr11.isase.org. September 2009

Stephen Edwards Gregory Kulczycki

VI

Preface

Welcome Message from the General Chairs Welcome to ICSR11. This conference began as an International Workshop on Software Reuse in 1991, and became a full conference in 1994. ICSR is currently managed by the ISASE International Society for the Advancement of Software Education. The ISASE website—www.isase.us—contains information about past ICSR proceedings and is a good place to check for future ICSRs. September 2009

John Favaro Bill Frakes

Organization

General Chairs Program Chairs Demos Chair Tutorials Chair Workshops Chair Doctoral Symposium Chair Corporate Donations Chairs

John Favaro (INTECS, Italy) and Bill Frakes (Virginia Tech CS, USA) Stephen Edwards (Virginia Tech CS, USA) and Greg Kulczycki (Virginia Tech CS, USA) Cl´audia Werner (Federal University of Rio de Janeiro Brazil) Stan Jarzabek (National University of Singapore) Murali Sitaraman (Clemson University CS, USA) Jason Hallstrom (Clemson University CS, USA) Asia – Kyo Kang (Pohang University, Korea) South America – Cl´ audia Werner (Federal University of Rio de Janeiro, Brazil) Europe – Jaejoon Lee (University of Lancaster, UK) and Davide Falessi (University of Rome Tor Vergata, Italy) US – Okan Yilmaz (NeuStar Inc., USA)

Program Committee Colin Atkinson Don Batory Ted Biggerstaﬀ Martin Blom Cornelia Boldyreﬀ Steve Edwards Davide Falessi John Favaro Bill Frakes Jason Hallstrom Stan Jarzabek Kyo Kang Greg Kulczycki Jaejoon Lee Chuck Lillie Wayne Lim Juan Llorens Brian Malloy Hong Mei

University of Mannheim, Germany University of Texas Austin, USA Software Generators, USA Karlstad University, Sweden University of Lincoln, UK Virginia Tech, USA University of Rome Tor Vergata, Italy INTECS, Italy Virginia Tech, USA Clemson University, USA National University of Singapore, Singapore Pohang University, Korea Virginia Tech, USA University of Lancaster, UK ISASA, USA Infosys, USA University of Madrid Carlos III, Spain Clemson University, USA Peking University, China

VIII

Organization

Maurizio Morisio Dirk Muthig Jeﬀerey Poulin Ruben Prieto-Diaz Michael (Eonsuk) Shin Alberto Sillitti Murali Sitaraman Bruce Weide Cl´audia Werner Okan Yilmaz

Politecnico di Torino, Italy Fraunhofer IESE, Germany Lockheed, USA University of Madrid Carlos III, Spain Texas Tech University, USA University of Bolzano, Italy Clemson University, USA Ohio State University, USA Federal University of Rio de Janeiro, Brazil NeuStar Inc., USA

Sponsoring Institutions Virginia Tech, Falls Church, VA, USA International Society for the Advancement of Software Education, USA National Science Foundation, USA Software Generators, USA Reuse Software Engineering, Brazil Pohang University of Science and Technology, Korea

Table of Contents

Component Reuse and Veriﬁcation Consistency Checking for Component Reuse in Open Systems . . . . . . . . . Peter Henderson and Matthew J. Henderson

1

Generating Veriﬁed Java Components through RESOLVE . . . . . . . . . . . . . Hampton Smith, Heather Harton, David Frazier, Raghuveer Mohan, and Murali Sitaraman

11

Increasing Reuse in Component Models through Genericity . . . . . . . . . . . . Julien Bigot and Christian P´erez

21

Verifying Component-Based Software: Deep Mathematics or Simple Bookkeeping? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jason Kirschenbaum, Bruce Adcock, Derek Bronish, Hampton Smith, Heather Harton, Murali Sitaraman, and Bruce W. Weide

31

Feature Modeling Extending FeatuRSEB with Concepts from Systems Engineering . . . . . . . John Favaro and Silvia Mazzini

41

Features Need Stories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sidney C. Bailin

51

An Optimization Strategy to Feature Models’ Veriﬁcation by Eliminating Veriﬁcation-Irrelevant Features and Constraints . . . . . . . . . . . Hua Yan, Wei Zhang, Haiyan Zhao, and Hong Mei

65

Reusable Model-Based Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Erika Mir Olimpiew and Hassan Gomaa

76

Generators and Model-Driven Development A Case Study of Using Domain Engineering for the Conﬂation Algorithms Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Okan Yilmaz and William B. Frakes

86

Model Transformation Using Graph Transactions . . . . . . . . . . . . . . . . . . . . Leila Ribeiro, Luciana Foss, Bruno da Silva, and Daltro Nunes

95

Refactoring Feature Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Kuhlemann, Don Batory, and Sven Apel

106

X

Table of Contents

Variability in Automation System Models . . . . . . . . . . . . . . . . . . . . . . . . . . . Gerd Dauenhauer, Thomas Aschauer, and Wolfgang Pree

116

Industry Experience A Case Study of Variation Mechanism in an Industrial Product Line . . . Pengfei Ye, Xin Peng, Yinxing Xue, and Stan Jarzabek Experience Report on Using a Domain Model-Based Extractive Approach to Software Product Line Asset Development . . . . . . . . . . . . . . . Hyesun Lee, Hyunsik Choi, Kyo C. Kang, Dohyung Kim, and Zino Lee Reuse with Software Components - A Survey of Industrial State of Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rikard Land, Daniel Sundmark, Frank L¨ uders, Iva Krasteva, and Adnan Causevic

126

137

150

Product Lines Evaluating the Reusability of Product-Line Software Fault Tree Analysis Assets for a Safety-Critical System . . . . . . . . . . . . . . . . . . . . . . . . . Josh Dehlinger and Robyn R. Lutz

160

Feature-Driven and Incremental Variability Generalization in Software Product Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Liwei Shen, Xin Peng, and Wenyun Zhao

170

Identifying Issues and Concerns in Software Reuse in Software Product Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Meena Jha and Liam O’Brien

181

Reuse of Architectural Knowledge in SPL Development . . . . . . . . . . . . . . . Pedro O. Rossel, Daniel Perovich, and Mar´ıa Cecilia Bastarrica

191

Reuse and Patterns Introducing Motivations in Design Pattern Representation . . . . . . . . . . . . Luca Sabatucci, Massimo Cossentino, and Angelo Susi

201

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Hummel and Colin Atkinson

211

Reusing Patterns through Design Reﬁnement . . . . . . . . . . . . . . . . . . . . . . . . Jason O. Hallstrom and Neelam Soundarajan

225

Table of Contents

XI

Service-Oriented Environments Building Service-Oriented User Agents Using a Software Product Line Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ingrid Nunes, Carlos J.P. de Lucena, Donald Cowan, and Paulo Alencar

236

DAREonline: A Web-Based Domain Engineering Tool . . . . . . . . . . . . . . . . Raimundo F. Dos Santos and William B. Frakes

246

Extending a Software Component Repository to Provide Services . . . . . . Anderson Marinho, Leonardo Murta, and Cl´ audia Werner

258

A Negotiation Framework for Service-Oriented Product Line Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jaejoon Lee, Gerald Kotonya, and Daniel Robinson

269

Ranking and Selecting Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alberto Sillitti and Giancarlo Succi

278

A Reusable Model for Data-Centric Web Services . . . . . . . . . . . . . . . . . . . . Iman Saleh, Gregory Kulczycki, and M. Brian Blake

288

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

299

Consistency Checking for Component Reuse in Open Systems Peter Henderson1 and Matthew J. Henderson2 1

Electronics and Computer Science, University of Southampton, Southampton, UK [email protected] 2 Mathematics and Computer Science, Berea College, Berea, KY 40404, USA matthew [email protected]

Abstract. Large scale Open Systems are built from reusable components in such a way that enhanced system functionality can be deployed, quickly and effectively, simply by plugging in a few new or revised components. At the architectural level, when new variations of a system are being planned by (re)configuring reusable components, the architecture description can itself become very large and complex. Consequently, the opportunities for inconsistency abound. This paper describes a method of architecture description that allows a significant amount of consistency checking to be done throughout the process of developing a system architecture description. An architectural design tool is described that supports consistency checking. This tool is designed to support component reuse, incremental development and collaborative working, essential for developing the architecture description of large systems.

1 Introduction Systems Architecture is that branch of Information System design that determines the overall structure and behaviour of a system to be built. Typically, an architecture is captured as an evolving set of diagrams and specifications, put together by a team of System Architects and iteratively refined over a period of consultation with the customers for the solution. The extent to which the architecture represents a buildable or procurable solution depends a great deal on the consistency and completeness of the architecture description and the extent to which it can be validated prior to commitment to procure. Validation of the architecture as early as possible in the process of development is important. This aspect of System Engineering is not well supported by tools. In this paper we advocate an approach to architecture description that lends itself to validation throughout architecture development. Open Systems have modular, or component-based, architectures based on a catalogue of reusable components with publicly maintained interfaces. Devising a new configuration involves selecting reusable components from the catalogue, devising new ones or variations of existing ones, and plugging them together according to the architecture description. Opportunities for inconsistent reuse of existing components are particular pitfalls which need to be avoided eventually, but which need, for pragmatic S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 1–10, 2009. c Springer-Verlag Berlin Heidelberg 2009

2

P. Henderson and M.J. Henderson

reasons, to be tolerated during design and development of a complex architecture description. The key idea is that the architects define a metamodel that enumerates the types of entities they are going to use in their description, along with the relationships between these entities. For example, as in this paper, they might choose to describe their architecture in terms of components and interfaces. As part of the metamodel, the architects will also specify constraints which a valid description must satisfy. Validation of the architecture description comprises checking the extent to which these constraints are satisfied. We have developed a tool, WAVE, to support this approach [1].

2 Background In Systems Engineering, in particular for software intensive systems, the design of a solution is normally developed around a system architecture.The description of this architecture is a shared model around which the architects work to understand the customer requirements and how these can be mapped on to programs and databases to realise a solution. The field of architecture description is now relatively mature [2,3,4,5,6,7,8]. Specific approaches to architecture description have, in many respects, found their way into standard representations such as UML [9] and SysML [10], so it is now quite common to find these notations in use in industrial projects. There are other approaches to architecture description [11,12,13,14,15,16,17,18,19,20,21,22] but these are generally ideas that are readily adopted as specialisations of the standard notations. Indeed, the Model Driven Architecture approach to system development [23] effectively assumes that a notation needs to be open to extension and specialisation. The need for architects to share a model, and for this model to evolve, immediately introduces the realisation that for most of its life the architecture description will be incomplete and probably inconsistent. Looking for inconsistencies in the architecture description is the principal means of validating it early in its life [24,25,26,12,27]. The research reported here builds on those ideas. In particular, like others, we take an approach to architecture description based on relational models [11,24]. We capture the details of an architectural description in UML or SysML, but support this with a precise specialised metamodel that has been expressed relationally. This means that we can capture the consistency constraints very precisely in relational algebra [28,29] and formally validate the metamodel as we develop the architecture description. The consequences for the research reported here are that we have a method of capturing an architectural metamodel, of capturing an architecture according to this metamodel in UML and a means of presenting that architecture for consistency checking throughout its life. We claim that this method (and our tool) supports component reuse, incremental development and collaborative working, for which we will give evidence in a later section. In order to describe the method, we begin with an example of architecture description based on UML Component diagrams.

Consistency Checking for Component Reuse in Open Systems

3

IA P

IA IB

IB

A

B IC

IC

Fig. 1. A consistent structure

3 Components and Interfaces Large scale Open Systems benefit from having their architecture described using Component diagrams in UML. These diagrams use Components to denote units of functionality and they use Interfaces to show how these units are plugged together to form assemblies with greater functionality. Components often denote quite substantial units of functionality (such as web servers, database servers etc.). Moreover, in large Open Systems there will often be dozens or hundreds of these replaceable components of various sizes and in many versions. Figure 1 shows a (simplified) Component diagram in which we see that Component P has nested within it two further components A and B. Strictly speaking, because it shows nesting, this is actually an example of a UML 2.0 Composite Structure diagram [9], where that diagram has been specialised to show nesting of Components. The interfaces in Figure 1 are shown by the ball-and-socket notation. For example Component A shows that it requires an Interface IB by its use of a socket with that label. Fortunately, also nested within Component P is a Component B which supplies such an interface, shown by the ball with that label. Normally, in a UML 2.0 diagram, this association between requires and supplies would be shown by an arrow from one to the other. Rather than do that here, since our diagrams are so simple, we have relied upon the reader’s ability to associate the two Components through their reference to an Interface by name. We see that all the components in Figure 1 supply or require interfaces. Normally, a component will both require and supply many interfaces, not just one of each as shown in this simple example. We say that the example in Figure 1 is consistent because all of the interface requirements are satisfied. The fact that Component B requires Interface IC is satisfied by this being brought to the outside and shown as a required interface on the parent P. Similarly, that Component P supplies Interface IA is satisfied by the fact that it contains a nested Component A which is the source of this interface. In contrast Figure 2 shows an inconsistent Component diagram. This time Component Q contains Components A and D which leads to some mismatches. Most obviously, we see that Component A, in this context, does not have its requirement for Interface IB satisfied, because there is no sibling that supplies that interface, nor has it been brought to the outside and made a required interface of the parent Q. We refer to this missing connection as dangling requires. We say that, within Component Q there is a danglingrequirement for Interface IB.

4

P. Henderson and M.J. Henderson IX

IA

Q

IA ID

IB

A

D IC

IC

Fig. 2. An inconsistent structure

Moreover Figure 2 shows an example of what we will call a dangling supplies. This is because Component Q supplies Interface IX but that is not one of the available interfaces supplied by one of its nested members. Again, note that this is a consistency constraint which is specialised from the metamodel that we are adopting and this will be explained later. Further, while we say that Component Q has a dangling-supplies of Interface IX, we do not consider the unused Interface ID of Component D to be a problem (again, a decision of the specialised metamodel). So far, what we have presented is an example of the type of Architecture Description that we advocate. It is based on a metamodel that we will introduce in a little while. The metamodel determines what types of entities will be described (here Components and Interfaces) and the consistency constraints that they must satisfy. In general, the metamodel will be defined by the Systems Architects for the specific system being designed and may use quite different entities and/or consistency constraints. We will discuss other metamodels later but first we will show how this one is formalised.

4 Specialised Metamodels Consider the way in which components and interfaces are conventionally described in a design notation such as UML or in a programming language such as Java or Python. A component will normally supply a number of interfaces and also make use of a number of interfaces supplied by other components. Figure 3 shows the entities and relations introduced by this description. It is the beginning of the metamodel against which we will check our system descriptions. The rest of the metamodel comprises the consistency constraints among these entities. Of course, this simple metamodel also needs to be extended with additional entities to be sufficiently useful in practice and these entities in turn will require further constraints. We will formalise the constraints that a correct design must obey in terms of the basic relations shown on the metamodel diagram. We denote the (natural) join operator of two relations by a dot (as for example in Alloy [30]). It forms the relational composition of its operands. Thus, for example contains.requires

denotes a binary relation formed from the composition (join) of two existing binary relations. That is, contains.requires denotes the relationship between Components and

Consistency Checking for Component Reuse in Open Systems

5

contains supplies Component

Interface requires

Fig. 3. A metamodel

Interfaces that we would describe as “Component c contains an unnamed Component that requires Interface i” . It is worth noting at this point that this focus on the whole-relation, leads to a holistic approach to the analysis of Systems Architectures, which is something we will return to in the section on Pragmatics. It is almost always the case that our relations are manyto-many. The relational algebraic approach affords a way of “reading-off” the diagram the derived relations that will be constructed as consistency rules in our metamodel. The way in which one uses relational algebra, as we will illustrate in the next section, is to construct predicates and challenge the consistency checker to construct the set of entities that fail to pass the test.

5 Consistency and Completeness The Architecture Description technique that we advocate assumes that the System Engineer will specify a metamodel and record the design against that metamodel. The metamodel will comprise entities, relationships and constraints. This section describes two such metamodels, shown respectively in Figure 3 and Figure 4. We develop consistency constraints that, according to the System Engineer who designed these metamodels, are among those that need to be satisfied if the Architecture being described is to be consistent. We also address a notion of completeness. We will assume that during development of the Architecture Description, interim models will not be consistent. The consequence for us here is that the constraints will be specified as sets of inconsistencies. The designer’s eventual objective is to achieve a design in which these sets of inconsistencies are eliminated. This approach to design supports both incremental and cooperative working. 5.1 Dangling Requires As a first example of a consistency rule, let us define the example we discussed in an earlier section, dangling-requires. Using the relations and entities illustrated in Figure 3 we can construct the relation dr as follows dr = contains.requires - contains.supplies - requires

Here, in addition to using relation join (composition) denoted by dot, we have used set difference, denoted by minus. This expression defines a relation dr which relates Components to Interfaces. The relation contains.requires contains all pairs (c,i) with the property that c contains a Component that requires i. Similarly contains.supplies contains all pairs (c,i) with the property that c contains a Component that supplies i.

6

P. Henderson and M.J. Henderson

contains supplies Component

Interface requires

derivedFrom implements tests Requirement

TestCase

Fig. 4. An extended metamodel

Thus the difference of these two relations contains all pairs where c’s requirement for i is not satisfied internally. Finally, by then constructing the difference between this set and the relation requires, we have that dr is the relation between Components c and Interfaces i, where c contains a nested Component that requires i but where that Interface is neither supplied internally nor required by the parent. This is exactly what we meant by dangling-requires. Constructing the relation dr has two benefits. First we have accepted that during development a design will be inconsistent and have decided to derive at any point in time a set of inconsistencies that the designer will eventually wish to remove. Second, by constructing a relation, we have taken a holistic approach, addressing the whole architecture description with our analysis rather than just looking at local inconsistencies. 5.2 Dangling Supplies We described informally, earlier, what we mean by dangling supplies. Formally, in terms of our metamodel we can specify this as follows ds = dom(contains)<:supplies - contains.supplies

The operator <: is domain-restrict. The first term in the definition of ds is just the relation supplies restricted to the domain of contains, which is just the relationship between composite Components and the Interfaces they supply. By constructing the difference between this relation and contains.supplies we get ds which relates composite Components to Interfaces that they supply but which are not supplied by any of their children. Precisely what we meant by dangling-supplies when we introduced it informally, earlier. 5.3 Replacements As a final example of a consistency constraint imposed by a metamodel on an Architecture, consider the situation when our Architecture is for an Open System, where we have potentially alternative suppliers of interchangeable Components. A system is Open if its interfaces are fully defined and available for exploitation, in that alternative suppliers

Consistency Checking for Component Reuse in Open Systems

7

can produce replacement or enhanced Components that plug into slots vacated by other components. How can we determine which Components are potential replacements for others? Consider canReplace = { (c1, c2) | supplies[c2] <= supplies[c1] and requires[c1] <= requires[c2]}

As before, this is a binary relation (<= denotes subset and [] denotes relational image). It is the relationship between Components with the property that if (c1, c2) is in canReplace then c1 can replace c2, wherever it might occur, simply because it supplies all the Interfaces that c2 must supply and requires only Intefaces supplied in the location that c1 would occupy. The way that this computed relation is used in practice is when (as in Figure 2) there is a mismatch, we can use canReplace to determine possible candidates to be used in place of D. This means that we have, in this metamodel, taken a particular view of what we mean by an Interface. An entity which represents an Interface by name, effectively encodes in that name both the syntax and semantics of the Interface. This is not unusual in practice but does leave undeveloped here how unequal, but related Interfaces, are to be handled in our metamodel. This is beyond the scope of this paper. 5.4 Completeness In addition to rules for checking consistency of an architecture as it is developing, there will be many rules that specify completeness of the description. An example of such a rule for the metamodels used in this paper might be that every Component should have at least one Interface that it either requires or supplies. When constructing a constraint for completeness we will work in the same way as we have for consistency and report incompletenesses during development. For example, we might report the set of Components for which there are, as yet, no Interfaces either required or supplied. In other systems for which we have developed metamodels, the kind of completeness rules we have developed include constraints such as every Component/Requirement pair should have at least one TestCase (see, for example Figure 4) or the constraint that every entity should have at least one Documentation fragment attached to it. Reports generated of architectures in development would then include sections listing incompletenesses alongside those listing inconsistencies. The Architect’s objective would be, eventually, to eliminate these sections.

6 Pragmatic Issues The example we have developed in the paper is rather simple. In practice, Architecture Descriptions of this sort can be very large. They will normally be developed incrementally and collaboratively by a team. They will also be constrained by the fact that they are building from legacy components and/or devising a modular architecture that comprises reusable components. Tools to support this type of development process must be able to deal with the consequences of these observations.

8

P. Henderson and M.J. Henderson

6.1 Tools The WAVE tool allows its users to describe Architectures [1]. It will produce documentation based on these descriptions, including reports on the inconsistencies. A prototype implementation of WAVE is available at http://ecs.soton.ac.uk/˜ph/WAVE. It comprises a collection of scripts which transform among representations of the architecture descriptions and which support the merging of independently developed models. WAVE also supports the inclusion of specialised metamodels. The prototype assumes that descriptions made according to these models will have an XML representation. In practice, we have used WAVE as an adjunct to a UML tool (Sparx Systems Enterprise Architect) from which the architecture description can be dumped as an XMI file. This XMI description is first turned into a realtional model, capturing data from the XML according to the chosen metamodel. The consistency checking is then done by a script that computes the “inconsistency” relations such as dr, ds and canReplace as described here. A further script then constructs the documentation of the architecture including the reports on inconsistencies in that report. We plan to replace this XML based implementation with one based on a conventional relational database in the near future, in order to inherit the robustness and transactional properties of those systems. The architect will then be able to publish their incremental description and share it with others via that shared database. 6.2 Collaborative Working, Incremental Development and Component Reuse The architects work by capturing new parts of the description and then using the tool to generate interim documentation. The inconsistencies are highlighted. So they continue to develop, all the while trying to remove inconsistencies they have added. Collaborative working is supported because the architects can work independently as follows. Each publishes their models to the others. When running consistency checking, each architect can merge their work with the work of others. The generated interim documentation contains inconsistencies that are now across the whole architecture. It will be apparent to each architect which inconsistencies are their responsibilities. They continue to work on their independent parts, while following this process of incremental development. Where the architecture is something being developed around a catalogue of reusable components (which is the norm these days) then the component descriptions can be held with the component in such a way that integrating them into a new “build” (i.e. build of the architecture) is straightforward. Thus the plug-and-play that is afforded by component reusability at the implementation stage is mirrored at the design stage by having reusable component-descriptions.

7 Conclusions We contend that architecture description is an important problem for Information Systems development, especially for large Open Systems with many architects, many engineers and many legacy components.

Consistency Checking for Component Reuse in Open Systems

9

We have observed that such an architecture description is likely to be incomplete and inconsistent for much of its life. We have argued that capturing the architecture description according to a precise specialised metamodel introduces the kind of redundancy in description that allows inconsistencies to be detected early in the life of an architecture description. We have given examples of the kinds of metamodels that can be developed. We have described tools based on UML that allow a pragmatic approach to developing both metamodels and architecture descriptions that are compliant to those metamodels. The particular example we have chosen to illustrate the method used Components and Interfaces, which are particularly appropriate in the context of component reuse. We have shown how the method lends itself to reuse of component descriptions, as well as to reuse of components. In the future, we will refine our methods and tools in particular ways, not least in integrating the architecture description with its documentation based on narrative structure [27]. We plan to publish detailed metamodels that we have developed along with collaborators. We recognise that many of the more complex consistency constraints that we specify have analogues in graph algorithms and wish to pursue this potentially rich theme. In particular, it might allow us to investigate the kinds of architectural complexity that Alexander described in the 60s and then deprecated in the 70s [31] or put more flesh on the bones of interesting new developments such as the Algebra of Systems [26]. The method we have presented is now quite mature and is being applied in practice by our industrial collaborators. The tool is usable and integrates well with established tools. We believe that the method and tools together constitute a sound and practical method of enhancing architecture description.

References 1. Henderson, P., Henderson, M.J.: Collaborative development of system architecture - a tool for coping with inconsistency. In: 21st International Conference on Software Engineering and Knowledge Engineering (SEKE 2009). Knowledge Systems Institute, Skokie (2009) 2. Kruchten, P.: Architectural Blueprints - the 4+1 view model of software architecture. IEEE Software 12(6), 42–50 (1995) 3. Shaw, M., Garlan, D.: Software Architecture - Perspectives on an emerging discipline. Addison-Wesley, Upper Saddle River (1996) 4. Maier, M.W., Rechtin, E.: The Art of System Architecting, 2nd edn. CRC Press LLC, Boca Raton (2002) 5. Henderson, P.: Laws for dynamic systems. In: International Conference on Software Re-Use (ICSR 1998). IEEE, Los Alamitos (1998) 6. Henderson, P., Yang, J.: Reusable web services. In: 8th International Conference on Software Reuse (ICSR 2004). IEEE, Los Alamitos (2004) 7. Rozanski, N., Woods, E.: Software Systems Architecture. Addison-Wesley, Upper Saddle River (2005) 8. Shaw, M., Clements, P.: The golden age of software architecture. Software, 31–39 (March/April 2006) 9. OMG: Unified Modeling Language, superstructure (2007),http://www.uml.org 10. OMG: OMG Systems Modeling Language (2006), http://www.uml.org

10

P. Henderson and M.J. Henderson

11. Holt, R.C.: Binary relational algebra applied to software architecture. In: CSRI Technical Report 345. University of Toronto (1996) 12. Hadar, E., Hadar, I.: Effective preparation for design review - using UML arrow checklist leveraged on the guru’s knowledge. In: OOPSLA. ACM, New York (2007) 13. Balasubramanian, K., Balasubramanian, J., Parsons, J., Gokhale, A., Schmidt, D.C.: A platform independent component modeling language for distributed real-time and embedded systems. J. Comput. Syst. Sci. 73(2), 171–185 (2007) 14. Dekel, U., Herbsleb, J.D.: Notation and representation in collaborative object-oriented design: an observational study. In: SIGPLAN Notices, vol. 42(10). ACM, New York (2007) 15. Egyed, A.: UML/analyzer: A tool for the instant consistency checking of UML models. In: ICSE 2007: Proceedings of the 29th international conference on Software engineering. ACM, New York (2007) 16. Nejati, S., Sabetzadeh, M., Chechik, M., Easterbrook, S., Zave, P.: Matching and merging of statecharts specifications. In: 29th International Conference on Software Engineering (ICSE 2007), pp. 54–64 (2007) 17. Balzer, B.: Tolerating inconsistency. In: ICSE 1991: Proceedings of the 13th international conference on Software engineering. ACM, New York (1991) 18. Egyed, A.: Instant consistency checking for the UML. In: ICSE 2006: Proceedings of the 28th international conference on Software engineering, pp. 381–390. ACM, New York (2006) 19. Egyed, A.: Fixing inconsistencies in UML design models. In: ICSE 2007: Proceedings of the 29th international conference on Software engineering. ACM, New York (2007) 20. Sabetzadeh, M., Nejati, S., Liaskos, S., Easterbrook, S., Chechik, M.: Consistency checking of conceptual models via model merging. In: RE, pp. 221–230 (2007) 21. Sabetzadeh, M., Nejati, S., Easterbrook, S., Chechik, M.: Global consistency checking of distributed models with tremer. In: 30th International Conference on Software Engineering (ICSE 2008), Formal Research Demonstration (2008) (to appear) 22. Nuseibeh, B., Easterbrook, S., Russo, A.: Making inconsistency respectable in software development. Journal of Systems and Software 58, 171–180 (2001) 23. OMG: MDA guide (2003), http://www.uml.org 24. Beyer, D., Noack, A., Lewerenz, C.: Efficient relational calculation for software analysis. Transactions on Software Engineering 31(2), 137–149 (2005) 25. Chang, K.N.: Consistency checks on UML diagrams. In: International Conference on Software Engineering Research and Practice, SERP 2007. IEEE, Los Alamitos (2007) 26. Koo, B.H.Y., Simmons, W.L., Crawley, E.F.: Algebra of systems: An executable framework for model synthesis and evaluation. In: Proceedings of the 2007 International Conference on Systems Engineering and Modeling. IEEE, Los Alamitos (2007) 27. Henderson, P., de Silva, N.: System architecture induces document architecture. In: 20th International Conference on Software Engineering and Knowledge Engineering (SEKE 2008). IEEE, Los Alamitos (2008) 28. Date, C.: Database in Depth - Relational Theory for Practitioners. O’Reilly Media Inc., Sebastopol (2006) 29. Beyer, D.: Relational programming with CrocoPat. In: ICSE. IEEE, Los Alamitos (2006) 30. Jackson, D.: Software Abstraction. MIT Press, Cambridge (2006) 31. Alexander, C.: Notes on the Synthesis of Form (with 1971 preface). Harvard University Press, Cambridge (1964)

Generating Verified Java Components through RESOLVE Hampton Smith1, Heather Harton1, David Frazier2, Raghuveer Mohan1, and Murali Sitaraman1 2

1 School of Computing, Clemson University, Clemson, SC 29634, USA Computer Science Department, East Tennessee State University, Johnson City, TN 37614, USA {hamptos,hkeown,rmohan,murali}@cs.clemson.edu, [email protected]

Abstract. For software components to be reused with confidence, they must be correct. Unlike testing, formal verification can be used to certify that a component will behave correctly regardless of context, as long as that context satisfies component assumptions. Some verification systems for developing correct components in languages such as Java are simplified to be practical, but are not complete. Other systems that account for necessary semantic complications arising from underlying reference behavior demand non-trivial specification and verification. This paper describes an alternative. Under this approach, reusable components are specified, implemented, and verified in RESOLVE, a language with clean semantics, and are translated to Java. To improve confidence in the verification process, we are currently re-engineering the RESOLVE verification system itself with generated verified components.

1 Introduction A fundamental goal of modern object oriented programming is to improve software engineering through the development and assembly of reusable software components. However, precisely that which is the greatest strength of object orientation—reuse— increases the necessity for verifying components. If the details of a component are to be kept hidden, and its behavior abstracted, then the client must be able to have confidence that the component works as described [1]. As a result, the push to apply formal methods to specify and verify both programs and properties of programs continues to accelerate, not just for mission-critical software but for reusable components in general. Several of these efforts have focused on extending existing programming languages with formal specifications. Examples of this are Spec# for C# [2] and JML for Java [3]. Unfortunately, popular objectoriented languages contain features that make formal verification difficult [4]. These difficulties can be formalized in the notion of clean semantics [5]. A language has clean semantics if it 1) allows for a state space defined only by named variables that represent mathematically rich values, and 2) allows code to affect the values of only those variables explicitly named. When the semantics of a language are clean, specifying components is more straightforward and thus verification is easier. RESOLVE S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 11–20, 2009. © Springer-Verlag Berlin Heidelberg 2009

12

H. Smith et al.

[6] is an integrated specification and implementation language that has been designed to have clean semantics. At ICSR 2000 [1], we explained that, in principle, it is possible to specify and verify software components in a modular fashion. An operation to reverse a List component was used as an example. Since then, we have built the RESOLVE Verifying Compiler [7, 8], which is able to perform mechanical verification and generate code for a class of programs, including the example in [1]. The compiler allows users to generate verified Java components by developing and verifying components in RESOLVE, then generating Java code from verified RESOLVE software. When used alongside unverified Java code, these verified components can increase the confidence in the overall system and narrow the search for bugs to unverified code. Because the compiler itself ultimately needs to be trustworthy, we take a step toward this goal by exploring a re-engineering case study in which correctby-construction Java components are incorporated into the RESOLVE verifying compiler itself. The paper is organized as follows. In section 2 we compare and contrast our method with related efforts for achieving verified Java components. In section 3, we give an overview of the RESOLVE compiler and how it can be used to generate reusable, verified Java components. In section 4, we present a case study in which we replace an existing Java component with a verified one. In section 5 we will present challenges, our conclusions, and some thoughts for future work.

2 Approaches for Verified Java Components This section contains a summary of related work in developing correct Java components. 2.1 Jahob Jahob verifies programs written in a subset of the Java programming language using Isabelle as a specification language [9]. Jahob's goal is to to provide static verification and guard against a class of errors, rather than to provide a mechanism for full functional verification. Jahob uses shape analysis to assist in automatically determining invariants [10, 11]. 2.2 ESC/Java ESC/Java uses a series of static checks to attempt to identify such run-time errors as unhandled exceptions, incorrectly-used locks, and accesses outside of the bounds of an array [12]. The VCs generated by ESC/Java are passed to the automated prover Simplify [13]. The specification language used by ESC/Java has been developed in conjunction with JML [14] and is very similar. The next generation of ESC/Java, ESC/Java2, is used to perform static checks for JML [15]. In addition to ESC, there are other JML Static Verification Tools [16, 17, 17, 19]. ESC/Java discourages aliasing. JML, however, uses a model that only allows the “owner” of an object to modify the object. These approaches limit the developer to a subset of Java. In addition, as noted in [20], JML has yet to catch up to developments in the Java programming added in Java 1.5, such as generics and enums.

Generating Verified Java Components through RESOLVE

13

2.3 Why The system Why provides a mechanism to verify C, ML, and Java programs. The programs are converted into a Why Internal language which is similar to ML [21]. As an example, Java programs annotated in JML are translated into Why by a program called Krakatoa. Krakatoa therefore represents another verification system for Java programs with JML specs [18]. Why generates VCs for various existing proof tools including Coq, PVS, Mizar, and HOL Light. Why also uses the Simplify [13] and haRVey [22] decision procedures to perform proofs. In general, Why disallows aliasing; however, Krakatoa models the Java heap so that once converted to Why, there is no aliasing. However, there are still portions of Java not handled by Krakatoa [18].

3 Verified Java Component Generation 3.1 Input Components The input to the RESOLVE verifying compiler consists of specifications in the RESOLVE integrated specification language and code in the RESOLVE programming language. In Figure 1, we see part of the List component defined in a concept called List_Template. RESOLVE concepts are specified by modeling objects using a variety of mathematical entities, such as numbers, strings, sets, and functions. In List_Template, to capture ordered collections of elements in which there is a cursor marking the insertion position, a List is modeled as an ordered pair of mathematical Concept List_Template(type Entry); uses Std_Integer_Fac, String_Theory; Family List is modeled by Cart_Prod Prec, Rem: Str(Entry); end; exemplar P: List; initialization ensures P.Prec = empty_string and P.Rem = empty_string; Operation Advance_to_End(update P: List); ensures P.Prec = #P.Prec o #P.Rem and P.Rem = empty_string; Operation Insert(clears New_Entry: Entry; updates P: List); ensures P.Prec = #P.Prec and P.Rem = <#New_Entry> o #P.Rem; Operation Remove(replace Entry_Removed: Entry; update P: List); requires P.Rem /= empty_string; ensures P.Prec = #P.Prec and #P.Rem = <Entry_Removed> o P.Rem; (* Other operations omitted for brevity *) end List_Template;

Fig. 1. A part of the List_Template specification

14

H. Smith et al.

strings: the elements in the first string precede the cursor (to the left of the cursor using the terminology of [1]), and those in the second string remain, following the cursor (to the right). The use of mathematical entities allows for a formal conceptualization of programming objects without referring to object implementation details. Stacks, Queues, Lists, and other objects that represent ordered collections can all be modeled mathematically using strings and may thus avail themselves of associated notations and theorems from mathematical string theory. Reuse of theories allows for the effort invested in establishing various theorems to be reused [23]. As an example for explaining verification, consider the Reverse operation on List proposed in [1], for which both a specification and realization are given in figure 2. The function rev in the specification is a mathematical operator that reverses a string, and |X| is a notation for the length of a string X. Here, the Advance_to_End operation is used in place of the original Advance used in [1] because the replacement eased mechanical verification. Enhancement Reverse_Capability for List_Template; Operation Reverse(updates S:List); requires |S.Prec| = 0; ensures S.Prec = rev(#S.Rem) and |S.prec|; end Reverse_Capability; Realization Reverse_Realiz for Reverse_Capability of List_Template; Procedure Reverse(updates S:List); decreasing |S.Rem|; Var temp: Entry; if Length_of_Rem (S) > 0 then Remove(temp, S); Reverse(S); Insert(temp, S); Advance_to_End(S); end; end Reverse; end Reverse_Realiz;

Fig. 2. Specification and realization of List Reverse_Capability

3.2 Verification Output In order to verify a component correct, the RESOLVE compiler first generates a series of logical implications called verification conditions or VCs. These VCs are generated such that proving them is necessary and sufficient to demonstrate that the program is correct. To verify the code in Figure 2, VCs must be generated for the preconditions of each operation (e.g., Advance requires that the remaining string of a List not be empty), for the implementation itself (to guarantee that it meets the ensures clause of the specification for Reverse), and for the progress metric, which states that the length of S.Rem must decrease with each recursive call in order to terminate. The VCs produced by the compiler can be output in the syntax accepted by the Isabelle proof assistant, in a more user-friendly syntax for human inspection, or passed

Generating Verified Java Components through RESOLVE

15

directly to RESOLVE's integrated prover. As an example, the code in Figure 2 generates seven different VCs intended to guarantee both functional correctness and termination. The integrated RESOLVE Prover is able to dispatch each. However, for more complicated code, the generated VCs are often outside the reach of either Isabelle or the RESOLVE integrated prover. Exploring techniques for writing more easily-provable code, generating more straightforward VCs, and proving resulting VCs are active areas of our research. 3.3 Output Java Components For the given example, the compiler generates Java code for List_Template, its implementation, the Reverse enhancement, and the Reverse implementation. A snippet of the generated code for the Reverse implementation is shown in Figure 3. This code can be compiled and used to extend Java List components compiled from RESOLVE implementations of the List_Template specification [24]. Since the source RESOLVE code has been verified correct, the generated Java code is correct by construction, insofar as the verification and translation processes can be trusted. Increasing the trustworthiness of the verification process is discussed in Section 4. 3.4 Challenges Because RESOLVE is imperative and object oriented, the process of writing RESOLVE code should be straightforward to programmers familiar with languages like Java. However, because RESOLVE is designed to be suitable for verification, it has some requirements that are likely to be unfamiliar to such programmers. Specifically, many programmers will be unfamiliar with writing formal specifications for methods and annotating loops with invariants. However, we have data from our experience teaching RESOLVE to undergraduate students that suggests this transition is fairly straightforward. This data is the topic of [25, 26, 27]. public class Reverse_Realiz implements Reverse_Capability, List_Template, InvocationHandler { public void Reverse(List_Template.List S) { RType temp = getTypeEntry().initialValue(); if (((Std_Boolean_Realiz.Boolean) (Std_Integer_Fac.Greater(Length_of_Rem(S), Std_Integer_Fac.createInteger(0)))).val) { Remove(temp, S); Reverse(S); Insert(temp, S); Advance(S); } } /* Much omitted for brevity */ }

Fig. 3. Snippet of Java code generated from RESOLVE implementation of Reverse

16

H. Smith et al.

4 Re-engineering the Verifying Compiler A faulty verification tool may assert that a component is correct with respect to its specification even when it is not. One important method for achieving this is, of course, to encourage the use of existing, mature components in our verification tools. A discussion of mature Java runtime component usage in the RESOLVE verifying compiler is the topic of section 4.1. Another option is to begin with a tool implemented using well-specified components that have not been verified, but in which our confidence is high due to their maturity, then to gradually replace these components with equivalent ones verified and generated by the tool itself. As we re-engineer our tool in this way, we increase our confidence that the tool is sound. Clearly, a faulty tool may verify and generate faulty components, but each re-engineering iteration is a new chance to reverify components under a new, and presumably better, set of circumstances, as well as a new opportunity to test these components and expose problems. While the ultimate goal is a system (in this case, a compiler) constructed of 100% verified code, in this paper we seek to establish that using verified libraries is a tenable next step in software engineering, where mature, vetted libraries are already used as a method to increase confidence that overall systems are correct. The application of these ideas to the RESOLVE verifying compiler is the topic of section 4.2. 4.1 Summary of Reusable Component Usage in the Current Compiler The RESOLVE Verifying Compiler uses many mature components, such as those found in the Java runtime library. We have identified which components from the Java runtime are used most frequently. These components are prime candidates for being replaced with verified components. The results of this investigation are found in Table 1. Counts of the String Java component have been omitted as being, if nothing else, fairly boring. Since the List component appears so often, it is an excellent candidate for being replaced with a verified alternative. Table 1. Instances of three often-used Java runtime components in the RESOLVE Verifying Compiler

List

StringBuffer

Map

Analyzer

145

0

0

Translator

181

103

10

VCGenerator

201

12

1

4.2 Re-engineering Issues Because of differences in the way RESOLVE and Java represent objects, care must be taken when designing the interfaces of components that are to be re-engineered. For instance, RESOLVE's primary mechanism for moving an object is to swap it with another object, as opposed to Java-style reference assignment. As a result,

Generating Verified Java Components through RESOLVE

17

RESOLVE components, when translated to Java, implement a special interface that provides methods called getRep() and setRep(), which essentially allow the data associated with a particular object (its representation) to be retrieved and set in a wholesale manner. This means that not only do components such as List need to implement these methods from the start if they are to be re-engineered, but also those components that will be inserted into the List. This restriction only applies to classes that are an explicit part of the public interface of the component. To facilitate such an interface, we created a simple wrapper class that allows arbitrary Java objects to be wrapped into a type acceptable to RESOLVE components. Using this class, we inserted a RESOLVE implementation of the List component into RESOLVE's integrated prover. It should be noted that while we are able to verify certain operations on Lists, we are not yet able to perform full mechanical verification on an implementation of List. However, since translation is independent of verification, we may integrate the component first and fully verify it later. In Figure 4. we present a snippet of code before and after the new component was added. pastStates was originally a Java doubly-linked list, being used here as a stack. It has been replaced in this example with a List generated from RESOLVE code. The List used here is a bounded version of the one presented in Section 3. That is, it has a maximum capacity. The unbounded version was presented to be consistent with the example in [1]. Because RESOLVE uses swapping to pass objects, inserting the contents of workingRESOLVEObject requires that another valid object fill the representation of that variable. Thus workingRESOLVEObject remains a valid, non-null RESOLVE object that does not cause an alias with the data stored in pastStates. /* Code before */ pastStates.push(vC); attemptStep(vC, curLength, metrics, pastStates); pastStates.pop(); /* Code after */ Rtype workingRESOLVEObject = new RtypeWrapper(vC); One_Way_List_Facility.Insert(workingRESOLVEObject, pastStates); attemptStep(vC, curLength, metrics, pastStates); One_Way_List_Facility.Remove(workingRESOLVEObject, pastStates);

Fig. 4. Snippet from the integrated prover before and after re-engineering

With the wrapper class, replacing a Java component with a more readily verifiable reusable component is straightforward. However, one issue we encountered is that the translation into Java was originally implemented under the assumption that the source code would represent a full RESOLVE program, rather than a component that might be mixed with existing Java code. As a result, the getRep() and setRep() functions each manipulate Java objects of type Object and thus provide no typesafety. Indeed, while the RESOLVE List must be parameterized with a RESOLVE type when it is initialized, our wrapper class would subsequently allow Java objects of any kind to be inserted, and this could cause a verified component to fail when it attempts to manipulate an internal object of an unexpected type. An obvious solution to this would be to allow the RType interface to be parameterized by a Java generic so

18

H. Smith et al.

that getRep() and setRep() operate on objects of the appropriate type. We intend to modify the translation system to allow for type-safe integration of generated components into Java code.

5 Conclusions Programming languages that are designed with verification of object-based software in mind, such as the RESOLVE, enable software engineers to create verified reusable components even in languages that are not designed to facilitate verification, such as Java. With this technique, a library of verified components can be generated in any language and used alongside components that have not been formally verified. With care, the interfaces to hand-written components in such a language can be designed such that they can be replaced at a later date with equivalent verified components. This technique can be applied to the verification tools themselves to increase our confidence in them. This paper presents a case study in which we have applied these techniques to the RESOLVE verifying compiler. While our example of a List component in this paper was chosen for its simplicity, the intention is that larger, more richly featured components will be written and verified in RESOLVE to be used alongside mature Java components. Obstacles remain to the development of a library of verified, reusable components. Development of the verifying compiler, both in terms of the understanding necessary to automate the process fully and in terms of implementation, requires considerable additional research. In addition, the translation process itself cannot be verified until adequate formalisms to specify, develop, and verify components for compilation and verification are conceived. Even as these long-term problems remain, this paper identifies a way to generate trustworthy components in languages such as Java in the short term starting with RESOLVE.

Acknowledgments This research is funded in part by NSF grants CCF-0811748 and DMS-0701187. We thank the members of the RESOLVE/Reusable Software Research Group (RSRG) at Clemson and Ohio State for discussions on the topics presented in this paper.

References 1. Sitaraman, M., Atkinson, S., Kulczycki, G., Weide, B.W., Long, T.J., Bucci, P., Heym, W.D., Pike, S.M., Hollingsworth, J.E.: Reasoning about software-component behavior. In: Frakes, W.B. (ed.) ICSR 2000. LNCS, vol. 1844, pp. 266–283. Springer, Heidelberg (2000) 2. Barnett, M., Leino, K.R., Schulte, W.: The Spec# Programming System: An Overview. In: Barthe, G., Burdy, L., Huisman, M., Lanet, J.-L., Muntean, T. (eds.) CASSIS 2004. LNCS, vol. 3362, pp. 49–69. Springer, Heidelberg (2005) 3. Burdy, L., Cheon, Y., Cok, D., Ernst, M., Kiniry, J., Leavens, G.T., Leino, K.R.M., Poll, E.: An overview of JML tools and applications. STTT 7(3), 212–232 (2005)

Generating Verified Java Components through RESOLVE

19

4. Weide, B.W., Heym, W.D.: Specification and Verification with References. In: Proc. SAVCBS, pp. 50–59 (2001) 5. Kulczycki, G.: Direct Reasoning, Ph. D. Dissertation, Clemson University (2004) 6. Edwards, S.H., Heym, W.D., Long, T.J., Sitaraman, M., Weide, B.W.: Part II: specifying components in RESOLVE. SIGSOFT Softw. Eng. Notes 19(4), 29–39 (1994) 7. Harton, H.K., Sitaraman, M., Krone, J.: Formal Program Verification. In: Wah, B. (ed.) Wiley Encyclopedia of Computer Science and Engineering, John Wiley & Sons, Chichester (2008) 8. Sitaraman, M., Adcock, B., Avigad, J., Bronish, D., Bucci, P., Frazier, D., Friedman, H.M., Harton, H., Heym, W., Kirschenbaum, J., Krone, J., Smith, H., Weide, B.W.: Building a Push-Button RESOLVE Verifier: Progress and Challenges. Technical Report RSRG09-01, School of Computing, Clemson University, Clemson, SC (2009) 9. Kuncak, K., Rinard, M.: An Overview of the Jahob Analysis System: Project Goals and Current Status. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, p. 323 (2006) 10. Wies, T.: Symbolic Shape Analysis. Master’s thesis, Universität des Saarlandes, Saarbrücken, Germany (September 2004) 11. Wies, T., Kuncak, V., Lam, P., Podelski, A., Rinard, M.: Field Constraint Analysis. In: Proc. Int. Conf. Verification, Model Checking, and Abstract Interpratation (2006) 12. Flanagan, C., Leino, K.R.M., Lillibridge, M., Nelson, G., Saxe, J.B., Stata, R.: Extended Static Checking for Java. In: Proc. ACM SIGPLAN 2002 Conference on Programming language Design and Implementation. Berlin, pp. 234–245 (2002) 13. Detlefs, D., Nelson, G., Saxe, J.B.: Simplify: A Theorem Prover for Program Checking. J. ACM 52(3), 365–473 (2005) 14. Leavens, G.T., Baker, A.L., Ruby, C.: Preliminary Design of JML: A Behavioral Interface Specification Language for Java. ACM Software Engineering Notes 31, 1–38 (2006) 15. Poll, E., Kiniry, J., Cok, D.: Introduction to JML, http://secure.ucd.ie/products/opensource/ESCJava2/ESCTools/ papers/CASSIS2004.pdf 16. Burdy, L., Requet, A., Lanet, J.: Java applet correctness: A developer-oriented approach. In: Araki, K., Gnesi, S., Mandrioli, D. (eds.) FME 2003. LNCS, vol. 2805, pp. 422–439. Springer, Heidelberg (2003) 17. Ahrendt, W., Baar, T.H., Beckert, B., Bubel, R., Giese, M., Hähnle, R., Menzel, W., Mostowski, W., Roth, A., Schlager, A., Schmitt, P.H.: The KeY tool. Software and System Modeling 4, 32–54 (2005) 18. Marché, C., Paulin-Mohring, C., Urbain, X.: The Krakatoa Tool for Certification of Java/JavaCard Programs Annotated in JML. Journal of Logic and Algebraic Programming 58(1-2), 89–106 (2004) 19. Chalin, P., James, P.R., Karabotsos, G.: JML4: Towards an Industrial Grade IVE for Java and Next Generation Research Platform for JML. In: Shankar, N., Woodcock, J. (eds.) VSTTE 2008. LNCS, vol. 5295, pp. 70–83. Springer, Heidelberg (2008) 20. Cok, D.: Adapting JML to generic types and Java 1.6. In: Proc. SAVCBS, pp. 27–35 (2008) 21. Filliâtre, J., Marché, C.: The Why/Krakatoa/Caduceus Platform for Deductive Program Verification. In: Werner, D., Holger, H. (eds.) CAV 19. LNCS, vol. 4510. Springer, Berlin (2007) 22. Ranise, S., Deharbe, D.: The haRVey decision procedure, http://www.loria.fr/~ranise/haRVey/

20

H. Smith et al.

23. Smith, H., Roche, K., Sitaraman, M., Krone, J., Ogden, W.F.: Integrating math units and proof checking for specification and verification. In: Proc. SAVCBS, pp. 59–66 (2008) 24. Hunt, J.M., Sitaraman, M.: Enhancements - Enabling Flexible Feature and Implementation Selection. In: Bosch, J., Krueger, C. (eds.) ICOIN 2004 and ICSR 2004. LNCS, vol. 3107, pp. 86–100. Springer, Heidelberg (2004) 25. Leonard, D., Hallstrom, J., Sitaraman, M.: Injecting Rapid Feedback and Collaborative Reasoning in Teaching Specifications. In: Proc. ACM SIGCSE 2009 (2009) 26. Long, T.J., Weide, B.W., Bucci, P., Gibson, D.S., Sitaraman, M., Hollingsworth, J.E., Edwards, S.H.: Providing Intellectual Focus To CS1/CS2. In: Proc. 29th SIGCSE Technical Symposium on Computer Science Education, pp. 252–256. ACM Press, New York (2009) 27. Sitaraman, M., Long, T.J., Weide, B.W., Harner, J., Wang, C.: A Formal Approach to Component-Based Software Engineering: Education and Evaluation. In: Proc. ICSR 2001, pp. 601–609. IEEE, Los Alamitos (2001)

Increasing Reuse in Component Models through Genericity Julien Bigot1 and Christian P´erez2 1

LIP/INSA Rennes 2 LIP/INRIA {julien.bigot,christian.perez}@inria.fr

Abstract. A current limitation to component reusability is that component models target to describe a deployed assembly and thus bind the behavior of a component to the data-types it manipulates. This paper studies the feasibility of supporting genericity within component models, including component and port types. The proposed approach works by extending the meta-model of an existing component model. It is applied to the SCA component model; a working prototype shows its feasibility.

1

Introduction

Component based software engineering is a very interesting approach to increase code reusability. Component models are used in a variety of domains such as embedded systems (Fractal [1]), distributed computing (CCM [2], SCA [3]), and even high performance computing (CCA [4]). There is however usually a direct mapping between component instances and execution resources as well as between components and the data-types they manipulate. This means that a component implementation binds together three distinct concerns: the behavior of the component, the data-types it manipulates and the execution resources it is targeted to. Separating those three concerns would greatly increase reusability as each aspect could be selected independently to be combined latter. While there are some works on automatic mapping of components on resources, there are few works on abstracting component models similarly as what is addressed by generic programming where algorithms and data-types can be parameters. This paper presents an attempt to support genericity in component models in order to validate the feasibility of this idea and evaluate its advantages on an example. This is achieved by extending an existing component model (SCA) with concepts to support genericity and by implementing a tool that transforms applications written in this extended model back to the original one. The remainder of this paper is organized as follow. Section 2 describes an example of component that would beneﬁt from the introduction of genericity. Section 3 analyzes some related works. An approach to introduce genericity is described in Section 4 and applied to SCA in Section 5. Then, Section 6 evaluates how this generic SCA can be used to implement the example described earlier through a prototype. Finally, Section 7 concludes and present some future works. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 21–30, 2009. c Springer-Verlag Berlin Heidelberg 2009

22

2

J. Bigot and C. P´erez

A Motivating Example: The Task Farm

Algorithmic skeletons are constructs that describe the structure of recurring composition patterns [5]. Some skeletons have been identiﬁed for the case of parallel computing, such as the pipeline (computation in stages), the task farm (embarrassingly parallel computations), the map and reduce (data parallel apply to all and sum up computations), the loop (determinate and indeterminate iterative computations) and the divide & conquer skeletons. As an example, the task farm skeleton shown in Fig. 1 takes a data-stream as inW put and outputs a processed version of this D W C .. stream. The parallelism is obtained by run. W ning in parallel multiple instances of workers (W in Fig. 1), each one processing a single piece of data at a time. A dispatcher (D Fig. 1. The task farm skeleton in Fig. 1) handles the input and chooses the worker for each piece of data and a collector (C in Fig. 1) reorders the outputs of the workers to generate the farm output. A typical component based implementation of the task farm will wrap each of the three roles in a component. The farm itself will be a composite containing instances of these components. To increase the reusability of this composite, it should be possible to use it for various processing applied to the data, for various types of data in the stream and for various numbers of workers. It is thus interesting to let the type of the data stream, the implementation of the workers as well as their number be parameters of the composite. When knowledge about the content of the manipulated data or about the behavior of the workers makes it possible to provide optimized implementations of the dispatcher or collector it should be possible to use these implementations. This should however not complexify the usage for cases where the default implementation is suﬃcient. It is thus interesting to let the implementations of these two components be parameter of the composite with some default values. Hence, the task farm is a good example of component that could beneﬁt from the support of genericity. Kinds of needed parameters include data-values (the number of workers), data-type (data stream) and component implementations (dispatcher, worker and collector), with the possibility to provide default values.

3 3.1

Related Works Languages with Support for Genericity

Genericity [6] is ubiquitous in object-oriented languages. For example, ADA, C++, C, Eiﬀel and Java all support it [7]. Classes, methods and in some cases procedures can accept parameters. Parameters can be data-types or in some cases data-value constants. A typical usage is to implement type-safe containers where the type of the contained data is a parameter.

Increasing Reuse in Component Models through Genericity

23

There are two main approaches for handling the validity of parameter values. In some languages such as Java, explicit constraints on the values of parameters [8] restrict their uses in the implementation. In other languages such as C++, the uses of the parameters in the implementation restrict the values they can be bound to [9]. Explicit constraints eases the writing and debugging of applications as invalid use of generic concepts can be detected using their public interface only. Describing the minimal constraints on parameters can however prove to be a very complex task. The upcoming C++0x [10] takes a mixed approach: constraints are expressed as use patterns of the parameters (this is called “concepts” in the C++0x terminology) but this does not prevent use of parameters in the implementation that were not covered by a “concept”. In some languages such as C++, explicit specializations can be provided for speciﬁc values of the parameter. This makes it possible to provide optimized implementations for these cases. This also makes the language Turing complete and enables template meta-programming [11]. As far as we know, there is no component model with support for genericity (except for HOCs further discussed in Sec. 3.3). The closest features found in most models are conﬁguration properties which are values that can be set to conﬁgure the behaviour of components. In some models such as CCM these conﬁguration properties can be modiﬁed at run-time. In other models such as SCA, they can only be set in the assembly making them more similar to generic parameters. Unlike generic parameters however, properties are only used to carry data-values, not types. 3.2

Algorithmic Skeletons

As seen in the previous section, the implementation of algorithmic skeletons is an example where genericity brings great advantages. Model bringing together components and skeletons have already been described, for example in [12]. These models are very similar to a component model supporting genericity from the point of view of a user of skeletons: skeletons are instanciated and the implementation of the component it contains are passed as parameters. In these model however, skeletons are supported by keywords of the assembly language and their implementation is generated by a dedicated compiler. From the point of view of the developer of skeletons this means that supporting new skeletons or new implementations of existing skeletons requires modiﬁcations of this compiler, which can be diﬃcult and strongly limits reusability. 3.3

Higher Order Components

Higher Order Components (HOCs) [13] is a project based on the Globus grid middleware. With HOCs, a Globus service implementation S can accept string parameters identifying other service implementations. At run-time, S can create instances of these services and use them, thus addressing the issue of reusable assembly structure. However, type consistency can not be statically checked as instantiation and use of services are deeply hidden in S implementation. Another

24

J. Bigot and C. P´erez

limitation is that only service types can be passed as parameters; data-types can not. For the task farm implementation, it leads to a distinct implementation for each data-type processed in the stream.

4 4.1

An Approach to Introduce Genericity in Component Models Overview

Introducing genericity in a component model means making some of its concepts generic. A generic concept accepts parameters and deﬁnes a family of concepts: its specializations. Each combination of parameter values of the generic concept deﬁnes one specialization. Supporting generic concepts means that when one is used, the values of its parameters must be retrieved and the correct specialization must be used. This can either be done at run-time (as has been done for C for example) or through a compilation phase (as has been done for C++ for example). The compilation approach has the advantage of requiring no modiﬁcation of the run-time. It can also lead to a more eﬃcient result since the computation of the specializations to use has already been done. On the other hand, this approach makes it impossible to dynamically instantiate specializations that were not statically used in the initial assembly. This paper studies the compilation approach: it describes a transformation that takes as input a set of generic components and that generates its non generic equivalent. The transformation is based on Model Driven Engineering (MDE). It manipulates two distinct component meta-models: B, a basic (i.e. non generic) component meta-model and G(B), the corresponding generic component metamodel. The proposed algorithm to transform instances of G(B) into semantically equivalent instances of B is presented in Section 4.3. The next section describes a pattern to derive a meta-model G(B) from a basic component model whose meta-model is B. 4.2

Genericity Pattern

As a ﬁrst step, the concepts of B that will be given as parameters in G(B) and those that will accept parameters (be generic) must be chosen. An example of application of the pattern described here is shown in Fig. 2. As G(B) is an extension of B, all the elements of B belong to G(B); this section describes the additions made to the meta-model to support genericity. For each concept that can be given as a parameter (e.g. PortType in Fig. 2), a meta-class with a “name” attribute is created to model such a parameters (e.g. PortTypeParameter in Fig. 2). For each concept that is turned generic (e.g. ComponentType in Fig. 2), attributes (lines in the ﬁgure) are added to its meta-class to model its parameters. For each concept that can be given as a parameter, an additional meta-class that references a parameter and inherits from the initial concept is created

Increasing Reuse in Component Models through Genericity

25

(e.g. GenericPortType in Fig. 2). This meta-class can now be used wherever the concept given as parameter is used (e.g. Port references a PortType in Fig. 2) For each concept that can be given as a parameter, an argument meta-class is created to reference both the parameter and its value (e.g. PortTypeParameter in Fig. 2). Each meta-class that references the concept made generic (Port in Fig. 2) has an argument attribute added. These are the minimal additions to G(B) required to support genericity. Other additions not shown in Fig. 2 can however be interesting. To support default value for parameters, an attribute referencing the value must be added to parameter meta-classes. For example, the PortTypeParameter will get a PortType attribute. To support constraints on parameters values, two approaches can be used: either adding a constraint attribute to the parameter meta-classes or adding it directly to the generic meta-classes. As applying constraints to the parameter meta-classes prevents the expression of constraints that depend on more than one parameter, the second approach is chosen. A root meta-class for constraints must be added. The kind of constraints that can be expressed depend on the kind of parameters. For example for a data constant parameter, a range of values can be an interesting constraint while for an object interface, the interfaces it extends can be constrained. For each kind of constraint, a meta-class that inherits from the root meta-class must be added. In addition, meta-classes modeling the various logic combinations of other constraints must be added. To support explicit specializations, a specialization meta-class must be added for each generic concept. This meta-class has a constraint attribute that speciﬁes in what case it must be used. It also has a copy of all attributes modeling the implementation of the generic concept. The meta-class modeling the generic concept on the other hand must have a specialization attribute added that models its explicit specializations. For example, the generic ComponentType meta-class will contain a ComponentTypeSpecialization attribute. This meta-class will contain a Constraint attribute as well as the content of the ComponentType: ports, implementation, etc. 4.3

Transformation from G(B) to B

This section describes an algorithm that transforms an application described in a generic component model into its equivalent in the basic component model. The transformation algorithm takes an instance i of G(B) (a set of meta-object ComponentInstance

*

+arguments

1

PortTypeArgument

+value

+parameter1 PortTypeParameter ComponentType

*

+parameters

+ name : string

PortType

1

+parameter

GenericPortType

Port

Fig. 2. Example of modifications to make ComponentType generic and to let PortType be given as parameter

26

J. Bigot and C. P´erez

of G(B)) as input and computes a semantically equivalent instance of B. The algorithm relies on a recursive function that takes a meta-object o of G(B) and a context c (bindings between generic parameters and their values) as input and returns a meta-object of B semantically equivalent to o. The main function of the algorithm iterates through all components of i. If a component can be instantiated in an empty context (at the root of an application), the recursive function is used to generate its equivalent in B. This equivalent is added to the output meta-model instance. The recursive function generates the equivalent of o using one of the four following behaviors depending on the kind of concept the meta-class of o models. If the modeled concept is a generic concept (such as ComponentType in Fig. 2), c is ﬁlled with the default values for parameters that have not been previously bound to a value. Then, the constraint on parameter values is checked: if it is not fulﬁlled, the transformation is aborted with an error. The constraints of each explicit specialization are checked. If one is fulﬁlled, the result of the function applied to each meta-object contained by this specialization is added to the result. Otherwise, the function is applied to the default content. If the modeled concept references a parameter (such as GenericPortType in Fig. 2), the value of this parameter is looked up in c and the application of the function to this value is returned. If there is no binding for this parameter in c, the transformation is aborted with an error. If the modeled concept references a generic concept (such as ComponentInstance in Fig. 2), a new context is created and ﬁlled with the arguments contained by o. Then, the result of the function applied to each metaobject contained by o with this new context is added to the result. If the modeled concept does not belong to any of the previous categories, the result of the function applied to each meta-object contained by o is added to the result. An instance of G(B) is valid if it conforms to the meta-model and leads to a valid instance of B when the algorithm is applied. An example of instance that conforms to the meta-model but is invalid is a composite containing (possibly transitively) an instance of itself as it may lead to inﬁnite recursion. The recursion can be broken if the composite accepts parameters that are used in the constraints of an explicit specialization. As genericity with recursion and selection of explicit specializations is very likely to be Turing-complete, the termination problem is expected to be undecidable. C++ compilers facing the same problem ﬁx a limit to the recursion depth after which an error is emitted.

5 5.1

Case Study: Turning SCA Generic Overview of SCA

SCA [3] (Service Component Architecture) is a component model speciﬁcation. It aims at easing service oriented applications development by making possible their description as components assemblies. It deﬁnes two types of ports: services and references, both typed by an interface. Interfaces can be extracted

Increasing Reuse in Component Models through Genericity

27

from various descriptors such as a Java interface, a WSDL interface, etc. SCA also supports conﬁguration properties as part of the external interface of SCA components. Components can have two kinds of implementations: composite implementations provided by an assembly or native implementations (such as Java or C++ classes). 5.2

A Meta-Model for Generic SCA

The pattern described in Section 4.2 has been applied to the SCA meta-model in order to create a meta-model for generic SCA. The SCA meta-model described as part of the “eclipse SCA Tools project1 ” has been used for this purpose. The concepts made generic are composites and native components, generic Java classes are supported. Parameters can be implementations, interfaces, data-types and data-values for composites, data-types and data-values for native components and Java types for Java classes. This required the addition of height additional meta-classes: GenericImplementation, ImplementationParameter, ImplementationArgument, GenericInterface, InterfaceParameter, InterfaceArgument, JavaTypeParameter and JavaTypeArgument. All the parameter meta-classes support default parameter value. No GenericJavaType meta-class has been created as Java types are simply identiﬁed by a string containing their name. No modiﬁcations have been done to support data-value parameters as SCA already has the concept of conﬁguration properties. Support for constraints on parameter values of composites and native components has been added with a root meta-class for constraints: Constraint. As conﬁguration properties are referenced by xpath expression in SCA XML documents, a constraint meta-class that supports boolean xpath expressions has been added: XpathConstraint. The constraints supported on other kinds of parameters are currently limited to exact equality constraints supported by the meta-classes ImplementationEqConstraint, InterfaceEqConstraint and JavaTypeEqConstraint. Three constraints that support logical combinations of other constraints have also been added:ConjonctionConstraint, DisjunctionConstraint and NegationConstraint. Support for explicit specialization of composite has been added with the addition of a compositeSpecialization meta-class which duplicates the content of the Composite meta-class. 5.3

Implementation

A prototype implementation of a generic SCA to plain SCA transformation engine has been developed. It implements the algorithm described in Section 4.3. As a special case, support for generic Java classes simply consists in checking Java type parameters for compatibility and erasing them. This is due to the fact that Java handles generics by type erasure: type parameters are used at compile-time for checking validity and then removed from the generated class ﬁle. 1

http://www.eclipse.org/stp/sca/

28

J. Bigot and C. P´erez

The meta-models of SCA and generic SCA are written in the ecore modeling language,. A ﬁrst implementation attempt has been made with a Domain Speciﬁc Language (DSL) for model transformations: QVT. The support for this language for the transformation of ecore meta-models is however not satisfying yet and the algorithm has ﬁnally been coded in plain Java. Java classes corresponding to the ecore meta-classes of generic SCA have been automatically generated. Those provided as part of the eclipse SCA Tools project and corresponding to the meta-classes of plain SCA have been used. The code used to instantiate these classes by parsing generic SCA XML ﬁles and to dump them in plain SCA XML ﬁle is also automatically generated thanks to annotations in the meta-model. More than 50.000 lines of Java code have been generated; the same amount from the eclipse SCA Tools project are reused. The implementation of the transformation algorithm requires around 750 lines of Java. Most of them simply copies attributes from classes modeling concepts of generic SCA to the attribute with the same name in classes modeling plain SCA (last case of the algorithm). Those could also have been automatically generated if QVT had been used. The real logic of the algorithm only requires around 100 lines of Java; this is however only an estimation as it is mixed with the attribute copy part.

6

Generic Task Farm Component in Generic SCA

This sections examines the deﬁnition and implementation of a generic task farm in generic SCA. It aims at showing the feasibility of the approach. 6.1

Generic Farm Component

The Farm composite implements the task farm and accepts six parameters. Two Java type parameters: I and O deﬁne the type of the input and output of the farm respectively. There are three implementation parameters D, W and C that deﬁne the types of the dispatcher, workers and collector respectively; and an integer parameter N that deﬁnes the number of workers. Its implementation is shown in Fig. 3. It simply instanciates the D and C components and relies on the Replication composite further described in the next subsection to instantiate multiple instance of W. These instances are connected by data streams simulated with a generic Java interface DataPush with a single asynchronous method void push(T data). This interface is used with I as argument before the workers and with O after. Farm Gen

in

D in dispatcher out T=I

SCA

in

Replication workers out

Gen

in

C collector

I=I, O=O, R=N, C=W

Fig. 3. The Farm composite

out

T=O

out

Increasing Reuse in Component Models through Genericity

29

The D and C parameters have default values provided: RRDispatcher and SimpleCollector that dispatch the data using a round-robbin algorithm and collect them with no reordering. These are generics Java implementations that do not depend on the data type manipulated. 6.2

Generic Replication Component

The Replication composite implements the replication of a given component and accepts four parameters. Two Java type parameters (I and O) deﬁne the type of its input and output. An implementation parameter (C) deﬁnes the type of the replicated component. An integer parameter (R) deﬁnes the number of replications. Its implementation shown in Fig. 4 relies on meta-programming and reReplication cursion. It contains one instance of C GenC called “additional” and one instance in in additional out out of Replication with the value of R When (R==1) id = R−1 decreased by one. The base case of SCAReplication GenC others the recursion is provided by an exin out in in initial out out plicit specialization used when the id = 0 R = R−1 value of R reaches one. Fig. 4. The Replication composite In the non specialized implementation of the composite, the “in” service promotes two services. This is not allowed by the SCA speciﬁcation. As a workaround, a concept of multiple service has been added to generic SCA that can only be connected to a reference with multiplicity “0..n” or “1..n”. At transformation phase, instances of multiple services are replaced by multiple instances of classical services.

6.3

Evaluation

As seen in this example, the behaviour of generic assemblies relying on metaprogramming can become rather hard to analyse. It remains the duty of the component developer to describe the behaviour of such composites so that they can be used as black-boxes, exactly as in the case of primitive components. This implementation of the task farm has been used to compute pictures of the mandelbrot set. Two kinds of workers have been written and used with the generic task farm: one that computes the value of a single pixel at a time and another that computes whole tiles. Each version has been used in the farm with one, two and four workers. The transformation phase takes between one and two seconds and most of this time is spent parsing the input ﬁles. The resulting component have been succefully run using tuscany-java-1.4 on muticore hosts. The meta-model for generic SCA, the compiler and the source code for these components can be found at http://graal.ens-lyon.fr/~jbigot/genericSCA.

30

7

J. Bigot and C. P´erez

Conclusion

This paper has studied the feasibility of increasing reusability in component models thanks to genericity. To make use of existing models, a generic metamodel was derived from an existing one, and an algorithm was provided to transform generic component applications into non-generic ones. This has been applied to SCA and validated with an image rendering application based on a generic task farm component. Future works include the application of this approach to others models than SCA, the comparison of the implementation of skeletons using genericity with classical skeletons and the support for dynamic instantiation of generic components. Another interesting problem is the automatic choice of value for some parameters (for example those related to available execution resources) including the case of set of parameters interacting with each others.

References 1. Bruneton, E., Coupaye, T., Stefani, J.B.: The Fractal Component Model, version 2.0.3 draft. The ObjectWeb Consortium (February 2004) 2. Object Management Group: Common Object Request Broker Architecture Specification, Version 3.1, Part 3: CORBA Component Model (January 2008) 3. Open Service Oriented Architecture: SCA Service Component Architecture: Assembly Model Specification Version 1.00 (March 2007) 4. Allan, B.A., et al.: A Component Architecture for High-Performance Scientific Computing. International Journal of High Performance Computing Applications 20(2), 163–202 (2006) 5. Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004) 6. Musser, D.R., Stepanov, A.A.: Generic Programming. In: Gianni, P. (ed.) ISSAC 1988. LNCS, vol. 358, pp. 13–25. Springer, Heidelberg (1989) 7. Garcia, R., Jarvi, J., Lumsdaine, A., Siek, J.G., Willcock, J.: A comparative study of language support for generic programming. In: OOPSLA, pp. 115–134. ACM, New York (2003) 8. Bracha, G.: Generics in the Java Programming Language (July 2004) 9. Stroustrup, B.: The C++ Programming Language, 3rd edn. Addison-Wesley Longman Publishing Co., Boston (2000) 10. Gregor, D., J¨ arvi, J., Siek, J.G., Stroustrup, B., Reis, G.D., Lumsdaine, A.: Concepts: linguistic support for generic programming in C++. In: OOPSLA, pp. 291–310 (2006) 11. Abrahams, D., Gurtovoy, A.: C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond. C++ in Depth Series. Addison-Wesley Professional, Reading (2004) 12. Aldinucci, M., Bouziane, H., Danelutto, M., P´erez, C.: Towards Software Component Assembly Language Enhanced with Workflows and Skeletons. In: Joint Workshop on Component-Based High Performance Computing and ComponentBased Software Engineering and Software Architecture, CBHPC/CompFrame 2008 (October 2008) 13. Gorlatch, S., D¨ unnweber, J.: From Grid Middleware to Grid Applications: Bridging the Gap with HOCs. In: Future Generation Grids. Springer, Heidelberg (2005)

Verifying Component-Based Software: Deep Mathematics or Simple Bookkeeping? Jason Kirschenbaum1 , Bruce Adcock1 , Derek Bronish1 , Hampton Smith2 , Heather Harton2 , Murali Sitaraman2 , and Bruce W. Weide1 1

The Ohio State University, Columbus OH 43210, USA {kirschen,adcockb,bronish,weide}@cse.ohio-state.edu http://www.cse.ohio-state.edu/rsrg 2 Clemson University, Clemson SC , USA {hamptos,hkeown,msitara}@clemson.edu http://www.cs.clemson.edu/~ resolve/

Abstract. Anecdotal experience constructing proofs of correctness of code built from reusable software components reveals that they tend to be relatively trivial bookkeeping exercises: they rarely require a substantive mathematical deduction. A careful empirical analysis of hundreds of verification conditions (VCs) for a library of component-client code shows the level of sophistication each proof requires, and suggests how to use the results to characterize a notion of mathematical “obviousness.”

1

Introduction

Perhaps the most powerful tool of modern programming languages—and a feature that makes new languages enticing to prospective developers—is a rich library of well-documented software components. Not only is the code in these catalogs (or “APIs” or “libraries”) reusable, but code that makes use of them tends to be reusable as well, because it builds on a cleanly-modularized foundation: programmers use components to build more specialized and sophisticated components. Thus, when considering the task of verifying the correctness of software, one logical place to begin is with veriﬁcation of code that is written from the perspective of clients using these “catalog components.” Sitaraman et al. [1] have explained how to reason about and formally verify the execution-time behavior of reusable software components. The use of programming-by-contract with abstract mathematical models, along with corresponding sound proof systems, allows software developers to reason about the correctness of newly-created code that uses components. One limitation of this process, however, is its reliance on people to check the veriﬁcation conditions (VCs) that result from the syntax-driven proof system. Given a human being’s propensity for error, one natural solution is to automate the reasoning process; VCs can be discharged automatically in a “push-button” manner. The verifying compiler grand challenge [2,3] is to build a compiler that both generates executable code and performs the reasoning process described above. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 31–40, 2009. c Springer-Verlag Berlin Heidelberg 2009

32

J. Kirschenbaum et al.

This tool would give developers more conﬁdence in the correctness of implemented software components, providing the ability to reuse software using only the contracts (or “interfaces”) of the components. One stumbling block in the path of meeting this challenge, of course, is the diﬃculty of fully automating mathematical theorem proving. In the general case, the problem of automatically proving or disproving a VC is undecidable [4]. We proceed on the notion that, because software engineers write code that is motivated by practical mathematical insight, real software will not normally result in VCs that are inherently undecidable. Indeed, our thesis is that most VCs generated from most programs are “obvious.” A precise deﬁnition of obvious is a long term research question that is important for software veriﬁcation; we would like to be able to characterize what makes a program automatically provable. Automated provers will always require suﬃcient justiﬁcation, as part of the program itself, to make correctness manifest. In this paper we examine VCs generated from code that builds on reusable software components, and classify them based on the reasoning methods needed for their proofs—a ﬁrst step towards an eventual rigorous characterization of obviousness. This work is, to our knowledge, the ﬁrst that classiﬁes VCs in accordance with these objectives and makes a case for the simplicity—as opposed to the oft-assumed diﬃculty—of automatically verifying “everyday” programs. Based on our earlier work [5], we generate VCs from a battery of programs that make use of a basic component library. Of course, the results of this analysis are arguably dependent on the particular proof system and programming language used. To a lesser extent, the analysis may also be dependent on the particular components used and implementations veriﬁed, but we argue that the conclusions generalize along these dimensions—a testable thesis oﬀered as a challenge to the community. Section 2 describes RESOLVE, the speciﬁcation and programming language used in this work. In Section 3 we elaborate our taxonomy of VC classiﬁcation. Section 4 presents our analysis of one example for illustration, and more general results gathered across a large component catalog. Section 4 discusses related work and Section 6 contains concluding remarks and directions for future work.

2

RESOLVE: VC Generation and an Example

The language used in this work is RESOLVE [6], an imperative language designed with fostering reusable, component-based software as the foremost objective. RESOLVE provides a catalog of abstract data type interfaces (e.g., Queue, Set, Map) on which client code can rely. The client code is written in a programmingby-contract discipline with model-based mathematical speciﬁcations, and thus adheres to a strict encapsulation boundary between abstract interface contracts and concrete implementation details: only the contracts, which formally deﬁne a component’s behavior, are necessary to write and reason about client code. More speciﬁcally, RESOLVE enforces so-called “two-level thinking”: reasoning about the correctness of a component implementation requires only the speciﬁcation of that component along with the speciﬁcations of any components used

Verifying Component-Based Software

33

in the implementation. This clear delineation of roles and emphasis on mathematical formalism facilitates modular veriﬁcation of RESOLVE programs via a two-stage process of (1) VC generation and (2) VC proof. The purpose of the proof system is to transform, syntactically, a RESOLVE program into a set of logical formulas whose validity is equivalent to the correctness of the program. One other major diﬀerence between RESOLVE and other languages is the use of swap, denoted by :=:, as the primary data movement operator. Swapping serves a role similar to assignment in other languages, and has been argued to be a better choice for data movement, particularly for the purposes of reasoning about code [7]. The statement x :=: y exchanges the values of x and y. A ﬁnal key attribute of RESOLVE is that the language includes syntactic slots for mathematical code annotations, which allow for a sound and relatively complete proof system [8,9]. These annotations include loop invariants, termination metrics for loops and recursive operations, and indeed arbitrary mathematical assertions, which must themselves be proved, that may “assist” the automated reasoning process in discovering mathematical insights underlying the code. Our hope is to minimize the number of extra programmer-supplied assertions necessary to verify code, and in fact none of the code veriﬁed in the work for this paper required any extra assertions beyond standard loop invariants and progress metrics (which are always required). We generate VCs automatically in accordance with the RESOLVE program proof system deﬁned formally in [9] and informally in [1]. As outlined in [1], this process can best be characterized as ﬁlling out a symbolic tracing table. We generate one VC for each line in the program where the next statement requires a pre-condition to be satisﬁed or a loop invariant/progress metric property to be upheld. We also generate a VC which states that, at end of the code being veriﬁed, the implementation has met its obligation as stated in its contract. Note the modularity of this approach: requirements on legally calling operations are proven to be met on the client’s side, whereas the correct behavior of program units assuming satisﬁed pre-conditions is veriﬁed once-and-for-all in isolation. The VC generator works in two phases. The by-rote VC generation of phase one introduces a large number of mathematical variables: one for each program variable in each program state (roughly speaking, we consider the execution of each line of code in the program to deﬁne a new “program state” or “program point”). For instance, xi stands for the value of program variable x in state i. In a program with n variables and m statements, the VCs contain a total of nm distinct mathematical variables at the end of phase one, and of course each variable may actually occur multiple times. The simpliﬁcation phase of the VC generator then applies a few theoryindependent logical restructuring rules to each VC. The most obvious and useful simpliﬁcation is basic substitution. A majority of the nm-many variables are removed by this step. For instance, the hypotheses in the original VCs often contain clauses which indicate that the value of some variable was preserved from one program state to the next, e.g., x10 = x9 . Upon encountering this situation we replace x10 by x9 throughout the VC and remove this clause altogether.

34

J. Kirschenbaum et al.

Any eﬀective back-end automated prover would certainly be able to apply such substitutions to achieve the same eﬀect, but the VC generator does this before handing the VCs to an automated prover so the human-readable output of the VC generator is more concise. The second phase also makes various structural changes to VCs in an eﬀort to render them more tractable for processing by an automated theorem prover. An example is the explicit introduction of case analysis. In our experience, useful case analysis can rarely be performed spontaneously by a back-end automated prover. Requiring human advice about appropriate cases seems to be the norm, and certainly is necessary in general. However, the kinds of case splits required for discharging VCs, rather than arbitrary mathematical sentences, are relatively simple and can usually be deduced from the structure of the code. The VC generator therefore makes case splits explicit. The simpliﬁcation phase divides each VC into appropriate cases based on the control ﬂow of the code, thus obviating the need for any special insight or assistance on the prover’s behalf. The net result of the two-phase VC generation process is that, while there are normally many VCs even for a relatively short piece of code, based on anecdotal experience, each VC tends to be relatively small and “prover-friendly.” To aid our explanation of the VC classiﬁcation in Section 3, we present an example software component contract along with a component extension, and code that purports to implement that extension. The software component is a stack and the extension deﬁnes an additional operation that reverses a stack. The contract is given in Fig. 1(b), and describes the behavior of the stack in terms of a mathematical model, namely a mathematical string. Each operation contract, for example Push, describes the eﬀects of the operation on the model via requires and ensures clauses (the pre- and post-conditions of the operation, respectively). The symbol # in the ensures clause denotes the incoming value of a parameter. The contract for the Reverse operation is depicted in Fig. 1(c). This contract uses a mathematical function reverse in the ensures clause; its meaning should be unsurprising. Finally, realization code that is purported to implement the Reverse operation is given in Fig. 1(a). Note the use of the swap operator :=: in this code. Also, this code illustrates the requisite code annotation in the form of a loop invariant (the maintains clause), and a loop progress metric (the decreases clause).

3

Classification of VC Proof Techniques

We use the implementation of a stack reverse extension shown in Fig. 1(a) to illustrate the VC classiﬁcation process. Figures 3 through 6 show example VCs generated by the method discussed in Section 2. Note that a VC always takes the form of a logical implication whose antecedent is a conjunction of zero or more clauses. We refer to clauses of the antecedent as “hypotheses” and the consequent as the “goal”. We represent our VCs by showing all of the antecedent’s clauses conjoined above the solid line, with the consequent written below the line.

Verifying Component-Based Software

contract StackTemplate (type Item)

realization Iterative implements Reverse for StackTemplate procedure Reverse (updates s: Stack) variable tmp: Stack loop maintains reverse (s) * tmp = reverse (#s) * #tmp decreases |s| while not IsEmpty (s) do variable x: Item Pop (s, x) Push (tmp, x) end loop s :=: tmp end Reverse end Iterative (a) Stack Reverse Implementation

contract Reverse enhances StackTemplate procedure Reverse (updates s: Stack) ensures s = reverse (#s) end Reverse

35

math subtype STACK_MODEL is string of Item type Stack is modeled by STACK_MODEL exemplar s initialization ensures s = empty_string procedure Push (updates s: Stack, clears x: Item) ensures s = <#x> * #s procedure Pop (updates s: Stack, replaces x: Item) requires s /= empty_string ensures #s = <x> * s function IsEmpty (restores s: Stack) : control ensures IsEmpty = (s = empty_string) end StackTemplate

(c) Stack Reverse Contract

(b) Stack Contract Fig. 1. Stack Example

Our classiﬁcation of the diﬃculty of the proof requirements of VCs is presented in Fig. 2(a). Note that the categories (i.e., sets of VCs) in Fig. 2(a) have a natural structure. The L category is a subset of every other category. Each LHi is a subset of LHi+1 , each MHi is also a subset of MHi+1 , and each DHi is a subset of each DHi+1 . Finally, each LHi is a subset of MHi ; also, each MHi is a subset of DHi . This structure forms a lattice shown in Fig. 2(b). Our methodology for analyzing the RESOLVE component catalog is to instrument an in-house VC prover, SplitDecision, to analyze each proven VC to

36

J. Kirschenbaum et al.

Label What is needed in the proof L Rules of mathematical logic Hn At most n hypotheses from the VC needed (n > 0) M Knowledge of mathematical theories used in the specifications D Knowledge of programmer-supplied definitions based on mathematical theories above (a) VC Classification

DH2 MH2 LH2

DH1 MH1 LH1

D M L

(b) Lattice of the VC categorization

Fig. 2. Table of VC Categorization and Diagram of Category Relationships

determine the “lowest” category the VC is in. We do this by checking if the goal is a tautology, checking to see how many hypotheses are needed in the proof, etc. The automatic nature of this can only guarantee we ﬁnd an upper bound for any VC. For example, a particular VC categorized as MH1 might in fact be M or LH1 , but certainly not MH2 or DH1 . For VCs that are true but are not proven automatically (12.5% of VCs in the study), we manually examine those VCs to determine the category. Figure 3 shows a VC that falls ⇒ reverse(s0 ) ∗ Λ = reverse(s0 ) ∗ Λ into the L category. Its proof relies Fig. 3. Example from category L only on an axiom of ﬁrst order logic with equality: ∀x.x = x. We note that VCs of type L are the “most obvious” kind: no knowledge of the underlying mathematical theory nor even any of the hypotheses of the VC are needed to establish the goal. Any rigorous, technical deﬁnition of “obvious” should include all VCs of this type. s2 =Λ A VC that falls into an LHn ∧ reverse(s ) ∗ tmp = reverse(s ) 2 2 0 ∗Λ category is given in Fig. 4. More ∧ |s 2| ≥ 0 speciﬁcally, we label this VC as LH1 ∧ is initial(x 3) because it is provable by noting sim⇒ s = Λ 2 ply that the goal is one of the hypotheses. LH1 VCs clearly should be Fig. 4. Example from category LH1 considered “obvious,” whereas other LHn VCs may or may not be obvious, depending on the amount of logical deduction necessary to actually establish the implication. A VC in category M is given in |s2 | ≥ 0 Fig. 5. This VC depends only on the ⇒ s0 = Λ → |s0 | > 0 validity of the goal in the underlying mathematical theory. In this case, Fig. 5. Example from category M the goal is a direct consequence of

Verifying Component-Based Software

37

a useful theorem in the mathematical theory of strings; it can be derived using the proof rules of ﬁrst order logic, plus the theory of integers to account for 0 and >. If the theorem is known and available, then this VC is surely obvious. If not, it might seem obvious to humans on semantic grounds, but a symbolic proof is several steps long. Finally, in Fig. 6 we include a ...[9 irrelevant hypotheses] VC from an implementation of the ∧ n 0 ∗ m0 = n3 ∗ m3 + p 3 well-known Egyptian multiplication ∧ m3 = m 8 + m 8 ∨ m3 = m8 + m8 + 1 algorithm to illustrate the D cat∧ IS ODD(m3 ) egory. Its proof requires three hy⇒ n0 ∗ m0 = (n3 + n3 ) ∗ m8 + p3 + n3 potheses. Without knowledge of the programmer-supplied deﬁnition of Fig. 6. Example from category DH3 IS ODD—either an expansion of the deﬁnition or the algebraic properties thereof—the VC cannot be proven. The distinction between the D and M categories is based on the current state of the components. As more MH5 4 theories are developed it is possible DH4-7 0.5% that programmer-supplied deﬁnitions 4 0.4% are moved into the mathematical theories because of their general utility. MH4 However, it is also clear that there will DH3 26 always be examples of mathematical def9 3.0% 1.0% initions that are programmer-supplied, for example the format of a particular type of input ﬁle. MH3 37 4.3%

4 Analysis of Verification Conditions The VCs analyzed for the paper are generated from roughly ﬁfty components that comprise a total of roughly two thousand lines of source code. The code ranges from simple implementations of arithmetic using unbounded integers, to sorting arbitrary items with arbitrary orderings, and includes classical code such as binary search using bounded integers. Figure 7 shows classiﬁcation data for all VCs generated from a sample catalog of RESOLVE component client code that relies on existing, formally-speciﬁed components to implement extensions,

LH2 2 0.2%

DH2 5 0.6% MH2 84 9.7%

LH1 78 9.0%

MH1 222 25.7%

DH1 2 0.2%

D 2 0.2%

M L 183 21.2%

207 23.9%

Fig. 7. VCs as categorized

38

J. Kirschenbaum et al.

which add additional functionality (e.g., the aforementioned Stack Reverse). Here, the area of each bubble in the lattice is proportional to the fraction of the 865 VCs that fall into that category. The number and percentage of all VCs falling into each category is shown in or near its bubble. Several interesting features are exhibited in Fig. 7. First, over 30% of the VCs can be proved without the use of any mathematics or more than one hypothesis. Moreover, over 75% of the VCs can be proved using no more than general mathematical knowledge along with at most one hypothesis. However, programmer-supplied deﬁnitions do tend to result in more sophisticated proofs; more hypotheses tend to be needed than in the general case. It is rare for a VC to require only a programmer-supplied deﬁnition. The programmersupplied deﬁnitions tend to encapsulate complexity; this complexity is apparent from the larger number of hypotheses needed to prove such VCs. While the number of hypotheses needed to prove VCs is interesting, the metric may mask some other properties of the VCs. For example, a VC whose proof requires a larger percentage of the hypotheses might be considered “less obvious” than a VC whose proof requires a small fraction of the hypotheses. Based on the results, this is fairly rare; the majority of VCs (over 90%) can be proved using only 30% of the hypotheses.

5

Related Work

We are unaware of any prior work that empirically examines the structure or proof diﬃculty of VCs. Work in this area has focused instead on the initial subproblems of generating VCs and creating tools to prove them. For example, the problem of generating VCs has been tackled using the Why methodology [10], which involves a simpliﬁed programming language, annotated with logical deﬁnitions, axioms, pre-conditions, post-conditions and loop invariants, from which VCs can be generated. A subset of both C (with annotations) and Java (with JML speciﬁcations) can be translated into the simpliﬁed programming language, such that the VCs generated are claimed to represent the correctness of the original C or Java code. The translation process from C or Java must explicitly capture the memory model of the original source language (C or Java). As a result of using RESOLVE, we do not need an explicit memory model, dramatically simplifying the generated VCs. Also addressing the problem of generating VCs, the tool Boogie [11] takes as input annotated compiled Spec# [12] code and generates VCs in the BoogiePL language. This language has support for mathematical assumptions, assertions, axioms and function deﬁnitions along with a restricted set of programming language constructs. The BoogiePL representation is used by Boogie to generate ﬁrst order mathematical assertions. The method [13] used for generating VCs is similar to the method presented in Section 2. Addressing both the problem of generating VCs from program source and of creating tools that can prove the generated VCs automatically, Zee et al. [14] have used a hybrid approach of applying both specialized decision procedures

Verifying Component-Based Software

39

and a general proof assistant to prove that code purporting to implement certain data structure speciﬁcations is correct. However, the use of Java as a starting language requires that the speciﬁcations use reference equality for comparison. Our approach proves properties that depend on the values of the objects instead. To repeat, none of the above papers nor any others we know about present empirical data about the sources of proof diﬃculty across even a small selection of VCs, let alone hundreds as reported here.

6

Conclusion and Future Work

This paper has presented the ﬁrst, to our knowledge, examination of the structure of VCs generated for client code built on reusable software components. The statistics support our hypothesis that VC proofs are mostly simple bookkeeping. Only a few VCs are structurally complicated. Moreover, the vast majority of VCs can be proved using at most three hypotheses. We have done some work using formal proof rules underlying a goal-directed approach, discussed in [8]. Preliminary results suggest that the goal-directed proof system generates fewer VCs that are comparably simple, according to taxonomy and the metrics, as the tabular method of VC generation. Further investigation of this phenomenon is in order. Our goal is to extend this work toward a more rigorous deﬁnition of obvious with a ﬁner grained evaluation of the current VCs that is more directly connected to a particular type of prover. Performing a more rigorous comparison of diﬀerent VC generation strategies along with a comparison of VCs generated using diﬀerent programming languages should provide valuable additional information about the sources of diﬃculty of VC proofs.

Acknowledgments The authors are grateful for the constructive feedback from Paolo Bucci, Harvey M. Friedman, Wayne Heym, Bill Ogden, Sean Wedig, Tim Sprague and Aditi Tagore. This work was supported in part by the National Science Foundation under grants DMS-0701187, DMS-0701260, CCF-0811737, and CCF-0811748.

References 1. Sitaraman, M., Atkinson, S., Kulczycki, G., Weide, B.W., Long, T.J., Bucci, P., Heym, W.D., Pike, S.M., Hollingsworth, J.E.: Reasoning about softwarecomponent behavior. In: Frakes, W.B. (ed.) ICSR 2000. LNCS, vol. 1844, pp. 266–283. Springer, Heidelberg (2000) 2. Hoare, C.A.R.: The verifying compiler: A grand challenge for computing research. J. ACM 50(1), 63–69 (2003) 3. Woodcock, J., Banach, R.: The verification grand challenge. Journal of Universal Computer Science 13(5), 661–668 (2007), http://www.jucs.org/jucs_13_5/the_verification_grand_challenge

40

J. Kirschenbaum et al.

4. Enderton, H.: A Mathematical Introduction to Logic. Harcourt/Academic Press (2001) 5. Weide, B.W., Sitaraman, M., Harton, H.K., Adcock, B., Bucci, P., Bronish, D., Heym, W.D., Kirschenbaum, J., Frazier, D.: Incremental benchmarks for software verification tools and techniques. In: Shankar, N., Woodcock, J. (eds.) VSTTE 2008. LNCS, vol. 5295, pp. 84–98. Springer, Heidelberg (2008) 6. Ogden, W.F., Sitaraman, M., Weide, B.W., Zweben, S.H.: Part I: the RESOLVE framework and discipline: a research synopsis. SIGSOFT Softw. Eng. Notes 19(4), 23–28 (1994) 7. Harms, D., Weide, B.: Copying and swapping: Influences on the design of reusable software components. IEEE Transactions on Software Engineering 17(5), 424–435 (1991) 8. Krone, J.: The Role of Verification in Software Reusability. PhD thesis, Department of Computer and Information Science, The Ohio State University, Columbus, OH (December 1988) 9. Heym, W.D.: Computer Program Verification: Improvements for Human Reasoning. PhD thesis, Department of Computer and Information Science, The Ohio State University, Columbus, OH (December 1995) 10. Filliˆ atre, J.C., March´e, C.: The Why/Krakatoa/Caduceus platform for deductive program verification. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 173–177. Springer, Heidelberg (2007) 11. Barnett, M., Chang, B.Y.E., DeLine, R., Jacobs, B., Leino, K.R.M.: Boogie: A modular reusable verifier for object-oriented programs. In: de Boer, F.S., Bonsangue, M.M., Graf, S., de Roever, W.-P. (eds.) FMCO 2005. LNCS, vol. 4111, pp. 364–387. Springer, Heidelberg (2006) 12. Barnett, M., Leino, K.R.M., Schulte, W.: The Spec# programming system: An overview, http://citeseer.ist.psu.edu/649115.html 13. Barnett, M., Leino, K.R.M.: Weakest-precondition of unstructured programs. In: PASTE 2005: Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 82–87. ACM, New York (2005) 14. Zee, K., Kuncak, V., Rinard, M.: Full functional verification of linked data structures. SIGPLAN Not. 43(6), 349–361 (2008)

Extending FeatuRSEB with Concepts from Systems Engineering John Favaro and Silvia Mazzini Intecs SpA via E. Giannessi, 5y – I-56121 Pisa, Italy {John.Favaro,Silvia.Mazzini}@intecs.it

Abstract. FeatuRSEB is a method for domain modeling of software system families using the industry standard notation of the Unified Modeling Language. FeatuRSEB/Sys is an extension of FeatuRSEB with constructs from the SysML profile for systems engineering, augmenting it with analysis and modeling mechanisms that are upstream from the original method, while nevertheless conserving the advantages of an industry standard notation and semantics. Keywords: Domain analysis, reuse, feature, UML, SysML, systems engineering, requirements, analysis, modeling, architecture, trade studies.

1 Introduction James Neighbors introduced the term domain analysis into the literature in 1984 in an article [1] on his pioneering Draco system. Another milestone occurred in 1990 with the introduction [2] of Feature-Oriented Domain Analysis (FODA). FODA became a catalyst for the development of subsequent methods due to a number of characteristics including a straightforward scheme for capture and classification of domain knowledge that is easily understandable by humans. These characteristics have led to the development of several FODA variants over the years [3]. The FeatuRSEB [4] variant of FODA is a feature-oriented extension of an application family development method, the Reuse Oriented Software Engineering Business (RSEB) [5], arising out of collaboration between Hewlett-Packard Research and Intecs. It has two characteristics of particular interest: • It was one of the first domain analysis methods to employ the notation of the Unified Modeling Language [6]. Since then, other feature-oriented methods using UML notation have appeared [7]. These approaches have the advantage that an industry-standard modeling notation is used instead of an ad hoc, proprietary notation, and the semantics of core modeling concepts (such as classes and relationships) are not interpreted in an ad hoc fashion. This makes it easier to focus on the value-adding characteristics of the method. • Due to its origins in the Reuse Driven Software Engineering Business, nontechnical, business-oriented factors are given more explicit emphasis with respect to purely technical factors; S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 41–50, 2009. © Springer-Verlag Berlin Heidelberg 2009

42

J. Favaro and S. Mazzini

Intecs works with clients in the aerospace, automotive, space, and telecommunications industries, where we advise them on development methodologies and provide specialized toolsets such as HRT-UML for the modeling of hard real-time systems [8]. This work has exposed us to the discipline of systems engineering, which is central to the development of large systems in those industries. Based upon this experience, we have extended FeatuRSEB with modeling concepts and constructs from the Systems Modeling Language (SysML) profile [9] of the UML in order to extend its capabilities for high-level analysis for domain modeling of large software system families, while preserving the advantages of an industry-standard notation and semantics.

2 Relating Systems Engineering to Domain Engineering Systems engineering as a recognized discipline dates back to the 1950s, and has been applied with success to large and complex systems in contexts ranging from defense and air traffic control to commercial products. Systems engineering essentially consists of three major concurrent activities (Fig. 1). Requirements elicitation and documentation activities

Analyses (feasibility studies, relationships, constraints, …)

High-level system functional design

Fig. 1. Three concurrent activities in systems engineering

• Requirements elicitation and documentation is the core activity of systems engineering. • Numerous types of analyses are carried out, related to the specific disciplines involved, and may include engineering and economic feasibility studies, documentation of relationships among components, and identification and documentation of physical constraints on system elements. In particular, they may include calculations and formulas related to non-functional requirements. • High-level system functional analysis identifies the principal subsystems, their interfaces, and dataflows through those interfaces. Although each of these activities superficially appears to have a direct counterpart in the software engineering life cycle, their nature and purpose are different in systems engineering. Requirements are first-class citizens in systems engineering (in the sense that they are elicited, documented, and managed as separate entities) because they are an output of the systems engineering process, whereas in software engineering processes, requirements tend to be managed as an input. In fact, it would be only a slight exaggeration to state that systems engineering is mostly concerned with eliciting requirements of systems and about analyzing the feasibility of designs. Requirements are elicited with the support of analyses that in turn elicit relationships and constraints among different parts of the system. The documentation of these analyses, trade-off studies, and associated decisions is likewise considered a firstclass citizen in systems engineering, rather than an afterthought as so often happens in

Extending FeatuRSEB with Concepts from Systems Engineering

43

software systems design (often something as poor as a simple text field to “describe rationale in 256 characters or less”). This elevation of rationale and decision-making documentation to first-class citizenship is in line with recent awareness in the software engineering community of its importance, e.g. through the addition of the Decision View in the standard 4+1 views model of software architectural description [10]. In domain analysis it is even more important, where the recording of trade-off and decision-related information for entire families of systems is crucial for informing the feature selection process for subsequent domain engineering of systems. The term “functional analysis” in systems engineering has a more general meaning than in software engineering. This is so in large degree because it is essential for determining subcontracting boundaries: systems engineers construct the product tree of all identified components of the system (including the software component) and then each such component is contracted out to others to construct (it is not unusual for a systems engineering organisation to have no construction capability at all). Requirements elicitation, analysis and decision-recording, and system component management mechanisms are much less well developed in software engineering processes because they are upstream from where software engineering processes generally start. Indeed, in the ECSS-E40 standard of the European Space Agency for software engineering [11], systems engineering activities are explicitly defined to be precedent to the software engineering activities. This is where the relationship of system engineering to domain engineering emerges: domain engineering is also upstream from the traditional (single-system) software engineering process. The outputs of domain engineering also serve as inputs to the activities for engineering (single) software systems. Therefore, in retrospect it is not surprising that we found precisely these mechanisms from systems engineering to add significant value to the domain modeling process.

3 Introducing SysML into Domain Analysis: The Domain Models The table in the Appendix summarizes the models and approach of FeatuRSEB/Sys, describing also the ways in which each model handles variability. In this section we discuss in more depth those models which integrate systems engineering concepts. 3.1 Context Model Context analysis was introduced in original FODA. The purpose of the context model is to define the scope of the domain that is likely to yield exploitable domain products. FeatuRSEB/Sys context diagrams are represented as SysML block diagrams. Standard UML is biased toward the concrete development of software systems: it naturally forces thinking in terms of objects and methods, quickly driving toward design. (Indeed it has been suggested by the inventors of FODA that it has been pushed beyond its original intention of analysis into design). At the highest levels of domain engineering, though, interfaces are much more important than specific “methods” on classes. With SysML blocks (thought of as “whited-out classes”) the domain analyst is freed from the “tyranny of the methods.” As in other FODA

44

J. Favaro and S. Mazzini

variants, context analysis provides the high level connections among major components related to the target domain. The exchange of information (dataflow) with external actors (peer domains) and lower domains is indicated at block boundaries by means of SysML flow ports. The SysML blocks and flow ports help to pull the context analysis up to a higher level of abstraction more suited to domain engineering. 3.2 Domain Dictionary The domain dictionary consolidates the definitions of all terms used in the domain and is updated over time. In FeatuRSEB the feature model did “double duty” as a domain dictionary, and this is also true in FORM [12] to a degree. In FeatuRSEB/Sys the domain dictionary is separate and explicit, serving the same role as in systems engineering of establishing a common basis for communication among all participants in the domain implementation activities. 3.3 Domain Requirements Model In FeatuRSEB the use case model alone represented user requirements, relieving the feature model of FODA from doing double duty to represent both user requirements and reuser features. In FeatuRSEB/Sys a similar phenomenon occurs: the use case model is relieved from representing both use scenarios and implicitly the requirements they embody, by adding a requirements model to model the requirements explicitly. Requirements reuse has received some attention in the literature [13], but its integration into domain analysis methods tends to take the form of implicit inclusion in domain use case models and to some extent feature models. We argue that a separate domain requirements model is a useful addition to the set of domain analysis models, for at least two reasons: first, although use case models are often referred to as “requirements models”, we subscribe to the point of view currently argued by many that use cases are at a lower level of abstraction than the requirements themselves – rather, use cases generally represent the results of requirements analysis; second, an explicit requirements model is particularly useful for documenting many types of technical, organizational, economic, and legal non-functional requirements, which are not handled well by use case techniques with their bias toward functional requirements, but which a domain analysis must capture in order to be complete. In short, explicit modeling of domain requirements contributes to keeping the level of abstraction of the domain analysis as high as possible. SysML provides an explicit requirements diagram (Fig. 2), which can serve as an important starting point for reusable requirements for new systems in a domain. For example, it gives the domain analyst a place to document the cost of implementing reusable requirements [14]. Furthermore, the requirements model allows requirements management tools to link from the requirements to their implementation in domain systems. This supports requirements traceability, and thereby provides support not only for the construction phase of domain engineering, but also for V&V, which has a heavier footprint in larger, more complex systems.

Extending FeatuRSEB with Concepts from Systems Engineering

45

Fig. 2. Two requirements from the domain requirements model corresponding to the telecommunications example in [4]. The requirement on the right-hand side could not be easily captured and modeled using only use case techniques.

3.4 Analysis Models In the UML, analysis has a rather narrow meaning, involving the transition from use cases to object design. FeatuRSEB inherited Jacobson robustness analysis from the RSEB for this purpose. But in systems engineering, analysis has a more general meaning, and is more about exploring feasibility, relationships and constraints at high system levels (relationships between large blocks of system functions), and in general doing supporting analysis to ensure that a system functional architecture is sound and cost-effective. In UML, there are no standard ways to document such characteristics. Some progress is being made with improved mechanisms for describing constraints, but in general the user is left to his own devices. In SysML, a new diagram has been introduced solely for documentation of constraints and relationships among system parts. Parametric diagrams are used to capture the properties of the applications or subsystems of the domain. They can express powerful constraints that abstract away specific constraints of individual systems into more general constraints (e.g. expressed in terms of formulas) characterizing aspects of system families. For example, perhaps a «performanceRequirement» can only be satisfied by a set of constraints along with an associated tolerance and/or probability distribution, or a «functionalRequirement» might be satisfied by an activity or operation of a block. Likewise, cost constraints can be explicitly documented in parametric diagrams. par [ ConstraintBlock] C

tf

tf

ts

ts

eq1: CallDuration {d = tf – ts - d}

ts

d

r

d

eq3: CostCalc

r

{C = r * d + b}

C

C

eq2: RateCalc rt

rt

{r = lookup (rt, ts)}

refines «requirement» Time-period based invoicing

Fig. 3. Parametric diagram for billing feature (simplified) in model of [4], to support analysis and identification of input parameters that must be made available by a domain architecture to support implementation of a requirement for time-period based call invoicing

46

J. Favaro and S. Mazzini

Parametric diagrams help the domain analyst to analyze constraints on the domain architecture needed to satisfy particular requirements. For example, in [4] one feature variant for the telephone call invoicing subsystem had a requirement to be able to support invoicing based upon time periods (e.g. “late night,” “daytime,” etc.). A parametric diagram (Fig. 3) helps document an analysis to identify the input parameters for such a feature that an architecture will have to support. Such an analysis supports refinement of the requirement and feasibility and cost studies. Feature modeling permits the documentation of reusable functionality – but what about reusable “non-functionality”? Domain analysis methods have not traditionally provided strong support for a reuse-oriented documentation of non-functional aspects of system families. For example, the theory and methods used to analyze performance constraints in families of real-time systems form an important part of their construction and documentation. The constraint blocks of SysML make it possible to package and document non-functional aspects as reusable artifacts (Fig. 4). Using the SysML concept of allocation, variability can be modeled with such mechanisms – for example, two different algorithms could be allocated to different system variants. bdd [ Package] Analysis Models

«constraint» Earliest Deadline First Model constraints

⎧ ⎨/U = ⎩

n

∑ i −1

⎫ Ci ≤ 1⎬ Ti ⎭

«constraint» Rate Monotonic Model constraints

(

)

n ⎧ ⎫ Ci n ⎨/ U = ∑ ≤ n 2 − 1 ⎬ T i − 1 i ⎩ ⎭

{size(T) = n & size(C) = n}

parameters T: Real [*] {ordered, unique} /U: Real C: Real [*] {ordered} n: Integer

parameters T: Real [*] {ordered, unique} /U: Real C: Real [*] {ordered} n: Integer

Fig. 4. Reusable analysis models for hard-real time system families packaged as constraint blocks

It is important to emphasize that, especially in a context of domain engineering of large and complex systems in the sectors in which our clients operate, those decisions which concern the scope of system families and the number of variations supported in a domain architecture are primarily business driven: considerations: the effort, time and costs required for domain design and implementation must be less than the effort, time and costs that would be required for the expected number of domain applications that may reuse the domain engineering results. Here FeatuRSEB/Sys inherits the business decision mechanisms from the Reuse Oriented Software Engineering Business upon which original FeatuRSEB was based. The difference is that the means to document these decisions is now provided in the new mechanisms from SysML. In particular, the parametric diagram mechanism supports the domain analyst in what in systems engineering are called trade (or trade-off) studies, whereby so-called measures of effectiveness (“moes”) are defined within an analysis context (Fig. 5).

Extending FeatuRSEB with Concepts from Systems Engineering

47

PhoneLine values «moe» throughput: T «moe» reliability: R «moe» quality: Q «moe» services: S

VOIP

POTS

values

values

«moe» throughput: T = 0.83 «moe» reliability: R = 0.75 «moe» quality: Q = 0.65 «moe» services: S = 0.95

«moe» throughput: T = 0.65 «moe» reliability: R = 94 «moe» quality: Q = 0.67 «moe» services: S = 0.45

Fig. 5. SysML analysis context for trade analysis for the phone line quality feature variants in [4] according to selected measures of effectiveness («moe»). Here another mechanism for handling variability is illustrated: the PhoneLine abstraction captures the variability between VOIP and POTS alternatives. Different objective functions will place emphasis on different characteristics (e.g. superior service offering flexibility on VOIP line or higher reliability of POTS line), allowing the domain engineer to select feature variants based upon the needs of the particular system.

3.5 High Level Generic Architecture SysML block diagrams are used to describe the top level partitioning of the domain system capabilities into functional blocks, as refinements of the context model structure diagrams; these blocks are defined as black boxes by putting the emphasis on provided and required interfaces. These interfaces do not require a full signature, but mainly describe named entities as SysML dataflows. Dataflow entities exchanged between domains and data that are handled by toplevel components (blocks) in the domain are described, categorized (by inheritance) and clustered (by aggregation) using object oriented technology, in order to provide their data structure. For this purpose UML class diagrams are used in a separate data package, thus describing the data model for information exchanged in the domain. The package may correspond to a library of data types available at domain level. The functional model identifies the functional commonalities and differences of the applications in the domain. The elements of the specification of a functional model can be classified in two major categories: First, specification of functions describes the structural aspects of the top level partitioning of the applications in terms of inputs, outputs, internal data, logical structures and data flow among them. Second, specification of the functional and dynamic behavior describes how an application behaves in terms of events, inputs, states, conditions and state transitions, sometimes defined as control flow. Activity diagrams and UML StateCharts are generally used to describe functional models, but other discipline-specific methods and languages may also be used, such as those used to model mechanical or control aspects of systems.

48

J. Favaro and S. Mazzini

3.6 Feature Model The feature model in FeatuRSEB/Sys remains the “+1” model as described in original FeatuRSEB, summarizing the results of domain analysis and tracing the features back into the respective domain models, whereby the traces are extended to the other, additional models that have been introduced (e.g. the requirements model). In original FeatuRSEB a notation for the feature model was devised by extending the UML with stereotypes for describing the features and their relationships. This notation remains valid for FeatuRSEB/Sys, retaining the advantage of using the UML.

4 Conclusions and Further Work Systems engineering places emphasis on activities that are upstream from those of software engineering, whereby outputs considered valuable include requirements, feasibility analyses and trade studies, and high level functional breakdown at a higher level of abstraction than in software system architecture. The same is true of domain engineering, and the new concepts and diagrams of SysML help to raise the level of abstraction in domain analysis back to where it was originally conceived. It also provides another advantage: the feature model in the feature-oriented approaches is in its essence a mechanism for organization and traceability. Systems engineering provides extended mechanisms for traceability – of requirements, of analyses, and the like – that make it possible to broaden the scope of traceability and provide a more complete documentation of the domain analysis. FeatuRSEB/Sys originated in the COrDeT project of the European Space Agency, where it was used in the modeling of onboard Space system families [15]. Related efforts in modeling techniques are underway in other contexts – for example, the EAST-ADL2 profile for an automotive architectural modeling language that incorporates concepts and mechanisms from SysML [16]. Tool support is an important aspect of feature-oriented domain modeling, and significant progress has been made since its introduction [17]. The choice of UML and SysML is important not only for notational uniformity and familiarity, but for its implications for tool support. The combination of the improved UML Meta Object Facility (MOF) and its support by the Eclipse framework has made model based development of specialized tools feasible with reasonable costs. The provision of complete tool support for FeatuRSEB/Sys will be the focus of future efforts.

References 1. Neighbors, J.: The Draco Approach to Constructing Software from Reusable Components. IEEE Trans. Software Eng. 10(5), 564–574 (1984) 2. Kang, K., Cohen, S., Hess, J., Nowak, W., Peterson, S.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Technical Report CMU/SEI-90-TR-21. Software Engineering Institute, Carnegie Mellon University, Pittsburgh (1990) 3. Frakes, W., Kang, K.: Software Reuse Research: Status and Future Directions. IEEE Transactions on Software Engineering 31(7) (July 2005)

Extending FeatuRSEB with Concepts from Systems Engineering

49

4. Griss, M., Favaro, J., D’Alessandro, M.: Integrating Feature Modeling with the RSEB. In: Fifth International Conference on Software Reusability, pp. 76–85. IEEE Press, New York (1998) 5. Jacobson, M., Griss, M., Jonsson, P.: Software Reuse: Architecture, Process and Organization for Business Success. Addison-Wesley-Longman, Menlo Park (1997) 6. Rumbaugh, J., Jacobson, I., Booch, G.: The Unified Modeling Language Reference Manual, 2nd edn. Addison-Wesley, Boston (2004) 7. Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to Patternbased Software Architectures. Addison-Wesley, Boston (2004) 8. Mazzini, S., D’Alessandro, M., Di Natale, M., Domenici, A., Lipari, G., Vardanega, T.: HRT-UML: taking HRT-HOOD into UML. In: Proc. 8th Conference on Reliable Software Technologies Ada Europe (2003) 9. The Object Management Group, Systems Modeling Language (OMG SysML), Final Adopted Specification, document ptc/06-05-04 (2006), http://www.omg.org 10. Kruchten, P., Capilla, R., Dueñas, J.C.: The Decision View’s Role in Software Architecture Practice. IEEE Software 26(2), 36–42 (2009) 11. European Space Agency, ECSS-ST-40C – Software General Requirements (March 6, 2009), http://www.ecss.nl 12. Kang, K., Kim, S., Lee, J., Kim, K., Shin, E., Huh, M.: FORM: a feature-oriented reuse method with domain-specific reference architectures. Annals of Software Engineering 5(1), 143–168 (1998) 13. Lam, W.: A case-study of requirements reuse through product families. In: Annals of Software Engineering, vol. 5, pp. 253–277. Baltzer Science Publishers, Bussum (1998) 14. Favaro, J.: Managing Requirements for Business Value. IEEE Software 19(2), 15–17 (2002) 15. Mazzini, S., Favaro, J., Rodriguez, A., Alaña, E., Pasetti, A., Rohlik, O.: A Methodology for Space Domain Engineering. In: Proceedings Data Systems in Aerospace (DASIA), Palma de Majorca, Spain (2008) 16. Lönn, H., Freund, U.: Automotive Architecture Description Languages. In: Navet, N., Simonot-Lion, F. (eds.) Automotive Embedded Systems Handbook. CRC Press, Boca Raton (2009) 17. Antkiewicz, M., Czarnecki, K.: FeaturePlugin: Feature Modeling Plug-in for Eclipse. In: Eclipse 2004: Proceedings of the 2004 OOPSLA Workshop on Eclipse Technology eXchange, OOPSLA, pp. 67–72. ACM Press, New York (2004)

50

J. Favaro and S. Mazzini

Appendix: Summary of FeatuRSEB/Sys Models and Method Domain model Sub-components and components Notation(s) Context model

Context diagrams – SysML block diagrams

Use case diagrams – SysML use case diagrams

Utilization Positioning of the target domain with respect to higher, peer and lower domains. Variability expressed through different diagrams Description of domain capabilities. Variability expressed through different diagrams Updated throughout the domain analysis

Domain dictionary Requirement model

Text

High level generic architecture

Top-level partitioning (into Refinement of the context model applications) – SysML Block structure diagrams diagrams

Requirement diagrams – SysML Requirement diagrams

Link to Use cases and Blocks. Required variability has to be explicitly specified by link to related use cases and/or blocks and by link or requirement annotation.

Data package (library of data Data flow entities exchanged between types/structures) - UML domains and data handled by top level class diagrams components are described, categorized (by inheritance) and clustered (by aggregation) Application capabilities Application Use cases – SysML Use case diagrams Application Behaviors – SysML Interaction (sequence) diagrams Functional model

Feature model

Refinement of context level Use cases

Structural aspects of top level partitioning (inputs, outputs, internal data, logical Specification of functions – structures and data flow among SysML activity diagrams applications). Specific of dynamic behavior Application behavior (events, inputs, – SysML StateChart states, conditions and state transitions) Parametric diagram – SysML Capture functional and non functional Parametric diagram requirements through formulation of constraints. Document trade-off studies on architectural variants, including cost and performance factors. Feature model – UML Synthesis of domain analysis variability notation from FeatuRSEB as emerging from the above models.

Features Need Stories Sidney C. Bailin Knowledge Evolution, Inc., 1221 Connecticut Ave. NW, STE 3B, Washington DC 20036 [email protected]

Abstract. We present an extension of the notion of feature models. An ontology of stories clarifies the meaning of potentially ambiguous features in situations where there is no unified domain model, and where candidates for reuse have varied provenance. Keywords: features, domain model, story, ontology, evaluation.

1 Introduction This paper argues that feature models, while an essential tool for reuse in any domain, are not in themselves sufficient for the capture and communication of knowledge about variability. We argue that features carry a relatively small amount of semantics, if any at all, and that much of the intended semantics remain implicit in naming conventions, or tacit in experience, or undocumented in hearsay. We suggest that the missing semantics, addressing fine-grained questions of context, interface, function, performance, and rationale, can be usefully conveyed through stories attached to features. By story we mean something relatively specific, not simply a chunk of text. There are several attributes that make something a story, but one of the key attributes is that of being situated. In a story, context is (almost) everything. Examples of the kinds of stories we are looking for already exist in the user feedback sections of many web-based product catalogues. User feedback contributions, despite their unstructured form, tend to be highly situated and, thus, are often good examples of the notion of story we are advocating. Consumer assistance product reviews, on the other hand, are often much less situated; for this reason, they often fall short both in their sense of urgency and in the amount of crucial background information they provide. We are not simply advocating for stories. That is an argument we have made elsewhere [1]. Nor are we arguing here for a particular kind of feature model: that, too, we have addressed elsewhere [2]. In this paper we are specifically interested in the way in which stories can be used systematically to enrich the semantics of a feature model. At the core of this vision lies interplay between the formal and the informal. The bulk of the paper consists of a case study that involved the evaluation of alternative technologies for a particular function within a complex system of systems. We argue in Section 1.1 that the evaluation of alternative candidates is the essence of S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 51–64, 2009. © Springer-Verlag Berlin Heidelberg 2009

52

S.C. Bailin

the reuse process. Our goal is not, however, to present a methodology for trade studies, but rather to explore the kind of contextual information that is necessary in order to perform such evaluations, and to draw conclusions from this about the structure and contents of domain models. Our conclusions are applicable to virtually any domain modeling method. 1.1 Features and Engineering Analysis Many opportunities for reuse occur in situations lacking a well-developed domain model, and in which the resources required for a full-scale domain analysis (not just money but also time) are unavailable. Such opportunities cannot take advantage of feature-based reuse methods [3] and product-line architectures [4]. Yet they are far more systematic than what is often termed opportunistic reuse. We will call this real-world reuse. These situations are often opportunities for knowledge reuse: the primary decision is to choose an existing, well-defined technical approach to solve a problem. The technical approach is typically selected through an engineering analysis involving goals, priorities, and tradeoffs. Thus, knowledge reuse is closely related to the disciplines of engineering analysis. Such situations, however, often invite reuse not just of knowledge but of actual software, for the simple reason that the technical approaches are realized in software. Thus, there is a deep connection between software reuse and the process of engineering analysis—evaluating alternative technologies to meet a set of requirements. Conversely, any software reuse decision ought to involve an evaluation of alternatives. At a minimum, the question of make vs. buy (develop vs. reuse) should be addressed. Beyond that, if there are multiple candidates for reuse, or multiple features that may or may not be selected, then a systematic evaluation of tradeoffs ought to occur. But real-world reuse, lying somewhere between systematic and opportunistic, illustrates a limitation of feature models. Because it occurs without a unified domain model, the features considered in the tradeoff studies are not necessarily comparable from product to product, let alone from study to study. There is no controlled vocabulary with which to describe and evaluate the alternative solutions. 1.2 Features and Ontologies In a previous paper we described the use of ontologies to represent KAPTUR-style domain models: problems, alternatives solutions, their features, rationales, and tradeoffs [5]. We noted that ontology mismatches arise when there are many parochial models of a domain rather than one unified model, and we described a process of progressive resolution of mismatches. In the LIBRA method, this process of resolving models-in-conflict is viewed as the very essence of reuse [6]. Aspects of the parochial contexts can be formalized to provide enriched semantics for their respective feature sets. Using ontologies, which are intrinsically open-ended, we can add concepts for disambiguating features that have slightly different meanings in different contexts (see Section 4.1). But there is a point of diminishing returns at

Features Need Stories

53

which formalization may not be appropriate, at least not until the domain reaches a certain level of maturity. The appropriate medium for capturing this context is, in such cases, narrative. 1.3 What Is a Story? By story we mean a communication that speaks of characters responding to some situation. The characters need not be people; they can be organizations, systems, computers, software programs or even designs. Something about the initial situation is unstable and causes the characters to act. The actions change the situation, and what follows is a succession of situations requiring action, and actions taken in response. This is the forward movement of the story. The emphasis is on causation, or, from another point of view, rationale. Stories are an especially effective mode of communication because they draw the reader into the story world. This only happens, of course, if what is at stake for the characters is also a concern for the reader. When this happens, the immersive nature of the story and its forward movement give the communication an urgency and sense of reality that ordinary communications lack.

2 Case Study We illustrate our arguments with a real case study that involved selecting software for use in a large, complex system of systems (SoS). The primary tradeoff was a make vs. buy decision; but, as we shall see, each of the alternatives involves some form of reuse. Salient features of the SoS, which will remain anonymous in this paper, include life-critical functions, high cost, and a complex network of stakeholders including multiple engineering and development organizations. The application we consider here is a relatively small part of the SoS, responsible for transmitting large volumes of structured data between two major SoS components. The data stream contains sequences of fields in many different data types and occurring with different frequencies. Different fields originate from different sources, and are routed to other component systems belonging to different stakeholders. This is illustrated in Figure 1.

System A

System B

Fig. 1. A data transmission application serves as a case study

54

S.C. Bailin

We called this a compression trade study because the primary challenge is to transmit the data efficiently by compressing it into a compact form, then decompressing it upon receipt. 2.1 Evaluation Criteria Table 1 shows the criteria used to evaluate the alternative compression technologies. Table 1. Evaluation criteria used in the case study

Criterion Encoding / decoding speed

Encoding / decoding software cost Transmission speed

Interoperability

Learning curve / programming complexity

Maturity

Tool availability Tool reliability

Evolvability

Risk

Description Speed at which messages can be encoded into their compressed format and then decoded by the receiving end Cost of acquiring and maintaining the software that performs the encoding and decoding Direct measure of the size of the compressed messages, which must be averaged over the likely variations in message types and content Availability of the proposed encoding method on different systems, and the expected level of confidence that the implementations on different systems are compatible Complexity of an encoding method has a direct impact on the cost of any project-developed software implementing the method, but it may also limit the range of available COTS implementations, and lessen confidence that the implementations are correct Assessed both temporally (how long the method has been in existence) and spatially (how widely it has been used) Availability of software to perform the selected functions The level of confidence that the tool will perform its functions continuously and correctly, where continuity includes the traditional systems engineering notion of availability, while correctness refers to the satisfaction of functional and performance requirements (throughput/response time) Includes the ability of the selected technology to handle evolution in the data stream contents, and the potential impact of evolution in the compression method itself (as in the evolution of standards) General unknowns

Features Need Stories

55

The evaluation criteria are not mutually independent, but each has a unique emphasis that warrants its being considered explicitly. 2.2 Alternatives Considered Table 2 identifies the alternative methods that were evaluated: Table 2. Alternative solutions considered in the case study

Solution Custom packing maps

Description The format of the data stream messages is defined explicitly in terms of the admissible fields, their sequencing, and their respective length Abstract Syntax Notation ASN.1 is an international standard for describing (ASN.1) Packed Encoding structured data [7]; the PER is part of the standard that Rules (PER) provides for a compact binary encoding of such data Fast Infoset [8] An international standard for binary encoding of data represented in the Extensible Markup Language (XML) Lempel-Ziv (LZ) A family of compression technique used by tools such compression [9] as WinZip and Gzip Efficient XML Interchange An emerging recommendation of the World-Wide (EXI) Web Consortium (W3C) for binary encoding of XML data [10] These alternatives represent the gamut from the conventional approach for such systems as previously developed (custom packing maps), through international standards (ASN.1 and Fast Infoset) and widely used file compression techniques (LZ), to a recent W3C recommendation for binary XML transfers (EXI). The alternatives vary significantly in their levels of maturity, but there is a logic to their consideration. The custom packing map approach is the organization’s legacy approach for this type of system. ASN.1 and XML are the two primary standards for expressing structured data outside of programming and formal specification languages. There are several standard encodings for ASN.1, but PER is the primary encoding for efficient binary data transfer. Similarly, Fast Infoset is a current standard for binary encoding of XML, while EXI is an emerging standard for the same purpose, whose definition has itself taken into account many of the alternatives. Finally, LZ compression is widely used in a variety of tools familiar to the engineers responsible for designing and implementing the system. Further complicating the study is the role of another international standard that would govern the way in which a custom packing map would be specified. To some extent this mitigates the “custom” nature of this alternative (as does the fact that it, alone among the alternatives, draws heavily from the organization’s past experience).

56

S.C. Bailin

3 Reuse in the Case Study The case study does not sound like a typical reuse scenario. There is no repository of reusable components, and no domain model from which systems can be instantiated through a selection of appropriate features. Nevertheless, the case study illustrates what we are calling real world reuse. It is an example of reuse because each of the alternative solutions is a form of reuse. This is clear in the case of the off-the-shelf candidates; perhaps less obvious, but equally true, is reuse in the case of the custom packing map, which is the approach the organization had used in all previous systems. At the very least, if a custom packing map were to be adopted, fundamental design concepts from prior systems would be reused; more likely, there would be adaptation of existing source code as well. Why, then, does the case study appear to be more like a conventional engineering design analysis than a case of reuse? One reason is that the evaluation was performed not at the level of functionally equivalent alternatives (for example, alternative implementations of the ASN.1 PER), but rather at the level of alternative functional approaches. Formally, however, this is no different from conventional reuse: we have a specification at a certain level of abstraction, and the candidates for reuse are alternative realizations of the specification. In this case, we lack a unified feature model for characterizing the alternative solutions. But the evaluation criteria serve, in effect, as features. Conversely, in a feature-based reuse process, the features may be considered evaluation criteria in the selection of a solution. Viewed in this way, every tradeoff analysis creates a kind of domain model, but each model is usually unique to the analysis; they are not easily compared, which makes it difficult to base decisions on prior analyses that were performed in different contexts. 3.1 Features with Values One objection to viewing evaluation criteria as features is that the criteria are attributes with values, while features are often viewed as things to be selected, i.e., the candidate either has the feature or it does not. This effectively restricts features to attributes with values in a finite enumeration set (such as Boolean-valued attributes). It is an arbitrary limitation that dispenses with information essential to specifying or evaluating reusability. For example, an efficiency feature could take values in an enumeration set consisting of the time-complexity classes such as Logarithmic, Linear, Polynomial, etc. But it might equally well take the form of more detailed testing results that graph specific response times against different inputs. The studies we consulted follow the latter approach—rightly so, because they tested different classes and distributions of input. The efficiency feature carries implicit semantics that must be made explicit for proper comprehension.

4 Integrating Multiple Feature Models Having identified the alternative solutions and the criteria for evaluating them, we studied the literature to see what was already known. There was a substantial body of

Features Need Stories

57

prior work on just this problem, but none of the studies we found used exactly the same set of candidate solutions and none used exactly the same evaluation criteria. The studies varied in their scope, in the level of rigor, and in the detailed definition of features (especially performance features). Not surprisingly, the studies also differed from each other in their conclusions. As an example, Table 3 lists the criteria from a study by Qing&Adams along with their respective priorities [11]. Table 3. Evaluation criteria considered by Qing&Adams

Criterion High Compression Ratio Low Processing Overhead No Semantic Ambiguity 3rd Party API Support

Priority Required Required Required Desirable

These criteria resemble ours but are somewhat different. High compression rate is effectively synonymous with our transmission speed criterion, while low processing overhead corresponds to our encode/decode speed. We implicitly required that all the candidates be semantically unambiguous, but we did not specify this explicitly. Finally, their 3rd party API support could facilitate our interoperability, and could improve our tool availability; but their criterion is more specific and does not necessarily imply ours. Table 4. Candidate solutions considered by Qing&Adams

Solution Gzip and Bzip wbXML ASN.1 wbXML + Zip ASN.1 + Zip XML Binary Optimized Packaging (XOP): Message Transmission Optimization Mechanism (MTOM) XMill

Description Bzip is an alternative compression algorithm using Burrows-Wheeler methods [12] WAP Binary XML, developed by the Open Mobile Alliance [13] Apparently the PER encoding although this is not clear wbXML encoding followed by Gzip or Bzip compression ASN.1 encoding followed by Gzip or Bzip W3C recommendation dating from 2005 [14] W3C recommendation for binary message transmission especially in the context of the Simple Object Access Protocol (SOAP) [15] Open source XML compressor that claims better performance than (for example) Gzip [16]

58

S.C. Bailin

The alternative solutions considered by Qing&Adams are also similar but not identical to our set, as shown in Table 4. Of the four evaluation criteria in Table 3, the latter two (no semantic ambiguity and 3rd part API support) were quickly disposed of as being satisfied by all candidates, leaving compression ratio and processing overhead as the focus of the study. Most of the other studies focused on these two criteria, as well. The meaning of the terms, however, varied from study to study. Now we look at some of the criteria used by the EXI committee. Table 5 shows 7 out of the 21 criteria they applied: those that correspond or are closely related to our criteria [17]. Table 5. Subset of evaluation criteria used by the W3C EXI committee

Criterion Compactness Forward compatibility Implementation cost

Platform neutrality Processing efficiency

Transport independence

Widespread adoption

Description Amount of compression a particular format achieves when encoding data model items. Supports the evolution of data models and allows corresponding implementation of layered standards. How much time does it take for a solitary programmer to implement sufficiently robust processing of the format (the so-called Desperate Perl Hacker measure). Not significantly more optimal for processing on some computing platforms or architectures than on others Speed at which a new format can be generated and/or consumed for processing. It covers serialization, parsing and data binding. Only assumptions of transport service are "error-free and ordered delivery of messages without any arbitrary restrictions on the message length". Has been implemented on a greater range and number of devices and used in a wider variety of applications.

Table 6 illustrates the relationships between the criteria used in our case study and those used by the EXI committee. Some of them—such as widespread adoption—are intrinsically “soft” in that they admit of varying interpretations. For these, the value of elaborating the meaning with stories may be obvious. For example, what were the experiences of projects that adopted the technology? To drive our argument home, however, we focus on one particular feature that should, on the face of it, be relatively well-defined: compactness. 4.1 Semantics of the Compactness Feature The term compactness is ambiguous because it must cover a wide range of inputs. If we had only one input instance, compactness could be reduced to a positive

Features Need Stories

59

Table 6. Relationships between our evaluation criteria and those of the EXI study

Our Criterion Encoding / decoding speed Encoding / decoding software cost Transmission speed Interoperability Learning curve / programming complexity Maturity Tool availability Tool reliability Evolvability

Relation Synonymous with Includes off-the-shelf purchase cost as alternative to Effectively synonymous with Enhanced by

EXI Criterion Processing efficiency Implementation cost

Compactness

Increases

Platform neutrality Transport independence Implementation cost

Indicated by Indicated by Weakly indicated by Partially supported by

Widespread adoption Widespread adoption Widespread adoption Forward compatibility

integer-valued attribute, namely the size of the output. In reality, though, it must represent an aggregate assessment over a set of inputs. Resolving the ambiguity requires deciding on the sample inputs to use, the classes in which to group them, and the kind of aggregation to perform, e.g., averages, distributions, peaks, or some other measure. For example, Qing&Adams used as input 9 files from the Extensible Access Control Markup Language (XACML) conformance test suite [18], ranging in size from 2KB to 1MB [11]. Mundy et al created a custom application for exchanging attribute certificates to compare XML and ASN.1 [19], while Cokus&Winkowsi created test sets to resemble the exchange of binary formatted military messages [20]. The EXI study involved a detailed definition of widely varying use cases [21], while Augeri et al created a corpus modeled on the widely used Canterbury corpus for compression evaluations [22, 23]. How do we compare, contrast, and aggregate such various tests? Viewing a feature model as part of an ontology [5], we can create placeholders in which to add the relevant information. For example, the alternative candidates can have a property test results, which are then linked to the relevant features of the candidate as well as to the test set employed, as illustrated in Figure2. This provides a unified view of the past studies. It does not, however, provide much help in comparing the different results. How should one understand the differences in test sets? The EXI tests, for example, correspond to a set of 18 use cases, including Metadata in broadcast systems, Floating point arrays in the energy industry, X3D graphics compression, serialization, and transmission, Web services for small devices, Web services within the enterprise, etc.

60

S.C. Bailin

Candidate

hasTestResults

Test Results

obtainedUsing

Test Set

hasFeature

indicates

Compactness

Fig. 2. The open-world character of an ontology allows us to add contextual information to a feature model

We could add use cases as first-class objects and associate them with the corresponding test sets. Figure 3 illustrates part of the resulting ontology.

hasRationale

supports Test Set

Justification

Use Case occursIn

describedBy involves

Domain

Stakeholder

Scenario

Fig. 3. The feature model can be enriched with an endless amount of contextual information

We can take this further by elaborating the Scenario class in Figure 3 with properties specifying the functions performed, their relative importance (e.g., required vs. desirable), and their performance attributes. But this does not necessarily help in comparing the results of the different studies. The Canterbury corpus, for example, is not defined in terms of use cases, and its classification of input files is quite unlike the EXI classification. They are really different ontologies. Simply aggregating them provides little insight into the significance of using one test corpus vs. another. A similar issue arises in representing the test results. Some of the studies we consulted present a set of curves—one per candidate—graphing compressed size against original size. But Augeri et al average the results over all test sets and present additional analysis of the test set attributes that were contributing factors [22]. Should these distinctions be modeled in the Test Results class of Figure 2? A further source of ambiguity is the instrumentation software itself. Different studies may yield different results in part because of different instrumentation software. A discussion in one of the EXI documents illustrates the issues:

Features Need Stories

61

The results presented here come from a stored snapshot that has been judged sufficiently stable for making decisions. Even so, there is still some variation in the results. This is a concern especially with the C-based candidates that may exhibit enormous and inexplicable variance between different test documents. It is therefore likely that there are still some problems with how the test framework's Java code interacts with the C code of the candidates. Accordingly, the results from the C-based candidates were filtered more carefully, by eliminating results for individual documents that were obviously incorrect. In addition, the stableness review of the results paid careful attention to any improvements in these candidates in particular. [24] We could address this by adding classes of instrumentation, linking them to the test results in a manner similar to Figure 2. But should we? There will always be important contextual information that has not been modeled, that could in principle be modeled, but that would be expensive to model. The alternative is to decorate the feature model with informal elaborations: commentary, explanation, and discussion. The relationships between the formal entities serve to coordinate these informal elaborations into a coherent story that answers questions of the form, Who, Why, What, and How.

5 Elaborating the Feature Model with Stories Deciding where to draw the line between formalization and informal elaboration must take account of the goals of the model. In our case study, the purpose of the elaborated feature model is to help us choose one of the alternatives listed in Table 2. So let us see how this plays out. Transmission speed being one of the criteria, we consult the Compactness feature, which is effectively equivalent to transmission speed. But Compactness cannot be summarized in a single value. Instead, from the Compactness feature of a given candidate, we can trace back along the indicates relationship in Figure 2 to see those test results that tell us something about this candidate’s level of success in compressing various inputs. There may be many such test results for this one candidate, including tests by different organizations or individuals, using different test sets, different methodologies, and different instrumentation. Each test result should point to this information. In the simplest case, a single candidate outperforms all the other candidates, i.e., it consistently produces the smallest output over all test sets in all of the recorded tests. In that case, we can easily say that this candidate scores highest with respect to transmission speed. But of course that is not likely. It is even less likely that we can also identify a clear second-place candidate, third-place candidate, etc., which we would like to do because transmission speed is only one of ten evaluation criteria. Instead, we have to start asking questions, the answers to which start to provide us with a coherent story. For example: • Which test sets (if any) resemble our expected data? • Are the results consistent for those test sets? • Is the provenance of the test results credible?

62

S.C. Bailin

If the answers to any of these questions are less than conclusive—for example, if we are uncertain whether any test set closely resembles our data, or the results are not consistent, or the provenance is doubtful—then we have to delve deeper, asking more questions: • • • •

What was the context of a test? Who did the test, and what do we know of their agenda? Was the methodology rigorous? What is the source of inconsistency between this and the tests from other sources?

The answers to such questions may or may not be found in the elaborated feature model shown in Figure 2 and Figure 3, depending on how far one takes the model elaboration. The questions, however, are definitely not modeled explicitly. This http://www.w3.org/TR/xbcproperties/#compactness

Feature

assessedBy

http://www.w3.org/T R/2009/WD-exievaluation-20090407/

http://www.w3.org/XML/EXI /public-status.html

performedBy Evaluation

Stakeholders

Method

Test Results

mayCauseRevisionOf

http://www.w3.org/TR/2009/WD-exievaluation-20090407/#compactnessresults

motivatedBy

employs

yields

Goals mayCauseRevisionOf

http://www.w3.org/TR/xbcmeasurement/

http://www.w3.org/2003/09/ xmlap/xml-binary-wgcharter.html

Fig. 4. By incorporating story tags in the elaborated feature model we can guide the potential re-user directly to relevant information

illustrates a common problem with ontologies: they may be rich in information, but they tend not to be self-explanatory. Many ontologies suffer from an absence of documentation about why the information is there, and how to use it. This is where stories enter, because the emphasis in a story is on rationale. The persistent question, Why? is what distinguishes a story from a mere report. For example, rather than just telling us that a particular test set was modeled on the Canterbury corpus, it would tell us why the Canterbury corpus was used. Without knowing this we cannot fairly assess whether the test results are relevant to our problem. More generally, we would like the elaborated feature model to guide us to the following types of information:

Features Need Stories

• • • • • •

63

Who are the stakeholders? What are they trying to achieve? Why? Why is it non-trivial—what barriers must they overcome? What did they do to overcome these barriers? Were they successful? Why? Did some barriers remain, or new ones appear?

The last three question may be iterated until, eventually, success (or, perhaps, failure) is declared. For example, the Compactness feature for EXI can point us to the URI for the EXI evaluation, which in turn points to earlier documents addressing the above questions. This is the skeletal structure of a story. But more than that is possible. The questions actually suggest an ontology of stories, consisting (for example) of actors, goals, barriers, actions, and rationales. Discussions about candidates for reuse can be marked up to indicate explicitly the actors, goals, barriers, etc. that have been involved in developing, using, and reusing the candidates. The feature model can use such markup to point us to the information we need. This is illustrated in Figure 4 for the Compactness feature of the EXI alternative.

6 Conclusion Feature models are a powerful way to achieve reuse in well established domains, such as those admitting of a product line architecture. But much reuse occurs in situations where there is no unified domain model; rather, the candidates for reuse and the available knowledge about them have varied provenance. In such situations, features are less well-defined, and their meanings need to be elaborated through additional context. Stories, by definition, are situated in context and emphasize rationale. They therefore provide an effective way of enriching a feature model with the necessary context. We have illustrated this through a case study involving the transmission of structured data between two systems.

References [1] Bailin, S.: Diagrams and design stories. Machine Graphics and Vision 12(1) (2003) [2] Bailin, S., Moore, M., Bentz, R., Bewtra, M.: KAPTUR: Knowledge acquisition for preservation of tradeoffs and underlying rationales. In: Proceedings of the Fifth Knowledge-Based Software Assistant Conference. Rome Laboratory (September 1990) [3] Kang, K., Cohen, S., Hess, J., Novak, W., Peterson, A.: Feature-Oriented Domain Analysis (FODA) Feasibility Study. Carnegie Mellon University Software Engineering Institute Technical Report CMU/SEI-90-TR-021 [4] van der Linden, F., Schmid, K., Rommes, E.: Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering. Springer, Heidelberg (2007) [5] Bailin, S.: Software reuse as ontology negotiation. In: Bosch, J., Krueger, C. (eds.) Software Reuse: Methods, Techniques, and Tools. LNCS, vol. 3107, pp. 242–253. Springer, Heidelberg (2004)

64

S.C. Bailin

[6] Bailin, S., Simos, M., Levine, L., Creps, R.: Learning and Inquiry-Based Reuse Adoption (LIBRA). Wiley-IEEE Press (2000) [7] Abstract Syntax Notation: http://www.asn1.org [8] Sandoz, P., Triglia, A., Pericas-Geersten, S.: Fast Infoset, http://java.sun.com/developer/technicalArticles/xml/ fastinfoset/ [9] Ziv, J., Lempel, A.: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977) [10] Efficient XML Interchange: http://www.w3.org/XML/EXI/ [11] Qing, X., Adams, C.: A comparison of compression techniques for XML-based security policies in mobile computing environments. In: Ottawa Workshop on New Challenges for Access Control (April 27, 2005) [12] Bzip: http://www.bzip.org/ [13] WAP Binary XML: http://www.w3.org/TR/wbxml/ [14] XML-binary Optimized Packaging: http://www.w3.org/TR/xop10/ [15] SOAP Message Transmission Optimization Mechanism: http://www.w3.org/TR/soap12-mtom/ [16] XMill: an Efficient compressor for XML, http://www.liefke.com/hartmut/xmill/xmill.html [17] Bournez, C.: Efficient XML Interchange Evaluation, W3C Working Draft (April 7, 2009), http://www.w3.org/TR/exi-evaluation [18] OASIS eXtensible Access Control Markup Language (XACML) TC, http://www.oasis-open.org/committees/ tc_home.php?wg_abbrev=xacml [19] Mundy, D.P., Chadwick, D., Smith, A.: Comparing the Performance of Abstract Syntax Notation One (ASN.1) vs. eXtensible Markup Language (XML). In: Terena Networking Conference 2008 / CARNet Users’ Conference 2008, Zagreb, Croatia (May 19-22, 2008) [20] Cokus, M., Winkiwski, D.: XML Sizing and Compression Study For Military Wireless Data, XML (2002) [21] XML Binary Characterization Use Cases: http://www.w3.org/TR/2005/NOTE-xbc-use-cases-20050331/ [22] Augeri, C.J., Bulutoglu, D.A., Mullins, B.E., Baldwin, R.O.: An Analysis of XML Compression Efficiency. In: Proceedings of the 2007 Workshop on Experimental Computer Science, San Diego, California (2007) [23] The Canterbury Corpus: http://corpus.canterbury.ac.nz/ [24] Efficient XML Interchange Measurements Note: http://www.w3.org/TR/2007/WD-exi-measurements-20070725/

An Optimization Strategy to Feature Models’ Verification by Eliminating Verification-Irrelevant Features and Constraints Hua Yan, Wei Zhang, Haiyan Zhao, and Hong Mei Key Laboratory of High Confidence Software Technology, Ministry of Education of China, Insititute of Software, School of Electronics Engineering and Computer Science, Peking University, Beijing, 100871, China {yanhua07,zhangw,zhhy}@sei.pku.edu.cn, [email protected]

Abstract. Feature models provide an effective approach to requirements reuse. One important problem related to feature models is the verification problem, which is NP-complete in theory. The existing approaches to feature models’ verification mostly focus on how to automate the verification of feature models using third-party’s tools, while these tools are usually designed to resolve general kinds of problems. However, by simply using these third-party’s tools, large-sized feature models still can hardly be verified within acceptable time. We argue that, to improve the efficiency of verification, the problem itself should be at first optimized. In this paper, we propose an optimization strategy to feature models’ verification, in which, verification-irrelevant features and constraints are eliminated from feature models and the problem size of verification is therefore reduced. We prove the correctness of this strategy, while experiments show the effectiveness of this strategy. Keywords: Feature model, Verification, Problem size, Reduction.

1 Introduction In software reuse, feature models provide an effective approach to modeling and reusing requirements in a specific software domain. The modeling responsibility of a feature model is achieved by encapsulating requirements into a set of features and clarifying possible dependencies among features. The reusing responsibility of a feature model is carried out through customization - that is, selecting a subset of features from a feature model, while maintaining those constraint dependencies among features. One important problem related to feature models’ customization is the verification problem. The verification of a feature model has two purposes. Before customization, the verification aims to find possible conflicts in constraints among features. After customization, the verification is intended to find possible conflicts in customizingdecisions on a feature model. Based on the discovery that constraints among features can be formalized as propositional formulas [4,7,1], the verification of a feature model can be formalized correspondingly as a set of propositional satisfiability (SAT) S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 65–75, 2009. © Springer-Verlag Berlin Heidelberg 2009

66

H. Yan et al.

problems, which are NP-complete in general. Because of the NP-complete nature, the verification of a complex feature model inevitably suffers from the state space explosion problem. For example, for a feature model with 1000 features and many complex constraints, the state-space of the verification problem is as large as 21000. Therefore, it is often infeasible to verify complex feature models without adopting any optimization strategy. In the existing research on feature models’ verification, most of them focuses on how to automate the verification of feature models using third-party’s tools (such as those SAT-solvers, CSP-solvers, and Model Checkers). However, as far as to our knowledge, very little work focuses on how to optimize the verification of feature models based on the characteristics of feature models’ verification (except some of our previous work [7,8]). Since those third-party’s tools are often designed to resolve general kinds of problem, it is impossible for them to incorporate any optimization strategy specific to feature models’ verification. In this paper, we propose an optimization strategy to feature models’ verification by eliminating verification-irrelevant features and constraints from feature models, and prove the correctness of this strategy. This strategy is developed based on the observation that most feature models contain some features and constraints that are irrelevant to feature models’ verification (in the sense that these verification-irrelevant features and constraints can be safely removed without changing the results of feature models’ verification), while the problem size of a feature model’s verification is exponential to the number of features and constraints in the feature model. Therefore, eliminating verification-irrelevant features and constraints from feature models will reduce the problem size of verification, and alleviate the state space explosion problem. Experiments have shown that this strategy improves the efficiency of and the capability to feature models’ verification. The rest of this paper is organized as follows. Based on some preliminary knowledge introduced in Section 2, Section 3 presents the optimization strategy to feature models’ verification, and gives its correctness proof. Experiments and analysis are shown in Section 4. Related works are discussed in Section 5. Finally, Section 6 concludes this paper with a short summary.

2 Preliminary In this section, we first give a notation for feature models with formal semantics of constraints among features, and clarify two kinds of constraint in feature models according to the source of constraints, and then introduce the criteria for feature models’ verification, which are proposed in our previous research [7]. 2.1 A Notation for Feature Models The symbols in this notation are listed in Table 1 and explained from the viewpoint of customization. The formal definitions of constraints among features are given in Table 2 and Table 3.

An Optimization Strategy to Feature Models’ Verification

67

Table 1. Symbols in the notation for feature models Symbol

Name

Explanation

A mandatory named “X”.

feature A mandatory feature must be selected in a customizing result,

X

An optional named “Y”.

feature

Y

if its parent feature is selected or it hasn’t a parent feature. If its parent is removed, it must be removed. An optional feature can either be selected in or be removed from a customizing result, if its parent feature is selected or it hasn’t a parent. If its parent is removed, it must be removed. A refinement connects two features. The feature connecting to the non-arrow end is called the parent of the other feature. A feature can only have one parent feature at most. A requires constraint connects two features. The feature connecting to the non-arrow end is called the requirer, and the other the requiree. This constraint means that if the requirer is selected, the requiree must be selected. An excludes constraint connects two features. This constraint means that the two features should not be both selected in a same customizing result. The left end connects to a composite constraint or to one of the right ends of a binding predicate. The right ends connect to a set of features or binding predicates. We define three types of binding predicate: and (denoted by ∧); or (denoted by ∨); xor (denoted by 1). See Table 2 for the formal definition of binding predicates. We define two types of composite constraint: requires (de); excludes (denoted by ). See Table 3 for noted by their formal definitions.

A refinement relation between two features. A requires constraint between two features. An excludes constraint between two features.

type

type

A binding predicate among a set of features and binding predicates A composite constraint between two binding predicate

Table 2. The formal definitions of binding predicates. In this table, A and B denotes features, and p and q denotes binding predicates. For a feature F, bind(F) is a predicate; it is true if F is selected, false otherwise. In our notation, we only use binding predicates as constituent parts of the composite constraints, but not as individual constraints. or(A, …, B, …, p, …, q)

Binding Predicate

type

A

type

A

p

bind(A)∨...∨¬bind(B) ∨...∨p∨...∨¬q

type

A

B

q

xor(A, …, B, …, p, …, q)

type

type

B

p

Formal Definition

and(A, …, B, …, p, …, q)

p

q

bind(A)∧... ∧¬bind(B) ∧...∧p∧...∧¬q

type

B

q

bind(A)⊗... ⊗¬bind(B) ⊗...⊗p⊗...⊗¬q

Table 3. The formal definitions of composite constraints. In this table, p and q denotes binding predicates. In the situation that p and q only contains one feature, the two types of composite constraints becomes the requires and the excludes constraints between two features. Composite Constraint Formal Definition

requires(p, q) p

type

type

p→q

excludes(p, q) q

p

type

type

p → ¬q

q

68

H. Yan et al.

2.2 Implicit Constraints and Explicit Constraints Constraints in feature models can be classified into two kinds according to their source. The first kind is the implicit constraints that are imposed by refinements relations between features. In feature models, each refinement relation implicitly imposes a requires constraint between the involved two features – that is, the child feature requires the parent feature. The second kind is the explicit constraints that are explicitly added into feature models by model constructors. Figure 1 shows a feature model example, in which, (a) is the refinement view, consisting of features and refinements between features, and (b) is the constraint view, consisting of those explicit constraints added by model constructors. Figure 1(c) shows all the constraints in the form of propositional formulas. The constraints from 1 to 6 are implicit constraints derived from the refinement view, and the constraints from 7 to 8 are explicit constraints. B

A

G

1. 2. 3. 4. 5. 6.

G

8.

F

D B

C

F

D

G

(a) Refinement view of the feature model

E

E B

(b) Constraint view of the feature model

bind(B) bind(C) bind(D) bind(E) bind(F) bind(G)

bind(A) bind(A) bind(A) bind(A) bind(C) bind(C)

7. ( bind(B) bind(D) bind(E) ) ( bind(F) bind(G) ) bind(B) bind(G) (c) Propositional formula view of the feature model

Fig. 1. A feature model example with explicit and implicit constraints

2.3 Three Criteria for Feature Models’ Verification In our previous research [7], we proposed three criteria for feature models’ verification. According to the deficiency framework for feature models [10], the three criteria can detect all the anomaly and inconsistency deficiencies. Due to space limitation, we just list the three criteria as follows without further explanations. 1. There exists at least a set of customizing decisions to all undecided features in a feature model that will not violate any constraints in the feature model. 2. Each undecided feature in a feature model has the chance to be selected without violating any constraints in the feature model. 3. Each undecided feature in a feature model has the chance to be removed without violating any constraints in the feature model. More information about what kinds of deficiencies exist in feature models and how the three criteria detect these deficiencies can be referred to literature [10] and [7].

3 Eliminating Verification-Irrelevant Features and Constraints In this section, we present the optimization strategy to feature model’s verification, and give the correctness proof of this strategy.

An Optimization Strategy to Feature Models’ Verification

69

3.1 The Optimization Strategy During feature models’ verification, one kind of errors (called implicit constraint violations) can be easily detected and fixed at first. This kind of errors occurs when a child feature is selected while its parent is not. By checking the binding states of two features in each refinement, implicit constraint violations can be detected. Then, these violations can be fixed by removing all the offspring features of the removed features and selecting all the ancestor features of the selected features. We have observed that certain features and constraints are irrelevant to the verification after handling implicit constraint violations. That is, these features and constraints can be safely eliminated without changing the verification results. Based on this observation, we develop an optimization strategy to feature models’ verification. This strategy is based on the following two concepts. Definition 1. Verification-irrelevant features. A feature is verification-irrelevant, iff this feature does not appear in any explicit constraint, and none of its offspring features appears in any explicit constraint. Definition 2. Verification-irrelevant constraints. A constraint is verification-irrelevant, iff at least one feature involved in this constraint is verification-irrelevant. Figure 2 shows the verification-irrelevant features and constraints in a feature model example. After eliminating these verification-irrelevant features and constraints, this feature model is reduced to the feature model in Figure 1.

A B B

C

D

F

E D

G I1

F

G

I2

I3

I4

I5

I6

I7

I8

(a) Refinement view of the feature model

E B

G

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

15. ( bind(B) ( bind(F) bind(B)

16. (b) Constraint view of the feature model

bind(B) bind(C) bind(D) bind(E) bind(F) bind(G) bind(I1) bind(I2) bind(I3) bind(I4) bind(I5) bind(I6) bind(I7) bind(I8)

bind(A) bind(A) bind(A) bind(A) bind(C) bind(C) bind(B) bind(E) bind(E) bind(F) bind(F) bind(F) bind(I2) bind(I2) bind(D) bind(E) ) bind(G) ) bind(G)

(c) Propositional formula view of the feature model

Fig. 2. A feature model example. The verification-irrelevant features and constraints are shown in the grey area. There are 8 verification-irrelevant features (I1 to I8 in (a)) and 8 verificationirrelevant constraints (7 to 14 in (c))

Based on the above two definitions, the optimization strategy can be expressed as follows. Given a feature model without implicit constraint violations, eliminating those verification-irrelevant features and constraints from this feature model will not influence the verification result of this feature model.

70

H. Yan et al.

Following this strategy, we can deduce that the feature model in Figure 2 and the one in Figure 1 are equivalent from the verification point of view if the feature model in Figure 2 contains no implicit constraint violations. 3.2 The Correctness Proof We prove the correctness of this strategy by two theorems. In the first theorem, we show that, after handling implicit constraint violations, if the reduced feature model satisfies the three verification criteria, the original feature model will also satisfy these criteria. To prove this theorem, three lemmas are first introduced. In the second theorem, we show that, if the original feature model satisfies the three verification criteria, the reduced feature model will also satisfy them. Based on the two theorems, the correctness of this strategy can be naturally deduced. In the following, we use two pairs and to denote the original feature model and the reduced feature model, respectively. In the two pairs, F and F’ denote the feature sets of the two feature models, and C and C’ denote the constraint sets. In addition, we use to denote the feature model consisting of the feature set F and the set of verification-irrelevant constraints C−C’ being removed from the original feature model. The three verification criteria (see Section 2.3) on a feature model x are denoted by VerifyC1(x), VerifyC2(x) and VerifyC3(x), respectively. The conjunction of the three criteria is denoted by Verify(x).

∧

Lemma 1. ├ (VerifyC1() VerifyC1()) → VerifyC1() Proof: After handling implicit constraint violations, VerifyC1() is true, then, the assignment of F’ can be expanded to the assignment of F by giving each undecided eliminated feature a removing customizing decision, which will not cause any conflict.

∧

Lemma 2. ├ (VerifyC2() VerifyC2()) → VerifyC2() Proof: Model reusers can select all the undecided eliminated features without violating any constraint.

∧

Lemma 3. ├ (VerifyC2() VerifyC2()) → VerifyC2() Proof: Model reusers can remove all the undecided eliminated features without violating any constraint. Theorem 1. Verify() ├ Verify() → Verify() Proof: We can deduce from Lemma 1, Lemma2 and Lemma 3 that ├ (Verify() Verify())→Verify()) Thus, ├ (Verify() Verify())→Verify()) Verify() Verify () ├ Verify () Verify() ├ Verify() → Verify()

∧

⇒ ⇒

∧

∧

Theorem 2. ├ Verify()→Verify() Proof: F’ F, C’ C ├ Verify() → Verify()

⊆

⊆

⇒

Corollary. Verify() ├ Verify() ↔ Verify()

An Optimization Strategy to Feature Models’ Verification

71

4 Experiments In this section, we first introduce an algorithm of generating random feature models, and then apply our optimization strategy to the verification of a set of randomly generated feature models to demonstrate the effectiveness of this strategy. 4.1 An Algorithm of Generating Random Feature Models as Test Cases To provide test cases for our experiments, we design an algorithm of generating random feature models (see Algorithm 1). Algorithm 1. An algorithm of generating random feature models. In this algorithm, GetRandomInt(n) is a function that returns a random integer from 0 to n. Input:

fN cN mNC mW

: : : :

p

:

The number of features of the feature model. The number of explicit constraints of the feature model. The maximum number of features in the constraints. The maximum number of children of a feature. Its default value is 10. The percentage of verification-irrelevant features in the feature model. Its default value is -1, which means that the percentage of verification-irrelevant features is random.

Output: The random generated feature model generate_random_fm(int fN,int cN,int mNC,int mW=10,int p=-1){ FeatureModel fm = new FeatureModel(); Queue queue = new Queue(); Feature f = new Feature(); queue.push(f); int counter = 1; while(!queue.isEmpty()){ Feature parent = queue.pop(); for(int i = 0; i < GetRandomInt(mW); i++){ Feature child = new Feature(); parent.addChild(child); fm.featureSet.add(child); counter++; if (p != -1 && counter <= (1-p)*fN){ Constraint constraint = new constraint(child,parent,“requires”); fm.addExplicitConstraint(constraint); } if (counter == fN) break L; } } L:for(int i = 0; i < cN; i++){ Set source = new Set(), target = new Set(); for(int i = 0; i
72

H. Yan et al.

It should be noticed that in a feature model generated by Algorithm 1, all features are optional. This is because that, in our experiments, we assume feature models have been optimized by the atomic-set technique proposed in our previous research [7]. By applying this technique, mandatory features can be eliminated from feature models. 4.2 Experiments and Analysis To make the experiments reflect the effectiveness of our strategy for feature models with different complexity, we generate three groups of feature model by varying the parameters of Algorithm 1. We use a BDD-based feature models’ verifier [8] to verify the optimized feature models. The environment for our experiments is a computer with an Intel Core DUO 2.66GHz CPU, 2GB of memory and a Windows XP OS. without eliminating verification-irrelevant features and constraints

Time (Second) 62.7 60 53.5 48.5

50

after eliminating verification-irrelevant features and constraints

38.1

40 30

25.2 19.2

20

15.0 10 0

7.6 10

20

30

40

50

60

70

6.0

80 Percentage of verification-irrelevant features (%)

Fig. 3. Experiment results of the first group of test cases. This group has 9 feature models. All of them contain 500 features and 50 explicit constraints, and the percentage of verificationirrelevant features varies from 0% to 80% with an increment of 10%.

∞ ∞

Time (Second)

+

+

60

56.9

50 43.9

40 30

without eliminating verification-irrelevant features and constraints after eliminating verification-irrelevant features and constraints

20 17.0 10

7.5 1.1

0

0.3 0.1 100 200 300 /10 /20 /30

3.8 400 /40

500 /50

600 /60

700 Number of Features /70 /Number of Explicit Constraints

Fig. 4. Experiment results of the second group of test cases. This group has 7 feature models. The number of features in each test case varies from 100 to 700 with an increment of 100, the number of explicit constraints in each test case varies from 10 to 70 with an increment of 10, and the percentage of verification-irrelevant features is random.

An Optimization Strategy to Feature Models’ Verification

73

Figure 3 shows the experiment results of the first group, from which we can see that our strategy improves the efficiency of verification for this group of test cases. Moreover, our strategy becomes more effective as the percentage of verificationirrelevant features increases. Figure 4 shows the experiment results of the second group. We can see that although both of the solid curve and the dashed curve rise as the number of features and explicit constraints increases, the growth rate of solid curve is lower than that of the dashed curve, which means that our strategy decreases the time for feature models’ verification. The experiment results also show that, for this group of test case, our strategy increases the capability to feature models’ verification. Time (Second)

+

60 50

whiout eliminating verification-irrelevant features and constraints

40

42.2

∞

∞

+

after eliminating verification-irrelevant features and constraints

30 23.3 20

13.9

9.7

10 0

16.1

15.6 6.0

5.0

8.2

4.2 4.7 3.2 2.5 3.0 2.2 1.3 0.3 0.8 1.9 0.2 0.1 0.1 0.3 0.1 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 /20 /30 /32 /34 /36 /38 /40 /42 /44 /46 /48 /50 /52 /54 /56 /22 /24 /26 /28 Number of Features /Number of Explicit Constraints

Fig. 5. Experiment results of the third group of test cases. This group has 19 feature models. The number of features in each test case varies from 100 to 1900 with an increment of 100. The number of explicit constraints in each test case varies from 20 to 56 with an incremental change of 2. The percentage of verification-irrelevant features is random.

Figure 5 shows the experiment results of the third group. We can see that, by applying our strategy, a feature model that contains 1500 features can be verified within 10 seconds. For this group of test cases, this strategy improves both the capability to and the efficiency of feature models’ verification.

5 Related Work The verification problem of feature models has been noticed since the first time feature models are proposed [3]. Existing research on feature models’ verification can be classified into three categories: specification and formalization of verification criteria, formalization of feature models, and automation of verification. In the research on specification and formalization of verification criteria, der Maßen and Lichter [10] proposed a deficiency framework of feature models. In our previous work [7], we proposed three formalized verification criteria. Our investigation [9] shows that the three criteria can detect all kinds of anomaly and inconsistency deficiency in der Maßen and Lichter’s deficiency framework. In the research on formalization of feature models, Mannion [4] proposed a first-order logic based method for the

74

H. Yan et al.

formalization of feature models’ constraints. In [7], we classified constraints in feature models into several types, and clarified their formal semantics based on propositional logic. In the research on automation of verification, several verification methods using third-party’s tools are proposed. For example, Batory proposed a LTMS-based method [1]. In Czarnecki’s research [2], a commercial BDD package is used. White et al. proposed a CSP-Solver-based approach [6]. However, little existing research has addressed how to optimize feature models’ verification at the problem level. One exception is the atomic-set technique proposed in our previous work [7]. Segura gave a quantitative analysis to the effectiveness of the atomic-set technique [5]. The strategy proposed in this paper can be integrated with the atomic-set technique through sequential composition. That is, a feature model can be first reduced by the atomic-set technique, and then be further reduced through the strategy in this paper. Furthermore, these two optimization techniques can also be seamlessly integrated with the existing approaches to feature models’ verification by equipping these approaches with a preprocessing of optimization.

6 Conclusions In this paper, we proposed an optimization strategy to feature models’ verification, and proved the correctness of this strategy. This strategy provides a way to eliminate features and constraints that are irrelevant to feature models’ verification. Experiment results demonstrate that by applying this strategy, the verification efficiency of feature models is improved, and the verification capability is enhanced as well. Acknowledgments. This work is supported by the National Grand Fundamental Research 973 Program of China under Grant No. 2009CB320701, the Science Fund for Creative Research Groups of China under Grant No. 60821003, the Hi-Tech Research and Development Program of China under Grant No. 2006AA01Z156, and the Natural Science Foundation of China under Grant No. 60703065 and 60873059.

References 1. Batory, D.: Feature Models, Grammars, and Propositional Formulas. In: Obbink, H., Pohl, K. (eds.) SPLC 2005. LNCS, vol. 3714, pp. 7–20. Springer, Heidelberg (2005) 2. Czarnecki, K., Kim, C.H.P.: Cardinality-Based Feature Modeling and Constraints: A Progress Report. In: OOPSLA 2005 International Workshop on Software Factories (2005) 3. Kang, K.C., Cohen, S.G., Hess, J.A., Novak, W.E., Peterson, A.S.: Feature-Oriented Domain Analysis Feasibility Study. Technical reports, Software Engineering Institute, Carnegie Mellon University (1990) 4. Mannion, M.: Using First-Order Logic for Product Line Model Validation. In: Chastek, G.J. (ed.) SPLC 2002. LNCS, vol. 2379, pp. 176–187. Springer, Heidelberg (2002) 5. Segura, S.: Automated Analysis of Feature Models using Atomic Sets. In: Workshop on Analyses of Software Product Lines, in Conjunction with the 12th Software Product Line Conference (2008) 6. White, J., Benavides, D., Schmidt, D.C., Trinidad, P., Ruiz-Cortés, A.: Automated diagnosis of product-line configuration errors in feature models. In: Proceedings of the 12th Software Product Line Conference (2008)

An Optimization Strategy to Feature Models’ Verification

75

7. Zhang, W., Zhao, H., Mei, H.: A Propositional Logic-Based Method for Verification of Feature Models. In: Proceedings of 6th International Conference on Formal Engineering Methods, pp. 115–130 (2004) 8. Zhang, W., Yan, H., Zhao, H., Jin, Z.: A BDD-Based Approach to Verifying CloneEnabled Feature Models’ Constraints and Customization. In: Mei, H. (ed.) ICSR 2008. LNCS, vol. 5030, pp. 186–199. Springer, Heidelberg (2008) 9. Zhang, W., Mei, H., Zhao, H.: Feature-Driven Requirements Dependency Analysis and High-Level Software Design. Requirements Engineering Journal 11(3), 205–220 (2006) 10. von der Maßen, T., Lichter, H.: Deficiencies in feature models. In: Workshop on Software Variability Management for Product Derivation, in Conjunction with the 3rd Software Product Line Conference (2004)

Reusable Model-Based Testing Erika Mir Olimpiew and Hassan Gomaa Department of Computer Science George Mason University, Fairfax, VA [email protected], [email protected]

Abstract. A reusable model-based testing method for software product lines (SPL) is used to create test specifications from use case and feature models, which can then be configured to test individual applications that are members of the SPL. This paper describes a feature-oriented model-based testing method for SPLs that can be used to reduce the number of reusable test specifications created to cover all use case scenarios, all features, and selected feature combinations of a SPL. These test specifications can be automatically selected and configured during feature-based test derivation to test a given application derived from the SPL. This paper also addresses what application configurations to test and how to configure test specifications for these applications. This model-based testing method was applied and evaluated on two SPL case studies. Keywords: Reuse, model-based testing, requirements, software product lines, feature model, use case model, activity diagrams, decision tables.

1 Introduction Reusable software requirement models help to minimize redundancy and inconsistency in an application’s requirements. Reusable software requirement models are used in a Software Product Line to proactively design a family of applications with similar characteristics, in order to reuse common features across the members of a SPL and also to distinguish between the features, or requirements, that differentiate these applications [1]. Reusable requirements model-based testing describes how to create reusable test specifications from an application’s requirement models, such as use case and feature models. In a SPL, these concepts are extended to enable the systematic reuse of test specifications across the members of the SPL. Several use case-based approaches have been developed to systematically create reusable test specifications for a SPL [2, 3, 4] For some evolving and customizable SPLs, which have a large number of features and possible application configurations, model-based testing for SPLs describes the combinations of features that should be selected to describe a set of application configurations to test [5, 6]. For example, a family of mobile phone applications can have several optional features, such as call waiting, and text messaging, which can be individually selected and configured on-demand for a customer. A possible test strategy would be to select a set of application configurations that covers all combinations of S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 76–85, 2009. © Springer-Verlag Berlin Heidelberg 2009

Reusable Model-Based Testing

77

optional features with basic phone capabilities. However, in practice, it may be impractical to test all combinations of features. This paper builds on previous research by describing a comprehensive modelbased testing method for SPLs, which combines feature-oriented testing with a use case-oriented test method to derive reusable test specifications. The test coverage criterion for this method is to cover all use-case scenarios and all features, as well as selected feature combinations including all feature dependencies and feature interactions. This paper expands on earlier research [7]. During application derivation, the reusable test specifications are customized for each application configuration. Applying this model-based testing method can reduce the number of reusable test specifications that need to be created to cover all use case scenarios, all features, and selected feature combinations of a SPL. The paper is organized as follows. Section 2 describes related work; section 3 describes CADeT, a feature-oriented model-based testing method for SPLs. Section 4 describes the results of validating the testing method. Section 5 contains the conclusions and further study.

2 Related Research 2.1 Software Product Lines Sofware Product Line development consists of SPL engineering and application engineering (Fig. 1). Model-based SPL engineering consists of the development of requirements, analysis and design models for a family of systems that comprise the application domain. During application engineering, an application configuration is derived from the SPL, which includes all the common features and selected optional and alternative features. The requirements, analysis, and design models, as well as component implementations, are also customized based on the features selected for that application. Any unsatisfied requirements, errors and adaptations are addressed iteratively in SPL engineering. Product Line UML-Based Software Engineering (PLUS) is a feature-based multiple-view modeling design method for software product lines. In the Requirements phase of SPL engineering, PLUS uses feature modeling to model variability and use case modeling to describe the SPL functional requirements [1]. The emphasis in feature modeling is in characterizing the SPL variability, as given by optional and alternative features, since these features differentiate one member of the family from the others. Use cases, on the other hand, are a means of describing the functional requirements of an SPL. The relationship between features and use cases is explicitly modeled by a feature/use case dependency table. A feature model [8] distinguishes between the members of the SPL in terms of their commonality (expressed as common features) and variability (optional and alternative features), feature dependencies, and feature groups that define constraints in combining features.

78

E. Mir Olimpiew and H. Gomaa

Product Line Requirements

Product Line Requirements and Analysis Models, Product Line Software Architecture, Customizable Test Models Software Product Line Engineering

Software Product Line Repository

Application Requirements

Software Application Engineering

Application Engineer

Executable Application

Customer

Unsatisfied Requirements, Errors, Adaptations

Fig. 1. SPL development processes

2.2 Model-Based Testing Model-based testing creates test specifications from formal or semi-formal software models of a software application. Use case-based testing methods have been extended for a SPL in [2, 3, 4]. Reuys et al [4] developed the ScenTED technique (Scenario-based TEst case Derivation), where test specifications are traced from the activity diagrams of a SPL to satisfy the branch coverage testing criterion, and then customized for an application derived from the SPL. Other software models, such as a decision tree, have also been used to apply systematic reuse to the test assets of a SPL. Geppert et al [9] re-engineer a legacy system test suite and then use a decision tree to configure this test suite for an application of the SPL. McGregor first introduced the problem of selecting representative application configurations to test from the potentially large configuration space of an SPL using combinatorial testing techniques [5]. Scheidemann [6] describes a method of selecting a minimal set of representative configurations, such that successful verification of this set implies the correctness of the entire SPL.

3 The CADeT Approach Customizable Activity Diagrams, Decision Tables and Test Specifications (CADeT) is a feature-oriented model-based functional test design method for SPLs. CADeT is incorporated within the PLUS method described above. CADeT can also be integrated with other SPL development methods that use both feature and use case models to describe the SPL requirements. CADeT extends PLUS to create activity diagrams and decision tables from the use case and feature models of a SPL. The decision tables are used to generate reusable, functional system test specifications for a SPL. Fig. 2 shows how CADeT (shaded in gray) is integrated with the PLUS method [1]. During SPL engineering, PLUS is used to create feature and use case requirements models, and CADeT is used to develop customizable activity diagrams, decision tables, and test specifications from these

Reusable Model-Based Testing

79

models. During application engineering, PLUS is used to derive one or more applications from the SPL, and CADeT is used to select and customize the test specifications for each application. Each derived application is then tested using the test specifications derived using the same feature model.

Fig. 2. Impact of CADeT on PLUS

In Fig. 2, SPL Test Modeling consists of creating activity diagrams from use cases to provide greater precision in the use case descriptions, creating decision tables to formalize the test specifications, and defining a feature based test plan that provides test coverage of all use case scenarios, all features, and selected feature combinations of a SPL. Feature-based Test Derivation consists of deriving the test specifications for the derived application, selecting the test data, and testing the application. The following sections describe these steps in more detail. 3.1 SPL Engineering: Create Activity Diagrams Functional models, such as activity diagrams, can be used to make the sequencing of activities in a use case description more precise for analysis and testing. Decision points identify where alternative scenarios diverge from the main scenario. An activity diagram is developed from each use case description in the use case model, and then activities in the activity diagrams are associated with the features in the feature model. An activity node can be used to represent different granularities of functional variations, ranging from a fine granularity of functional variation, such as a parameter in a use case step, to a coarse granularity of functional variation, such as a set of use cases. Further, fine-grained functional variations tend to be dispersed and repeated across the use case activity diagrams of a SPL, while coarse-grained variations are represented by an entire use case activity diagram, or a group of use case activity diagrams. Thus, managing fine-grained variability requires more sophisticated techniques to group related functions and to minimize redundancy. CADeT uses UML stereotypes to distinguish between different granularities of functional variability in the activity

80

E. Mir Olimpiew and H. Gomaa

diagrams of a SPL, and feature conditions to relate this variability to features in the feature model. Some of the following stereotypes are used in an activity node to distinguish between different levels of functional abstraction: • • • •

A «use case» activity node, which describes a use case. An «extension use case» activity node, which describes an extension use case. An «input step», which describes an input event from an actor to an application in a use case description. An «output step», which describes an output event from an application to an actor in a use case description.

The feature to use case relationship table of PLUS [1] is used together with reuse stereotypes and feature conditions of CADeT to analyze the impact of common, optional, and alternative features on the activity diagrams. A feature to relationship table associates a feature with one or more use cases or variation points, where a variation point identifies one or more locations of change in the use cases. A reuse stereotype is a UML notation that classifies a modeling element in a SPL by its reuse properties [1]. CADeT reuse stereotypes are applied to activity nodes rather than decision nodes as in [4], since activity nodes can be abstracted or decomposed to represent different levels of functional granularity. CADeT contains the following reuse stereotypes to describe how an activity node is reused in the applications derived from the SPL: • • • •

A «kernel» activity node, which corresponds to a «common» feature in the feature model. An «optional» activity node, which corresponds to an «optional» feature in the feature model. A «variant» activity node, which corresponds to an «alternative» feature in the feature model. An «adaptable» activity node, which identifies an activity node that is associated with a use case variation point.

Besides reuse stereotypes, feature conditions are added to associate the variability in the control flow of an activity diagram with a feature in a feature model. The values of a feature condition represent possible feature selections. An optional feature in the feature model is associated with a Boolean feature condition with two possible values. An alternative feature in the feature model is associated with a feature condition that represents its feature group, where each alternative is a possible value for that feature condition. Setting the value of a feature condition enables or disables the activities associated with the feature in the activity diagram of an application derived from the SPL. Banking System SPL example A Banking System SPL provides ATM services and optional online services to its customers. The banking system SPL can be used to derive a configuration of a banking application that provides ATM services only, or an application that provides both ATM and online services. In addition, a banking application can be configured for different languages, pin formats, and maximum number of pin attempts (Fig. 3).

Reusable Model-Based Testing

81

Fig. 3. Banking System SPL feature model «adaptable use case» 1 Validate pin

«localPrecondition» ATM.state = Idle

«adaptable output step» 1 Display welcome message (vpLanguage , vpGreeting )

«kernel input step» 2 Insert card

«adaptable aggregate step» 3 Process pin input

«localPrecondition» ATM.state = WaitingForPin

«adaptable output step» 3.1 Prompt for pin (vpLanguage ) «kernel output step» 6 Eject card «adaptable input step» 3.2 Enter pin (vpPinFormat )

«adaptable output step» 3.3 Display error message and reprompt (vpLanguage )

[(NOT ValidPin) AND (numAttempts >= maxAttempts)]

[(NOT ValidPin) AND (numAttempts < maxAttempts)]

[ValidPin AND (numAttempts <= maxAttempts)] «adaptable output step» 4 Prompt for transaction type (vpLanguage )

«adaptable output step» 5 Display max attempts error (vpLanguage )

«localPostcondition» ATM.state = WaitingForTransaction

Fig. 4. Activity diagram for Validate Pin kernel use case

Table 1 shows the feature conditions and feature selections associated with features of the Banking System SPL. Fig. 4 shows a simplified activity diagram created from the Validate Pin kernel use case description. The Spanish, French and English language features correspond to the vpLanguage variation point in the feature to use case relationship table, which impacts all output steps in the activity diagram of Fig. 4. Each of these activity nodes is stereotyped as «adaptable», and each references

82

E. Mir Olimpiew and H. Gomaa

another sub-activity diagram that shows the feature conditions, feature condition values, and variant display prompts associated with the Spanish, French and English language features (Fig. 5).

Fig. 5. A sub-activity diagram for a display prompt adaptable output step Table 1. Feature conditions of Banking System SPL

Feature condition BankingSystemKernel onlineBanking Language pinFormat maxAttempts

Feature selections T {T, F} {English, Spanish, French} [3..10] [1..5]

3.2 SPL Engineering: Create Decision Tables and Test Specifications In CADeT, decision tables are used to represent and organize the associations between features and test specifications created to cover the use case scenarios in a SPL. A decision table is created from each activity diagram in the SPL so that test specifications can be associated with each use case scenario. The precondition, feature conditions, execution conditions, postconditions, and activity nodes of an activity diagram are mapped to condition rows in the decision table. Simple paths are traced from an activity diagram for each use case scenario and then mapped to a reusable test specification in a column in the decision table. A simple path is a sequence of unique, non-repeating activity nodes traced from an activity diagram. A simple path begins at a precondition or postcondition, and ends at the next precondition or postcondition state in the activity diagram. Each simple path is mapped to a reusable test specification in a column in the decision table. A feature can be associated with a test specification created for a use case scenario, which represents a unit of coarse-grained functionality, or a feature can be associated with a variation point in that test specification, which represents a unit of fine-grained functionality. As in an activity diagram, a variation point in a test specification in a decision table is represented using the «adaptable» stereotype.

Reusable Model-Based Testing

83

CADeT distinguishes between the binding times of coarse-grained functional variability (feature to test specification) and fine-grained variability (feature to variation point). The feature selections of feature conditions associated with test specifications are bound during SPL engineering, while the feature selections of feature conditions associated with variation points are bound during feature-based test derivation. Delaying the binding of the fine-grained variability improves the reusability of the test specifications by reducing the number of test specifications that need to be created and maintained for a SPL. The simplified decision table for the Validate Pin use case contains four test specifications that correspond to simple paths (columns) traced from the activity diagram in Figure 5 for three use case scenarios: an initialization sequence {1, 2}; a valid pin scenario {3.1, 3.2, 4}; an invalid pin < max pin attempts scenario {3.1, 3.2, 3.3}; and an invalid pin >= max pin attempts scenario {3.1, 3.2, 5, 6}. The numbers correspond to the numbers of the activity nodes in Fig. 4, and each simple path begins at a precondition/postcondition and ends at another precondition/precondition. This allows simple paths to be concatenated to form more complex test sequences. Further, each simple path contains customizable <> “Display” or “Prompt” nodes, which are impacted by the alternative English, French, and Spanish language features. These nodes will be customized during application engineering when one of the language features is selected for an application derived from the SPL. 3.3 SPL Engineering: Define Feature-Based Test Plan Next, a test plan is created to describe a set of application configurations to test that will cover all features, selected feature combinations and all use case scenarios of a SPL. First, the feature model is analyzed to limit the number of application configurations to test. Feature dependencies, such as one feature requires another feature, must be tested together. Feature grouping constraints, such as mutually exclusive group, limit the number of possible feature combinations in a SPL. Parameterized features describe a range of values, which must be defined during application derivation. The boundary-value test selection criterion can be applied to select discrete values for the parameterized features of a SPL. A feature interaction is a functional behavior that is enabled for a feature combination selected for an application derived from the SPL, but that is not enabled when any feature of the combination is selected separately. In an activity diagram, a feature interaction is represented as an activity or data value that is enabled by a combination of two or more features, but is not enabled when any feature of the combination is selected separately. Test specifications correspond to simple paths in an activity diagram, which can be analyzed to detect feature interactions. A feature interaction exists in a test specification if that specification is impacted by more than one feature, and if the combination of features enables a test step that is not enabled when any feature of the combination is selected separately. Combinatorial testing techniques for single applications [10] are extended for SPLs by applying the notion of a configuration parameter with possible parameter values to a feature condition with possible feature selections. The largest number of feature combinations to test is used to determine a minimum n-way feature-based combinatorial coverage criterion for that SPL, where an n-way combinatorial coverage criterion covers combinations of at most n features.

84

E. Mir Olimpiew and H. Gomaa

An example of a feature-based test plan is shown for the simplified Banking System SPL in Fig. 3. The SPL in Fig. 3 has a total of 7 features: one common feature, the Banking System Kernel; one optional Online Banking feature; three alternative features, English, Spanish, and French, which are part of an exactly-one-of Language feature group; and two parameterized features, Pin Format, and Pin Attempts. The total number of possible feature combinations is 240=1 x 2 x 3 x 8 x 5. All test specifications created for the use cases associated with the Banking System Kernel and optional Online Banking feature are impacted by the language features. Each combination of a language feature with the Banking System Kernel and optional Online Banking feature enables language-specific display prompts and output test steps for that feature. Thus, at least a 2-way combinatorial testing criterion is needed to generate a set of applications to test that will cover all 2-way feature combinations. 3.4 Application Engineering: Derivation of Test Specifications Next, the test specifications of the SPL are customized for each representative application using several custom tools. A test generator tool is used to select and configure the test specifications based on the feature selections for an application configuration, and create a test specification document that contains the test specifications for the application. A test procedure tool is used to read the same feature selections, and create a test execution graph that describes the possible sequences in which the test specifications can be executed for an application derived from the SPL. Finally, a system test generator tool is used to generate a system tests document for the application that is used by the test engineer to define input data values and expected output values for the system tests, execute the system tests against the application, and verify whether the observed test results match the expected output values.

4 Validation CADeT was used to create functional models and customizable test specifications for two SPL case studies: a Banking System SPL, and an Automated Highway Toll System SPL. A set of applications to test was selected for each SPL, and then the decision tables and test specifications were customized for each application configuration of each SPL. The largest number of features in a feature interaction in any test specification in either case study was two. Table 2 shows that a 2- way combinatorial feature coverage criterion was applied to create 13 application configurations for the Banking System SPL and 16 for the AHTS SPL, compared with 864 and 224 possible application Table 2. Number of configurations and test specifications for each case study

#Features All combinations 2-way combinations #Use case scenarios #Test specifications No reuse Some reuse

Banking System SPL 12 864 13 21 23 299 127

AHTS SPL 16 224 8 28 30 149 52

Reusable Model-Based Testing

85

configurations, respectively. Table 2 shows that CADeT was applied to create 23 and 30 reusable test specifications for each case study. With no reuse, 299 and 149 test specifications would have been needed to cover the 13 and 8 application configurations. With copy and paste reuse (some reuse), this number would have been reduced to 127 and 52, considerably more test specifications than with CADeT. Thus, using CADeT created considerably less test specifications to cover all use case scenarios and 2-way feature combinations in each case study.

5 Conclusions This paper has described CADeT, a comprehensive model-based testing method for SPLs that combines feature-oriented and use case-based functional testing approaches for SPLs. Reusable test specifications are created from SPL models to cover all use case scenarios, and a set of application configurations are generated to cover selected feature combinations. Then, the test specifications of the SPL test suite are customized during featurebased test derivation for each application configuration. This method was evaluated on two case studies, and the results of each case study showed that the method could be used to substantially reduce the number of test specifications created for a SPL. Future research will investigate integrating different variability mechanisms with this approach, such as separation of concerns, and extending the approach to address integration testing of SPL architectures.

References 1. Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to Patternbased Software Architectures. The Addison-Wesley Object Technology Series. AddisonWesley, Reading (2005) 2. Bertolino, A., Gnesi, S.: PLUTO: A Test Methodology for Product Families. In: Software Product-Family Engineering: 5th Int’l Workshop, Siena, Italy (2003) 3. Nebut, C., et al.: A Requirement-Based Approach to Test Product Families. In: Software Product-Family Engineering: 5th Int’l Workshop, Siena, Italy (2003) 4. Reuys, A., et al.: Model-Based Testing of Software Product Families. In: Pastor, Ó., Falcão e Cunha, J. (eds.) CAiSE 2005. LNCS, vol. 3520, pp. 519–534. Springer, Heidelberg (2005) 5. McGregor, J.D.: Testing a Software Product Line, SEI (2001) 6. Scheidemann, K.: Optimizing the Selection of Representative Configurations in Verification of Evolving Product Lines of Distributed Embedded Systems. In: 10th Int’l Software Product Line Conference. IEEE Computer Society Press, Baltimore (2006) 7. Gomaa, H., Olimpiew, E.: Managing Variability in Reusable Requirement Models for Software Product Lines. In: Proc. 10th International Conference on Software Reuse, Beijing, China (May 2008) 8. Kang, K.: Feature Oriented Domain Analysis. Software Engineering Institute, Pittsburg (1990) 9. Geppert, B., Li, J., Roessler, F., Weiss, D.M.: Towards Generating Acceptance Tests for Product Lines. In: 8th Int’l Conf. on Software Reuse, Madrid, Spain. Springer, Heidelberg (2004) 10. Cohen, D.M., et al.: The AETG System: An Approach To Testing Based on Combinatorial Design. IEEE Transactions on Software Engineering 23(7), 437–444 (1997)

A Case Study of Using Domain Engineering for the Conflation Algorithms Domain Okan Yilmaz and William B. Frakes Computer Science Department Virginia Tech {oyilmaz,wfrakes}@vt.edu

Abstract. In this study we used domain engineering as a method for gaining deeper formal understanding of a class of algorithms. Specifically, we analyzed 6 stemming algorithms from 4 different sub-domains of the conflation algorithms domain and developed formal domain models and generators based on these models. The application generator produces source code for not only affix removal but also successor variety, table lookup, and n-gram stemmers. The performance of the generated stemmers was compared with the stemmers developed manually in terms of stem similarity, source, and executable sizes, and development and execution times. Five of the stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not. Keywords: Software reuse, domain analysis, conflation algorithms, stemmers, application generator.

1 Introduction In the early 1980s software companies started the systematic reuse process through domain engineering to improve software productivity and quality. There has been insufficient empirical study of the domain engineering process and domain products such as reusable components and generators. This paper addresses this problem by documenting and empirically evaluating a domain engineering project for the conflation algorithms domain. This domain is important for many types of systems such as information retrieval systems, search engines, and word processors. The application generator developed for this study extends the domain scope compared to previous ones. 1.1 Conflation Algorithms Domain Conflation algorithms are used in Information Retrieval (IR) systems for matching the morphological variants of terms for efficient indexing and faster retrieval operations. The conflation process can be done either manually or automatically. The automatic conflation operation is also called stemming. (Frakes W. B., 1992) categorizes stemming methods into four groups: affix removal, successor variety, n-gram and table lookup. Affix removal is the most intuitive and commonly used of these algorithm S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 86–94, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Case Study of Using Domain Engineering for the Conflation Algorithms Domain

87

types. In order to determine the stem, affix removal algorithms remove suffixes and sometimes also prefixes of terms. Successor variety and n-gram methods analyze a word corpus to determine the stems of terms. Successor variety bases its analysis on the frequency of letter sequences in terms, while n-gram conflates terms into groups based on the ratio of common letter sequences, called n-grams. Table lookup based methods use tables which map terms to their stems. We did a domain analysis for the semantic automatic conflation algorithms domain. We analyzed 3 affix removal stemmers, a successor variety stemmer, an n-gram stemmer, and a table lookup stemmer. Based on this analysis, we created a generic architecture, developed reusable components, and designed and developed a little language and an application generator for this domain. We compared the performance of the automatically generated algorithms with their original versions and found that automatically generated versions of the algorithms produced nearly the same results as the original versions.

2 Related Work In this study, we used the DARE: Domain analysis and reuse environment (Frakes, Prieto-Diaz, & Fox, 1998). DARE draws on several DE research threads including faceted classification, feature modeling, automated IR, and artificial intelligence. The developers of DARE tried to use the best tools and methods available, including the domain engineering book metaphor. Since then, other tools and methods have been developed. These, along with earlier methods still in use, are surveyed in (Frakes & Kang, 2005). Using the DARE method, we performed domain engineering in two phases: domain analysis and domain implementation and collected all domain related information in a domain book. In information retrieval systems there is a need for finding related words to improve retrieval effectiveness. This is usually done by grouping words based on their stems. Stems are found by removing derivational and inflectional suffixes via stemming algorithms. The first affix removal stemming algorithm was developed by Lovins (Lovins, 1968). This algorithm did stemming by iteratively removing longest suffixes satisfying predefined suffix removal rules. Several other longest match affix removal algorithms have been developed since (Salton, 1996) (Dawson, 1974) (Porter, 1980) (Paice, 1990). The Porter algorithm is most commonly used because of its simplicity of implementation and compactness. Later Paice proposed another compact algorithm. Hafer and Weiss (Hafer & Weiss, 1974) took a different approach in their successor variety stemming algorithm and proposed a word segmentation algorithm which used successor and predecessor varieties to determine fragmentation points for suffixes. Successor and predecessor varieties are the numbers of unique letters after and before a substring in words in a corpus. Their algorithm applied several rules to identify the stem from the substrings of each word that appeared in a corpus. The successor variety algorithm has the advantage of not requiring affix removal rules that are based on the morphological structure of a language. However, the effectiveness of this algorithm depends on the corpus and on threshold values used in word segmentation. Adamson and Boreham (Adamson & Boreham, 1974) developed the N-gram algorithm that uses the number of distinct and common n-character

88

O. Yilmaz and W.B. Frakes

substrings to determine if two or more corpus words can be conflated. Similar to successor variety, the strength of this algorithm depends on the corpus and the cutoff similarity values chosen. More recently, Krovetz (Krovetz, 1993) developed the Kstem algorithm that does a dictionary lookup after applying affix removal rules to remove inflectional suffixes. (Frakes & Fox, 2003) analyzed four stemming algorithms in terms of their strength and similarities. They used the Hamming distance measure as well as other commonly used measures. (Fox & Fox, 2002) reported an application generator using finite state machines for longest match stemming algorithms. They generated computationally efficient stemmers for Porter, Paice, Lovins and S-removal stemming algorithms (Harman, 1991) and compared their performance with the developed versions of these algorithms. This paper extends the scope of analysis to other sub-domains of conflation algorithms by analyzing not only affix removal but also successor variety, n-gram, and dictionary lookup types of algorithms. For this paper we analyzed Lovins, Porter, and Paice as examples of longest match affix removal, and Successor Variety, N-gram, and K-stem as instances of the remaining three types of conflation algorithms. As the result of the domain analysis, we developed an application generator and generated stemmers for Porter, Paice, Lovins, successor variety, S-removal, and K-stem algorithms and compared Porter, Paice, Lovins, S-removal, and K-stem algorithms developed via this automated approach with the corresponding algorithms developed by humans.

3 Conflation Algorithms In order to develop a better domain model, we analyzed at least one example for each conflation algorithms subdomain. We analyzed ─ ─ ─ ─

Porter, Paice, Lovins as examples of Longest Match Affix Removal algorithms, K-Stem as an example of Table Lookup stemming algorithms, successor variety algorithm, and n-gram stemming algorithm.

4 DARE Domain Analysis Method In this study we used the DARE domain analysis method and organized the domain information of conflation algorithms in a DARE domain book. The major sections of the domain book were as follows: ─ Source information subsection contained documents related to the conflation algorithms domain: source code, system descriptions, system architectures, system feature tables, and source notes of the six conflation algorithms that we analyzed. Table 1 shows the system feature table of the conflation algorithms domain. ─ Domain scope subsection contained inputs, outputs, functional diagrams of conflation algorithms that were analyzed as well as a generic functional diagram that we developed as a result of domain analysis.

A Case Study of Using Domain Engineering for the Conflation Algorithms Domain

89

─ Vocabulary analysis subsection had basic vocabulary information, a facet table for the domain, a synonym table, a domain template, domain thesaurus document, and vocabulary notes. ─ Code analysis subsection showed source code analysis results for conflation algorithms that were analyzed. ─ Architecture analysis subsection contained a generic architecture diagram. ─ Reusable components subsection contained the components that were determined as reusable as the result of domain analysis. ─ Little language subsection contained a domain specific language represented in Backus-Naur form. ─ Application generator subsection contained application generator notes and the source code produced as a product of the conflation algorithms domain analysis. Table 1. System Feature Table of the Conflation Algorithms Algorithm Name Porter

Corpus Usage No

Dictionary Usage No

Natural Language English

Paice

No

No

English

Lovins

No

No

English

Successor Variety N-Gram K-Stem

Yes

No

Any

Yes No

No Yes

Any English

Type Longest Match Affix Removal Longest Match Affix Removal Longest Match Affix Removal Successor Variety N Gram Dictionary based Inflectional

Stem Generation Yes

Strength Medium

Yes

High

Yes

Medium

Yes

Low

No Yes

N/A Medium

One main goal of the DARE method is to develop a generic architecture that describes all systems in the domain. This architecture is formed after identifying commonalities and variabilities of these exemplar systems. The DARE method starts with scoping the domain that will be analyzed (Frakes, 2000). First the systems in the domain are described verbally. Then the verbal description is translated into a mathematical formalism, for example set and function notation. Once domain scoping is done, then domain exemplars are gathered and analyzed. After the scoping process, a functional diagram for each exemplar system is created, and then these diagrams are merged into a generic functional diagram which helps to identify sub-domains. Vocabulary analysis follows the domain scoping and creation of functional diagrams for domain exemplars. In the vocabulary analysis process, the domain documents and domain expert information are used to create an initial word set. From this set, via automatic and/or manual analysis a domain keyword set is

90

O. Yilmaz and W.B. Frakes

determined. These keywords are clustered into groups according to commonalities among them. From these word clusters a facet table is created for the domain. Each cluster is identified with a facet name and will become a facet category in the facet table. After that a domain template describing the domain verbally is created by using the facet categories. In the later stages of domain analysis and domain implementation the domain template and facet table are used to identify reusable components. In the architectural analysis the system architecture diagrams of each exemplar system are merged into a generic system architecture that can express variabilities as well as commonalities of all systems. Table 2. Reusable Components of Conflation Algorithms Domain

Reusable Component Category Hash Table operations Text file operations String manipulation operations String/character validation operations File processing/storage operation Word verification operations Suffix removal rules Suffix recode rules

Operations initialize, search and retrieve, add, delete, open, close, read line substring, string comparison, lowercase, uppercase, string length is AlphaNumeric, is Vowel, is Consonant, shorter than, longer than Read & store each word from in an input file (e.g. corpus, dictionary) check the size, check if it is alphanumeric, etc. remove a suffix if it is equal to a morpheme replace a suffix if it is equal to a morpheme

Table 3. Variable Components of Conflation Algorithms Domain

Algorithm Name N-Gram

Successor variety

K-Stem Paice Porter Lovins

Operations initialize, create, display n-grams for each word, determine common n-grams of two words, cluster words based on their common n-grams, merge or display clusters, add, remove words from clusters, etc create and analyze prefix hash table, calculate entropy, add, reset, display segments lookup command which checks the existence of a word in the dictionary hash iterate and intact commands isMeasure command recode and partial matching functions

A Case Study of Using Domain Engineering for the Conflation Algorithms Domain

91

Code analysis requires programming language knowledge and expertise. In this process the code of exemplar systems is analyzed in terms of common and different functionalities, and several software metrics such as those for complexity and maintainability. The goal of this process is to identify the reusable software components that can be used in domain implementation and to determine the important architectural elements. During domain implementation, reusable components are created, a little language is generated, and ultimately an application generator may be developed. Table 2 shows the reusable components that we determined by analyzing the generic architecture, the facet table and the domain template. We also summarized the variable components in Table 3. As another domain product, a programming language called a little language was developed specifically for the domain. Finally, an application generator was developed for the domain by implementing the reusable components and a code generator supporting the domain specific language.

5 Evaluation of Generated Stemmers We evaluated the application generator we developed by comparing the stemmers generated with the stemmers developed by humans in terms of the following characteristics of stemmers: • • • • •

Similarity of stems produced Time spent during development Size of the executable of the stemmer Number of lines of code (LOC) Total execution time

We also compared the box plots of elapsed stemming times of each word in the test set for each version of analyzed stemmers in Fig. 1. 5.1 Evaluation Method To evaluate the performance of stemmers we needed a test data set. We created a corpus containing 1.15 million words by combining about 500 articles from Harper's Magazine (Harpers Magazine), Washington Post Newspaper (Washington Post New Paper), and The New Yorker (New Yorker Magazine)with a sample corpus of spoken, professional American-English (Sample Corpus of Professional Spoken English). We generated a test file with 45007 unique words by using the text analysis functionality of the application generator. We evaluated, developed, and generated versions of Porter, Paice, Lovins, S-removal, and K-stem stemmers. All these algorithms were in the Perl programming language except for the developed version of the K-stem algorithm which was in C. While the code generated by the application generator was object oriented, the developed versions of these algorithms were not. During the evaluation process we verified the stems generated by these stemmers and fixed bugs in the developed code and in rule files for the application generator.

92

O. Yilmaz and W.B. Frakes Table 4. Comparison results for developed and generated stemmers

Alg. Name

Porter

Stem Similarity

Lovins

Paice

S-removal

Identical Different Identical Different Identical Different Identical 45006 1 45006 1 45007 0 45007 Devel. Generated Developed Generated Developed Generated DevelGenerated Time oped (hours) 4 12 3 NA 6 NA 0.5 Exec. Size Generated Developed Generated Developed Generated DevelGenerated (bytes) oped 849247 390778 900851 398528 874039 393640 839315 Number Generated Developed Generated Developed Generated DevelGenerated of oped LOC 453 126 1180 555 1069 1039 142 Exec. Time Generated Developed Generated Developed Generated DevelGenerated (seconds) oped 3.03 1.52 6.58 1.73 2.72 6.66 0.70

K-stem

Different Identical Different 0 44970 37 Developed Generated Developed 0.5 6 NA Developed Generated Developed 387443 856334 334689 Developed Generated Developed 36 719 2035 Developed Generated Developed 0.44

3.41

1.02

5.2 Evaluation Results Table 4 summarizes the evaluation results. All five stemmers generated by the application generator produced more than 99.9% identical stems with the developed stemmers. Preparing the rule file for the Porter algorithm took 4 hours while developing the same algorithms took 12 hours. Since the S-removal is a very simple stemming algorithm, both developing it and generating rule files for it took about half an hour. For the rest of the algorithms we report the rule file generation time since we did not have information about their actual development time. Executables generated from the Perl scripts of all generated stemmers were at least twice as big as the developed stemmers. Among all algorithms developed K-stem had the smallest executable. This was partly because it was developed in C rather than Perl. On the other hand, for the same reason developed K-stem had highest LOC among all stemmers. The generated stemmers were slower than the developed ones except for the Paice algorithm. We did not find a statistically significant difference between the generated and developed stemmers in terms of LOC and execution time due to the limited number of algorithms tested. 5.3 Analysis of Elapsed Stemming Times of Generated and Developed Stemmers Stemming times per word stemmed is reported in the box plot for each stemmer in Fig. 1. Developed K-Stem and Developed Paice had the lowest and highest average elapsed stemming times respectively. Generated stemmers for Paice and S-removal performed a little better than developed ones. On the other hand, developed Porter, Lovins, and K-stem performed much better than the generated versions of these algorithms. Although the total time spent by developed K-Stem was more than the developed and generated versions of S-removal stemmers, the average elapsed time for each word stemmed turned out to be much shorter. This was because the time spent during the dictionary reading was not included in the elapsed time for stemming each word. Fig. 1 shows many outliers for each stemmer. We stemmed the data set several times and compared the outliers in each case to determine the characteristics of these outliers. We saw that in each run we had different outliers and concluded that the

A Case Study of Using Domain Engineering for the Conflation Algorithms Domain

93

50 20 10 1

2

5

Elapsed Time (microsec)

100

200

500

outliers were not caused by the stemming algorithms or stemmers. Stemming operations were normally very fast operations taking less than 100 microseconds on the average. When the test was running, the other operations done by the windows operating system were affecting our result by causing outliers.

Dev. Porter

Gen. Porter

Dev. Paice

Gen. Paice

Dev. Lovins

Gen. Lovins

Dev. Sremoval Gen. Sremoval

Dev. Kstem

Gen. KStem

Stemmer Name

Fig. 1. Box Plot Diagrams for Elapsed Stemming Times of Stemmers in Log. Scale

6 Conclusion and Future Work In this paper we presented a case study of using domain analysis for the semantic conflation algorithms domain. We compared the performance of stemmers generated by the application generator with the corresponding stemmers developed by humans in terms of identical stem generation, development times, size of executables, number of LOC, and the total time spent to stem all terms in our test set. We created and used a corpus with 45007 words to test stemmers. Our results indicated that the stems produced by the generated and developed stemmers produced identical results for more than 99.9% of evaluations. We also determined that stemmers produced by application generators have bigger executables than the stemmers developed by humans. We did not find a statistically significant difference between the generated and developed stemmers in terms of LOC and the total time spent to stem all terms in the test set due to the limited number of algorithms tested. We also analyzed elapsed stemming times of these developed and generated stemmers. We presented a box plot diagram for each stemmer in terms of the elapsed stemming times. We determined that generated stemmers performed better for some cases and worse in some other cases on this measure.

94

O. Yilmaz and W.B. Frakes

In this study we have done a domain engineering project for affix removal, successor variety, n-gram, and table lookup types of stemming algorithms and generated code for all types other than the N-gram algorithm. In future work, we plan to generate a stemmer for N-gram as well. Also we did not compare the successor variety stemmer with a successor variety stemmer developed by humans, but hope to do this in the future.

References (n.d.) New Yorker Magazine (retrieved April 12, 2007), http://www.newyorker.com (n.d.) Sample Corpus of Professional Spoken English (retrieved April 12, 2007), http://www.athel.com/sample.html (n.d.) Harpers Magazine (retrieved April 12, 2007), http://www.harpers.com (n.d.) Washington Post New Paper (retrieved April 12, 2007), http://www.washingtonpost.com Adamson, G., Boreham, J.: The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Information Storage and Retrieval, 253–260 (1974) Dawson, J.L.: Suffix removal and word conflation. ALLC Bulletin, 33–46 (1974) Fox, B., Fox, C.J.: Efficient Stemmer generation. Information Processing and Management: an International Journal, 547–558 (2002) Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B.-Y. (ed.) Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992) Frakes, W.B., Fox, C.J.: Strength and similarity of affix removal stemming algorithms. SIGIR Forum, 26–30 (2003) Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference Software Engineering and Applications 2000, Las Vegas, NV (2000) Frakes, W., Kang, K.: Software Reuse Research: Status and Future. IEEE Transactions on Software Engineering, 529–536 (2005) Frakes, W., Prieto-Diaz, R., Fox, C.J.: DARE: Domain analysis and reuse environment. Annals of Software Engineering, 125–141 (1998) Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval, 371–385 (1974) Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science, 7–15 (1991) Krovetz, R.: Viewing morphology as an inference process. In: 16th ACM SIGIR conference, Pittsburgh, PA, pp. 191–202 (1993) Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics, 22–31 (1968) Paice, C.D.: Another Stemmer. SIGIR Forum, 56–61 (1990) Porter, M.: An algorithm for suffix stripping. Program, 130–137 (1980) Salton, G.: Automatic information organization and retrieval. Mc Graw Hill, New York (1996)

Model Transformation Using Graph Transactions Leila Ribeiro, Luciana Foss, Bruno da Silva, and Daltro Nunes Universidade Federal do Rio Grande do Sul, Instituto de Inform´ atica, Av. Bento Gon¸calves, 9500, 91501-970, Porto Alegre - RS, Brazil

Abstract. Model transformations are central to model driven software engineering. The main aim of defining a model transformation is to reuse this model by adapting it to a new situation or context (aims of transformation include synthesis, reverse engineering, migration, optimization, refactoring, etc). Given two metamodels T 1 and T 2 a model transformation takes as input a model of T 1 and delivers as result a corresponding model with respect to T 2. Since many modelling languages are diagrammatic (like class-diagrams, message sequence charts, state charts), it is natural to use graphs as a formal basis to describe metamodels of these languages. Rules that transform graphs can be used to describe the transformation process. Here we propose the use graph grammars with transactions to describe model transformations. The notion of (graph) transaction can be very useful in proving essential properties of model transformation, like termination, confluence and correctness.

1

Introduction

Software systems are always evolving. Besides changes in the requirements a system has to satisfy, there may be other reasons for evolution, like the need for new architectural models for optimization or adaption to new platforms; the use of updated versions of data bases, maybe with diﬀerent database schemas; adaption to new paradigms, like service-oriented architectures; use new languages, etc. One of the problems of software engineering is how to cope with such (sometimes drastic) changes, while assuring the quality of the software system. Quality is usually achieved by the use of solid software development and analysis techniques in all phases. These techniques require the existence of abstract descriptions of the system capturing diﬀerent views, and also diﬀerent levels of abstraction of a same view. They can be seen as models of the system under construction. Ideally, all models of a software system should be consistent. Only then one can guarantee, for example, that all analysis that was done at the speciﬁcation level is valid for the generated code. But, how to cope with evolution in such a scenario? The direct inclusion of some new funcionality by just changing code artifacts may introduce design ﬂaws and deviations from the speciﬁcation compromising the quality of the system. However, it is also not feasible to reengineer the whole system.

This work was partially supported by CNPq.

S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 95–105, 2009. c Springer-Verlag Berlin Heidelberg 2009

96

L. Ribeiro et al.

One of the approaches to cope with evolution reusing software design and even code is Model Driven Software Engineering (MDE) [1]. The idea is that the modeling techniques used to construct a system should allow precise descriptions that are called metamodels. A model is thus an instance of the corresponding metamodel. Changes can be speciﬁed at metamodel level by rules that can be applied to all instances of these metamodels. These changes may be endogenous, or within the same metamodel, or exogenous, changing the metamodel description of the system [2]. They can be used to describe, for example, reﬁnements or optimizations (endogenous) or synthesis or migration (exogenous). In any case, transformations allow the reuse of existing models, adapting them to new requirements, platforms, languages, levels of abstraction, etc. Model transformations may be simple (specifying just a small change) or very complex (specifying a change to a completely diﬀerent kind of model or an involved reﬁnement step). The transformation is usually described by a set of rules and, to assure that suitable outputs are obtained, the process should terminate and be conﬂuent (no matter the order in which rules that govern the transformation are applied, the result is the same). One may also desire to analyze whether diﬀerent transformations (for example, adding two diﬀerent functionalities to a system) may interfere with each other. Moreover, it is also highly desirable that transformation is semantically correct: depending on the kind of transformation being performed, we may require that the semantics of the new model is equivalent, a conservative extension or related in some other way to the semantics of the original model. Since many speciﬁcation languages are visual (like most UML diagrams), it is natural to consider that model transformations are based on rules that transform graphs. Indeed, there are lots of approaches that use graph transformation to model various kinds of transformation [3,4]. The basic idea is that the metamodel corresponding to a diagram can be described by a graph and, given metamodels M 1 and M 2, one can deﬁne rules to transform any instance of M 1 into an instance of M 2 as graph transformation rules. The area of graph grammars or graph transformations [5] oﬀers a lot of results concerning various kinds of analysis (like termination, conﬂuence, independence). A notion of transaction was proposed in [6,7], allowing to relate a sequence of steps to a more abstract step that describes the eﬀect of the whole sequence. In this paper, we propose an extension of the notion of graph transactions to specify complex model transformations. In our approach, given a set of rules that specify a transformation, we can construct another set of rules that describes the abstract behavior of the transformation rules. These abstract rules can then be analyzed to check desired properties as well as to validate the transformation itself. In contrast to other approaches of using graph grammars to model transformation, our proposal is suitable for endogenous as well as for exogenous transformations. Moreover we do not need restriction operators to obtain the resulting model (as in [4]). This paper is organized as follows: Sect. 2 introduces the basic concepts of graph transformation to specify model transformation; Sect.3 shows how to use transactions for model transformation. Final remarks are presented in Sect. 4.

Model Transformation Using Graph Transactions

2 2.1

97

Graph Transformation for Model Transformation Using Graph Transformation to Describe Model Transformation

Many kinds of diagrams used to specify static as well as dynamic aspects of computational systems can be suitably represented as graphs. Metamodels describe a class of models, deﬁning which diagrams can be seen as models of some type. To formalize this notion of metamodel, the notion of type graph can be used. A type graph deﬁnes types of vertices and edges that may appear in instance graphs. In this paper, we propose to enrich the deﬁnition of type graph by considering a structured type graph, in which one can distinguish the types of elements belonging to the diﬀerent metamodels involved in the transformation. A model transformation is a description of how instances of one metamodel T 1 can be transformed into instances of another metamodel T 2. Any procedure to describe such transformation must deal with two metamodel descriptions (T 1 and T 2). Moreover, since the transformation is actually an algorithm, it may be necessary to use auxiliary data structures. In a graph transformation approach, all data types are described by graphs, and therefore these auxiliary structures will also be graph elements (vertices or edges). The basic idea is that transformation starts with a graph that is a model M 1 of type T 1. Then, while items of M 1 are are removed, items of type T 2 and/or auxiliary items may be generated. Finally, when all auxiliary items are removed and only items of type T 2 remain, the process is ﬁnished. Auxiliary items will be called unstable items. Depending on the kind of transformation being performed, there may be types that belong both to T 1 and T 2. The transformation process has 2 phases: Phase 1 (Local): During this phase, all items of type T 1 that should not belong to the resulting model will be treated. This means that they will be either (i) deleted or (ii) marked by some unstable arc that will be treated in phase 2. (i) is usually used for arcs, while (ii) is needed to treat deletion of vertices properly, because it is not possible to delete a vertex that is still being referenced to by some arc (we ﬁrst have to deal with all references in a suitable way). In this phase, items of type T 2 as well as unstable items may be created, but it is only allowed to introduce unstable links between new elements and elements that are marked for deletion in the next phase. Phase 2 (Global): The aim of this phase is to treat all links to/from items marked for deletion, substituting them by links to appropriate items of type T 2. When all links were handled, items marked for deletion may be removed. According to these 2 phases, there are 2 kinds of unstable items: the ones that should be dealt with in phase 1, and the ones that are handled in phase 2. The former will be called unstable.1 and the latter, that correspond to deletion marks on vertices and links between old and new elements, will be called unstable.2. Figure 1 illustrates the proposed transformation process, where the ﬁrst phase of the transformation creates elements of type T 2 and unstable items, generating

98

L. Ribeiro et al.

M1

T1

...

G1

T1+Unstable.1+Unstable.2+T2 Phase 1

Gi

...

Gn

Unstable.2+T2

M2

T2

Phase 2

Fig. 1. Graph Transformation Process

as a result a graph type over T 2 plus unstable.2 items, and the second phase handles all unstable.2 items, giving raise to the transformed model (completely typed over T 2). 2.2

Formal Definitions

A structured type graph will be the basis for the graph transformation. A graph is a set of vertices and a set of arcs plus two functions assigning source and target vertices to each arc. In order to relate two graphs we will use graph morphisms. A graph morphism is a mapping that associates all elements of the source graph to elements of a target graph, where the mapping of arcs must respect their source and target vertices. Definition 1 (graph and graph morphisms). A graph is a tuple G = VG , AG , sG , tG , where VG and AG are sets of vertices and arcs, and sG , tG : AG → VG are total functions, called source and target functions. A (total) graph morphism f : G → G is a pair of functions (fV : VG → VG , fA : AG → AG ) such that fV ◦ sG = sG ◦ fA and fV ◦ tG = tG ◦ fA . A type graph to transform from a metamodel T 1 to a metamodel T 2 is a structured graph T that contains both metamodels T1 and T2 plus auxiliary items that can be used in phases 1 and 2 of the transformation (TP 1 and TP 2 ). Definition 2 (MT-type graph). An MT-type graph T is a tuple TP 1 , TP 2 , T1 , T2 , such that there exist three incluT1 Qu QQ QQQ sions as depicted in diagram on the right, where T1 ( / TP 1 is the source metamodel, T2 is the target metamodel, 6 TP 2 m m m m m ) TP 2 includes the auxiliary items of phase 2. and TP 1 m T2 includes also the auxiliary items of phase 1. Given a fixed MT-type graph T , a T -typed graph GT is given by a graph G and T a graph morphism tG : G → TP 1 . A morphism of T -typed graphs f : GT → G is a graph morphism f : G → G that satisfies tG ◦ f = tG . A graph typed over an MT-graph T as deﬁned above is actually a graph typed over TP 1 , and therefore it may contain elements of metamodels T 1 and T 2 plus unstable (.1 and .2) elements. In any phase of the transformation, elements of metamodels T 1 and T 2 will be considered as stable. Items of kind unstable.1 are unstable in phase 1 (they do not occur in phase 2). Unstable.2 elements will be considered stable in phase 1 and unstable in phase 2 (because these items are

Model Transformation Using Graph Transactions

99

not meant to be dealt with in phase 1). Given an MT-graph (a graph mapped to TP 1 ), it is always possible to get the corresponding projections to types T 1, T 2 and TP 2 by forgetting elements not typed with desired types (this is formally deﬁned by a known universal construction called pullback). The behavior of a graph transformation system is determined by the application of graph rewriting rules [5]. A rule is composed by three graphs: the left-hand side L, the right-hand side R, and an interface K. This interface speciﬁes the contact points in which the newly created items will be connected to the context (graph) in which the rule is being applied. A rule speciﬁes that, once an occurrence of graph L is found in the current (graph) state, it can be replaced with graph R, preserving K. Therefore, items associated to K are preserved, those in L that are not in K are consumed and those in R that are not in K are created. An MT-rule is a special graph rule that cannot consume elements exclusively in T2 and neither create elements exclusively in T1 . We will only consider rules that consume something. A model graph transformation is deﬁned as a set of MT-rules typed over an MT-type graph. Definition 3 (rules and model graph transformation - MGT). A T lq

rq

typed MT-rule is a tuple q : Lq ← Kq Rq , where q is the name of the rule, lq is a strict inclusion, rq is an injective morphism and Lq , Kq and Rq are T -typed graphs, such that ∀a ∈ Lq (tLq (a) ∈ (T2 − T1 ) → a ∈ rng(lq )) and ∀b ∈ Rq (tRq (b) ∈ (T1 − T2 ) → b ∈ rng(rq )). A model graph transformation is a tuple M = T, P , where T is an MT-type graph, P is a set of T -typed graph rules. The application of a rule is given by a direct derivation, which deletes all items that shall be consumed and includes all items that shall be created. A direct derivation exists only if for all deleted vertices, there are not arcs pointing to/from them. A derivation is a (possibly inﬁnite) sequence of direct derivap1 ,m1 p2 ,m2 tions, denoted by GT0 ⇒ GT1 ⇒ GT2 · · · , where each rule pi is applied to Gi−1 , using the occurrence mi and resulting in Gi . If i ≤ n, i. e., the derivation is ﬁnite, we denote by GT0 and GTn its initial and ﬁnal graphs, respectively. 2.3

Example

In Figure 2 we show an example of model transformation. This transformation takes a partial entity-relationship class diagram (left-side of Figure 2) and generates a detailed design (right-side). Due to space limitations we illustrate the transformation for only one entity (Paper). The transformation applies two wellknown design patterns in the JavaEE Community - DAO (Data Access Object) and DTO (Data Transfer Object) [8]. For each class stereotyped as Entity in the source model, a DAO interface and a DAO implementation classes with default methods are generated. Moreover, a DTO class is created, including the Entity attributes, gets/sets accessors and keeping other existing methods. The motivation to use these design patterns in a detailed design class diagram depends on design decisions, such as implementation platform, persistence mechanism and so on.

100

L. Ribeiro et al. <> PaperDAO

paperId: Integer title: String version: String publisher: Publisher authors: Collection

+ createPaper(paperDTO: PaperDTO): Boolean + deletePaper(paperID: Integer): Boolean + findPaper(paperID: Integer): PaperDTO + updatePaper(paperDTO: PaperDTO): Boolean

<<Entity>> Paper paperId: Integer title: String version: String publisher: Publisher authors: Collection

<> PaperDTO

Model Transformation

+ getPaperInfo(): String + setPaperID(id: Integer): void + getPaperID(): Integer + setTitle(title: String): void + getTitle(): String + setVersion(version: String): void + getVersion(): String + setPublisher(publisher: PublisherDTO): void + getPublisher(): PublisherDTO + setAuthors(authors: Collection): void + getAuthors(): Collection

<> PaperDAOImpl

+ getPaperInfo(): String

+ createPaper(paperDTO: PaperDTO): Boolean + deletePaper(paperID: Integer): Boolean + findPaper(paperID: Integer): PaperDTO + updatePaper(paperDTO: PaperDTO): Boolean

Fig. 2. A model transformation example

The explanation of those two design patterns used in the example goes beyond the scope of this paper. The illustrated situation is an example of an endogenous and vertical software transformation, taking a source model in UML (class diagram) and generating a target model, which is also a UML class diagram. It can be viewed as a PIM (Platform Independent Model) to PSM (Platform Speciﬁc Model) transformation in the MDA (Model-Driven Architecture) context [9]. The structured type graph for this example is shown in Figure 3. Metamodel T 1 has classes that may be stereotyped with Entity. In the transformed model, we will have 3 new kinds of stereotypes: DTO, Interface and DAO. Since we are describing an endogenous transformation, the basic modelling mechanism (class diagram) remains the same (shared by T 1 and T 2). Unstable items are dashed (the only stable items that are dashed are the relations Rea and Dep), and identiﬁed with .1 or .2 depending on which phase they should be treated. These items do not belong neither to T 1 nor to T 2. We adopt a graphical notation with special shapes for vertices and arcs and sometimes do not draw vertices and arcs explicitly (this the case for vertex Stereotype, written inside the square representing Class vertices in the rules). The rules that govern the transformation are depicted in Figures 4 and 5 (Phase 1) and Figure 6 (Phase 2). Rules in Figures 5 and 6 are generic (may be used with any T 1-model), but the rule in Figure 4 is speciﬁc to this model. In our example, we have one class with stereotype Entity, and therefore we have one start rule. This is the ﬁrst rule to be applied in the transformation, and it

T1

<<Entity>> Meth result

par Dep

tar attr

Class

T2

<> <> <>

tar.1 attr.1

Rea id

L.1 Old.2 attr.2 result.2

String

id.1

Class.1 DAO.1 DTO.1 Imp.1

NotEnt.2

Fig. 3. MT-Type Graph

TP1 = T1+T2+Unstable.1+Unstable.2 TP2 = T1+T2+Unstable.2

Model Transformation Using Graph Transactions Paper

Paper

<<Entity>>

tar

Paper

getPaperInfo

id.1

title

Integer

version

String

result.2

Old.2

startPaper paperID.1

paperID

getPaperInfo

DAO.1 Paper.1 DTO.1

Paper

result

tar.1

Imp.1

<<Entity>>

id

id

stype

101

publisher

authors

Publisher

Collection

version.1

title.1

String

Integer

publisher.1

Publisher

authors.1

Collection

NotEnt.2

Fig. 4. Start Rule – Phase 1

DTO.1

genDTO

E.1

DAO.1

E.1

<>

E.1

E.1

E

genDAO

E genAttr

attr.1

<>

<>

E

E

<>

E

E’ <>

E

tar.1

genTarget

result.2

Imp.1

genInter

Integer

<>

E

Integer

result

E

E L.1

deleteE updateE

<>

E

<>

E

result findE

removeTempEnt

E.1 <>

<

tar

Meth

E

findE

result createE

E.1 E.1

getatr

E.1

Meth <>

E

deleteE updateE

result

result

E.1

<>

createE

attr.2

setatr

E’

result

result

Integer

E.1

<>

E.1

result

Integer

<>

<>

E

E

link <>

E

removeStereoType

<<Entity>>

<>

E

L.1

Fig. 5. Rules – Phase 1

generates an unstable class vertex containing all attributes and methods of the class to be transformed. Moreover, it generates an Old.2 mark in the original class, marking it for deletion. The rules also mark all classes that are not of stereotype Entity by connecting them via unstable arcs to new unstable node (called NotEnt.2). We have to insert as many of these edges as the degree of each class vertex (each will be used to handle one of the references to this vertex). Other rules of phase 1 are shown in Figure 5. These rules create the new classes with new stereotypes, move all attributes and methods of original classes

E’

E’

E’

attr.2

redirectAttrDTO attr

<>

E’

E’

Meth

E

E

<>

<>

E

redirectAttrNE

E’

attr

<>

E’

E

removeNotEnt

E’

result.2

Meth

redirectParNE

Old.2

id

E’

result

NotEnt.2

NotEnt.2

E NotEnt.2

Meth result

<>

E’

NotEnt.2

NotEnt.2

redirectParDTO

<>

E’

attr.2

E’

result.2

<>

<>

E

Fig. 6. Rules – Phase 2

removeEntity

E

Meth

102

L. Ribeiro et al.

to these new classes, create the required relationship between these new classes, and ﬁnish by removing the unstable entity node and the stereotype Entity (these rules can only be applied if there is no arc referencing these vertices). Then we proceed to Phase 2, depicted in Figure 6: all links connected to old classes are moved to new classes and the process ﬁnished by removing all old classes.

3

Graph Transactions

The notion of graph transaction was introduced in [7] to describe derivations that accomplish some task. Here, we extend this concept to handle MT-graphs, and thus transactions will characterize complete model transformations. A transaction always starts in states with elements of type T1 and ﬁnishes in states with elements of TP2 (condition 1 in the next deﬁnition). During the execution of a transaction, no stable state is reached (except the ﬁnal state), that is, a transaction can not be split into other transactions (condition 2). The start state of a transaction contains exactly what is needed to accomplish the transaction (condition 3) and all elements of the target metamodel (T 2), once created, can not be consumed within the transaction (condition 4). MT-transactions are transactions of phase 2, reaching a state in which all elements are of type T 2. q1 ,m1

Definition 4 (transaction). A transaction is a derivation ρ = G0 ⇒ . . . qn ,mn ⇒ Gn which satisfies the following properties: (1) G0 is typed over T1 and Gn is typed over TP2 ; (2) any intermediate graph Gi (i = 0, n) is not stable (with respect to TP 2 ); (3) for all a ∈ G0 , there is x ∈ Lqi , such that mi (x) = a; and (4) for each derivation step δ (in ρ), if a ∈ Gi is created by δ, such that tGi (a) ∈ T2 , then there is no derivation step in ρ consuming a. If all items in Gn are typed over T2 , ρ is an MT-transaction. An mgt can be seen at three levels of abstraction. It can be viewed as a model transformation where both stable and unstable items of states are visible, but we can also abstract away from the unstable.1 states and observe only complete transactions of phase 1. Formally, this gives rise to another mgt, where the rules correspond all transactions of the original mgt. Analogously, we can obtain complete transactions of phase 2. This deﬁnition requires the notion of the rule induced by a derivation sequence, a known construction in the literature [5]. which basically builds a rule using the initial and ﬁnal graphs of the derivation. Considering an mgt M, we can build an abstract mgt A(M): the type graph is the stable type graph of M, that is, TP 2 ; the set of rules contain all abstract rules corresponding to transactions of M. In A(M) we have the abstract mgt associated to the first phase of transformation model, because we forgot all unstable.1 elements and can execute each transaction of M in only one step. By taking A(A(M)), we have the abstract mgt associated to the second phase of transformation, because we forgot unstable.1 and unstable.2 elements and can execute each MT-transaction of A(M) in only one step.

Model Transformation Using Graph Transactions

103

Definition 5 (Abstract MGT). Let M = TP1 , TP2 , T1 , T2 , P be an mgt. The abstract mgt associated to M, denoted by A(M), is the mgt TP2 , TA , T1 , T2 , P , where P is the set of abstract rules corresponding to transactions ρ of M and TA is the graph T1 ∪ T2 . We denote by A1 (M) = A(M) and A2 (M) = A(A(M)), the abstract mgts associated, respectively, to the first and second transformation phases of M. Transactions of phase 1 describe local transformations of each class, without considering interconnections between classes. Transactions of phase 2 (or MTtransactions) show the global view of the transformation process. In our example, there is one transaction performing the local transformation of class Paper, leaving all references to other classes to be solved in the next phase. This transaction is given by the application of rules startPaper, genDTO, genAttr (5 times), genTarget, genDAO, genInter and removeTempEnt. The abstract rule associated to the MT-transaction (phase 2) is the one shown in Figure 2. Main theoretical results concerning transactions that can be adapted to model graph transformations. An abstract rule represents in a concise way essentially the same transformation of the corresponding sequence of (concrete) rules. Indeed, the abstract system of an mgt has the same transactions of the concrete mgt, i.e., their behaviors are the same when we forget the auxiliary (unstable) elements. This fact was proved in [7] by means of the construction of an adjunction between the two views (abstract and concrete) of the system. The proof has been carried out for graph transformation systems with simple type graphs. These results can be lifted to model graph transformations, allowing the use of abstract rules as basis for analysis of model transformations (instead of using all concrete ones). This is possible due to the restrictions imposed on MT-type graphs. These restrictions allow also to use the procedure deﬁned in [6] to construct all transactions of a graph transformation system, obtaining automatically the abstract system of a model transformation. Now we discuss how transactions can be used to aid analysis of model transformations. Termination: The transformation process ﬁnishes when there is no graph rule that can be applied and the current state is typed over T2 . By ﬁnding the abstract rules that represent the transformation, we can be sure that it is possible that the process terminates. To guarantee that this always happens, we have to perform proofs by induction based on the initial model and the rules that describe the transformation using, for example, approaches like [10] and the AGG tool [11]. Confluence: A model transformation is conﬂuent if, for each source model, the processes of transformation results in a unique target model. For graph transformations, conﬂuence is well studied [5]. In the proposed approach, proving conﬂuence reduces to the proof of existence of only one rule at the abstract system of the second phase, i.e., that the mgt has only one MT-transaction. Semantical Correctness: A model transformation can be considered correct if it preserves the semantics of the system in the desired way. An approach to prove semantical correctness of model transformations by using graph transformation

104

L. Ribeiro et al.

rules to deﬁne the operational behavior of visual models was presented in [12]. In [13] a formal deﬁnition of compositionality for mappings from typed graphs to semantic domains was proposed. Besides, several works have investigated behavior preservation of model refactoring using graph transformations, such as [3], [14]. These approaches deal with very simple transformations, which can be deﬁned by only one rule, or provide ad hoc methods to obtain the abstract rule that corresponds to a whole transformation. For complex transformations, it is usually not true that each rule preserves semantics, since the target model is completed only at the end of a series of rule applications. By generating the abstract rules corresponding to a model transformation, we provide a formal basis for the use of existing approaches in the context of complex transformations.

4

Conclusion

The main aim of this paper is to present an approach to enable reuse of models via formally deﬁned transformations. We provide a framework in which complex transformations may be described in a well-founded way, by characterising explicitly elements that are needed as auxiliary items during the transformation process, and proposing two phases to deal with diﬀerent aspects of the transformation process (local vs. global transformation). Abstract rules associated to transactions cab be used as basis for analysis. Due to space limitations, it was not possible to illustrate the proposed concepts with a larger example, nor show the results formally, we rather focused on presenting the main ideas and potential of the proposed approach. As future work, we plan to extend this approach to attributed graphs, as well as use negative application conditions [5]. We also intend to extend existing tools to implement automatic generation and analysis based on transactions.

References 1. Schmidt, D.C.: Model-driven engineering. IEEE Computer 39(2), 25–31 (2006) 2. Mens, T., Gorp, P.V.: A taxonomy of model transformation. ENTCS, vol. 152, pp. 125–142 (2006) 3. Mens, T., Demeyer, S., Janssens, D.: Formalising behaviour preserving program transformations. In: Corradini, A., Ehrig, H., Kreowski, H.-J., Rozenberg, G. (eds.) ICGT 2002. LNCS, vol. 2505, pp. 286–301. Springer, Heidelberg (2002) 4. Ehrig, H., Ehrig, K.: Overview of formal concepts for model transformations based on typed attributed graph transformation. ENTCS, vol. 152, pp. 3–22 (2006) 5. Rozenberg, G.: Handbook of Graph Grammars and Computing by Graph Transformation, Fondations, vol. 1. World Scientific, Singapore (1997) 6. Foss, L.: Transactional Graph Transformation Systems. Phd thesis, Federal University of Rio Grande do Sul, Porto Alegre, Brazil (2008) 7. Baldan, P., Corradini, A., Foss, L., Gadducci, F.: Graph transactions as processes. In: Corradini, A., Ehrig, H., Montanari, U., Ribeiro, L., Rozenberg, G. (eds.) ICGT 2006. LNCS, vol. 4178, pp. 199–214. Springer, Heidelberg (2006)

Model Transformation Using Graph Transactions

105

8. Alur, D., et al.: Core J2EE Patterns: Best Practices and Design Strategies. Core Design Series. Sun Microsystems, Inc (2003) 9. OMG: Mda guide version 1.01 (2003) 10. Costa, S.A., Ribeiro, L.: Formal verification of graph grammars using mathematical induction. In: Brazilian Symposium on Formal Methods, pp. 161–176 (2008) 11. Taentzer, G.: AGG: A graph transformation environment for modeling and validation of software. In: Pfaltz, J.L., Nagl, M., B¨ ohlen, B. (eds.) AGTIVE 2003. LNCS, vol. 3062, pp. 446–453. Springer, Heidelberg (2004) 12. Ehrig, H., Ermel, C.: Semantical correctness and completeness of model transformations using graph and rule transformation. In: 4th Int. Conference on Graph Transformations, pp. 194–210. Springer, Heidelberg (2008) 13. Bisztray, D., Heckel, R., Ehrig, H.: Compositionality of model transformations. ENTCS, vol. 236, pp. 5–19 (2009) 14. Rangel, G., Lambers, L., Knig, B., Ehrig, H., Baldan, P.: Behavior preservation in model refactoring using dpo transformations with borrowed contexts. In: Ehrig, H., Heckel, R., Rozenberg, G., Taentzer, G. (eds.) ICGT 2008. LNCS, vol. 5214, pp. 242–256. Springer, Heidelberg (2008)

Refactoring Feature Modules Martin Kuhlemann1 , Don Batory2, and Sven Apel3 1 University of Magdeburg, Germany [email protected] 2 University of Texas at Austin, USA [email protected] 3 University of Passau, Germany [email protected]

Abstract. In feature-oriented programming, a feature is an increment in program functionality and is implemented by a feature module. Programs are generated by composing feature modules. A generated program may be used by other client programs but occasionally must be transformed to match a particular legacy interface before it can be used. We call the mismatch of the interface of a generated program and a client-desired interface an incompatibility. We introduce the notion of refactoring feature modules (RFMs) that extend feature modules with refactorings. We explain how RFMs reduce incompatibilities and facilitate reuse, and report our experiences on ﬁve case studies.

1

Introduction

In feature-oriented programming, a feature is an increment in program functionality and is implemented by a feature module [1]. Feature modules can add new classes to a program, add new members, and extend members of existing classes. It is common for a program composed from feature modules to be used by another program [2], which we call an environment . An environment expects a composed program to have names of classes or methods that can be diﬀerent from what was generated. We call the non-matching of expectations an incompatibility between the composed program and its environment. Incompatibilities occur frequently and hinder reuse [9,20,14]. In this paper, we concentrate on refactorings to eliminate incompatibilities. A refactoring alters the structure of a program but not its behavior [22,6]. Existing approaches can be used to integrate a program – also with refactorings – but they have problems: To adapt a program composed from feature modules using contemporary refactoring engines like Eclipse [7], the program has to be composed ﬁrst and then refactorings are applied to it. The key problem is, if there are n optional features in producing a program and m optional refactorings, then

Martin Kuhlemann was supported and partially funded by the DAAD Doktorandenstipendium (No. D/07/45661). Batory’s work was supported by NSF’s Science of Design Project #CCF-0724979. Sven Apel’s work was supported in part by the German Research Foundation (DFG), project number AP 206/2-1.

S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 106–115, 2009. c Springer-Verlag Berlin Heidelberg 2009

Refactoring Feature Modules

107

2n+m program variants are possible. Hence, brute force is not an option [10]. Wrappers (a.k.a. adapters), as a second approach, increase program complexity as they introduce additional methods and classes [9], and meta-programs require developers to guarantee the resulting program can be compiled. We later discuss these approaches and others in detail. In contrast to prior work, we aim at a uniﬁcation of features and refactorings in order to establish a general model of conﬁgurable and reusable software based on transformations. We propose that object-oriented refactorings be included in feature modules, Feature Module Base called refactoring feature modules (RFMs). public class Container { 1 List elements; 2 We illustrate how RFMs automate recurvoid insert front(Element e){ 3 ring tasks that eliminate incompatibilities. elements.add(e); 4 } 5 When an oﬀ-the-shelf program is moved } 6 into a feature module then RFMs help automate its integration with other programs. Feature Module LimitedSize We demonstrate the practicality of our refines class Container { 7 int depth; 8 approach with ﬁve case studies.

2

Background

void setElements(List newElems){ elements=newElems; } void insert front(Element e){ Super.insert front(e); depth= elements.size(); }

9 10 11 12 13 14 15 16

Feature-Oriented Design. In Figure 1, } we show three feature modules implemented in Jak, a superset of Java that Feature Module ContainerAsDeque supports feature modularity and feature public class Deque { 17 Container c; 18 composition [1]. When feature modules are void add front(Element e){ 19 selected in a conﬁguration process they add c.insert front(e); 20 } 21 classes or class reﬁnements to a given provoid setElements(List newElems){ 22 gram. A class reﬁnement , which is indic.setElements(newElems); 23 } 24 cated by the keyword refines, adds members } 25 to and extends methods of existing classes. Feature module Base deﬁnes class Container. Module LimitedSize reﬁnes class Fig. 1. Feature-oriented design of a Container by adding ﬁeld depth and container library method setElements. Existing methods are extended by overriding, e.g., method insert front (Lines 12-15) reﬁnes method insert front of class Container (Lines 3-5) via an inheritance-like mechanism. This method reﬁnement adds statements and calls the reﬁned method using Jak’s keyword Super (Line 13). Feature module ContainerAsDeque adds a new class Deque. The result of composing Base, LimitedSize, and ContainerAsDeque includes both classes Container and Deque. In this example, Deque is a wrapper class (a.k.a. adapter class) for Container, i.e., by delegating methods it makes Container objects accessible under the name Deque and makes method insert front of Container accessible under the name add front of Deque.

108

M. Kuhlemann, D. Batory, and S. Apel

Container _elements _depth insert_front() setElements()

(a)

ContainerToDeque refactoring

Deque _elements _depth insert_front() setElements()

InsertToAdd refactoring

(b)

Deque _elements _depth add_front() setElements()

(c)

Fig. 2. Refactoring Container with ’Rename Class’ and ’Rename Method’

Refactoring. A refactoring is a transformation that alters the structure of a program without altering its observable behavior [22,6]. One of the uses of refactorings is to remove incompatibilities among programs in order to increase reuse [22]. Two common refactorings are ’Rename Method’ and ’Rename Class’.1 We use them as examples throughout the paper. In Figure 2a, we depict class Container that has been composed from the feature modules Base and LimitedSize of Figure 1. Figure 2b shows the result of performing the refactoring ContainerToDeque. ContainerToDeque renames class Container into Deque and adjusts all references. Figure 2c shows the resulting class after the refactoring InsertToAdd. InsertToAdd renames method insert front into add front and adjusts all calls. Refactorings have parameters that deﬁne the target program elements [19]. For example, the parameters of a ’Rename Method’ refactoring are (1) the qualiﬁed name of the method to rename and (2) the new method name. We use the term refactoring type for the template that expects parameters. For example, ’Rename Class’ and ’Rename Method’ are refactoring types. Once its parameters are provided, the refactoring is fully speciﬁed and can be applied. Fully speciﬁed refactorings can have names, e.g. the ’Rename Class’ refactoring that renames Container to Deque is called ContainerToDeque (see Fig. 2).

3

Refactoring Feature Modules (RFMs)

A refactoring feature module (RFM) integrates refactorings with feature modules. The basic idea is to deﬁne refactorings in refactoring units that become elements of feature modules. By packaging one refactoring per feature module, a particular sequence of refactorings can be applied to a program, just like feature module sequences are composed to build programs. That is, program generation and restructuring are integrated with RFMs. Concept. Every refactoring type has an interface, which contains a getter method for each parameter of the refactoring type. A refactoring unit is a classlike module that implements a refactoring interface. It implements each getter method by returning the value for a designated parameter. Together the parameter values of a refactoring unit fully specify a particular refactoring. We 1

’Rename Method’ changes the name of a method and ’Rename Class’ changes the name of a class [6].

Refactoring Feature Modules

109

choose to represent refactorings as class-like modules because this is similar to feature-oriented reﬁnements, and technically it allows us to reuse tool support. Figure 3 depicts a sample RFM, called ContainerToDeque, which encapsulates a Feature Module ContainerToDeque refactoring unit MyRenameClass. MyRenameClass deﬁnes a ’Rename Class’ refac- refactoring MyRenameClass implements 1 RenameClassRefactoring { toring, i.e., it implements the interface String getOldClassId(){return ”Container”;} 2 RenameClassRefactoring and deﬁnes the String getNewClassName(){return ”Deque”;} 3 4 getter methods getOldClassId and get- } NewClassName. The getters of MyRenameClass return the qualiﬁed name of Fig. 3. Refactoring unit that renames the class to rename and the new class Container into Deque name. Refactorings of other types than ’Rename Class’ are deﬁned analogously. In Figure 4, we show a design in which RFMs Base are applied successively (in top-down order) to Container the composition of the two feature modules _elements Base and LimitedSize. InsertToAdd is composed insert_front() after ContainerToDeque and renames method LimitedSize Deque.insert front into add front. When all modules Container are selected in a conﬁguration process the result is _depth the same as composing the feature modules of Figinsert_front() ure 1 (class Container is accessible under the name setElements() Deque; method Container.insert front is accessible ContainerToDeque under the name add front of class Deque). <> Control the scope of RFMs. A transformation MyRenameClass is applied when an RFM is composed [1]. That is, getOldClassId() the program that is synthesized by composing modgetNewClassName() ules prior to an RFM is transformed by that RFM. InsertToAdd In Figure 4, the classes refactored by the RFMs <> ContainerToDeque and InsertToAdd are limited to MyRenameMethod classes created by feature modules these RFMs folgetOldMethodId() low. That is, ContainerToDeque refactors the code getNewMethodName() added by the feature modules Base and LimitedFig. 4. Sequence of RFMs Size but not code added/changed by InsertToAdd as InsertToAdd is composed after ContainerToDeque. If an additional feature module NewContainer would apply after ContainerToDeque and introduce a second class Container, like feature module Base, this class would not be aﬀected by ContainerToDeque because it would be added after ContainerToDeque. With RFMs, program elements can be both added and deleted (e.g., renaming can be represented as a sequence of deleting and creating code elements). After an RFM renames a method, the old method no longer exists. If a subsequent feature module or RFM references the renamed method by its old name, an error is reported. To guarantee the absence of such errors in all feature compositions is possible with techniques of safe composition [12], another topic that we are investigating.

110

M. Kuhlemann, D. Batory, and S. Apel

Deque

from Fig. 1 without RFMs

Container _elements _depth insert_front() setElements()

add_front() setElements()

Deque _elements _depth add_front() setElements()

(a)

from Fig. 4 with RFMs

(b)

Fig. 5. Composition results

RFMs in Action. In the introduction, we observed that generated programs often do not have the correct structure for them to be reused as-is in an environment (e.g., legacy code) [9]. Here is where RFMs can eliminate incompatibilities and promote reuse without altering the functionality of the generated program. RFMs allow us to avoid forwarding methods and classes (commonly used for integration), and thus simplify the resultant program. Figure 5 shows, the composed program in Figure 5b only encapsulates a class with the desired name Deque and no obsolete class Container as in Figure 5a. Tool Support. We have implemented RFMs as an extension to the Jak language, which adds support for feature modules to Java [1]. We use the AHEAD tool suite [1] to compose feature modules. We extended AHEAD with a plugin mechanism that encapsulates a template program of one refactoring type, e.g., the ’Rename Method’ refactoring type is implemented in its own plugin. Refactoring units refer to a plugin with their interface declaration and parameterize the refactoring template program with their getters. More details are given in a technical report [11].

4

Case Studies

We report on experiences with RFMs using two larger and three smaller case studies. We (1) transformed an oﬀ-the-shelf library in order to be able to reuse it in an incompatible database engine; (2) integrated variants of conﬁgurable libraries using RFMs with a legacy environment; (3) used uncommon refactorings to integrate a library of graph data types (GDTs); (4) integrated a large Eclipse library using RFMs with minimal eﬀort, and (5) integrated a library of abstract data types (ADTs) and thereby removed wrapper classes and methods that became obsolete with RFMs. In Table 1, we show the transformed programs and the refactorings applied to them. Logging libraries. The oﬀ-the-shelf logging library Log4J2 cannot be used as-is by the database SmallSQL3 (∼20K lines of source code) due to incompatibilities. To standardize logging in SmallSQL we applied three RFMs to restructure Log4J such that it can be used in SmallSQL. One RFM moves class org.apache.log4j.Logger into the SmallSQL package smallsql.database and two RFMs rename 2 3

http://logging.apache.org/log4j/ http://www.smallsql.de/

Refactoring Feature Modules

111

Table 1. Information on case studies Program Log4J ZipMe Raroscope TrueZip GDT Workbench.texteditor ADT library

#SLOC* Refactorings ∼12K ∼3K ∼250 ∼13K ∼1K

1x Move Class, 2x Rename Method 2x Move Class, 1x Rename Class 2x Move Class, 2x Rename Class 2x Move Class, 2x Rename Class, 1x Rename Method 4x Move Class, 2x Rename Class, 2x Rename Method, 6x Encapsulate Field, 2x Extract Interface ∼16K 1x Rename Class, 2x Rename Field 59 1x Rename Class, 1x Rename Method *lines of source code

methods into SmallSQL-compatible names. The RFMs transform single code elements and a number of references to these elements automatically. For example, to make class Logger compatible, we did not have to know and enumerate those 144 points in 38 Log4J classes (distributed over 10 packages), that reference the moved class and must be transformed; we also did not have to ﬁnd the numerous members that needed to be qualiﬁed as public when we moved the class – the ’Move Class’ RFM performs these transformations automatically. We observed that one incompatibility could not be eliminated by refactoring Log4J. We introduced with a feature-oriented reﬁnement a single default constructor into the Logger class that calls setters. This way, RFMs do not replace reﬁnements but complement them to integrate programs. As a result, we can now select either the informal SmallSQL logging engine or the Log4J standard logging library for the SmallSQL database in a conﬁguration process. RFMs allow Log4J (and future releases of it) to be reused in the formerly incompatible SmallSQL environment. We deﬁned the adaptation changes once. We found the eﬀort to deﬁne RFMs is small and found it comfortable that the code changes the selected RFMs must perform are applied automatically (hidden from us). Conﬁgurable compression libraries. ZipMe4 is a library to access ZIP archives, Raroscope5 is a library to access RAR archives, and TrueZip6 can access TAR archives. The used versions of ZipMe and Raroscope are conﬁgurable, i.e., diﬀerent library variants can be composed for each of them from selectable features like Checksum. Furthermore, we developed a graphical tool that used an old library to analyze ﬁles inside ZIP archives (ﬁle names, last modiﬁcation date, uncompressed footprint) and decompress them. We wanted to replace the old library with ZipMe, Raroscope, and TrueZip to also analyze RAR and TAR archives with our tool. But all variants of these libraries were incompatible with our tool. We applied a number of RFMs to automatically restructure variants of 4 5 6

http://sourceforge.net/projects/zipme/ http://code.google.com/p/raroscope/ https://truezip.dev.java.net

112

M. Kuhlemann, D. Batory, and S. Apel

the libraries such that they can be reused in our tool (e.g., in ZipMe we renamed class ZipArchive into ZipFile). We observed that some incompatibilities cannot be eliminated by refactoring the library variants. This was the case for creating archive representations – our tool passes a File argument but all variants of the libraries take a FileInputStream or String argument. We added a feature module with a single factory method to each library and call the methods in order to bridge this gap. For TrueZip the feature module also encapsulates a method to access streams of single archive entries with certain parameters. Raroscope provides no such streams, so we disabled decompression here (still we can analyze archives). Again, RFMs do not replace reﬁnements but complement them to integrate programs. Technically, the version of ZipMe can be composed from 13 features to 26 different variants. The version of Raroscope can be composed from 5 features to 24 diﬀerent variants. We composed diﬀerent variants of the conﬁgurable ZipMe and Raroscope libraries and all were compatible with our tool automatically when we selected the refactoring features.7 Interestingly, the variants were compatible although only the fully-ﬂetched versions had been composed before. After we applied RFMs to TrueZip, its implementation became compatible with our tool. We now also can analyze and decompress TAR archives with our tool. Beside of renaming and moving the TrueZip representations of archives and archive entries, we had to rename the archive method getArchiveEntries into entries because our tool expects this name. For TrueZip, RFMs automate adaptation changes when new versions of TrueZip are released. Graph library. We integrated a conﬁgurable library of GDTs [17] (15 features, 55 library variants) with RFMs into an incompatible environment that used originally the graph library OpenJGraph8 . Beside renaming and moving Graph and Vertex classes, we had to encapsulate 6 ﬁelds in these classes with access methods using RFMs and had to extract interfaces for these classes. With reﬁnements we added ﬁve methods. With RFMs, we can now conﬁgure multiple GDT variants to be compatible with the OpenJGraph client. Eclipse library. Dig et al. reported on incompatible environments of the Eclipse library ’workbench.texteditor’ (16K lines of source code) [5]. We applied three RFMs to automatically restructure ’workbench.texteditor’ such that it can be reused in these environments. One RFM renames class Levenshtein into Levenstein because this name was expected in the environment and two RFMs rename ﬁelds from levenshtein into levenstein. In this study, three simple RFMs automatically integrate the large library (and its future releases) with aforesaid environments. Abstract data types. Our running example of Figure 1 (Container class with its wrapper class Deque) leans on a conﬁgurable library of ADTs (5 features, 7 library variants) [2]. We reimplemented this feature-oriented design with RFMs 7

8

Informally, we performed primitive performance tests and found that our tool decompressed ZIP archives ∼5% faster with (fully-ﬂetched) ZipMe than with the replaced old library, i.e., integrating ZipMe was beneﬁcial. http://sourceforge.net/projects/openjgraph/

Refactoring Feature Modules

113

as we have shown in Figure 4. With RFMs, we can now automatically integrate diﬀerently conﬁgured library variants just by selecting refactoring features. In this study, we removed wrapper classes and methods, that provided access to classes and methods under a diﬀerent name but became obsolete with RFMs. Summary. RFMs integrate well with feature-oriented reﬁnements. RFMs allow libraries to be reused in environments they were incompatible with before. Specifically, RFMs can apply (sequences of) pre-deﬁned refactorings to hand-written or synthesized programs automatically. After deﬁning RFMs, any number of variants of a conﬁgurable library can be conﬁgured to be compatible with an environment. While renaming appears most important, RFMs in our perspective may encapsulate any transformation which aﬀects structure but not semantics, e.g., ’Extract Interface’ refactoring [6] (cf. GDT study). We observed that RFMs complicate debugging because the refactored classes of the debugged program diﬀer from the developed classes inside the feature modules. Hence, we need advanced debugging tools that keep track of the performed refactorings such that changes to the program’s classes are triggered back to the feature modules automatically.

5

Related Work

Diﬀerent styles of wrapper modules (a.k.a. adapters) forward method calls to wrapped objects in order to integrate incompatible code and to increase reuse, e.g., [8,14,4]. Wrappers exist simultaneously with their wrapped objects and so a wrapper is a second way of accessing a wrapped object. RFMs transform bodies of classes such that there is no second way to access objects of a transformed class. However, RFMs avoid problems that wrappers have: Wrappers increase implementation and maintenance eﬀort when they add methods and classes [18,9]. The forwarding methods of wrappers impact negatively on performance and footprint of the resultant program [9]. Wrappers are complex because (a) wrapper objects have diﬀerent identities than the wrapped object [5,23,9] and (b) they cause type problems as their location in a type hierarchy diﬀers from the location of the wrapped class (redundant hierarchies emerge) [9,23]. Meta-programming approaches like [20,25,3] restructure programs beyond refactoring and generally do not guarantee that generated programs are compilable. Refactoring units parameterize pre-deﬁned meta-programs implemented globally in plugins of our composer. Therefore, developers of RFMs do not care whether generated programs are compilable (ensured by our composer). Some researchers propose refactoring meta-programs [26,13] or refactorings as language concepts [15]. They all do not integrate refactoring with feature transformations and do not provide a general model of conﬁgurable and reusable software. In order to adapt a program composed from feature modules using contemporary refactoring engines like Eclipse [7] every program variant has to be composed ﬁrst and then refactorings are applied to each variant. Since possibly many (up to millions) combinations of feature modules can be composed this approach is not feasible [10]. Re-applying a common set of refactorings with such engines

114

M. Kuhlemann, D. Batory, and S. Apel

on constantly updated incompatible programs is error-prone and laborious as well [18,9]. RFMs automate refactorings and sequences of refactorings. RFMs are selected in a conﬁguration process and thus apply at the time a program is composed (in case their feature is selected) – thus, RFMs make refactored programs available even if refactorings were not recorded for them individually. ReBA [5] helps to integrate a library into environments that use the library but rely on an outdated interface of that library. ReBA uses a trace of edits and refactorings, which lead from the old to the evolved library version. Using this trace, ReBA adds code which allows using the evolved instead of the old version, e.g., elements, that are deleted in the evolved version, are added back when referenced. RFMs can bridge incompatibilities that occur when a library A is replaced by a completely diﬀerent library B. Thereby, in general no helpful trace is available, which maps all code of A to all code of B. In KIDS, users select correctness-preserving code transformations that improve performance and footprint, e.g., partial evaluation [24]. Refactoring transformations keep correctness too. When selected, RFMs restructure a program to simplify its reuse. Note, by inlining methods and classes with refactorings, RFMs may also improve performance and footprint of a composed program. Feature-oriented refactoring [16] and aspect-oriented refactoring [21] decompose a program into feature modules of a feature-oriented design and aspects respectively. In contrast, RFMs perform object-oriented refactorings on a program which is composed from features.

6

Conclusion

The structure and features (increments in functionality) of a program are important for the program to be reused by an environment. When the interface of a generated program and a client-desired interface mismatch, the generated program cannot be reused by this client. In current technology, transformations to alter the structure of programs (e.g., refactorings) and to alter the features of programs (e.g., feature modules) are still treated as disjoint concepts. In this paper, we have introduced refactoring feature modules (RFMs), which integrate feature modules with refactorings. An RFM automatically alters the structure of programs, which are composed from feature modules. We have implemented support for RFMs and demonstrated in a number of case studies that RFMs can help to reuse programs. Speciﬁcally, we showed that with RFMs the studied programs can be integrated automatically and reused in environments they were incompatible with before, i.e., RFMs simplify the reuse of code.

References 1. Batory, D., Sarvela, J.N., Rauschmayer, A.: Scaling step-wise reﬁnement. TSE 30(6), 355–371 (2004) 2. Batory, D., Singhal, V., Sirkin, M., Thomas, J.: Scalable software libraries. In: Anderson, R. (ed.) FSE 1993. LNCS, vol. 809, pp. 191–199. Springer, Heidelberg (1994) 3. Biggerstaﬀ, T.J.: A new architecture for transformation-based generators. TSE 30(12), 1036–1054 (2004)

Refactoring Feature Modules

115

4. Bosch, J.: Design patterns as language constructs. JOOP 11(2), 18–32 (1998) 5. Dig, D., Negara, S., Mohindra, V., Johnson, R.: ReBA: Refactoring-aware binary adaptation of evolving libraries. In: ICSE, pp. 441–450 (2008) 6. Fowler, M.: Refactoring: Improving the design of existing code. Addison-Wesley Longman Publishing Co., Inc, Amsterdam (1999) 7. Fuhrer, R.M., Keller, M., Kie˙zun, A.: Advanced refactoring in the Eclipse JDT: Past, present, and future. In: WRT (2007) 8. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: Elements of reusable object-oriented software. Addison-Wesley, Reading (1995) 9. H¨ olzle, U.: Integrating independently-developed components in object-oriented languages. In: Nierstrasz, O. (ed.) ECOOP 1993. LNCS, vol. 707, pp. 36–56. Springer, Heidelberg (1993) 10. Krueger, C.W.: New methods in software product line practice. CACM 49(12), 37–40 (2006) 11. Kuhlemann, M., Batory, D., Apel, S.: Refactoring feature modules. Technical Report 15, Faculty of Computer Science, University of Magdeburg (2008) 12. Kuhlemann, M., Batory, D., K¨ astner, C.: Safe composition of non-monotonic features. In: GPCE (2009) 13. L¨ ammel, R.: Towards generic refactoring. In: Workshop on Rule-Based Programming, pp. 15–28 (2002) 14. Lau, K.-K., Ling, L., Ukis, V., Velasco Elizondo, P.: Composite connectors for composing software components. In: Lumpe, M., Vanderperren, W. (eds.) SC 2007. LNCS, vol. 4829, pp. 266–280. Springer, Heidelberg (2007) 15. Lewis, J.R., Shields, M.B., Meijert, E., Launchbury, J.: Implicit parameters: Dynamic scoping with static types. In: POPL, pp. 108–118 (2000) 16. Liu, J., Batory, D., Lengauer, C.: Feature-oriented refactoring of legacy applications. In: ICSE, pp. 112–121 (2006) 17. Lopez-Herrejon, R.E., Batory, D.: A standard problem for evaluating product-line methodologies. In: GCSE, pp. 10–24 (2001) 18. Mattsson, M., Bosch, J.: Framework composition: Problems, causes and solutions. In: Marie, R., Plateau, B., Calzarossa, M.C., Rubino, G.J. (eds.) TOOLS 1997. LNCS, vol. 1245, pp. 203–214. Springer, Heidelberg (1997) 19. Mens, T., Eetvelde, N.V., Demeyer, S., Janssens, D.: Formalizing refactorings with graph transformations: Research articles. Journal of Software Maintenance and Evolution: Research and Practice 17(4), 247–276 (2005) 20. Mezini, M., Seiter, L., Lieberherr, K.: Component integration with pluggable composite adapters. Kluwer Academic Publishers, Dordrecht (2000) 21. Monteiro, M.P., Fernandes, J.M.: Towards a catalog of aspect-oriented refactorings. In: AOSD, pp. 111–122 (2005) 22. Opdyke, W.F.: Refactoring object-oriented frameworks. PhD thesis, University of Illinois at Urbana-Champaign (1992) 23. Sekaraiah, K.C., Ram, D.J.: Object schizophrenia problem in modeling Is-Role-Of inheritance. In: Inheritance Workshop (2002) 24. Smith, D.R.: KIDS: A knowledge-based software development system. In: Automating Software Design, pp. 483–514 (1991) 25. Tatsubori, M., Chiba, S., Killijian, M.-O., Itano, K.: OpenJava: A class-based macro system for Java. In: Cazzola, W., Stroud, R.J., Tisato, F. (eds.) Reﬂection and Software Engineering. LNCS, vol. 1826, pp. 117–133. Springer, Heidelberg (2000) 26. Verbaere, M., Ettinger, R., de Moor, O.: JunGL: A scripting language for refactoring. In: ICSE, pp. 172–181 (2006)

Variability in Automation System Models Gerd Dauenhauer, Thomas Aschauer, and Wolfgang Pree C. Doppler Laboratory Embedded Software Systems, University of Salzburg Jakob-Haringer-Str. 2, 5020 Salzburg, Austria [email protected]

Abstract. Model driven engineering as well as software product line engineering are two approaches that increase the productivity of creating software. Despite the rather mature support of the individual approaches, tools and techniques for their combination, promising product specific customization of models, are still inadequate. We identify core problems of current approaches when applied to automation system models and propose a solution based on an explicit notion of variability embedded in the core of the modeling language itself.

1 Introduction Model driven engineering (MDE) is becoming increasingly popular for developing complex software intensive systems. Prominent examples include the Object Management Group’s Model Driven Architecture [1] initiative, targeted mainly at generating executable software, and MATLAB/Simulink [2], which is widely used for example in the automotive industry for designing control algorithms. Both approaches allow the user to define the behavior of a system in terms of a high-level model which is then transformed into low-level implementation code. Besides executable code, MDE may also be targeted at other artifacts such as configuration files. Our group, for example, cooperates with a provider of a specific kind of automation systems, called testbeds used for example in the automotive industry for developing combustion engines. Due to the ever changing measurement tasks during engine development, testbeds must be highly flexible and customizable. This flexibility is achieved through configuration parameters of the automation system software. Instead of the laborious and error prone process of manually configuring the automation system – a typical configuration comprises tens of thousands of individual parameter values – we apply MDE to let the users work with models of testbeds and to automatically derive configuration data. Figure 1 shows the tool chain. Since a testbed can be used for different measurement tasks, its automation system software has to be configured accordingly. Think for example of a testbed for diesel and gasoline engines. If a diesel engine is operated, all gasoline related hardware and software parts of the testbed must be disabled, and vice versa. In order to derive configuration parameters, the model must be modified each time a different task is to be performed. For a typical testbed, however, different usage scenarios can be anticipated. Instead of manually modifying the testbed model each time the measurement task changes, the testbed model could already incorporate these usage scenarios, so the model would allow choosing between predefined model variants. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 116– 125, 2009. © Springer-Verlag Berlin Heidelberg 2009

Variability in Automation System Models

117

Fig. 1. Model driven engineering for configuration parameter generation

Expressed in terms of software product line engineering (SPLE) [3], such a testbed model represents a product line, from which specific products can be generated by making decisions at variation points. While the product line comprises the union of all possible testbed model variants, a product represents a specific testbed model from which configuration parameters can be generated. Variation points in our case are for example the choices between diesel and gasoline fuel. Products are created from reusable assets according to selections made at variation points; in our case these assets are for example model fragments representing diesel and gasoline fuel subsystems. Assets may also contain variation points, i.e. they may be parameterized. For example a diesel fuel subsystem may support weight based or flow based measurement of fuel consumption. We will use these terms throughout the rest of the paper. SPLE is concerned with two major aspects: (a) the technical representation of variability, i.e. representing assets and variation points, and (b) feature modeling, i.e. the representation of dependencies among variation points in terms of conditional expressions. Assets used in SPLE often are source code fragments, and variation points are represented as #IFDEF-like annotations within the source code. No generally applicable equivalent however is available for the case of graphical models. In the rest of the paper we thus focus on the representation of variability in a modeling evironment for testbed automation systems and only briefly touch feature models. We describe commonly used workarounds for combining SPLE and MDE and then present our own approach, which we think is applicable to other domain, too.

2 Problems of Current Approaches Although software product line engineering techniques are already applied to model driven engineering, we consider the current approaches inappropriate for automation system models. The problems stem mainly from the fact that SPLE is implemented as an add-on to existing MDE tools. This section uses examples to describe two conceptual approaches of applying SPLE approaches to MDE, and highlights their shortcomings when it comes to modeling both, software and hardware aspects which is essential for our domain. In testbed models, software aspects are represented in a graphical dataflow model; hardware aspects are represented in an electrical wiring model. 2.1 Using Dedicated Model Elements to Represent Variation Points This approach is based on positive variability where a model comprises the union of all model elements used in any of the testbed’s variants. Decisions made in the feature model are used to configure the model. Feature selection may be done through parameterization of the model in place or by creating a new model from a subset of the existing source model through some model transformation. Modeling a union

118

G. Dauenhauer, T. Aschauer, and W. Pree

model however is not always straightforward as shown in figure 2 a). The semantics of dataflow models for example usually forbids connecting multiple output signals to one input signal, so we cannot simply define both connections and choose between the connections depending on the feature model. We must find workarounds instead. An existing model can be parameterized in place through removing or changing individual model elements, for example by setting the output value of a constant block in a dataflow model to a certain value. Model elements representing variation points may be explicitly marked as fixed, optional, or variable. In case of UML, for example, stereotypes may be used, as in the PLUS approach [4]. Whole parts of a model can be enabled or disabled through model elements representing switches, for example routing output signals of multiple source model elements into one input signal of a target model element as in the Koala approach [5]. Figures 2 b) and c) show how variant specific constant values in a dataflow model can be represented in both ways. As a result of the feature selection, the existing model is altered at its variation points, reflecting the choices made.

Fig. 2. A dataflow model with variation points

Figure 2 d) shows how the final model for the example could look like if value “2” was chosen in the feature model. If the source model was figure 2 b), the constant y would be removed, along with the now unnecessary switch. If the source model was figure 2 b), the unspecified value “C” would simple be replaced by “2”. Although figure 2 b) is a possible solution to the problem of representing variability within the model, it requires additional blocks that do not stem from modeling the domain, but from the particular technical representation of variability. Such additional blocks increase the size of the model and are likely to lead to obfuscated models or “accidental complexity”, as Brooks calls it [6]. The second solution shown in figure 2 c) is less complex, but one still cannot fully understand the model until the choices made in the feature model are clear, i.e. until the value “2” is specified. In the case of dataflow models, however, this approach still is a practically used solution [7]. In contrast to the dataflow model before, we now consider an example specifying a testbed’s electrical wiring. Suppose a testbed supports two operation modes, one requiring a pressure sensor, and a second one requiring temperature sensor. Both sensors would be connected to the same plug of an I/O device. In reality, of course only one sensor may be connected to the I/O device at a time. Similarly to the case of the dataflow model, a modeling environment might prevent multiple connections to a single electrical plug in the model. As a consequence, the model fragment in figure 3 a) could not express the fact that both connections are valid in principle, but not at the same time, similarly to the dataflow example in figure 2 a).

Variability in Automation System Models

119

Fig. 3. An electrical wiring model with variation points

In contrast to the dataflow example before, it is problematic to represent the variation point by introducing an artificial switch component as shown in figure 3 b), since the model would not reflect the structure of the physical testbed anymore. This is particularly important since testbed models are not just used to derive configuration data, but also to document the system’s current state to provide guidance for testbed maintenance. Even if we would not require a hardware model to accurately represent its real world counterpart, modeling an artificial switch still introduces accidental complexity. 2.2 Creating Products by Merging Assets in Multiple Model Fragments An alternative technique for representing positive variability is using multiple assets representing model fragments. One specific testbed model can then be created by merging these fragments into a single model, as illustrated in figure 4.

Fig. 4. Dataflow models created by merging multiple fragments

We use the dataflow example from the previous section; the electrical wiring example can be modeled analogously. Figure 4 a) represents a fragment containing the common functionality where one of the controller’s inputs is not yet connected; figures b) and c) are two additional fragments containing the variant specific elements to be merged in. The complete model shown in figure 4 d) is the result of merging fragment a) and b), while model e) is created from fragment a) and c). In order to be able to merge fragments automatically, e.g. by a dedicated external SPLE tool, models of the fragments must contain some uniquely identifiable shared elements. In our example, fragment b) and c) both contain a PID controller. Since the common model fragment a) also contains an equal controller block, they can be merged unambiguously by partially replacing the definition of the controller in the common fragment. If no such shared element could be identified, merging could not be performed automatically. As a consequence, fragments must be kept in sync, which may not always be straightforward if they are stored in different files, edited by different users. Another consequence of using multiple, technically independent assets is that the “big picture” of the model is lost. Consistency checks for example, e.g. compatibility

120

G. Dauenhauer, T. Aschauer, and W. Pree

between dataflow signals, can be performed only locally for each fragment; the overall consistency can not be checked until they are merged together according to a particular selection in the feature model. The resulting cycle of “feature selection, model merge, and consistency check” is cumbersome. In practice, this situation usually is avoided by encoding such technical requirements into the feature model, leading to complex feature dependencies that are intermixed with business decisions. Maintaining the feature model then becomes difficult and error-prone, since in-depth technical knowledge as well as business-specific knowledge is required.

3 Modeling Language with Built-In Support for Variability In the previous section we used examples to identify problems that arise when using a modeling language that does not provide direct support for representing variability. We now describe how a modeling language could support variability from a user’s perspective. Our approach is related to the modeling approach using multiple model fragments as described in section 2.2. Instead of using technically independent assets, such as different files, we treat them as conceptual entities defined within one single model, in order to avoid the disadvantages of that approach. We start with a model defining the common behavior. We then explicitly define the variation points with additional fragments and define their behavior as an extension of the common model. Picking up the example from figure 2, figure 5 shows how our approach can be applied to define the dataflow model: a) describes the common behavior, b) and c) are each defined by incrementally defining the variations.

Fig. 5. Explicit variations in the dataflow models

Note that the first input port of the PID controller in fragment a) is not yet connected. Incremental fragments b) and c) only define the constant block and the connection to the PID controller in the common model. The PID controller drawn in dashed lines and any other model elements defined by the common fragment are fixed since the model fragment for a variation may only introduce additional behavior. Instead of introducing fragment specific model elements as used in figure 5 that differ only in their parameter values, we explicitly provide a means for representing fragment specific parameter values. Figure 6 shows how our example could be represented more concisely. Fragment a) again is the common behavior, now however it already contains a constant block with a default value explicitly marked as modifiable in variants. Fragments b) and c) only redefine the parameter value. They do not introduce additional model elements anymore.

Variability in Automation System Models

121

Fig. 6. Explicit variation of parameter values

Figure 7 shows a hardware example corresponding to figure 3. Note that no artificial switch component from figure 3 b) is needed anymore, and both wire connections from sensors x and y to the I/O device’s input plug can be represented without contradictions. Again, a) represents the common behavior, where models b) and c) represent the fragments using different sensors, where the common behavior is fixed.

Fig. 7. Explicit variation in the electrical wiring model

The main difference to the merging approach described in section 2.2 is that in our approach the common model and the fragments all share the same model elements. There are no model elements that must be kept in sync in order to merge fragments correctly. In our example, variations 3 and 4 both reference the same single I/O device element in the model. Another major difference is that in our approach, the fragments describing the variations extend the common model fragment and as such more consistency checks can be performed in context. For example, in variations 3 and 4, not only the fact is modeled that sensor x or y is connected to the I/O device, but in addition, the whole signal chain from the sensors to the automation system can be traced and checked for consistency, e.g. for proper encoding of sensor data on bus messages between the I/O device and the automation system. By keeping all fragments within the same model, we however face the disadvantage that individual fragments can not be used outside their anticipated scope. Additionally, model fragments can not be created in a distributed environment without further support of the modeling environment. We think that such restrictions are acceptable, though. 3.1 Variation Points and Feature Modeling Although feature modeling is not the main focus of scope of this paper, we however briefly describe how our modeling approach affects feature modeling. Similarly to the approach in section 2.2, we create a complete testbed model by merging together multiple model fragments. The fragments to merge may be selected manually or they may be defined by choices made in the feature model. Feature models are used to express logical dependencies among variation points in a model. Fragments from figure 4 b) and c) for example cannot be merged, since they contain contradicting definitions for constant values. The same holds true for our fragments 6 b) and c).

122

G. Dauenhauer, T. Aschauer, and W. Pree

While in the conventional approaches for combining SPLE with MDE such dependencies have to be modeled explicitly, in our approach these technical conflicts can already be derived from the model. Table 1 shows the tabular representation of model elements and fragments from our example in figure 5 and 6. Table 1. Variants in tabular form describing their model elements

Model element constant controller sensor x sensor y I/O device automation system …

common 4 9 9 9 …

+ var. 1 2 9

…

+ var. 2 5 9

…

+ var. 3 4 9 9 9 9 …

+ var. 4 4 9 9 9 9 …

Model elements such as constant, controller, I/O device, and automation system are used in the common fragment, but sensor x and sensor y are not. The constant is used also in variation 1 and 2 representing dataflow model fragments with a value of “2” and “5” respectively. It is also implicitly used in variation 3 and 4 with the default value defined in the common fragment. From these definitions, one can automatically derive that variation 1 and 2 cannot be used simultaneously. The controller is used unmodified in variation 1 and 2 and again implicitly used in variation 3 and 4. Note that the connections between model elements representing signal flows or electrical wires are also model elements but are skipped here; they would also be explicitly marked present or missing in each of the fragments. Note that the model with its variation points does not yet define a complete feature model. For example we described “common + variation 1”, as well as “common + variation 3”, but did not make statements about whether these variations can be active simultaneously. But since there are no conflicts visible in the table, these fragments may technically both be used together. We currently do not care whether merging these fragments make sense from e.g. a business perspective though. We think that manually creating a testbed model from fragments like this, i.e. by enabling “columns” from the table is a possible first step to a full-featured product line support in our modeling environment. We thus consider presenting such a tabular view to the users as the means for configuring testbed models; a feature model, however, could be defined on top of this technical basis later on. 3.2 Variation Points of Model Elements So far we have described how a model can be created by enabling model fragments, i.e. by selecting which model elements to use, how to connect them and what their parameter values should be. As motivated in the introduction, model elements may themselves come in multiple variants. Their variants are chosen in the same way as described above. After all, model elements and the testbed model are not different in a technical sense; a testbed model itself is also just a model element that could in principle be used in a model of a factory. As an example, consider a testbed model that

Variability in Automation System Models

123

contains a diesel fuel system model element. This system may come with weight or flow based measurement of fuel consumption and as such defines one variation point.

4 Model and Variability Representation In order to sketch how we implement variability, we first introduce the language’s existing implementation core. Similar to other modeling languages such as the UML [8] with MOF [9] as its core, our modeling language uses a set of core of primitives. Our language, however, is based on a unification of classes and objects known as clabjects [10, 11]. White boxes in figure 8 represent a simplified view on our modeling language core, which is sufficient for our discussion here. enables Variant

enables contains

Clabject

target

Connector

source

requires

contains enables

Field

contains

Fig. 8. Clabject based modeling language core

Clabjects are used to represent model elements that users work with, for example sensors and I/O devices. A clabject may represent either a type or an instance. A clabject can be associated with another clabject by means of a connector, for example an I/O device may contain electrical plugs. A connector also may represent either a type, e.g. specifying cardinality, or an instance, i.e. a link between two clabjects representing instances. A connector may either represent composition of a clabject and its contained clabjects, or it may represent general associations between clabjects. Both, clabject and connector can have fields for representing parameter values, for example the value of the constant in our dataflow example. Again, a field may be either a type specifying e.g. the data type, or an instance specifying a value. Each of the basic elements, clabject, connector, and field, has another relation that allows defining subtype and instantiation relations between elements; these relations are skipped here to reduce clutter. Although the modeling language core does not define semantics of a specific domain, our modeling environment however ensures that the models are consistent, i.e. that for example the connector instances are established between compatible clabject instances and that the model adheres to the multiplicity constraints. The examples in section 3 informally introduced the kinds of variability we support in our modeling language. These are: (1) enabling/disabling connections, for example between the sensors and the I/O device in figure 7, (2) enabling/disabling model elements, for example the different sensors in figure 7, (3) enabling/disabling variant specific field values, for example the constant values in figure 6, and (4) enabling/disabling of variants of contained model elements, for example the fuel consumption measurement. The basic model is thus extended with an explicit notion of one or more variants; one variant is always implicitly defined as the common variant. The clabject representing e.g. a whole testbed thus contains the union of all clabjects, connectors and fields used in any of these variants. The enables relations between a

124

G. Dauenhauer, T. Aschauer, and W. Pree

variant and a subset of the clabjects, connectors and fields now explicitly define which of these parts it requires. Two additional relations are defined for the variant: an enables relation is used to chose between variants of a contained clabject, while the requires relation is used to represent the DAG of variant dependencies. Using this basic mechanism, we can represent the different kinds of variation easily: (1) can be represented straightforwardly by an enables relation between a variant and a connector instance. (2) can be represented in the same way by an enables relation between a variant and a connector instance; note that the containment of clabjects is represented by connectors, too. (3) can be represented by an enables relation between the variant and a specific field value. (4) can be represented by an enables relation between the clabject’s variant and a variant of a contained clabjects.

5 Related Work Voelter and Groher [12] use the terms negative and positive variability to describe how models are constructed. Negative variability, uses a model from which unnecessary features are selectively removed to get a specific model. Positive variability, in contrast, uses a core model to which features are added. SPLE in MDE often is done by configuring models according to the choices made in an external tool. pure::variants Connector is such a tool for MATLAB/Simulink [13]. It imports Simulink blocks as assets into the separate feature modeling tool. For a certain feature selection the corresponding Simulink blocks are added, removed, or their parameters are set, and also signals, i.e. connections between blocks, can be created or deleted. Thus this product supports positive as well as negative variability. BigLever provides an analogous commercial Bridge solution [14] for integrating Telelogic’s Rhapsody [15] UML and SysML modeling tool into their Gears SPLE tool. Elements in a Rhapsody model are turned into variation points, managed by Gears. Thus this commercial product supports negative variability. Creating models from fragments seems to be less well supported. Voelter and Groher [12], for example describe a solution based on positive variability using aspect oriented software development. Among their assets are models that are merged or woven together according to a feature model. Straw et al. [17] show how UML-like class models can be merged, while Herrmann et al. [16] present an algebraic view on model composition.

6 Conclusion In this paper we motivated why a modeling environment for automation systems should support variability of models. We first described why we consider existing SPLE approaches insufficient in this context. We introduced our alternative approach from a user perspective first and briefly outlined how it could be integrated seamlessly with our clabject based modeling language core. We already have implemented the modeling environment and demonstrated its applicability to the domain. We did however not yet implement the variability support. Although we described variability using examples from the engine testbed domain, we expect that our approach can be applied to other sufficiently complex domains as well.

Variability in Automation System Models

125

References 1. OMG Model Driven Architecture, http://www.omg.org/mda 2. The MathWorks MATLAB/Simulink, http://www.mathworks.com/products/simulink 3. Clements, P., Northrop, L., Northrop, L.M.: Software Product Lines: Practices and Patterns. Addison-Wesley Professional, Reading (2001) 4. Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to PatternBased Software Architectures. Addison Wesley, Redwood City (2004) 5. van Ommering, R., van der Linden, F., Kramer, J., Magee, J.: The Koala Component Model for Consumer Electronics Software. Computer 33(3) (2000) 6. Brooks, F.P.: No Silver Bullet Essence and Accidents of Software Engineering. In: Computer, vol. 20/4, pp. 10–19. IEEE Computer Society Press, Los Alamitos (1987) 7. Dziobek, C., Loew, J., Przystas, W., Weiland, J.: Von Vielfalt und Variabilität – Handhabung von Funktionsvarianten in Simulink-Modellen. In: Elektronik Automotive, vol. 2, pp. 33–37. WEKA Fachmedien GmbH (2008) 8. Object Management Group: Unified Modeling Language Superstructure, v 2.1.2 (2007) 9. Object Management Group: Meta Object Facility, http://www.omg.org/mof 10. Atkinson, C., Kühne, T.: The Essence of Multilevel Metamodeling. In: Proceedings of the 4th International Conference on the Unified Modeling Language, Modeling Languages, Concepts, and Tools, pp. 19–33. Springer, Heidelberg (2001) 11. Aschauer, T., Dauenhauer, G., Pree, W.: Multi-Level Modeling for Industrial Automation Systems. 35th Euromicro Conference on Software Engineering and Advanced Applications (to appear, 2009) 12. Voelter, M., Groher, I.: Product Line Implementation using Aspect-Oriented and ModelDriven Software Development. In: Proceedings of the SPLC 2007, pp. 233–242. IEEE Computer Society, Los Alamitos (2007) 13. pure systems: pure:variants Connector for MATLAB®/Simulink® (2009), http://www.pure-systems.com 14. BigLever Telelogic Rhapsody® GearsTM Bridge (2009), http://www.biglever.com/extras/Rhapsody_Gears_Data_Sheet.pdf 15. Telelogic Rhapsody® (2009), http://modeling.telelogic.com/products/rhapsody 16. Herrmann, C.A., Krahn, H., Rumpe, B., Schindler, M., Völkel, S.: An algebraic view on the semantics of model composition. In: Akehurst, D.H., Vogel, R., Paige, R.F. (eds.) ECMDA-FA. LNCS, vol. 4530, pp. 99–113. Springer, Heidelberg (2007) 17. Straw, G., Georg, G., Song, E., Ghosh, S., France, R., Bieman, J.: Model Composition Directives. In: Baar, T., Strohmeier, A., Moreira, A., Mellor, S.J. (eds.) UML 2004. LNCS, vol. 3273, pp. 87–94. Springer, Heidelberg (2004)

A Case Study of Variation Mechanism in an Industrial Product Line Pengfei Ye1, Xin Peng1, Yinxing Xue2, and Stan Jarzabek2 1

School of Computer Science, Fudan Univeristy, Shanghai, China {072021110,pengxin}@fudan.edu.cn 2 School of Computing, National Univeristy of Singapore, Singapore {yinxing,stan}@comp.nus.edu.sg

Abstract. Fudan Wingsoft Ltd. developed a product Line of Wingsoft Financial Management Systems (WFMS-PL) providing web-based financial services for employees and students at universities in China. The company used a wide range of variation mechanisms such as conditional compilation and configuration files to manage WFMS variant features. We studied this existing product line and found that most variant features had fine-grained impact on product line components. Our study also showed that different variation mechanisms had different, often complementary, strengths and weaknesses, and their choice should be mainly driven by the granularity and scope of feature impact on product line components. We hope our report will help companies evaluate and select variation mechanisms when moving towards the product line approach.

1 Introduction The goal of this paper is to evaluate strengths and weaknesses of variation mechanisms used in the existing Wingsoft Financial Management System 1 Product Line (WFMS-PL), developed by Fudan Wingsoft Ltd., a small software company in China. We took the following steps in this study. We first analyzed WFMS-PL variant features [7] and presented them as a feature diagram [10]. Then, we studied variation mechanisms in WFMS, namely Java conditional compilation2, commenting out feature code, design patterns [4], parameter configuration files, and a build tool Ant3. Finally, we analyzed how the granularity and scope of features impact on WFMS components affects the effectiveness of variation mechanisms. We distinguish two types of features according to the granularity of their impact, namely fine-grained features affecting many system components, at many variation points, and coarse-grained features whose code is usually contained in files that are included into a custom product that needs such features. Mixed-grained features involve both fine- and coarse-grained impact. Most of the WFMS features were 1

WFMS for Shanghai Jiaotong University: http://www.jdcw.sjtu.edu.cn/wingsoft/index.jsp Java does not formally have conditional compilation, but you can implement the similarfunction: http://c2.com/cgi/wiki?ConditionalCompilationInJava 3 http://ant.apache.org/ 2

S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 126–136, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Case Study of Variation Mechanism in an Industrial Product Line

127

fine-grained features, managed with conditional compilation and/or manually commenting out the feature code. Our study shows that different variation mechanisms have different, often complementary, strengths and weaknesses, and their choice should be mainly driven by the granularity and scope of feature impact on product line components. Fundamental differences in capabilities of variation mechanisms justify the use of multiple variation mechanisms. For example, Ant is strong in configuring coarse-grained features, but weak in configuring fine-grained features. Parameter configuration files define environmental variables and variant feature options, but require yet other mechanisms to perform the actual customizations in product line components. Design patterns reduce the coupling in code, making it easier to add, remove or change a variant feature. However, we found only few opportunities to apply design patterns in WFMS. Overloading fields in order to use the same field for different purposes usually helps only in configuring database schema. Conditional compilation is used as the main mechanism to control fine-grained variant features in Java source code, while commenting out feature code is heavily used in HTML and JSP files. In some situations, we suggest possible remedies to weaknesses of variation mechanisms used in WFMS-PL. Particularly, multiple variation mechanisms must be used to manage each of the mixed-grained features. Our study reveals that while it is natural to match feature granularity with the proper variation mechanism, over time the inter-play between multiple variation mechanisms may be difficult to comprehend. Variation mechanisms used in WFMS-PL are simple, freely available and commonly used in Software Product Lines (SPL) to complement component/architecturebased approaches. As yet we do not have enough material to compare them with more advanced SPL approaches, such as GEARS4, Pure5 or XVCL [6], with possibly better results. We are going to conduct experiments to facilitate such comparison. In the past years, there were some case studies on variation mechanism. These studies, however, usually focused on variation implementation with certain techniques like AspectJ [13], FOP [1] or XVCL [6]. The industrial case study presented in [17] aims at architecture-based variability realization in large companies. In this paper, we analyze a real product line using a mixed set of light-weight variability mechanisms in a small company. We believe Wingsoft choice of variation mechanisms was typical, and other small companies may find our experiences reported in this paper useful when moving towards the product line approach.

2 An Overview of WFMS WFMS was developed in 2003 and evolved to an SPL with more than 100 customers today, including major universities in China such as Fudan University, Shanghai Jiaotong University, Zhejiang University. During its evolution, Wingsoft set up product architecture and was adopting variation mechanisms such as Java conditional compilation, Ant, parameter configuration files, and design patterns to manage product variability. 4 5

GEARS: http://www.biglever.com/ Pure::Variants: http://www.pure-systems.com/

128

P. Ye et al.

The core assets of the WFMS-PL were designed and implemented by few domain engineers. Domain engineers sometimes also played the role of an application engineers responsible for initial, program-level customization of core assets for a custom product. Service engineers, familiar with financial business but with little or no programming knowledge, did final customer-side customizations and deployment, using readable parameter configuration files. Usually, application engineers only provided in-office application-specific implementations, and responded to requests of service engineers. Domain engineers maintained WFMS products for many customers delegating routine work to service engineers. WMFS consists of four subsystems, namely Financial Management Subsystem (FMS), Salary Management Subsystem (SMS), Reward Management Subsystem (RMS), and Tuition Management Subsystem (TMS). We selected the TMS for our case study, as it involved types of variability and variation mechanisms that were representative of the whole WFMS. TMS is a web-based portal for students to pay online their tuition fee, with functions such as login, fee browsing, online payment, payment detail generation and bank settlement. The code of TMS is 25% of the whole WFMS system, comprising 58 Java source files, 99 JSP web pages, and several configuration files. A WFMS feature diagram is shown in Fig. 1. The minimum and maximum choices of OR-features are shown as numbers surrounded by square brackets. 80% of the 32 variant features can be selected for custom TMS. However, there are also some feature interactions. For example, the selection of InitPayMode depends on the number of selected variant features under FeeItemSelection and the selection of Settlement depends on whether selected banks require settlement. TMS features include fine-, coarse, and mixed-grained features.

Fig. 1. The feature diagram of TMS

3 Variation Mechanism in TMS 3.1 Review of Variation Mechanism in TMS Five variation mechanisms shown in Table 1 were used to manage variant features in TMS. In the table, column “# Features” indicates the number of features whose

A Case Study of Variation Mechanism in an Industrial Product Line

129

customizations involved a given technique. Ant, conditional compilation and commenting out variant feature code were most commonly used. Java conditional compilation and commenting out code: In Java, conditional compilation is realized with final-boolean variables. If a final-boolean variable’s value is false, then the code in the statements under if is not compiled into the generated bytecode file. The effect is similar to #define and #ifdef C/C++ preprocessor directives [15]. Fig. 2 illustrates the usage of final-boolean variables to manage variant features in TMS class FeatureConfiguration. The limitation of Java’s conditional compilation is that it can only be used in inner-method statements. It cannot handle the inclusion or exclusion of class methods or attributes. In WFMS, such cases were handled by manually commenting out the code that was not required in a given product variant. Commenting out was also used in non-Java files such as JSP files or SQL script. The main reason for such practice was that the engineers at Fudan Wingsoft could not find flexible tools to manage variability in these files at that time. Table 1. Feature numbers for variability techniques used in TMP # Techniques Conditional compilation & comment Ant Overloading fields configuration items Design Pattern & reflection

1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14

# Features 31 19 13 12 3

public class FeatureConfiguration { // Configuration items public static final boolean DelegationLock = true; public static final boolean OperationLock = true; } public class FeeInfo { ... public void initInfo(FeeUser user, boolean isPaidFeeInfo) throws Exception { //get each year’s fee items for( int i=0; i < yearTemp.size(); i++ ) { If ( FeatureConfiguration.DelegationLock && FeatureConfiguration.OperationLock ) // Code when both features are selected else if ( FeatureConfiguration.DelegationLock ) // Code when delegationLock is selected else if ( FeatureConfiguration.OperationLock ) // Code when operationLock is selected } } }

Fig. 2. Managing variant features with Java’s final-boolean mechanism

Design patterns and reflection: [14] has described the use of the abstract factory pattern in SPL. It also extended this concept using the dynamic abstract factory pattern, in which concrete factories were adapted to support new concrete products at run-time by adding Register and UnRegister operations to Abstract Factory for each

130

P. Ye et al. 1 2 3 4 5 6 7 8 9 10 11 12 13

public class FeeOrder { private Initializer initializer; public init(FeeUser user, FeeInfo info, HttpServletRequest request) { Class c; try{ c = Class.forName( user.getPayMode() ); initializer = (Initializer) c.newInstance(); initializer.init ( . . .); } catch(Exception e ) { e.printStackTrace(); } } }

Fig. 3. Reflection used in strategy pattern

abstract product. Although the most frequently used design patterns in TMS were AbstractFactory with FactoryMethod and Strategy, reflection mechanism, instead of operations returning name of product, was also used to dynamically instantiate proper concrete instances according to configuration options so that the specific class names could be abstracted from the source code. Then Ant could be used to control the inclusion and exclusion of a strategy subclass. Fig. 3 shows the use of Strategy Pattern in TMS. Overloaded Fields: Variant features affect TMS database schema. Overloading table fields helps to contain some of those impacts. For example, a table may have several fields named spec_1, spec_2 ... spec_n, and the same filed may be used to store bank card number in one product variant and ID card number in another one. There were also tables and fields that make sense for some product variants, but are useless for others. With overloading fields, all the products could share the same DB schema, but still support different data structures required for variant features. Overloading fields was adopted for WFMS-PL database. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

<project name="webfee" basedir="." default="main"> <project>

Fig. 4. Using Ant to include optional features

Ant: An important class of configuration parameters was managed by Ant. [3] used Ant and configuration files to differentiate variability in process and variability in product. Lower layers always override the build.xml file of upper layers, by which only the build.xml of the leaf layer will take effect. By reading layer orders in configuration files, the leaf layer can be determined. In WFMS, Ant was useful as a

A Case Study of Variation Mechanism in an Industrial Product Line 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

131

<webFee> <paymode>PayByItem ICBC CCB CMB http: //mybank.icbc.com.cn/servlet/co... C: /apache-tomcat-5.5.25/webapps/… 12345678 <merchantid>440220500001 true

Fig. 5. Using configurations files

variation mechanism for coarse-grained variant features as Ant can be used to control the inclusion/exclusion of not only java source files but also JSP files and security certificates. For instance, optional feature DownloadPaymentDetail of TMS was managed by Ant as shown in Fig. 4. This feature was implemented by a Java class and a JSP file. The inclusion of this feature in a customized product was implemented by moving the relevant files from the path of core asset to the path of javac command in the Ant configuration file. Parameter Configuration Files: In TMS, self-defined configuration files were also employed as a variation mechanism, as shown in Fig. 5. Configuration files contained both data and control parameters. Data parameters, such as URLs of banking services and key path, were also widely used in single products. Control parameters feature to be selected for a custom product, while other variation mechanisms were used to perform the actual selection. For example, the parameter paymode in Fig. 5, indicating the right sub-class to be initialized, worked together with the reflection and the strategy pattern shown in Fig. 3 (see the underlined part). Another example of parameters working with other variation mechanisms is the parameter DownloadDetail in Fig. 5. A simple tool was implemented to read this parameter and generate Ant script shown in Fig. 4 if the value is true. 3.2 Summary of Variation Mechanism in TMS Fig. 6 shows which variant features were managed using which variation mechanisms. We do not show overloading fields which was used for variants in database table schema only and did not overlap with other mechanisms. More than 80% features (26 among 32) were managed by more than one variation mechanism: 13 features were managed by three mechanisms and three features by four mechanisms. Design patterns were always used together with other mechanisms. Another interesting observation is that almost all features involved the use of conditional compilation and/or commenting out feature code, as in WFMS, like in many other SPLs, we saw many fine-grained features.

132

P. Ye et al.

Fig. 6. Variation mechanisms per feature

For mixed-grained features, a combination of several variation mechanisms was usually used. For example, when we include a source files for a selected variant feature with Ant, we still need conditional compilation to configure corresponding caller in the base code. Table 2 summarizes the usage, scope, merits and drawbacks of various variation mechanisms used for WFMS-PL variant features. It is interesting to compare traditional variation mechanisms, such as described in this paper, and more advanced ones such as XVCL [6], AspectJ [13] or FOP [1]. It also will be the next step of our work.

4 Evaluation of the WFMS-PL and Possible Improvements Projecting experiences from TMS study, we now evaluate WFMS-PL variability management strategies from the perspectives of feature granularity, ease of applying, readability and managing the consistency between WFMS core assets and custom products. On overall, we found that the current WFMS-PL strategies for variability management properly matched the feature granularity. The newcomers to the team could easily understand when and how to apply them. However, over time when the impact of many features was accumulating, the readability of the WFMS-PL has suffered, and it was becoming difficult to trace features to code and manage features consistently. The detailed reasons and suggested remedies are given in the following paragraphs.

A Case Study of Variation Mechanism in an Industrial Product Line

133

Table 2. Summary of variation mechanism in WFMS-PL Variation Mechanism

Usage

Conditional compilation & Comments

Using Final-Boolean mechanism in Java and natural language comments

Design Pattern

To gain good modularization in OO source code Used together with Java Reflection and Ant

Overloading Fields

Scope Final-Boolean method is only used on innermethod statements. Comments can be adapted to all places Class or method level Best to gain class level flexibility in OOP part of a product line.

Making all the customized products share the same Database table schema attribute in database

Merits

Drawbacks

Easy to learn and use

All maintain works and configurations are manual. Sub-paths explode as the number of tangled features increases.

Providing Elegant code, High readability, Good Extendibility.

Scope of application is narrow and always need the aids of other techniques.

Avoiding troubles to change the name of attributes

Hard to maintain, easy to cause confusion about the meaning of the fields

Configuration File

Implementing the configuration of various parameters according to the variant feature selection and environmental change

Giving parameters of the Good mechanism to variant feature selections do feature configuraand the environment tion

Ant

Conditionally compiling java source files and building deployments

System level customization Dealing with all kinds of files

Powerful and popular build tool, flexible to deliver product variants

It needs to cooperate with other methods and introduces the nontraceability issue. Many inter-dependent configuration parameters

Only file-level variants

Feature granularity: Feature granularity is a critical factor that guides selection of variation mechanisms as feature characteristics must be matched by the capabilities of a variation mechanism(s) used to manage a given feature. In WFMS-PL, fine-grained features were managed by conditional compilation in Java code, and with commenting out code section in other WFMS artifacts. Ant was used to manage coarse-grained features at the level of package or class inclusion/exclusion level. Design pattern played the role of a class or method extension mechanism. In Table 3, we show the number of WFMS variation points for each feature impact granularity level. Finegrained impacts of features required small changes in Java expressions, statements, method signatures, comments (in Java code, JSP or HTML), database table scripts, and parameter configuration files. Medium-grained impacts required changes of Java methods or attributes, changes of database table scripts, and changes of configuration items in parameter configuration files. Coarse-grained impacts required inclusion or exclusion of product-specific source files. Fine-grained features trigger most of the problems. Conditional compilation and commenting out feature code was used to manage fine-grained impacts. A big problem is how to trace variant features down to the many variation points relevant to them. This problem aggravates when multiple variation mechanisms are used to manage a given feature. Some feature enhancements involve changes at many variation points that must be properly coordinated. WFMS engineers often encountered the problems of inconsistent product release, e.g., a product variant was deployed with incorrect database schema. Fine-grained and coarse-grained features were most common. We try to analyze the reasons as follows: Coarse-grained impacts are easy to configure. Whenever possible, domain engineers tried to contain the variant feature code in separate files which could then be included into custom products that required those features. Wizards could be implemented to allow application/service engineers to easily include such features into

134

P. Ye et al.

custom products. Fine-grained impacts are attributed to the variability of the business flow or logic of the application itself and of the programming languages. However, fine-grained features made it more difficult to consistently manage the overall product configuration. They often affected coarse-grained features and the new variation points had to be injected into the source code of coarse-grained features. Table 3. The number of variation points per impact granularity level Granularity

#Java

#JSP

#Conf. File

#DB Schema

Finest

14

0

0

0

#Total 14

Fine

67

43

3

3

116

Medium

18

0

7

5

30

Coarse

40

57

9

0

106

#Total

139

100

19

8

266

Ease of application: Customizations of core assets by configuring parameters and database schemas could be managed by service engineers who were in charge of deployment of a customized product on the customer site. Service engineers were familiar with general financial domain, user requirements, and basic deployment operations, but did not know much about programming and internals of the WFMSPL. Wizard-supported parameter configuration files provided an easy-to-use configuration capability for service engineers. Ease of application without involvement of any unconventional or proprietary techniques was the most important reason for Fudan Wingsoft to adopt simple and commonly available variation mechanisms for WFMS-PL. This reduced the learning curve and the staff training cost, important factors for any small or middle-sized company. Unless current mechanisms were found totally ineffective, Wingsoft would be chary of adopting new ones. WFMS-PL was constructed in lightweight, reactive way, which was in line with the company’s benefits so far. Readability: Design patterns and Ant did not hinder readability, but conditional compilation, commenting our feature code and overloading fields made code difficult to understand for applications engineers and even domain engineers. In our project, 30% of code in class FeeOrder, 20% of code in FeeInfo and 35% of code in FeeUser was managed by Java conditional compilation. Given that there are no other techniques to manage fine-grained features, this problem is very hard to solve. If we keep the code of variant feature code embedded in the base code, the code is bound to become hard to read. One can consider Aspect-Oriented Programming (AOP) [13] or Feature-Oriented Programming (FOP) [1] to separate features from the base code, but these approaches pose new problems as demonstrated in [11] and [12]. To improve the readability of the code, a promising approach is to resort to visualization tool’s support such as CIDE [12] or [8]. Traceability and extensibility: Traceability between features and their respective variation points has to do with both feature reuse and evolution. Here are some examples of problems:

A Case Study of Variation Mechanism in an Industrial Product Line

135

•

Each feature may be addressed at many variation points scattered through many SPL core components. To reuse or modify the feature we must find and analyze code at all these points. • One SPL core component is usually affected by many features that may be managed by different possibly overlapping variation mechanisms. To reuse or modify the feature we must understand interactions among these mechanisms. Variation mechanisms described in this paper provide a workable but not perfect solution for traceability problems. Table 4 shows features that involved several variation mechanisms, with their respective variation points spread across different WFMS-PL core components. How to manage these variation points consistently was the issue of traceability. The difficulty in traceability also brought in the problem of product extensibility at those variation points. Table 4. The number of variation points in example features Variant Feature

Preprocessing

Conf. Files

Ant

Total

WebService-Payment

6

2

2

10

ABC

2

1

3

6

CCB

1

1

2

4

CMB

2

1

2

5

ICBC

1

2

3

6

Yet another traceability problem that has to do with two-way propagation of changes between SPL core components and customized product variants [15]. This problem often hinders reuse, and we believe it is difficult to address it in the frame of variation mechanisms described in this paper. A meta-level representation of the SPL core components paves the way for more effective solutions to these problems [9]. They can capture and manage synchronously the overall impact of features on SPL core components [5].

5 Conclusion In this paper, we evaluated strengths and weaknesses of variation mechanisms used in Wingsoft Financial Management System Product Line (WFMS-PL), developed by Fudan Wingsoft Ltd. Feature characteristics must be matched by the capabilities of variation mechanism(s) used to manage a given feature. Fundamental differences in capabilities of variation mechanisms justify the use of multiple variation mechanisms. Our study confirmed that different variation mechanisms have different, often complementary, strengths and weaknesses. Their choice should be mainly driven by the granularity and scope of feature impact on product line components. In some situations, we suggested possible remedies to weaknesses of variation mechanisms used in WFMS-PL. Our study revealed that while it was natural to match feature granularity with the proper variation mechanism, over time the inter-play between multiple variation mechanisms may become difficult to comprehend. Variation mechanisms used in WFMS-PL are simple, practical, commonly used in SPLs to complement component/architecture-based approaches. We hope our report

136

P. Ye et al.

will help companies to make more informed decisions when moving towards the product line approach. In follow-up study, we compare the original WFMS-PL with a representation built with XVCL [6] as a variation mechanism, and plan to extend our study to other SPL approaches. Acknowledgement. This work was supported by Fudan University grant (the National Natural Science Foundation of China under Grant No. 60703092, the National High Technology Development 863 Program of China under Grant No. 2007AA01Z125, and Shanghai Leading Academic Discipline Project under Grant No. B114) and National University of Singapore grant R-252-000-336-112.

References 1. Batory, D., Sarvela, J.N., Rauschmayer, A.: Scaling Step-Wise Refinement. IEEE Transactions on Software Engineering 30(6) (2004) 2. Bosch, J., Florijn, G., Greefhorst, D., Kuusela, J., Obbink, H., Pohl, K.: Variability Issues in Software Product Lines. In: van der Linden, F.J. (ed.) PFE 2002. LNCS, vol. 2290, p. 13. Springer, Heidelberg (2002) 3. Díaz, Ó., Trujillo, S., Anfurrutia, F.I.: Supporting Production Strategies as Refinements of the Production Process. In: Obbink, H., Pohl, K. (eds.) SPLC 2005. LNCS, vol. 3714, pp. 210–221. Springer, Heidelberg (2005) 4. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns. Addison-Wesley Professional, Reading (1995) 5. Jarzabek, S.: Effective Software Maintenance and Evolution: Reuse-based Approach. CRC Press Taylor & Francis, Boca Raton (2007) 6. Jarzabek, S., Bassett, P., Zhang, H., Zhang, W.: XVCL: XML-based variant configuration language. In: ICSE 2003 (2003) 7. Jarzabek, S., Ong, W.C., Zhang, H.: Handling Variant Requirements in Domain Modeling. Journal of Systems and Software 68(3) (2003) 8. Jarzabek, S., Zhang, H., Lee, Y.P., Xue, Y., Shaikh, N.: Increasing Usability of Preprocessing for Feature Management in Product Lines with Queries. Accepted for ICSE 2009 poster 9. Jirapanthong, W., Zisman, A.: XTraQue: Traceability for Product Line Systems. Software and System Modeling 8(1) (2009) 10. Kang, K.C., Lee, J., Donohoe, P.: Feature-oriented Product Line Engineering. IEEE Software 19(4) (2002) 11. Kästner, C., Apel, S., Batory, D.: A Case Study Implementing Features Using AspectJ. In: SPLC 2007 (2007) 12. Kästner, C., Apel, S., Kuhlemann, M.: Granularity in Software Product Lines. In: ICSE 2008 (2008) 13. Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C.V., Loingtier, J.-M., Irwin, J.: Aspect-Oriented Programming. In: Aksit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997) 14. Linden, F., Schmid, K., Rommes, E.: Software Product Lines in Action: The Best Industrial Practice in Product Line Engineering. Springer, Heidelberg (2007) 15. Pohl, K., Böckle, G., Linden, F.J.: Software Product Line Engineering: Foundations, Principles and Techniques. Springer, Heidelberg (2005) 16. Spencer, H., Collyer, G.: #ifdef Considered Harmful, or Portability Experience with C News. In: Summer 1992 USENIX Conference (1992) 17. Svahnberg, M., Gurp, J., Bosch, J.: A Taxonomy of Variability Realization Techniques. Software: Practice and Experience 35(8) (2005)

Experience Report on Using a Domain Model-Based Extractive Approach to Software Product Line Asset Development* Hyesun Lee1, Hyunsik Choi1, Kyo C. Kang1, Dohyung Kim2, and Zino Lee2 1

Computer Science and Engineering Department Pohang University of Science and Technology (POSTECH), Pohang Korea {compial,nllbut,kck}@postech.ac.kr 2 Alticast Corp. Hana Capital Bldg., Seocho-dong, Seocho-gui, Seoul, Korea {dynaxis,zino}@alticast.com

Abstract. When we attempted to introduce an extractive approach to a company, we were faced with a challenging project situation where legacy applications did not have many commonalities among their implementations as they were developed independently by different teams without sharing a common code base. Although there were not many structural similarities, we expected to find similarities if we view them from the domain model perspective as they were in the same domain and were developed with the object-oriented paradigm. Therefore, we decided to place the domain model at the center of extraction and reengineering, thus developing a domain model-based extractive method. The method has been successfully applied to introduce software product line to a set-top box manufacturing company. Keywords: product line engineering, component extraction, domain model, feature model.

1 Introduction The extractive approach to product line engineering (PLE) capitalizes on existing systems to initiate a product line [1]. It can provide an efficient way for an organization to transform from traditional development to product line-based development without a large upfront investment. There have been few research efforts reported on the extractive approach, however. There are tools [2], [3] on the market, but their technical details are not known through publication. There are publications on reengineering and refactoring [4], [5] but most of these are not in the context of PLE. The extractive approach that we introduced to Alticast, Corp. presented us interesting research challenges. Since its founding in 1999, Alticast, Corp. has been marketing several types of digital set-top boxes1. Because of the diversity of markets and *

This research was supported by Korea SW Industry Promotion Agency (KIPA) under the program of Software Engineering Technologies Development and Expects Education. 1 A set-top box is a device that is connected to a television and digital network to restore the original signals from a video server to a displayable hardware, i.e., TV. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 137–149, 2009. © Springer-Verlag Berlin Heidelberg 2009

138

H. Lee et al.

different types of set-top box products, development and maintenance of software systems has been extremely costly for the company. To address these problems, we introduced PLE to the company targeting on the set-top box domain. Alticast, Corp. already had a number of legacy set-top box products that have been sold in the marketplace. Therefore, for an efficient transition to product line-based development, we adopted an extractive approach. We targeted our project to the middleware and Electronic Program Guide (EPG) subsystems of the product line. EPG was a quite interesting product line as it provided us difficult research challenges. Due to the time-to-market pressure, frequent releases with new features, and the diversity of markets, the development of EPG products was done in parallel by separate teams without code cloning or sharing common assets even though the products shared a large number of common features. When we analyzed the code, each application had its own naming and design styles that were quite different from others, and use of tools to compare similarities was difficult. It was difficult to extract and reuse code without a considerable amount of laborious manual efforts. To address this problem, we developed a domain modelbased component extraction approach to PLE. Section 2 introduces the underlying concepts of our method. Overall processes and detail steps of our method are discussed in section 3. As an illustrative example, the EPG product line is used. Section 4 explains the application engineering process to validate the developed assets, and section 5 introduces the related work and compares them with our method. Section 6 concludes this paper with a summary of this project and future works.

2 Rationales behind Our Approach In this section, we discuss the rationales behind our method. We first explain the problem and technical challenges we had, and then explain our approach to address those challenges. We attempted to introduce a product line to Alticast, Corp. by extracting and reusing reusable components from three legacy products. When we analyzed the legacy products, however, we were immediately faced with a very difficult problem: The products had been developed independently by different development teams, each having its own naming convention and design structure. On the surface, it was difficult to see commonalties, and finding reusable components from the legacy systems was a very difficult challenge. Despite the problem we had, we expected to see some common design elements as all products share largely the same functionality. To compare and evaluate the functional commonalities, we needed to go beyond what we could find out on the surface through syntactic name comparison. We needed to find the functions each class, method or component implemented, and then compare these across different design models. In order to perform this systematically, we: (1) defined a domain model (domain object model) of the product domain that, we believed, represented the domain most appropriately, and (2) compared each design model against the domain model (not against each other).

Experience Report on Using a Domain Model-Based Extractive Approach

139

This domain model-based extractive approach had the following advantages: y Although each design model has partition of components and classes that are different from others, we can compare them through the intermediation of the domain model, and find a design model that most closely embodies domain concepts. y In addition, the number of comparisons we have to make can be reduced substantially, from n(n - 1)/2 to n comparisons for n products. After comparing and analyzing legacy products, we need to select a reusable architecture(s) and components from the products, and then reengineer them if necessary for the product line. As applications of the product line share the product line architecture and components, we must design them based on the domain concepts that are common across all products. Therefore, we decided to extract design elements (classes, components) that are most close to the domain model. To do this, we evaluated design models in terms of coverage and conformance. y Coverage measures the extent to which domain concepts (represented as domain entities and associations between entities in the domain model) are covered by a design model or a component. y Conformance measures the degree to which design elements (classes or components) of a design model realize closely related domain concepts. Each design element should implement a cohesive set of domain concepts. Details of these metrics are discussed in section 3.3. Based on these ideas, the method processes were defined, which are outlined and explained in the following section.

3 Engineering Process We define a method consisting of four engineering processes: recovering design models from existing products; analyzing the domain and creating a product line domain model; selecting reuse candidates that are “most appropriate” (discussed in section 3.3.) for reuse for the product line; and reengineering those reusable candidates to create the product line asset components (Fig. 1). Details of the processes and artifacts from each of these processes are discussed in the following subsections.

Fig. 1. Engineering process

140

H. Lee et al.

3.1 Design Recovery We want to reuse legacy software as much as economically viable in the development of product line assets. For this to happen, we need to recover and/or create design models, and collect necessary information for asset development. In this process we recover architectures, components, and class models from the legacy applications. We used a commercial UML CASE tool, Enterprise Architecture [6], to recover class models and components (as packages). Architectures were recovered manually. 3.2 Domain Analysis In this process we decide the product line domain boundary that covers all instances of legacy applications we have and possible future products in the product plan, and then create a feature model [7] and a product line domain model. For the product line, we need to create asset components with embedded variation points that can be instantiated for each product in the product line with appropriate variants. Each product of a product line may have a different coverage of the domain. To support domain model-based development of product line assets, we create a domain model that covers all products in the product line but with embedded variation points and variants for adaptation to specific products. We refer to this domain model as the product line domain model (PLDM). The personal video recorder (PVR) feature in Fig. 2 records broadcasting to the hard disk that we can play back, and Tuner is an operating environment feature and can be either Single Tuner or Multi Tuner. In Fig. 3, “<<●PVR>> Recorded Item” indicates that Recorded Item is related entirely to the optional feature PVR; “<< Multi Tuner>> quantity” represents that the attribute ‘quantity’ is related to the alternative feature Multi Tuner, and if we select or unselect Multi Tuner, the value of ‘quantity’ changes.

△

Fig. 2. Feature model of the EPG product line

With domain experts, we create a domain model by capturing entities in the domain boundary, as typically done in object-oriented development; create a feature model (Fig. 2) by capturing commonalities and variabilities among existing and planned products; and refine the domain model by embedding variation points and variants to create a PLDM (Fig. 3) using the feature model. Creation of these models is an iterative incremental process continuously comparing one model with the other,

Experience Report on Using a Domain Model-Based Extractive Approach 1 Watc hes

User

1 <<●PVR>> Rec ords

1

1..*

<>viewType

<<●PVR>> Plays 1

0..*

0..1

1

Plays-reserved- program- by 1

0..*

<<●PVR>> Recorded Item

Reserves

Window

141

programID recordedTime recordedData

0..1 1 <<●PVR, ●Rec ordingReservation>> Reserved- rec ords 0..*

Reservation <>reservationType 1 programID reservedTime 1 1..* Chec ks- for-use Requires 1

Tuner <<

Multi Tuner>>quantity

1<<●PVR>> Uses- for-rec ord

Fig. 3. PLDM of the EPG domain

especially focusing on maintaining consistency of variabilities between the two models. Availability of domain and marketing experts is most critical in this process. In this project, we spent about a month to create the feature model and PLDM of EPG domain with domain and marketing experts. The PLDM is used in the next section to evaluate design models of each legacy application of the product line. 3.3 Design Evaluation and Selection In this process we compare and evaluate design models with the PLDM and identify reuse candidates. It consists of three activities: model comparison, design evaluation, and reuse candidates selection. 1) Model Comparison: In this activity, we compare each design model with PLDM and construct a mapping table. As the capability of most tools for comparing models is limited to the syntactic level, we have to perform both automatic and manual comparisons. We employed the following strategies in the comparison: y Perform a structural comparison between a design model and PLDM. We want to find out how close each design structure is to that of PLDM. We believe that the closer the structure of a design model is to that of PLDM, it is more likely that the design model closely embodies the domain concepts and is more reusable than otherwise. y Compare attributes of classes of each design model with those of PLDM. We want to understand functional cohesiveness of the components and classes of each design. Again, we believe a design model that is more cohesive than others is more likely to be reusable as closely related concepts are “packaged” together. Entities, attributes, and associations of PLDM are compared with the corresponding components, classes, attributes, and methods of each design model. We compared PLDM with the three design models of the EPG products (Table 1). For example, an attribute reservationType defined for the Reservation entity of PLDM is defined as

142

H. Lee et al.

reserveType in the design model 1 (Design1) but there were no corresponding attributes in Design2 and Design3. These results are used as important information for evaluating and selecting reuse candidates from the legacy software in the activities that follow. 2) Design Evaluation: Reuse candidates are reengineered (if necessary) and used in the product line construction, thus identifying proper reuse candidates is important. In this activity, we evaluate design models to find the reuse candidates. We evaluate design models at three levels: conceptual, architecture, and implementation level. A. Conceptual Level At the conceptual level, as discussed in section 2, we evaluate the coverage and conformance of a design model to measure the conceptual affinity between the domain model and the design model. Table 1. PLDM-design model mapping table Product Line Domain Model Entity

Design1

Attribute Component

Association

Component

Design2

Design3

Class

Attribute /method

Component

Class

Attribute Component /method

Class

Attribute /method

Class

Attribute /method

Component

Class

Attribute/ Component method

Class

Attribute /method

A.1) Coverage: We measure the coverage of each design model and component using the comparison results (Table 1) and counting the components, classes, and associations of a design model that correspond to entities and associations of the PLDM. We first identify the set of covered domain concepts CC, and the set of covered domain attributes CA for each design model or a component. For example, CC(Design1) = {Reservation, RecordedItem, Tuner, Purchase(User-Program)}; CA(Design1) = {reservationType, programID(in Reservation), programID(in RecordedItem), quantity}; CC(Design2) = {Reservation, Purchase(User-Program)}; and CA(Design2) = {programID} (Table 1). The algorithm used is as follows: Let T be a mapping table; DC be a set of domain concepts (domain entities and associations) of a domain model; dci be a domain concept in DC; A(dci) be a set of attributes of dci ( if dci is an association); and X be a design model or a component to be evaluated. For a conceptual element (i.e., an entity, association, or attribute), y, of a domain model, the correspondence of y to X is given by

. The set of domain concepts covered by X, CC(X), and the set of domain attributes covered by X, CA(X), are given by

.

Experience Report on Using a Domain Model-Based Extractive Approach

143

To measure the coverage of design concepts by a design model or a component, we used the metric COV using CC and CA. The algorithm used is as follows: For dci, the ratio of domain attributes covered by X, ra(dci, X) is given by

. The coverage of domain concepts by X, COV(X) is given by .

For the example given in Table 1, COV(Design1) = 1, and COV(Design2),and COV(Design3) = 0.375. COV of the design models of three EPG products were 0.83, 0.73, and 0.66, respectively. We can also measure coverage for classes in the same manner. A.2) Conformance: For high quality assets, cohesive responsibilities must be allocated together to design entities. We can expect that, if a design model is consistent with the domain model, the design classes of the design model will most likely be cohesive. Therefore, for each component in a design model, we check conformance at the entity (component, class) level by comparing allocated attributes and methods. We measured the conformance of each component using the sets CC and CA. To measure conformance of each design entity (a component or a class) Z in a design model X, we used the following procedure:

∈

y For each domain concept dci CC(Z), count the number of entities in the design model X that collectively cover dci. If there are “many” entities covering dci, we consider that Z has a lower conformance to dci than otherwise because dci is scattered to “many” design entities. y For each pair of domain concepts, (dcj, dck) CC(Z), identify the closeness between dcj and dck. If they are not connected each other though associations, Z has a lower conformance than otherwise because Z realizes unrelated domain concepts together.

∈

B. Architecture Level We must reuse an appropriate architecture(s) for the product line. To identify a candidate architecture(s) for reuse, we need to evaluate architectural quality attributes. In this project, we used the Architecture Tradeoff Analysis Method® (ATAM®) [8]. One quality attribute that is not directly addressed by ATAM but is very important in the product line approach is adaptability. (Maintainability encompasses adaptability. However, we did not see the need to generate detailed scenarios as we had a good understanding of variability through feature modeling.) We used the feature model to identify variable features and evaluate how easily each design model can accommodate the variability. C. Implementation Level For reuse of components, the quality of implementation is important. At the implementation level, we evaluate the quality of the implemented code using various code

144

H. Lee et al.

Table 2. Quality attribute evaluation results of the EPG design models by using IBM Rational Software Analyzer® Evaluation Metric2 CohesionS CouplingV UnderstandabilityS

LCOM1V LCOM2V CCV MIS ANOCOMS ABDV

Design1 Design2 Design3 2.00 0.77 3.08 231.2 118.0 1.80

3.00 0.89 3.39 204.68 73.43 1.88

2.00 0.80 2.53 217.27 81.0 1.69

S: The higher, the better, V: The lower, the better

analysis metrics such as complexity, cohesion, inheritance, and dependency. We used a commercial tool, IBM Rational Software AnalyzerⓇ [9]. Design1 had a higher cohesion and a lower coupling value, and was more understandable than others (Table 2). (Details of analysis results are discussed in section 6.) 3) Reuse Candidates Selection: Considering the evaluation results together from the previous activity, we select an architecture(s) and components that are most suitable for reuse in asset development. We employed the following strategy in the selection: y The candidate design model should have a higher coverage (higher COV value) than others. y The candidate components should have a higher conformance than other comparable components. y The candidate architecture should satisfy the important quality attributes represented in the feature model. y The implementation of the candidate design model or component should have better code quality than other design models or components that have same responsibilities. If we can select an architecture and a large portion (more than 70%) of candidate components and classes from a certain design model, the design model is called the “base design model.” If we have a base design model, we may lower the ‘adoption barriers [1]’ as we have developers familiar with the design. Reengineering of the selected candidates is discussed in the section 3.4. 3.4 Product Line Reengineering The purpose of this process is to create a product line design model (PLdeM, i.e., asset architecture(s), components, and classes with embedded variation points),and implement it using reuse candidates, PLDM, and the feature model created in the earlier activities (Fig. 1). This process consists of three activities: architecture 2

LCOM1 (Lack of Cohesion of Methods 1), LCOM2 (Lack of Cohesion of Methods 2), CC (Cyclomatic Complexity), MI (Maintainability Index), ANOCOM (Average Number of Comments Metric), and ABD(Average Block Depth)

Experience Report on Using a Domain Model-Based Extractive Approach

145

modification; component modification; and implementation with embedded variabilities (i.e., variation points and variants). 1) Architecture Modification: The first task is to create the architecture for PLdeM that satisfies the quality attributes represented in the feature model and is adaptable for products in the product line. We can modify the candidate architecture, or create a new one if modification of the candidate architecture requires an excessive effort. To assess and improve the architecture, we can use ATAM [8] and Attribute Driven Architecture Design (ADD) [10] in addition to design patterns and tactics [11]. 2) Component Modification: Based on the architecture, we modify the reuse candidate components, if necessary. Some components may be reused without modification, but to satisfy the required quality attributes and/or the functional requirements, other components might need to be modified. While modifying them, we continue to evaluate the quality attributes and any improvement actions that can happen in refactoring can also happen here. 3) Implementation with Embedded Variabilities: To meet the variability requirements of the product line, we must insert variation points into component implementations. We can use several implementation techniques such as templates, dynamic binding, option tables, and macro processing. The candidate architecture from Design1 implemented the Program Purchase functionality inside EPG Controller component. To improve reliability and maintainability, we decided to separate the functionality from EPG controller to a new component. Also, we reused the Purchase Manager component from Design2 and allocated the Program Purchase responsibility to it (Fig. 4). We used a macro language (e.g., $IF(;$RecordingReservation[…]) in Fig. 5) in addition to mechanisms, such as inheritance and templates, that come with the programming language.

Fig. 4. Product line design model of the EPG domain

146

H. Lee et al. In ProgramReservation Component: Class ReservedProgram <;$RecordingReservation> { … restoreFromPref (str String){ $IF(;$RecordingReservation)[ If( reserveType==“RecordingReservation”) strBuf.append(reserveType).append(DELIMITER); … ]…

Fig. 5. A part of implementation of the EPG application

4 Application Engineering for Asset Validation To validate the reengineered assets, we developed two EPG applications running in different set-top boxes. One was a low-end set-top box providing only basic tuning and viewing features, while the other was a high-end set-top box supporting PVR-related features such as Recording Reservation, Play Control, and Multi Tuner (Fig. 2). We performed the following application engineering procedures to develop these EPG applications and to check if the asset-based products work correctly in the product line environment. 1) Feature Selection. We used the ASADAL CASE tool [12] to build an EPG feature model during the product line reverse engineering, and also to select features for each EPG application during application development. The tool made feature selection easy by automatically applying feature dependency rules (“require” and “exclude” relationships), preventing selection conflicts. We selected all PVR-related features for the high-end set-top box, whereas only mandatory and a small number of alternative features for the low-end one. The ASADAL tool generated information on the selected features in an XML file, which was used for asset adaptation. 2) Asset Adaptation. We generated source code of each EPG application from the reengineered assets by processing variation points in macros and inserting variants that correspond to the selected features specified in the XML file. A variant might be a component, a class, or a code segment. 3) Code Generation. We compiled the generated source code of each EPG application, and linked it with other set-top box software modules and some libraries encapsulating set-top box devices. The resulting file was an executable image for the set-top box. 4) Application Test. We uploaded each executable image onto the corresponding set-top box, connected the box to cable television networks, and checked whether the box provided the selected features correctly. We tested some features manually and the others with automatic test suites. Both the low-end and the high-end settop boxes operated correctly, and we could conclude that those reengineered assets related to basic tuning, watching, and PVR features were validated. We were also able to generate all three EPG applications (Design1, Design2, and Design3 in Table 1) that we used to create product line assets. Although the structure of generated source code was different from that of the original one, their functional-

Experience Report on Using a Domain Model-Based Extractive Approach

147

ities were the same. Also, there was no visible performance difference between the asset-based products and the original ones.

5 Related Work There have been active researches in the reengineering and refactoring fields [4], [5]. Although many researchers have focused on reusing legacy systems to reduce development cost and effort, there have been a few methods extending the techniques to software product line engineering. J. DeBaud et al. [13] introduced a reengineering method using a domain model. Although their method was not in the product line context, they emphasized the importance of a domain model. They used a domain model to define and understand the context of a program, but we create a domain model to define the context of a family of programs, to extract and reengineer components from legacy software, and to create a product line. With the emergence of the software product line paradigm, researchers have performed research on reengineering in the context of product line engineering. J. Bayer et al. [14] presented the ‘RE-PLACE’ framework to support transition of existing software assets to a product line architecture. They performed asset reengineering, product line design, and architecture modeling by extracting features. R. Kolb et al. [15] introduced PuLSE™-DSSA process and applied it to reengineer an image memory handler component to reuse it in a software product line. Although both researchers introduced reengineering for asset components development, they did not analyze and manage commonalities and variabilities of the domain model in a systematic way. We believe that a domain model-based approach can easily build good software quality attributes into product line assets. Since the feature oriented domain analysis [7] was introduced, some researchers performed reengineering and refactoring from the feature-oriented perspective. K.C. Kang et al. [16] reengineered legacy home service robot applications into product line assets using a feature-oriented method [17]. J. Liu et al. [18] introduced the feature oriented refactoring (FOR) process which decomposes a system into features and reengineered the system based on the features. They reengineered an open source data base system implemented in Java using the FOR process. Although these researchers performed reengineering in terms of features, they did not consider extracting code components from a family of legacy applications for use in creating an asset base for a product line. Their methods are not in the context of an extractive approach. V. Alves et al. [19] created a mobile games product line in an extractive approach. They introduced a method for extracting and evolving a product line using the aspectoriented method. Although they extracted product lines with simple refactoring laws, they assumed a high level of design similarities among products. Thus, if there are few structural similarities, the method is difficult to apply. Our method of inserting variabilities into a domain model based on a feature model is somewhat related to research in the model transformation field. K. Czarnecki et al. [20] introduced a template-based approach for mapping a feature model to activity diagrams. F. Heidenreich et al. [21] presented ‘FeatureMapper’ which is a tool to define mappings of features to model elements by specifying feature realizations.

148

H. Lee et al.

Unlike their focus on model-based transformation, we focus on the model-based component extraction from a family of related systems.

6 Conclusion and Future Work In this project, we have shown how a domain model can play a key role in an extractive PLE approach. By analyzing design models of different systems using a domain model, we could easily find commonalities among the designs of a family of related systems. Moreover, by selecting reuse candidates that are similar to the domain model, effective reengineering became possible. Some of the interesting findings we have from this project are: y Effectiveness of Domain Model-Based Approach: Through domain modeling, we could have a clear understanding of the domain and also what to look for from a design model. Without this, comparing a design model with others was like comparing an apple with an orange. However, there is an extra cost for developing these models. We spent about a month to do domain modeling. y Base Design Model: In section 3.3 we discussed base design model. If we have a base design model, we can transition from product-based development to product line-based development more efficiently as we can reuse the design itself, which implies that we can also reuse many of the components included in the design. In our project, Design1 was the base design model we selected, and the architecture and most of the components of Design1 were used. y Conformance at the Conceptual Level and Quality Attributes at the Implementation Level: We found some interesting correlations between the level of conformance of a design model and the quality of implemented code. A design model that had high conformance at the conceptual level had high quality assessments at the implementation level. When we evaluated quality attributes of three design models using the tool, Design1 (which had the best conformance value) turned out to have the best quality values (Table 2). A good design lends itself to a quality implementation. Feasibility of our method was demonstrated in the EPG domain. To quantitatively analyze the effectiveness of this method, we plan to continuously monitor how much time and efforts are required to develop applications from the product line assets, and how well these applications meet functional and quality requirements. In the project, we could select one design model as the base design model and reuse the architecture and components of that design model. However, there can be situations where we need to create a new product line architecture and reuse components from several legacy products. We plan to explore other project situations and improve our method.

References 1. Krueger, C.W.: Easing the Transition to Software Mass Customization. In: Proc. of the 4th International Workshop on Product Family Engineering, October 2001, pp. 282–293 (2001) 2. GEARS, BigLever Software, Inc., http://www.biglever.com/

Experience Report on Using a Domain Model-Based Extractive Approach

149

3. Pure:Variants, Pure-systems, Inc., http://www.pure-systems.com/ 4. Fanta, R., Rajlich, V.: Reengineering Object-Oriented Code. In: Proc. of ICSM, November 1998, pp. 238–246 (1998) 5. Tahvildari, L., et al.: Quality-Driven Software Re-Engineering. J. Systems and Software 66(3), 225–239 (2003) 6. Enterprise Architecture, Sparx systems, Inc., http://www.sparxsystems.com/products/ea/ 7. Kang, K.C., et al.: Feature Oriented Domain Analysis (FODA) Feasibility Study. CMU/SEI-90-TR-21, SEI, CMU (November 1990) 8. Kazman, R., et al.: ATAM: Method for Architecture Evaluation. CMU/SEI-2000-TR-004, SEI, CMU (August 2000) 9. Rational Software Analyzer, IBM,Inc., http://www.ibm.com/software/awdtools/swanalyzer/ 10. Wojcik, R., et al.: Attribute-Driven Design (ADD), Version 2.0. CMU/SEI-2006-TR-023, SEI, CMU (November 2006) 11. Bachmann, F., et al.: Illuminating the Fundamental Contributors to Software Architecture Quality. CMU/SEI-2002-TR-025, SEI, CMU (August 2002) 12. Kim, K., et al.: ASADAL: A Tool System for Co-Development of Software and Test Environment Based on Product Line Engineering. In: Proc. of the 28th ICSE, May 2006, pp. 780–786 (2006) 13. DeBaud, J., Rugaber, S.: A Software Re-engineering Method Using Domain Models. In: Proc. of the ICSM, pp. 204–213 (1995) 14. Bayer, J., et al.: Transitioning Legacy Assets to a Product Line Architecture. In: Proc. of ESEC/FSE, September 1999, pp. 446–463 (1999) 15. Kolb, R., et al.: A Case Study in Refactoring a Legacy Component for Reuse in a Product Line. In: Proc. of the 21st ICSM, September 2005, pp. 369–378 (2005) 16. Kang, K.C., et al.: Feature-Oriented Re-engineering of Legacy Systems into Product Line Assets – a Case Study. In: Proc. of the 9th SPLC, September 2005, pp. 45–56 (2005) 17. Kang, K.C., et al.: Feature Oriented Product Line Engineering. IEEE Software 19(4), 58– 65 (2002) 18. Liu, J., et al.: Feature Oriented Refactoring of Legacy Applications. In: Proc. of the 28th ICSE, May 2006, pp. 112–121 (2006) 19. Alves, V., et al.: Extracting and Evolving Mobile Games Product Lines. In: Proc. of the 9th SPLC, September 2005, pp. 70–81 (2005) 20. Czarnecki, K., Antkiewicz, M.: Mapping Features to Models: A Template Approach Based on Superimposed Variants. In: Proc. of the 4th International Conference on Generative Programming and Component Engineering, September-October 2005, pp. 422–437 (2005) 21. Heidenreich, F., Kopcsek, J., Wende, C.: FeatureMapper: Mapping Features to Models. In: Proc. of the 30th ICSE, May 2008, pp. 943–944 (2008)

Reuse with Software Components – A Survey of Industrial State of Practice Rikard Land1, Daniel Sundmark1, Frank Lüders1, Iva Krasteva2, and Adnan Causevic1 1

Mälardalen University, School of Innovation, Design and Engineering, Västerås, Sweden 2 Faculty of Mathematics and Informatics, Sofia University, Sofia, Bulgaria {rikard.land,daniel.sundmark,frank.luders, adnan.causevic}@mdh.se, [email protected]

Abstract. Software is often built from pre-existing, reusable components, but there is a lack of knowledge regarding how efficient this is in practice. In this paper we therefore present qualitative results from an industrial survey on current practices and preferences, highlighting differences and similarities between development with reusable components, development without reusable components, and development of components for reuse. Component reuse does happen, but the findings are still partly disappointing: currently, many potential benefits are not achieved. Still, the findings are encouraging: there are indeed good, reusable components properly verified and documented, and mature organizations who manage to reuse these components efficiently, e.g. by leveraging the previous component verification. We also find that replacing one component for another is not necessarily complicated and costly.

1 Introduction The paradigm of component-based software engineering (CBSE) has a number of perceived benefits [1] [2]: components may be developed independently of each other and interact only through explicit interfaces, which open up the possibility for component reuse in new contexts. It provides a framework for defining architectures and facilitating ease of integration, when using pre-existing components as well as in a top-down design decomposition system development [3]. By selecting pre-existing components that have been proven in use and enhanced over time, it would be possible to construct high quality systems more rapidly than ever. Moreover, research is progressing towards the vision that system behaviour can be predicted from component behaviour [4] [5], which would make reuse even more attractive, as the consequences of selecting a particular component would be known in advance. However, in practice, software reuse through components is difficult and not entirely successful, for several reasons: first, components do not always live up to the expectations, partly because it is inherently extremely difficult to verify a component without a context. Second, it is seldom easy to exchange a component for another; even though (part of) the interface is identical or similar. Thus, at least some of the development time saved through reusing a component needs to be spent in the S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp.150–159, 2009. © Springer-Verlag Berlin Heidelberg 2009

Reuse with Software Components

151

selection, evaluation, and verification of components, and explicit management of the relationships with component vendors. This typically also leads to vendor lock-in, and reuse is thus often degraded to only an initial event in a system’s history. We set out to study the state of the practice in the following general software development activities, from a software component reuse perspective: requirements elicitation and customer interaction, design and implementation, verification, and component selection and evaluation. For this purpose, we constructed a web-based survey and invited organizations reusing and integrating existing software components, as well as organizations not reusing components, and component builders, as respondents. Focus was on the technical staff (developers, testers, architects, etc.) This paper presents the results of this survey, thereby providing an insight in how well CBSE supports software reuse in current practice, how and to what extent components are verified in isolation, and how component users test and evaluate components before selecting them. There are two main research questions reflected in the structure of this paper: first, in Section 4, we investigate whether there are any differences in how development activities are performed, depending on whether software development include the reuse of components or not. Developers of components for reuse are included as a third group in this comparison. Then, in Section 5, we investigate how component selection and evaluation, is performed by projects developing software (partially) by integrating reusable components. First however, in Section 2 we describe the background, and in Section 3 we present the research method used to perform and analyse the survey. Section 6 concludes the paper and presents ideas on future work.

2 Background and Related Work Although software reuse has some potential benefits, practice has shown a great many challenges, and not only technical aspects must be mastered. Any serious software reuse attempt must permeate the organization and allow existing processes and practices to be modified [6]. Other empirical studies of software reuse have been conducted (see e.g. [7] for a review) and even some focusing specifically on reuse with components [8] [9]. The study presented here adds to this body of work by investigating some specific questions, in particular related to verification and component selection. Other related work is referred to in context throughout the rest of the paper: Section 4 describes existing approaches to requirements and customer interaction [10] [11], design and implementation [11] [12], and verification [3] [13] [14], which has a bearing on component reuse. Section 5 relates to literature with suggested methods for Off-the-Shelf (OTS) component selection and evaluation [15] [16] [12] [17] [18] [19] [20] [21] [22]. In these sections, we describe suggested methods, practices, and previous observations found in literature, and relate our empirical survey results with them, to investigate the extent to which suggested guidelines etc. are adopted in practice.

152

R. Land et al.

3 Research Method To study how the component-based software paradigm support reuse in practice, we constructed a web-based questionnaire. Invitation emails were sent to companies that were part of our joint research projects such as FLEXI1 and NESSI2 , among others. We thus received a total of 93 responses, 30 of which seem to have quit the questionnaire after providing only some background information. We believe the main reason is that they perceived the questionnaire would take too long time, and we cannot know if this poses a particular threat to validity, i.e. if some particular types of answers were thus systematically excluded. However, since the respondents are anonymous we cannot know how many organizations these represent. Also, as we sent the invitation to participate to some email lists, and encouraged every recipient to further spread the invitation we can neither know the response frequency, nor exactly which organizations are represented. Hence, during any statistical treatment of the data we must bear in mind the limitations imposed by this type of convenience sampling to the external validity of the results. More information about the questionnaire, as well as all data, is available as a technical report [23]. In much of our analysis, we explore any differences between development with reusable components, development without reusable components, and development of components for reuse. These three groups are defined as illustrated in Fig. 1, based on three specific questions in the questionnaire: we consider two complementary subsets of development projects: with or without reusable components. Orthogonal to this division, we also consider projects developing components for reuse (which we study, as indicated in the figure), and projects developing (non-reusable) products and systems used by end users.

Fig. 1. Groups of respondents

For each respondent, based on the responses to some mandatory initial questions in the questionnaire, some later sections of questions were shown or hidden. As this caused the number of respondents to vary between sections, the number of respondents in each survey section is specified in Table 1, both per group and the total 1 2

http://www.flexi-itea2.org/ http://www.nessi-europe.com/

Reuse with Software Components

153

(which is the sum of the first two columns, i.e. the groups representing development with and without reusable components). Table 1. Number of responses in each respondent group for each section in the survey Development... Survey Section Agile practice preferences Testing Component development System development with reusable components System development

...with reusable ...without reuscomponents able components 32 18 24 12 8 5 29 0

...of components for reuse 8 6 8 5

50 36 13 29

25

6

25

0

Total

4 Development with, without, and for Reuse In this section, we analyze the activities requirements elicitation and customer interaction, design and implementation, and verification. In particular, we explore the differences, if any, between development with reusable components, development without reusable components, and component development for reuse. 4.1 Requirements Elicitation and Customer Interaction Interaction with customers and feedback [11] affect how requirements are formulated, how fixed they are, and how often deliveries are made. Generally, our results show that regardless of the level of component reuse in development, incremental delivery is a widespread practice, but requirement handling and collection of customer feedback varies between development of, with, and without reusable components. Regular interaction. For development without reuse, regular interaction between developers and customers/business people is in general encouraged by management, while for development with and for reuse, there is no consensus. However, there is a consensus among the respondents that they would like such regular interaction to be increased. Changing requirements. For development with reusable components, there is a slight tendency to discourage customers from changing requirements once they are specified. For development without reusable components, the tendency is the opposite: customers have more possibilities to change their requirements. A possible explanation is that when a decision has been made to use a reusable component, requirement changes may have a larger impact on the existing design [10] [12]. However, both groups seem to be dissatisfied with the current state: respondents in the development with reusable components group would like to allow their customers to change their requirements, while respondents in the development without reusable components think customers should be allowed to change less. For the above questions, the development of components for reuse group provides answers without any clear preferences.

154

R. Land et al.

Incremental delivery. In all groups, the general practice is to deliver software to customers incrementally, and all respondents think this practice should be even more emphasized. All groups in general also provide users with early versions (alpha/beta) of the software, but this tendency is stronger among system development than component development for reuse. One interpretation of this difference is that it more useful feedback can be collected from end users using an incomplete or buggy user interface application, than from component users using an unreliable component. Delivery of source code. Sometimes, the software is delivered as source code, and sometimes in binary format, without any particular tendency or any difference between the groups. This may indicate that the domain determines what is convenient, rather than for example different types of business relationships in the groups, or any difference in the desire to keep the implementation secret. Customer feedback. In development of systems for end users, the respondents almost uniformly state that end customer feedback is collected and evaluated through different mechanisms. This can be contrasted to development of components, where the respondents are more varied in their responses, but still with a slight overweight in support for this practice. One partial interpretation is that for at least reusable components developed for the mass-market, the distance to customers is large (although this distance can be decreased: we are aware of one COTS vendor which presents the current state to their key customers in web conferences every second week, and allow interaction in these virtual meetings). 4.2 Design and Implementation This section reports on some findings related to design and implementation from the perspective of the three groups defined above. Our findings point out that incremental design and coding is a preferred practice among the respondents, but also that there are differences between the preferred and the actual practice. Interleaving of design and programming. The responses are very varied as to what degree programming should be allowed to start before design is completed. The current practice varies across the scale for all groups of respondents, although those doing development without reuse has a slight tendency towards being more permissive of starting programming early. When asked about their preference, all three groups are less permissive when compared to the current practice, although this is not the case for each individual respondent. Incremental design and coding. This is often viewed as a good way to discover design problems early and to get early customer feedback [11]. Our findings show that it is widely used in current practice, independently of whether development is done with, without or for reuse. When asked about their preference, the respondents unanimously agreed that the incremental approach is desirable. Return on investment of designing components for maintainability. The group of respondents representing development of software components for reuse unanimously agreed that if enough efforts for building a good and maintainable design of a component are not spent in advance, the cost of change for a component is really high. Respondents outside this group were not asked this question.

Reuse with Software Components

155

Redesigning component-based systems. Design lock-in has been identified as a potential side-effect of building systems from pre-existing software components [12]. However, the majority of respondents agreed that redesigning a system is not a big issue when building a system out of components. The group representing development without software components were not asked this question. 4.3 Verification Ease of verification is one of the main arguments for software reuse through components [3] [14]. The main idea is that components that have been verified in previous settings and deployments will not require as much verification effort as software developed from scratch. Such savings would be highly relevant, since verification is widely known to consume significant portions of the resources in software development projects [13]. In this section, we investigate system and component verification from the perspective of current practice in software development with, without and for reuse. General opinions. Regardless whether the system is built with or without reusable components, most respondents find themselves having less time for testing than they would like to. Looking at the ideal verification practices, in the eyes of the respondents, unit testing still has a high degree of preference. Moreover, respondents generally feel that both functional black-box testing and testing based on code analysis should be increased compared to current practice, and functional black-box testing is preferred over testing based on code analysis. Unit testing and component testing. In system development with and without reuse, most respondents report a high level of use of unit testing. The same goes for functional black-box testing of components. This trend is even more apparent when looking at functional black-box testing on system-level. Answers are similar for performance and security testing, but we feel that these types of testing are too domain-specific to consider generally. In component development for reuse, both functional black-box testing and testing based on code analysis (e.g. statement or path coverage) are present in some projects. For all these verification methods, there is a noticeable difference between the current practice and the perceived ideal level of usage, which in general is significantly higher. Integration testing. To a large extent, respondents find themselves in projects that allow code changes during integration testing. Interestingly, respondents developing systems with reusable components find this less problematic than those developing systems without reuse. In addition, the respondents do not consider it easier to test systems built out of reusable components which are previously tested in isolation, than to test systems built without reuse. Testing of documentation. In all groups, testing of documentation is something that is perceived to be largely neglected, and most respondents, except those developing components for reuse, would like a significant increase in this practice. However, out of the 8 developing components for reuse, only 2 explicitly agree that the documentation provided with the components is sufficient for the needs for the component users.

156

R. Land et al.

In-house vs. Subcontracted vs. OTS. Among the respondents, there are stronger explicit demands on the documentation and verification of subcontracted components compared to the documentation and verification of in-house or OTS components. Component creators and component users both think that the current state of documentation and verification fulfil the needs of component users, for subcontracted and OTS components, but not for in-house components. This is also reflected in a stronger dissatisfaction with the documentation and verification of in-house components, compared to that of subcontracted or OTS components. One possible explanation for this is that the distance from a subcontractor or OTS vendor to the component user is greater, and also that the amount (and quality) of documentation is regulated by contracts (for subcontractors), or implicitly required in order to have an attractive product (for OTS vendors). Whatever the reason, this points us in a direction where component reuse could be improved by providing more efficient and practically useful documentation.

5 Component Selection and Evaluation In this section, we describe the current state of practice concerning the selection of reusable components to use during software development, and the challenges of evaluating reusable components in a system context. 5.1 Component Selection The commonly suggested practice for OTS selection is to first filter away many component candidates in a high-level evaluation phase, based on information and documentation about the components, and only later perform a prototyping hands-on evaluation of a final few components by writing test cases and create prototypes [15] [16]. Roles involved in component selection. The survey responses indicate that in some projects, only the development unit is involved in the component evaluation and selection process, while other projects heavily involve customers or internal staff with a responsibility to know the market and customers. Although it is true that some components are not directly visible to customers and end users, more often than not, the decision to use a specific component does have a business impact; it may for example strongly affect the possibilities for future extensions of the systems [12] [17]. Thus, it appears that, in some companies, the current state of practice needs to be improved. Interleaving system requirements elicitation and component selection. The respondents tend to formulate requirements on components fully prior to evaluation and selection. However, they generally find it difficult to break down system requirements to component requirements. This indicates that many organizations have not yet implemented the practice [15] to interleave component selection with the requirements elicitation process, as suggested by e.g. the methods PORE (Procurement-Oriented Requirements Engineering) [18], CRE (COTS-Based Requirements Engineering) [19] and CARE (COTS-Aware Requirements Engineering) [20]. However, as said, the majority of the respondents assert that customers or business people are involved during component selection and evaluation. One interpretation is therefore that require-

Reuse with Software Components

157

ments elicitation and component selection and evaluation are often in practice interleaved, albeit not formalized as a process. 5.2 Component Evaluation Prototyping evaluation. After some initial, high-level evaluation, based on information about OTS components (or existing knowledge of the potential components) [8], the suggested practice is to create prototypes, or simulate the system’s usage of the component through testing [15]. There are two main goals for this: To examine technology or architecture [15] [16]; the survey results clearly show this type of prototyping activity is widely performed in practice. To evaluate component assemblies (rather than individual components) [15] [16] [21] [22]; the result varies with no clear tendency. Usage of provided test cases. Our responses vary concerning whether test cases provided with the components are used to evaluate them. The respondents who use test cases provided with the components report that they also develop their own test cases for components in order to evaluate them, and surprisingly, those that do not use test cases provided with the components do not write their own test cases. Even more surprising is perhaps that this is true not only for subcontracted or in-house developed components – where one could expect the detailed functionality, level of quality, and responsibility for quality assurance to be specified by contracts – but also for OTS components. Insufficient evaluation. High-level component evaluation and prototyping evaluation complement each other; however, if the components to select from are known, it may be sufficient to do a brief hands-on evaluation in the new context [8], which could partly explain that some of our respondents do not evaluate components prior to selection. However, some of the respondents who do not test their components believe testing is more efficient than documentation (this is true also for all respondents who do use test cases), which makes us lean towards the following conclusion: there are organizations and projects where OTS components are selected without proper evaluation – and that they are aware of this. However, there are also indeed organizations that perform systematic evaluation of OTS components.

6 Conclusion and Future Work This paper presented an empirical, qualitative study of reuse with software components. Our data indicate that reuse of components does not make design decisions as permanent as might be feared. The impact of requirements changes are inconclusive. Regarding verification, the general opinion in our study is that it is not done to a sufficient extent, independent of component reuse. Separate verification of reusable components in isolation does not in general make system verification or component evaluation easier. Known good practices for component selection and evaluation are implemented in some organizations but not all. In conclusion, as for the current state of the practice of component reuse in industry, we can claim that components are as a matter of fact built for reuse, and those components are in fact being reused. The main reasons (which we have not studied)

158

R. Land et al.

are probably those of cost and time for system development: through component reuse systems can be built cheaper and faster. However, some other potential benefits (which we have studied) are not in general experienced: in particular system verification is not necessarily made easier, and requirements engineering, and ultimately the ways system developers interact with their customer, need to change further than is the case in general today. Nevertheless, our study clearly shows that there are organizations where these benefits are indeed experienced, but this is apparently hard to achieve without explicit attention and effort. Further research includes studying the organizations which manages component reuse the best in order to identify good practices and how to implement them in different circumstances. Many such practices and potential benefits are already known, but are, according to our results, not yet widely adopted in industrial practice. As this generally confirms previous studies, it is useful as it adds to the body of knowledge and may provide additional insights.

Acknowledgements This work was partially supported by the Swedish Foundation for Strategic Research (SSF) via the strategic research centre PROGRESS, the Bulgarian Ministry of Education and Science, and FLEXI. Thanks also to all the questionnaire respondents and the people who have been involved in earlier phases of this research.

References 1. Szyperski, C.: Component Software, 2nd edn. Addison-Wesley, Reading (2002) 2. Wallnau, K., Hissam, S., Seacord, R.: Building Systems from Commercial Components. Addison-Wesley, Reading (2001) 3. Crnkovic, I., Chaudron, M., Larsson, S.: Component-based Development Process and Component Lifecycle. In: International Conference on Software Engineering Advances (ICSEA 2006), Tahiti (2006) 4. Hissam, S., Moreno, G., Stafford, J., Wallnau, K.: Packaging Predictable Assembly with Prediction-Enabled Component Technology, Pittsburgh (2001) 5. Land, R., Carlson, J., Larsson, S., Crnkovic, I.: Towards Guidelines for a Development Process for Component-Based Embedded Systems. In: Workshop on Software Engineering Processes and Applications (SEPA), Yongin, Korea. LNCS (2009) 6. Karlsson, E.-A.: Software Reuse: A Holistic Approach. John Wiley & Sons Ltd., Chichester (1995) 7. Mohagheghi, P., Conradi, R.: Quality, Productivity and Economic Benefits of Software Reuse: A Review of Industrial Studies. Journal of Empirical Software Engineering 12(5), 471-516 (2007) 8. Li, J., Torchiano, M., Conradi, R., Slyngstad, O., Bunse, C.: A State-of-the-Practice Survey of Off-the-Shelf Component-Based Development Processes. In: Morisio, M. (ed.) ICSR 2006. LNCS, vol. 4039, pp. 16–28. Springer, Heidelberg (2006) 9. Li, J., Conradi, R., Bunse, C., Torchiano, M., Slyngstad, O., Morisio, M.: Development with Off-The-Shelf Components: 10 Facts. IEEE Software 26(2), 80–87 (2009) 10. Cooper, K.: Can Agility be Introduced into Requirements Engineering for COTS Component Based Development? In: International Workshop on Software Product Management, IWSPM (2006)

Reuse with Software Components

159

11. Beck, K.: EXtreme Programming EXplained: Embrace Change. Addison Wesley, Reading (1999) 12. Krasteva, I., Branger, P., Land, R.: Challenges for Agile Development of COTS Components and COTS-Based Systems – A Theoretical Examination, Funchal, Portugal (2008) 13. Tassey, G.: The Economic Impacts of Inadequate Infrastructure for Software Testing (2002) 14. Aoyama, M.: New age of software development: How component-based software engineering changes the way of software development. In: Proceedings of International Workshop on Component-Based Software Engineering (1998) 15. Land, R., Blankers, L., Chaudron, M., Crnkovic, I.: COTS Selection Best Practices in Literature and in Industry. In: Mei, H. (ed.) ICSR 2008. LNCS, vol. 5030, pp. 100–111. Springer, Heidelberg (2008) 16. Oberndorf, P., Brownsword, L., Morris, E., Sledge, C.: In: Workshop on COTS-Based Systems (1997) 17. Krasteva, I., Land, R., Sajeev, A.: Being Agile when Developing Software Components and Component-Based Systems – Experiences from Industry. In: EuroSPI, Madrid, Spain (2009) 18. Maiden, N., Ncube, C.: Acquiring COTS Software Selection Requirements. IEEE Software 15(2) (1998) 19. Alves, C., Castro, J.: CRE: a systematic method for COTS components Selection. In: Proceedings of the XV Brazilian Symposium on Software Engineering (SBES), Rio de Janeiro (2001) 20. Chung, L., Cooper, K.: Defining Goals in a COTS-Aware Requirements Engineering Approach. Systems Engineering 7(1) (2004) 21. Burgués, X., Estay, C., Franch, X., Pastor, J., Quer, C.: Combined Selection of COTS Components. In: Dean, J., Gravel, A. (eds.) ICCBSS 2002. LNCS, vol. 2255, pp. 54–64. Springer, Heidelberg (2002) 22. Bhuta, J., Boehm, B.: A Method for Compatible COTS Component Selection. In: Franch, X., Port, D. (eds.) ICCBSS 2005. LNCS, vol. 3412, pp. 132–143. Springer, Heidelberg (2005) 23. Causevic, A., Krasteva, I., Land, R., Sajeev, A., Sundmark, D.: An Industrial Survey on Software Process Practices, Preferences and Methods (2009) 24. Land, R., Alvaro, A., Crnkovic, I.: Towards Efficient Software Component Evaluation: An Examination of Component Selection and Certification. In: Euromicro SEAA SPPI Track, Parma, Italy (2008)

Evaluating the Reusability of Product-Line Software Fault Tree Analysis Assets for a Safety-Critical System Josh Dehlinger1 and Robyn R. Lutz2 1

Department of Computer and Information Sciences, Towson University, 7800 York Road, Towson, Maryland, USA 21252 [email protected] 2 Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, Iowa, USA 50011 and Jet Propulsion Laboratory / Caltech [email protected]

Abstract. The reuse of product-line assets enables efficiencies in development time and cost. Safety analysis techniques for Software Product-Line Engineering (SPLE) construct safety-related, non-code artifacts with the aim of reusing these assets for new product-line members. In this paper we describe results from the construction and reuse of a key safety-analysis technique, Product-line Software Fault Tree Analysis (PL-SFTA), and its supporting tool, PLFaultCAT. The main contribution of this work is the evaluation of PL-SFTA and PLFaultCAT for the reuse of safety analysis assets in a product line. The context is a safety-critical product line of spacecraft developed as a multi-agent system. Keywords: reusable safety analysis assets, safety aspects of reuse, product-line software fault tree analysis, multi-agent system product lines.

1 Introduction Software product-line engineering (SPLE) is a key enabling technology for reuse. It provides a proactive and systematic approach for the design and development of systems to create a set of similar products, a product line, from reusable assets. A software product line (SPL) is defined as "a set of software-intensive systems sharing a common, managed set of features that satisfy the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way" [1]. SPLE supports reusability by developing a set of products that share core commonalities and differ via a set of managed variabilities [13] and has been shown to be able to reduce the design, development and production time and cost of systems through the reuse of code and non-code artifacts [1]. Engineering for safety-critical systems, such as cardiac pacemakers [10] and medical imaging systems [11], has adopted a SPLE approach to take advantage of reusable artifacts during design and development. To maintain the safety properties requisite for critical systems, safety analysis techniques and tools specific to SPLE had to be created to accommodate the variability inherent in a SPL while also producing reusable artifacts that can be used for all product-line members [10]. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 160–169, 2009. © Springer-Verlag Berlin Heidelberg 2009

Evaluating the Reusability of Product-Line Software Fault Tree Analysis

161

Product-Line Software Fault Tree Analysis (PL-SFTA) [2] [6] extends traditional Software Fault Tree Analysis (SFTA) [8] by incorporating SPLE to produce reusable safety analysis assets. Unlike traditional SFTA, PL-SFTA develops fault trees that incorporate the variabilities amongst the SPL members to provide reusable safety analysis assets for the entire SPL. PL-SFTA is supported by a tool, PLFaultCAT [6], that enables this reuse by automatically producing the fault tree for each product-line member from the PL-SFTA. The goal is to support the safety analysis needed on a new product-line member with a less costly and more automated process. This paper furthers the argument that PL-SFTA and PLFaultCAT supply a beneficial safety analysis technique and tool. Specifically, the contribution of this paper is an evaluation of the degree to which their generated reusable safety analysis assets can be directly applied to new product-line members within SPLE, including: • A description of how PL-SFTA and PLFaultCAT systematically capture and reuse non-code safety analysis assets for safety-critical product lines • An evaluation of the degree to which the safety analysis assets developed using PL-SFTA can be reused toward new product-line members using a significant case study based on a NASA-proposed multi-agent system product line (MAS-PL) • An assessment of the degree to which the PLFaultCAT tool reduces the effort needed to construct the safety analysis products for a new member from the reusable product-line safety analysis assets • A discussion of the implications of the evaluation of PL-SFTA and PLFaultCAT on the degree to which software engineering artifacts can be reused for MAS-PLs compared to traditional SPLs This work is a part of a larger effort that investigates how safety-critical SPLs can be designed and developed with the support of SPL-specific, reusable safety analysis techniques. The long-term goal is to provide safety analysis assets for new productline members in a timely and cost-efficient manner. The remainder of the paper is organized as follows. Section 2 reviews related work. Section 3 summarizes the PAM MAS-PL case study used here for evaluation. Section 4 describes PL-SFTA in the context of SPLE. Section 5 provides an analysis of our empirical results regarding the reusability of the PL-SFTA safety analysis assets. Section 6 discusses the results from our experimental evaluation and their implications for MAS-PLs and SPLs. Finally, Section 7 provides concluding remarks.

2 Related Work SPLE supports the systematic planning, design and development of a family of software systems through understanding, controlling and managing their common characteristics and differences. SPLE develops a family of products and relies on the analysis of the commonalities and variabilities of the product-line members prior to their development. Following Weiss and Lai [13], we specify a SPL using a Commonality and Variability Analysis (CVA) to document the SPL’s commonalities (i.e., requirements of the entire product line), variabilities (i.e., specific requirements not contained in every member of the product line) and dependencies (i.e., constraints amongst selection of the variable features).

162

J. Dehlinger and R.R. Lutz

In previous work [2] [3], we have integrated the Family-Oriented Abstraction, Specification and Translation (FAST) SPLE methodology [13] into Agent-Oriented Software Engineering (AOSE) to enable the analysis and design of MAS-PL. In the domain engineering phase, the MAS-PL’s requirements are defined and specified using our Gaia-PL methodology [2]. The application engineering phase then reuses the software engineering artifacts developed in the domain engineering phase to build new product-line members, in this case software agents within the MAS-PL. There has been little related work to date specifically in safety-critical SPLs. Prior work in software safety [8] and in reuse for safety-critical systems [9] has not dealt directly with SPLE safety analyses. On the other hand, prior work in SPLE has not addressed the additional, high-assurance needs of safety-critical SPLs [1] [13]. To provide safety assurances for critical SPLs, we developed PL-SFTA and its tool support, PLFaultCAT [5] [6], to allow for the creation of fault trees for a SPL while supporting the reuse inherent in SPLE. A fault tree is a directed AND/OR graph that represents a hazard and its contributing causes. Each node is an event or condition that can contribute to the occurrence of the hazard [8]. In addition to supporting the development and reuse of safety analysis artifacts for critical SPLs, PLFaultCAT provides additional, automated safety analyses to identify failure points and safetycritical requirements [5]. These safety analysis results, when developed during the domain engineering phase, can be applied to all product-line members. Other SPLE-specific safety analysis techniques, including [4] [7] [10], have provided systematic approaches for conducting safety analyses on critical SPLs without studying the extent to which the resulting artifacts can be reused.

3 The Prospecting Asteroid Mission To evaluate this work, we used requirements based on the Prospecting Asteroid Mission (PAM), a NASA-proposed concept mission based on the Autonomous NanoTechnology Swarm (ANTS) technology to explore the asteroid belt [12]. This mission will consist of up to 1,000 spacecraft that can autonomously form subswarms to investigate asteroids of interest. Except for a spacecraft’s scientific instrumentation specialties, each PAM spacecraft has identical hardware. Each PAM spacecraft will be designated as a leader, a messenger or a worker [12]. A leader will determine the types of asteroids and data to pursue and will coordinate the efforts of worker spacecraft. A messenger will coordinate communication among spacecraft and with the Earth. Each worker will perform scientific investigation using its specialized equipment (e.g., a spectrometer). Within the PAM swarm, there is significant redundancy for each spacecraft type since 60-70% of the PAM spacecraft could be lost over the duration of the mission due to failures, collisions, etc. To preserve mission-critical requirements, additional capabilities (i.e., product-line variabilities) are given to some spacecraft to achieve redundancy at the swarm level through reconfiguration and adaptation. For example, some spacecraft may be able to switch at runtime from messenger to leader in response to loss or failure of other spacecraft, or to be tasked with monitoring for an impending solar storm (an optional variability). The design and development of the PAM spacecraft as a MAS-PL would allow for the reuse of software engineering assets during design and development. In the

Evaluating the Reusability of Product-Line Software Fault Tree Analysis

163

context of SPLE, the safety-related, commonality requirements include the navigation and guidance capabilities, collision avoidance, solar storm protection, etc. Additionally, each of the PAM spacecraft will have similar requirements related to selfcoordination, self-healing and self-optimization behaviors (i.e., requirements).

4 Product-Line Software Fault Tree Analysis Using PLFaultCAT This section describes the creation, analysis and reuse of a PL-SFTA for the safetycritical MAS-PL described in Section 3. The creation of the PL-SFTA occurs during the domain engineering phase of SPLE and constructs the safety analysis artifacts reused for specific product-line members in the application engineering phase. 4.1 Domain Engineering – Development of Reusable Safety Analysis Artifacts The development of the PL-SFTA using PLFaultCAT consists of three steps: Step 1. Identify a root node hazard(s) and develop intermediate node tree. The root node of a fault tree represents a potential hazard the system design and implementation should mitigate. This hazard may be known from an existing Preliminary Hazard Analysis [8] or from a product-line Software Failure Modes, Effects and Criticality Analysis (SFMECA) [4]. From the root node, a backward search is done to find causal, contributing events. Through gathering the causal events, an intermediate node tree is constructed to establish the cause-event hierarchy. The intermediate node tree, while not necessary in the construction of a PL-SFTA, aids in jump-starting the organization and analysis of the PL-SFTA and serves as the input to PLFaultCAT. Essentially, the intermediate node tree represents a typical fault tree without the Boolean logic gate relationships between causal events and effects. Step 2. Refine intermediate node tree and document in PLFaultCAT. The intermediate node tree may contain nodes that do not reflect the level of detail needed. Thus, domain expertise, and/or the use of other safety analysis artifacts like SFMECA [4], may be needed to analyze the tree for completeness, capture additional events leading to a failure (e.g., events from the environment) and refine nodes. The domain expertise is additionally needed to determine the necessary logical combination of the children nodes to cause the parent node, as is done in traditional fault tree analysis [8]. As with any FTA, the reliance on domain expertise means that the PL-SFTA is only as good as the accuracy and completeness of the information used to create it. Step 3. Consider the influence of commonality and variability requirements on all leaf nodes. This step employs a bottom-up approach to analyze each leaf node of the intermediate SFT and determines which commonality and/or variability requirements contribute to causing the root node event to occur. In doing this, we associate the range of commonality and variability choices for any individual product-line member with how it might influence a particular hazard. Not every commonality or variability will have an influence or appear within any given fault tree. However, every leaf event node should have an associated commonality, variability, and/or basic (primary) event (e.g., an environment or user input). Considering the influence of a present or

164

J. Dehlinger and R.R. Lutz

Fig. 1. Associating SPL requirements to the leaf nodes of a PL-SFTA in PLFaultCAT

absent variability on an event is straightforward; we analyze the influence of the variability being present within the product and not functioning as designed. If, however, the node relates to a commonality rather than a variability, we link the fault tree’s leaf node with the appropriate commonality. Using PLFaultCAT makes associating a commonality and/or variability with a failure node straightforward. The PLFaultCAT interface, shown in Figure 1, allows labels for "Basic Event" nodes, depicted as circles, as a Commonality or Variability as well as defining a label or ID for the variability (i.e., the textbox under the heading "Variability ID"). The application of these three steps, using the SPL requirements documented in a CVA and a list of hazards to be mitigated, constructs a PL-SFTA that encapsulates the safety analyses of all product-line members. Figure 2 provides the PL-SFTA for the PAM MAS-PL for the hazard “Spacecraft to Asteroid Collision” depicting the failure nodes and the associated SPL requirement leaf nodes. 4.2 Application Engineering – Derivation and Reuse of Safety Analysis Artifacts The development of a PL-SFTA, described in Section 4.1, enables the reuse of this safety analysis artifact to derive the SFT for a specific product-line member during the application engineering phase. We describe this process in this section. Step 1. Select the variabilities for a new product-line member. A new product-line member is defined through the selection of which variabilities or features to include from the CVA. A product-line member is created by selecting the variabilities that it will contain and defining the values of the variabilities. PLFaultCAT supports the selection of a new product-line member’s variability requirements by providing a checkbox window that presents all possible variabilities for the SPL. PL-SFTA does not itself enforce or check the dependencies prescribed in the CVA. Other tools, such as DECIMAL [5], are capable of enforcing the dependencies and constraints detailed in the CVA for large, complex SPLs. PLFaultCAT is used after the choice of variabilities has been determined to be legal.

Evaluating the Reusability of Product-Line Software Fault Tree Analysis

165

Fig. 2. PL-SFTA for the PAM MAS-PL hazard “Spacecraft to Asteroid Collision”

Step 2. Generate the product-line member fault tree using PLFaultCAT. After establishing and verifying a product-line member, we prune the product-line SFTA to create a baseline SFTA for the new system. The pruning process, described fully in [2] [6], first uses a depth-first search to automatically remove the subtrees that have no impact on the product-line member being considered and then relies on a small amount of domain knowledge to further collapse and prune the SFTA. A subtree within the PL-SFTA will have no impact if all the leaf nodes of that subtree contain variability requirements not included for the specific product-line member. The pruning errs on the side of caution to derive the product-line member’s SFT from the PLSFTA since it only marks the subtrees that can be removed without review and does not actually do any pruning. This is advantageous from a safety perspective because it simply indicates those subtrees where neither commonalities nor selected variabilities can be found in the subsequent children nodes. This algorithm then defers the actual pruning to the domain experts, as described in the next step. Figure 3 shows a portion of the resulting product-line member SFT derived from the PL-SFTA shown in Figure 2 for the PAM MAS-PL for the hazard “Spacecraft to Asteroid Collision. Step 3. Apply domain knowledge. After removing the subtrees that had no bearing on the product-line member under consideration in Step 2, the fault tree may be able to be further pruned and/or collapsed within PLFaultCAT. However, this step requires domain knowledge and illustrates the limit to fully-automated PL-SFTA reuse. Removal of subtrees will often lead to orphaned logic gates or other opportunities to safely simplify the fault trees of a new product-line member. If removing orphaned OR gates when there is only one causal event remaining, we collapse the lower event into the parent event. If there is only one commonality or variability leaf node remaining, we attach it to the parent event and remove the OR gate. When AND gates are involved, more caution is required. Intuitively, if at least one input line to an AND gate is removed, the output event is impossible. However, it was found that this is not always the case, so each removal of an AND gate warrants further scrutiny. The

166

J. Dehlinger and R.R. Lutz

Fig. 3. Derived SFT from the PL-SFTA for a specific spacecraft in the PAM MAS-PL

clean-up of the derived SFTs for the new product-line member(s) presented in this step is a manual process and must be pursued with care. Enough information should be retained within the product-line member's fault tree for future hazard analysis and mitigation strategies. The application of domain knowledge helps in the derivation of the SFTs for a new product-line member by removing extraneous nodes and focusing attention on nodes that may contribute to failures in a specific product-line member. This provides additional assurance for the reused safety analysis asset. The following section evaluates the claim that the application of these three steps, which reuse the PL-SFTA to derive specific safety analysis assets for a product-line member, provides safety analyses more efficiently and at a reduced cost.

5 Evaluation of Safety Analysis Asset Reuse The application of the PL-SFTA to the PAM case study developed four fault trees for the hazards: 1. Spacecraft to Spacecraft Collision; 2. Spacecraft to Asteroid Collision; 3. Spacecraft Solar Storm Damage; and, 4. Failure to Detect Impending Solar Storm. The PL-SFTA associated 85.7% of the commonalities and 72.5% of the variabilities with at least one of the leaf nodes i the set of fault trees. In the domain engineering phase, PLFaultCAT did not provide significant advantages over other tools in PAM beyond additional opportunity to embed textual hazard analysis information in the fault tree. This allowed a cross-check of the information provided in the fault tree to previously derived safety requirements. In the application engineering phase, PLFaultCAT provided significant reuse advantages by exercising the pruning method outlined in Section 4. In the case study, approximately 54% of the failure nodes in the PL-SFTA were found to be common to

Evaluating the Reusability of Product-Line Software Fault Tree Analysis

167

Table 1. Results of the Application of PL-SFTA to the PAM Case Study Total Failure Nodes 82

Common Failure Nodes 64

Spacecraft to Spacecraft Collision

84

61

Spacecraft Solar Storm Damage Failure to Detect Solar Storm

87 91

52 13

Hazard Spacecraft to Asteroid Collision

% Commonality Requirements

Core Reuse

PLFaultCAT Automation

88.5%

78.0%

72.2%

63.8%

72.0%

82.1%

60.0% 6.7%

59.8% 14.3%

72.2% 93.8%

all 160 unique spacecraft of the PAM MAS-PL. That is, the minimum expected reuse of the PL-SFTA safety analysis asset for any given PAM spacecraft would be 54%. Table 1 provides the results for each of the hazards examined using PL-SFTA for the PAM MAS-PL case study. The “Hazard” column describes the root node hazard of a fault tree; the “Total Failure Nodes” column shows the total number of failure nodes of a fault tree from Steps 1-3 described in Section 4; the “Common Failure Nodes” column gives the number of failure nodes that will be common to all productline members of the PAM MAS-PL; the “% Commonality Requirements” is the percentage of the requirements associated to the leaf nodes of a fault tree that are commonality requirements; the “Core Reuse” column is the percentage of the failure nodes that are common to all product-line members of the PAM MAS-PL; and, finally, the “PLFaultCAT Automation” column shows the percentage of the nodes that could be safely and automatically pruned from the PL-SFTA using PLFaultCAT. Although the overall reuse of the PL-SFTA in this study is approximately 54%, in most cases the reuse potential of a fault tree in the PL-SFTA was even higher, in the 60-80% range. The only exception was the “Failure to Detect a Solar Storm” fault tree which had a 14% minimum reuse potential. The reason is that the major contributing factor to a lower reuse potential for the safety analysis assets of the PL-SFTA is its relation to the requirements. The fault trees for “Spacecraft to Spacecraft Collision”, “Spacecraft to Asteroid Collision” and “Spacecraft Solar Storm Damage” all had root nodes directly related to product-line commonalities. Since each of these requirements necessitates a PAM spacecraft to prevent the hazards outlined in these fault trees, all PAM spacecraft will be equipped with the functionality to prevent those hazards. As a result, a large portion of the PL-SFTA could be reused regardless of the specific configuration of the spacecraft. However, for a hazard that stems from a product-line variability, such as the “Failure to Detect Solar Storm” hazard, the reuse potential is much less since the failure of the product-line variability to mitigate against the hazard is only found in a subset of the product line’s members. This case study also found that, of failure nodes that could be safely pruned for a new product-line member, PLFaultCAT was able to automatically perform a minimum of 72% of the trimming without losing necessary information. Thus, 28% of the work was left to be done manually. This metric reflects the effort saved in reuse of the PL-SFTA safety analysis assets. The automation that PLFaultCAT does when pruning for a specific member(s) is sensitive to the number of Boolean AND gates in the fault tree. As a result of the conservative pruning approach, PLFaultCAT will not automatically remove the AND gates as a safety precaution. Thus, PLFaultCAT does more automated pruning for fault trees with fewer AND gates. Despite this, in the PAM case study the automation to manual effort was at least a 3:1 ratio.

168

J. Dehlinger and R.R. Lutz

These results compare favorably to those of an initial case study we performed in [6] on the Floating Weather Station (FWS) product line [13]. Unlike the study presented here, the FWS case study was a smaller, traditional SPL (i.e., not agent-based). In the FWS study, we found that a smaller portion of the PL-SFTA nodes, 45%, was common to all products of the product line. However, like the PAM case study, the FWS study found that PLFaultCAT was able to automatically prune 70% of the nodes that could safely be pruned. The difference in the percentage of common failure nodes in the two studies (45% for the FWS; 54% for PAM) is likely due to the difference in the types of applications in the case studies. That is, the results reported in the FWS study reflect the application of PL-SFTA to a single fault tree for a case study consisting of fewer than 20 requirements. More importantly however, is that the product-line members of the FWS did not share the same safety concerns as they did in the PAM study. Shared safety concerns seem to result in more reuse.

6 Discussion and Implications The agent characteristics of the PAM case study as well as the types of variabilities that were present had a significant impact on our results. The PAM spacecraft will have the responsibility for protecting and healing itself from the possible risks of space exploration. For this reason, each spacecraft is to be equipped with the behavior to protect itself from the types of hazards modeled in the PL-SFTA. Further, the variabilities of the PAM MAS-PL reflected the differing types of scientific investigation possible on the spacecraft and had only a minor impact on the leaf nodes. The implication of this result is that a PL-SFTA may best suit a MAS-PL rather than a traditional SPL since the agents of a MAS-PL will typically also include selfprotecting and self-healing characteristics as commonalities and may have variabilities that are less likely to be safety-critical. For traditional SPLs that have few variabilities that will impact the safety of a system, a PL-SFTA can be applied to achieve comparable results found in this study. For traditional SPLs with a large number of variabilities that can impact the safety of a system, such as the FWS, the reuse of the PL-SFTA safety analysis assets is far more efficient, especially for large SPLs, than to serially construct SFTAs for each of the desired product-line members. A concern for performing safety analysis on critical SPLs is whether the technique is scalable as the SPL grows more complex by incorporating more variabilities and product-line members. From the experience of applying the PL-SFTA to the PAM case study, it appears that our method and tool will scale adequately. This is because most of the added complexity in a large SPL lies in the domain engineering phase when the PL-SFTA is constructed. In [4], we provide a structured process to construct the SFMECA for a MAS-PL from the requirement specifications of the Gaia-PL methodology [2] [3]. Since the construction of the PL-SFTA relies heavily on the aid of a SFMECA, the scalability is at least as robust as that of the SFMECA.

7 Concluding Remarks This paper described and evaluated the structured reuse of SFTA documents for a product line of collaborating spacecraft developed as a multi-agent system. Results

Evaluating the Reusability of Product-Line Software Fault Tree Analysis

169

showed that reuse of the PL-SFTA assets across this safety-critical product line reduced effort over serial construction of SFTAs for each spacecraft, with 54% of the failure nodes common to the 160 unique spacecraft. The implication of this for other safety-critical product lines is that SPLs built as multi-agent systems, which often incorporate adaptive and self-healing behaviors as commonalities, fit especially well with PL-SFTA reuse. More generally, the PL-SFTA evaluation recorded advantages in expanded reuse of safety-analysis assets across a safety-critical SPL. Acknowledgements. This research was supported by the National Science Foundation under grants 0204139, 0205588 and 0541163.

References [1] Clements, P., Northrop, L.: Software Product Lines. Addison-Wesley, Boston (2002) [2] Dehlinger, J.: Incorporating Product-Line Engineering Techniques into Agent-Oriented Software Engineering for Efficiently Building Safety Critical Multi-Agent Systems, Ph.D. Thesis. Iowa State University (2007) [3] Dehlinger, J., Lutz, R.R.: A Product-Line Approach to Promote Asset Reuse in MultiAgent Systems. In: Garcia, A., Choren, R., Lucena, C., Giorgini, P., Holvoet, T., Romanovsky, A. (eds.) SELMAS 2005. LNCS, vol. 3914, pp. 161–178. Springer, Heidelberg (2006) [4] Dehlinger, J., Lutz, R.R.: Bi-Directional Safety Analysis for Product-Line, Multi-Agent Systems. In: ACM SIGBED Review: Special Issues on Workshop Innovative Techniques for Certification of Embedded Systems, vol. 3(4) (2006) [5] Dehlinger, J., Humphrey, M., Padmanabahn, P., Lutz, R.R.: Decimal and PLFaultCAT: From Product-Line Requirements to Product-Line Member Software Fault Trees. In: 29th International Conference on Software Engineering Companion, Minneapolis, MN, pp. 49–50 (2007) [6] Dehlinger, J., Lutz, R.R.: PLFaultCAT: A Product-Line Software Fault Tree Analysis Tool. Automated Software Engineering Journal 13(1), 169–193 (2006) [7] Feng, Q., Lutz, R.R.: Bi-Directional Safety Analysis of Product Lines. Journal of Systems and Software 78(2), 111–127 (2005) [8] Leveson, N.G.: Safeware: System Safety and Computers. Addison-Wesley, Boston (1995) [9] Leveson, N.G., Weiss, K.A.: Making Embedded Software Reuse Practical and Safe. In: ACM SIGSOFT Software Engineering Notes, pp. 171–178 (2004) [10] Liu, J., Dehlinger, J., Lutz, R.: Safety Analysis of Software Product Lines Using StateBased Modeling. Journal of Systems and Software 80(11), 1879–1892 (2007) [11] Schwanke, R., Lutz, R.: Experience with the Architectural Design of a Modest Product Family. Journal of Software Practice and Experience 34(13), 1273–1296 (2004) [12] Sterritt, R., Rouff, C., Rash, J., Truszkowski, W., Hinchey, M.: Self-* Properties in NASA Missions. In: Proceedings International Conference on Software Engineering Research and Practice, Las Vegas, NV, pp. 66–72 (2005) [13] Weiss, D.M., Lai, C.T.R.: Software Product Line Engineering: A Family-Based Software Development Process. Addison-Wesley, Boston (1999)

Feature-Driven and Incremental Variability Generalization in Software Product Line Liwei Shen, Xin Peng, and Wenyun Zhao School of Computer Science, Fudan University, Shanghai 200433, China {061021062,pengxin,wyzhao}@fudan.edu.cn

Abstract. In the lifecycle of a software product line (SPL), incremental generalization is usually required to extend the variability of existing core assets to support the new or changed application requirements. In addition, the generalization should conform to the evolved SPL requirements which are usually represented by a feature model. In this paper, we propose a feature-driven and incremental variability generalization method based on the aspect-oriented variability implementation techniques. It addresses a set of basic scenarios where program-level JBoss-AOP based reference implementations respond to the feature-level variability generalization patterns. It also provides the corresponding guidance to compose these patterns in more complex cases. Based on the method, we present a case study and related discussions.

1 Introduction Software product line (SPL) engineering [1] promises to improve time-to-market, cost, productivity, and quality in a proactive mode, i.e. predicting and implementing all the application variations in advance through domain engineering. However, the proactive approach only suits organizations that can predict their product line requirements well into the future and have the time and resources for a long waterfall-like development cycle [2]. Most of the real SPLs are developed and maintained in a mixed mode of proactive and reactive approaches. Usually, an initial SPL platform involving one or several product variations will be developed first. After that, the SPL will evolve in an incremental and iterative mode. In each iteration, more product variations are recognized and included, and the SPL platform is extended and improved. In the SPL lifecycle, incremental generalization is usually required to extend the variability of the existing core assets to support the new or changed application requirements. The term generalization named by Thum et al. [4] indicates the case when new products are added and no existing products are removed. It is a part of the term refactoring defined by Alves et al. [3] as “a change made to the structure of a SPL in order to improve its configurability, make it easier to understand, and cheaper to modify without changing the observable behavior of its original products”. In our work, we focus on variability increments, e.g. introducing new variation point or variant. It is the generalization that usually aims to add more variations based on the existing core assets. On the other hand, the generalization should conform to the evolved SPL requirements, which are usually represented by a feature model [4, 5, 6, 12, 17]. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 170–180, 2009. © Springer-Verlag Berlin Heidelberg 2009

Feature-Driven and Incremental Variability Generalization in Software Product Line

171

In this paper, we propose a feature-driven and incremental variability generalization method based on the aspect-oriented variability implementation techniques. We identify a set of basic scenarios which address the generalization patterns on the feature model, representing the requirement-level variability evolution, and provide program-level reference implementation based on JBoss-AOP for each of the pattern. Then, we can obtain incremental generalization guidance for more complex variability evolution by composing the basic feature model generalization patterns and deriving the integrated reference implementations. The paper is structured as follows. Section 2 presents the background of variability generalization in the SPL. Section 3 introduces the basic generalization scenarios and their composition guidance. Section 4 demonstrates the method with a case study. Later in Section 5 and Section 6 we present discussions and related work respectively. Finally we conclude the paper and plan the future work in Section 7.

2 Background: Variability Generalization in SPL In our method, we use the standard feature model as shown in Figure 1(a) to capture the feature-level variability evolutions. The three types of variation points are optional, alternative and OR features. On the program level, we employ JBoss-AOP as the reference implementation technique for the variations, since aspect-oriented programming (AOP) [7] has been proven an efficient way for the variability implementation [16]. Moreover, the obliviousness characteristic of AOP [8], which implies that the developer of the initial core assets need not to be aware of and prepare for the future variability evolutions, supports stepwise variability increment without intruding the existing programs. JBoss-AOP [9, 10] is one of the most popular tools in the AOP community. In JBoss-AOP, an aspect is defined by an interceptor and a pointcut. The interceptor is the same as the advice and it is programmed as a Java class (implements org.jboss.aop.advice.Interceptor). The pointcut is defined in XML (jboss-aop.xml) and related interceptors are bound to it. The advantage of applying JBoss-AOP as the implementation technique will be discussed in section 5.1. Feature tangling and scattering complicate the traceability from features in a feature model all along to the program level units, usually classes or methods in an object-oriented language. It means there may be several program units, especially methods, contributing to a single feature, while a single program unit, especially a class, may contain implementations for several features. In our paper, we assume that each feature is implemented by one or more methods in the same class or in different classes, i.e. a one-to-many relationship between features and methods. The variability implementation with JBoss-AOP can be seen in Figure 1(b). For the sake of simplicity, we assume further that one feature corresponds to one method. A mandatory feature is connected through the normal method and constructor invocations. An optional feature is implemented by weaving the method into the base program through an interceptor. The implementation for an alternative feature adopts the inheritance approach, i.e., each method corresponding to a variant should reside in a class that inherits the base class containing the base method and all of them ought to keep the same method signature. Then the feature can be implemented

172

L. Shen, X. Peng, and W. Zhao

by one of its variants through instantiating the variant-related class when customized. We call it object replacement mechanism, which is handled by an aspect. The implementation for an OR feature also follows the object replacement mechanism but it is replaced at runtime. Therefore a selecting program is necessary in the interceptor which helps to determine the variant. In addition, the interceptor codes for implementing alternative and OR features are undetermined at design time since the variant to be chosen and the variants included cannot be decided until the product derivation phase. So, these two kinds of interceptors are suggested to follow the certain implementation templates and they will be reified to support the object replacement when the SPL is customized. Feature model generalization is a transformation performed to increase the variability of a SPL. In [3], Alves et al. propose twelve types of unidirectional feature model refactorings. Their work concentrates on feature-level variability refactoring only, while we focus on feature-driven program-level variability generalization. On the other hand, in this paper we only consider the generalizations which are part of the feature refactoring types that add the variability incrementally, i.e. improve the configurability by introducing new variation points or variants.

Fig. 1. (a) Standard feature model

(b)Variability implementation with JBoss-AOP

3 Basic Generalization Scenarios and Their Composition We identify six basic generalization scenarios, each of which includes a feature generalization pattern and the corresponding reference implementation. Based on the basic scenarios, we provide guidance for scenario composition to handle the more complex cases. Furthermore, we still follow the assumption that one feature corresponds to one method in describing the basic scenarios in Section 3.1, while feature tangling and scattering are taken into consideration in Section 3.2. 3.1 Basic Generalization Scenarios The scenarios are illustrated in Figure 2, where the legends follow those in Figure 1. In particular, the squares filled with color indicate new or modified entities. Besides the graphs, the snippets of the reference implementations when deriving a product are illustrated in Figure 3. The snippets include the method body of method invoke in the interceptor class as well as the XML segment of the aspect binding in jboss-aop.xml.

Feature-Driven and Incremental Variability Generalization in Software Product Line

173

Fig. 2. Six basic generalization scenarios

3.1.1 AddOptional This scenario describes adding a new optional feature B into the feature model. In the program level, a new aspect is introduced. The pointcut is the method (ClassA.foo()) which intends to invoke the new method implementing the added feature. The interceptor thus includes the invocation to the new method which can only be executed before or after the base method. The implementation snippets when feature B is included can be seen in Figure 3(a). 3.1.2 MandatoryToOptional This scenario indicates transforming an existing mandatory feature B to an optional one. It’s another situation to implement optional features besides AddOptional. In the program level, an aspect is introduced to weave into the existing programs. The pointcut points to the method (ClassB.bar()) which corresponds to the optional feature. The interceptor then overrides the target method (not really modify it) by writing an

174

L. Shen, X. Peng, and W. Zhao

Fig. 3. AOP snippets for the atomic generalization patterns

empty method body or simply returning a nonsensical value if the method should have a return value. As a result, the aspect takes effect when the new optional feature is removed in a product, and vice versa. The implementation snippets when feature B is removed are shown in Figure 3(b). 3.1.3 SingleAddVariant This scenario describes adding a variant to an existing single feature, i.e. the feature without variants, to make it alternative. We cannot implement the new alternative feature using the method in Section 2 where there is a super-class acting as a placeholder at the beginning. Explaining the reason using the graph, we can see that the invocation from ClassA to ClassB is fixed, so it’s not practical to alter ClassA to invoke a class created as a super-class and then make ClassB inherit the new class. Therefore, we adopt a new method based on the object replacement mechanism. ClassC is firstly defined as a subclass of ClassB where the method bar() should be overridden. Secondly an aspect is created. Its pointcut is ClassB’s constructor while its interceptor instantiates ClassC to get an object taking the place of ClassB. Once feature C is selected, the aspect should be bound in jboss-aop.xml. Contrarily, the aspect won’t be woven if the original feature B is selected. The interceptor code and xml segments when feature C is bound are shown in Figure 3(c). 3.1.4 AlternativeAddVariant This scenario is captured when a new variant is added to an existing alternative feature. There are two approaches to implement an alternative feature: SingleAddVariant (it becomes alternative after generalization) and the idea from section 2 (it is alternative in the beginning). They are different in the pointcut definition and are illustrated in the second and third rows respectively. However, the underlying generalization method for the two situations is unique, both following the object replacement mechanism. We define the new class (ClassD) as a subclass of the one (ClassX and ClassB in two styles respectively) invoked by ClassA. The pointcut remains the same, pointing to the constructor. The interceptor codes are undetermined since the variant choice decision is unknown at design time. However, the generalization

Feature-Driven and Incremental Variability Generalization in Software Product Line

175

pattern provides the implementation template for the interceptor, which is reified when the SPL is customized. The interceptor template for the reference implementation can be concluded from Figure 3(c). The intercepted class and the chosen variant are the variable points in the template. 3.1.5 AlternativeToOR This scenario transforms an existing alternative feature to an OR feature, i.e. more variants can be included in a product and the choice is made at runtime through a selecting program. Similar with AlternativeAddVariant, two approaches can implement the alternative feature. However, no matter which approach is adopted, the only thing is to modify the interceptor following a new template which contains the selecting logic for the future included variants. There is a set of if clauses each of which has a condition and a corresponding object instantiation sentence. Sometimes the application requires the human interaction to do the decision, so the interceptor may have to provide a user interface to acquire the human’s choice. The reference implementation snippets after SPL customization are shown in Figure 3(d) and the template can be concluded from them. 3.1.6 ORAddVariant This scenario indicates adding a new variant to an existing OR feature. The reference implementation containing the pointcut definition and the interceptor template keeps the same with AlternativeToOR. If the new variant is to be included in the derived product, we just need to ensure that the selecting program in the interceptor embodies the if clause involving the new variant. 3.2 Guidance for the Composition of Basic Generalization Scenarios In reality work, we will come up against situations that cannot be solved based on a single basic generalization scenario but their integration. This is caused by the continuous SPL variability evolution as well as the feature tangling and scattering. In this section, we will put forward an incremental generalization guidance for more complex generalization scenario cases by composing the basic feature model generalization patterns in order and deriving the integrated reference implementation. The phases are as following: (1) Identify the set of ordered basic feature model generalization patterns for each variation point. Under the circumstances of a complicated generalization scenario, the variability evolution may emerge in the different points, while in a certain point the evolution continues. Hitherto, it is an ad hoc phase for the developers to address them. (2) For each variation point, adopt the corresponding reference implementations. In this phase, the pointcuts as well as the interceptors are defined in the program level. One feature model generalization may cause the implementation evolution in several parts due to the feature tangling. Furthermore, the implementations for the last three scenarios involve the same pointcut (the constructor) and the same interceptor, whereas the interceptor implementation is of different templates. In this situation, the interceptor should follow the last template within the set and the pointcut is preserved. (3) Find out the conflicts in reference implementations. Conflicts may appear when the reference implementations of different variant points involve the same pointcut

176

L. Shen, X. Peng, and W. Zhao

and they cannot be arranged in order. The typical situation is that two features with alternative or OR variability map to the methods in the same class, then the object replacement will bring mistakes when the two interceptors are both woven, i.e., the object can only be replaced by the later bound one. Under such circumstances, the implementations cannot be carried out and the developers should be informed to do the additional undesirable work such as modifying the base programs. (4) Customize the variability. When there is no conflict in the reference implementations, the binding states of the feature model variability can be determined, e.g. the optional feature is included or not, which variant is chosen for the alternative feature, etc. In the program level, it indicates the decision whether an interceptor will be bound to a pointcut (first three scenarios), or determine the implementation codes of an interceptor (last three scenarios).

4 Case Study In this section, a simplified case study is presented based on the composition guidance. The typical generalization case is described in the following sentences. Before generalization: an initial library management system provided the free service of borrowing books for the campus students. After generalization: the system was applied for the social use. Public readers were charged and they had to refill money to their accounts. In the beginning, the system received cash only. Along with the diversity in payment manner, the library system allowed readers to refill by credit card or check. Thus readers could choose one of the three kinds of payment when refilling. The generalization scenario is explicit then the feature model generalizations are easy to identify. Figure 4(a) represents the generalization related part in the feature model. We can see that a new optional feature with three variants has been added to the domain. The whole transformation is combined with a set of basic feature model generalization patterns in the following path: AddOptional + SingleAddVariant + AlternativeAddVariant + AlternativeToOR. In the program level illustrated in Figure 4(b), a new class CashRefill and a new interceptor NeedRefill_Interceptor are introduced first according to AddOptional. The interceptor not only helps to connect AccountMngUI and CashRefill, but also adds a widget in the UI that triggers the method invocation (codes in Figure 5a). In the process of the next three reference implementations, another two classes are introduced and both inherit CashRefill. In addition, the interceptor RefillType_Interceptor is initially introduced by SingleAddVariant and at last its implementation code follows the template in AlternativeToOR. In particular, the interceptor includes an inner class providing the UI for selecting. Since the simple traceability in the example, there is no conflict found in the implementations. The variation points are going to be customized. We assume that the feature AccountRefill is bound in a product, and the three variants are all included. Thus the codes of NeedRefill_Interceptor and RefillType_Interceptor as well as the aspect binding in jboss-aop.xml are illustrated in Figure 5a/b/c respectively.

Feature-Driven and Incremental Variability Generalization in Software Product Line

177

Fig. 4. (a) Feature model generalizations (b) reference implementation

Fig. 5. Interceptor codes and XML segments

5 Discussion 5.1 JBoss-AOP versus AspectJ The adoption of JBoss-AOP as the reference implementation technique rather than AspectJ [11], which is another widely used AOP tool, includes three reasons. Firstly, aspects in AspectJ are software entities that define pointcuts and advice codes which have their own syntaxes and keywords. Contrarily, the interceptor (advice) of JBossAOP is written in regular Java classes which are easily coded and reused. Secondarily, the pointcuts of JBoss-AOP are defined and centralized in an XML file. Operating on this unique file can be regarded as a central configuration mechanism for the SPL. However, the pointcuts of AspectJ are defined in the aspects and they are dispersed in the programs. Thus they cannot be managed efficiently. Thirdly, JBoss-AOP supports the dynamic AOP while AspectJ doesn’t. It allows modifying the bindings (the removal of aspects and the weaving of new aspects) at runtime without recompiling. It

178

L. Shen, X. Peng, and W. Zhao

will be a crucial character in our later work that the SPL generalization can be automatically performed at runtime. 5.2 Limitations Limitation of the work can be found in the program-level reference implementation based on JBoss-AOP. Firstly, the last three scenarios cannot handle the static method invocation. Since the classes embodying the static methods don’t need to be instantiated, the object replacement cannot take effect. Secondly, we assume that a class which embodies the methods representing a variant should be either a super-class or a subclass since it’s the premise for object replacement at runtime. Thus, the variantrelated classes have to be predefined with inheritance when the variants are introduced into the domain. Thirdly, we have noticed that there are two implementation approaches in the last three scenarios. It will make the generalization difficult to perform, especially when it grows larger. Hitherto, we have taken note to them and plan to work out solutions in the future work.

6 Related Work Alves et al have explored SPL adoption strategies at the feature model level [3] and at the implementation level [13]. Our work is inspired from [3] while the difference lies in the fact that we only focus on the generalizations which add variability to a SPL incrementally. In [13], the authors propose a method and a tool for extracting a product line and evolving it in the implementation level. The work includes a set of simple programming laws that adopt AspectJ to handle with the variations. Compared with their mechanism, we utilize JBoss-AOP to implement variability. On the other hand, the reference implementations are associated with the feature model generalization patterns in our paper. However, their papers are independent. A feature-oriented refactoring (FOR) approach [14, 15] is proposed to decompose a program into a set of features, which are considered as the increments in program functionality. The purpose of their work is to specify the relationships between features and their implementing modules, especially to describe the SPL variability as program refinements added to the base program. Our work applies generalization to address the configurability increments in a SPL, which is different from the idea of FOR. Howsoever, their formal theory on the program level is believed to be complementary to our future work on automatic SPL generalization.

7 Conclusion and Future Work Software product lines cannot be stable all the time, thus incremental generalization is indispensable to support new or changed application requirements. In this paper, we propose a feature-driven and incremental variability generalization method. In it, a set of basic generalization scenarios are introduced which contain the feature model generalization patterns and program-level reference implementations based on JBossAOP. Composition guidance is also presented to handle more complex generalization cases. In addition, a case study and some discussions are posed.

Feature-Driven and Incremental Variability Generalization in Software Product Line

179

It’s our elementary research so we have concluded our future work to improve the method. Firstly, we will try to solve the limitations mentioned in section 5.2. Secondly, we will improve the flexibility and scalability of our method to handle with other software artefacts like components under a comprehensive traceability support. Thirdly, an automatic SPL generalization is desired to replace the ad hoc approach. A formal basis for the variability generalization is critical. In addition, a supporting tool is expected that it can support the feature modeling for a SPL and perform the program-level evolution driven by the generalizations in feature model automatically. Acknowledgments. This work is supported by National Natural Science Foundation of China under Grant No. 60703092, National High Technology Development 863 Program of China under Grant No. 2007AA01Z125, and Shanghai Leading Academic Discipline Project under Grant No. B114.

References 1. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. AddisonWesley, Reading (2002) 2. Krueger, C.: Eliminating the Adoption Barrier. IEEE Software 19, 29–31 (2002) 3. Alves, V., Gheyi, R., Massoni, T.: Refactoring Product Lines. In: Proceedings of the 5th International Conference on Generative Programming and Component Engineering (GPCE 2006), pp. 201–210 (2006) 4. Thum, T., Batory, D., Kastner, C.: Reasoning about Edits to Feature Models. In: Proceedings of the 31th International Conference on Software Engineering, ICSE 2009 (2009) 5. Kang, K.C., Cohen, S., Hess, J., Nowak, W., Peterson, S.: Feature-oriented domain analysis (FODA) feasibility study. Technical Report CMU/SEI-90-TR-21, Software Engineering Institute. Carnegie Mellon University, Pittsburgh, PA (1990) 6. Kang, D.C., Kim, S., Lee, J., Kim, K., Kim, G.J., Shin, E.: FORM: A feature-oriented reuse method with domain-specific architecture. Annals of Software Engineering 5, 143–168 (1998) 7. Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.M., Irwin, J.: Aspect-Oriented Programming. In: Aksit, M., Matsuoka, S. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997) 8. Filman, R., Friedman, D.: Aspect-oriented programming is quantification and Obliviousness. In: Proceedings of the Workshop on Advanced Separation of Concerns, in conjunction with OOPSLA 2000 (2000) 9. Khan, K.: JBoss-AOP (2008), http://www.jboss.org/jbossaop/ 10. Pawlak, R., Seinturier, L., Retaille, J.: Foundations of AOP for J2EE Development. Apress (2005) 11. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Griswold, W.: Getting Started with AspectJ. Communications of the ACM 44, 59–65 (2001) 12. Batory, D.: Feature models, grammars, and propositional formulas. In: Obbink, H., Pohl, K. (eds.) SPLC 2005. LNCS, vol. 3714, pp. 7–20. Springer, Heidelberg (2005) 13. Alves, V., Matos, P., Cole, L., Borba, P., Ramalho, G.: Extracting and evolving mobile games product lines. In: Obbink, H., Pohl, K. (eds.) SPLC 2005. LNCS, vol. 3714, pp. 70– 81. Springer, Heidelberg (2005)

180

L. Shen, X. Peng, and W. Zhao

14. Liu, J., Batory, D., Lengauer, C.: Feature Oriented Refactoring of Legacy Applications. In: Proceedings of the 28th International Conference on Software Engineering (ICSE 2006), pp. 112–121 (2006) 15. Trujillo, S., Batory, D., Diaz, O.: Feature Refactoring a Multi-Representation Program into a Product Line. In: Proceedings of the 5th International Conference on Generative Programming and Component Engineering (GPCE 2006), pp. 191–200 (2006) 16. Peng, X., Shen, L.W., Zhao, W.Y.: Feature Implementation Modeling based Product Derivation in Software Product Line. In: Mei, H. (ed.) ICSR 2008. LNCS, vol. 5030, pp. 142–153. Springer, Heidelberg (2008) 17. Peng, X., Zhao, W.Y., Xue, Y.J., Wu, Y.J.: Ontology-Based Feature Modeling and Application-Oriented Tailoring. In: Proceedings of the 8th International Conference on Software Reuse (ICSR 2006), pp. 87–100 (2006)

Identifying Issues and Concerns in Software Reuse in Software Product Lines Meena Jha1 and Liam O’Brien2 1

CQUniversity and the University of New South Wales, Sydney, NSW 2052, Australia [email protected] 2 National ICT Australia Limited and the Australian National University, Canberra, ACT 2601, Australia [email protected]

Abstract. One of the reasons for introducing software product lines (SPL) is the reduction of costs through reusing common assets for different products. Developing assets to be reused in different products is often not easy. Increasing complexity due to the multitude of different functions and their interactions as well as a rising number of different product variants are just some of the challenges that must be faced when reusing software and other assets. In an attempt to understand the obstacles to implementing software reuse in SPL we have conducted a survey to investigate how software reuse is adopted in SPL so as to provide the necessary degree of support for engineering software product line applications and to identify some of the issues and concerns in software reuse. This survey also gathers information from SPL practitioners on what influences the selection of software to reuse within a software product line. This paper reports the results of that survey. Keywords: Software Reuse, Software Product Line, Domain Engineering.

1 Introduction The study of software product line addresses the issues of engineering software system families, or collections of similar systems [2, 5, 6]. The objective of a software product line is to reduce the overall engineering effort required to produce a collection of similar systems by capitalizing on the commonality among the systems and by formally managing the variation among the systems. This is a classic software reuse problem [8]. Software reuse has been practiced since computer programming began. Reuse as a distinct field of study in software engineering, however is often traced to McIlroy’s paper which proposed a software industry on reusable components [1]. The primary focus of software product line research has been on a number of issues such as domain analysis and modelling, architecture modelling, process definition, etc [3]. Research in software product lines [9, 10, 11] shows that there are success stories, but there are no strong rules that can be derived from these success stories about how software is reused. Systematic reuse is one of the goals of software product lines, meaning that components must not only be copied but actually shared among several S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 181–190, 2009. © Springer-Verlag Berlin Heidelberg 2009

182

M. Jha and L. O’Brien

subprojects. Also a software product will not only consist of shared components for the whole software product line, but some of the components will be specific to a single product. This means that the ability to reuse software depends on the commonality and variability requirements. Software Product Line practitioners come across many software reuse problems. Up to now the software reuse issues and concerns have not been surveyed, compared and documented in a systematic way. We conducted a survey to determine if domain knowledge is one of the keys to software reuse in SPL, to identify areas of concern for software reuse within SPL and to identify if the reliability of software assets/components is an issue for reuse in SPLs. This survey builds upon our previous survey on software reuse in the conventional software community [4] to identify issues and concerns in software reuse. The remainder of the paper is structured as follows: Section 2 discusses the methodology, activities and milestones adopted in undertaking the survey. Section 3 gives an overview of the questions presented to software engineers, developers, managers and researchers working in SPLs. The key results and analysis are described in detail in Section 4. Section 5 concludes the paper and outlines potential future work derived from the survey.

2 Methodology The survey was email based and carried out in mid 2008. The development and administration of the survey questionnaire and the analysis of the results were conducted in five different activities. These activities are as follows: • Step 1: Gather good candidate questions; • Step 2: Select questions that were of most interest; • Step 3: Determine how the questions would be asked and in which order; • Step 4: Select the people from SPL community whom we would like to contribute to our survey and send them the survey; • Step 5: Analyze the results by putting all responses together to point out areas where the respondents identified issues and concerns.

3 Presentation A total of 51 questions were put together and organized into five subsections: general type (2 questions), reuse measurement type in SPL (16 questions), reuse technical issues in SPL (10 questions), testing and the reliability of reused software in SPL (10 questions), and development environment in SPL for reuse (13 questions). The general type questions were asked to capture the educational and experience level of the respondents. Reuse related questions included, for example, identifying the level of reuse being done in SPL, measuring the reuse level, testing of reused code, development environment used, etc. The survey was targeted specifically at the software product line community and people active in software product line efforts.

Identifying Issues and Concerns in Software Reuse in Software Product Lines

183

4 Result and Analysis 4.1 Section 1: General Questions Of the 29 survey respondents 17% had less then 5 years experience in SPL, 37% had experience of 5-10 years, 37% had experience of 10-20 years and only 6% had more than 20 years experience. The survey result shows the years of experiences for the majority of the respondents ranges from 5 to 20 years. 65% of the respondents were researchers & managers in SPL, 20% were product line architects, 6% each were core asset architects, and 9% were R&D project leaders. Simply put these numbers could mean that we have a more mature population in the sense of working experience in an SPL context. This could indicate that we have gathered a good sample population that could answer our software reuse questions, thus reflecting the average software product line engineer and developer. 4.2 Section 2: Software Reuse Management and Measurement Before an organization fully commits itself to actually convert into a functioning product line organization, the advantages, disadvantages and factors influencing reuse in software product line should be addressed. This section of the survey has captured answers about reuse management and measurement, its advantages, disadvantages and factors influencing reuse in the software product line community. The key benefits of reuse in SPL have been widely accepted and our survey result reflects the same. 100% of the respondents believe that reuse in SPL will achieve increased quality, planned productivity, capture of domain knowledge, and cost reduction. The advantage thus is that reuse is considered to be of strategic importance in organizations adopting SPL practices and tools. Our respondents believe that SPL necessitates effective strategic planning and product line road mapping; market analysis; change from bespoke customer relationships and projects toward market, product and service orientation; effective requirements, scope, and release management of products and services. Moreover, the scope of reuse is broadened from software (code) reuse to the reuse of all domain artifacts. Disadvantages of Software Reuse in SPLs However, the disadvantages associated with reuse in SPL were maintenance cost and start up cost. The major disadvantages respondents have highlighted are complexity – the “gravity” of software engineering: reuse can add complexity by creating dependencies between previously autonomous organizational units. Some of the problems with the dependencies identified by one of the respondents are web of dependencies, coordination cost, cost of offering integration, process and tool divergence. Is Software Reuse domain based? Respondents believe that software reuse “is not sufficiently domain based”. Some of the comments from the respondents include “Documentation may be missing, the trace between requirements and design artifacts may be missing, and the understanding of the interplay between different functions may be missing”, “There is no real way of translating practice into theory”, “There can be widespread reusable software

184

M. Jha and L. O’Brien

with inappropriate design”, “People reuse software without solving architecture mismatches”, “There is too much reusable software that is large grained such as in Service Oriented Architecture (SOA). Loosely coupled software modules are hard to be developed due to the difficulty to define the right interfaces, since SOA usually erodes other quality attributes such as performance and scalability”, and “Software developers are interested in developing software for-reuse but not developing a product with reusable software components”. Reuse might also hinder new ideas and the making of innovations. Balance between innovation and reuse should be determined by the company’s strategy. The opinion that ‘copy and paste’ is THE reuse strategy should be removed from SPL. Respondents are of the opinion that to maintain asset health, regular investments should be done to keep code healthy and keep the number of variations manageable. The Impact of Product Line Engineering Participants were also asked if they feel current software product line engineering practice is influencing reuse. Responses to this question (which are negative) should change the way we do reuse currently. 87% of the respondents feel reuse education will definitely help them learn more about recent reuse technology in making reuse possible in their organization. Also companies need to have development policies that mandate reuse. The companies that are serious about SPL must educate all levels (technical management, higher level management, and developers) about what SPL is, how the company plans to achieve it and what their role will be in making the transition successful. However, more then half of the respondents (55%) feel that work may be increased when they are reusing other’s code. 20% feel that reusing other’s code will not increase their work. However, 26% believe that it will definitely increase their work. 100% of the respondents believe that there is increased recognition of reuse, but software reuse will not be achieved with just education. Reuse Planning The respondents felt that SPL reuse has to be planned in advance. In their view, the central factor is to anticipate common and variable artifacts in the domain of the product line to provide useful reusable artifacts. The alignment of four views of SPL (business, architecture, process and organization) is fundamental for achieving benefits from SPL based product development. According to the respondents reuse is part of SPL, and its facilitators are: high expertise in domain knowledge, technical skills in SPL techniques, tool support, the number of software units produced in an SPL, motivated personnel, and high management commitment and support for integrating domain and technical expertise. Long term strategies and commitments are required. Standards should be established on how to communicate all properties of reusable components. Respondents said that the reuse policy should be tracked and managed by engineering managers and/or quality managers and/or configuration managers; include planned variation and variation mechanisms; include mechanisms to deal with change control conflicts on assets; include rotation of product team members among different products, and/or core asset teams. Respondents agreed that SPL is very challenging to institutionalize as long as the costs of implementing an organization-wide SPL

Identifying Issues and Concerns in Software Reuse in Software Product Lines

185

program are perceived to outweigh the benefits. The most important factor is to achieve the break-even point between costs and benefits as quickly as possible. The Influence of Software Engineering Practices 85% of the respondents felt that software engineering practice influences reuse. Using core assets, the configuration management philosophies, the amount and type of documentation, requirements elicitation, and many other software engineering practices all influence reuse. 90% of the respondents felt the increased recognition of reuse as the software product line success stories have an impact on companies. It was felt that the use of a common software process would promote reuse in a single organization. There is no real benefit in having a common process across companies but there is some benefit in having a common process across products within an SPL. A common process will help if there is an equal emphasis on the technology and the product. A Reuse Repository The responses to the use of a reuse repository improving code reuse were mixed. A clear idea behind the reuse repository in SPL could not be gathered. Several of the respondents commented that a reuse repository would improve code reuse as long as there are proper methods and practices around its use. The fundamental premise of software product lines is that code repositories without the context provided by a software product line do not work. Influence of Organization or Project Size on Reuse The majority of the respondents believe that a company’s divisions or project’s size is not predictive of organizational reuse. People are of the opinion that the size has nothing to do with reuse. This is very contradictory with the theory already established where the complexity of the reuse increases with the size of the project [10]. People are also of the opinion that product line approaches must be tailored to organizations. Size is one of the customization factors; but far from being the only one. A rich (large) domain has potential for reuse but usually has more challenges to coordinate. A small company developing sophisticated and customized products does not benefit from reuse. But a small company that is developing products which have 50-80% of the same components, 20-50% for customized software, reuse and SPL are beneficial. Products targeted into mass-markets have high potential for reusable software components. So, business and target markets are more important than the size of organization. Reuse and Software Quality The majority of the respondents felt that software quality does not inhibit reuse. Most code in each product is already of high quality as it is reused across multiple products. Quality concerns should actually support SPLs. But several of the respondents felt that conflicts on core assets (usually arising because of different stakeholders’ quality concerns) can limit the modification and/or use of core assets within products. Especially for mass marketed, deeply embedded systems where memory and CPU power are very limited due to cost constraints, reuse (and even modular software design) is inhibited. Also for some real-time systems with specific performance requirements reuse might be inhibited. So performance in terms of efficiency and response time are

186

M. Jha and L. O’Brien

most inhibitive for reuse. Of course all the domain artifacts that are explicitly designed (typically in an SPL context) to be reused must be tested extensively to uncover both functional and non-functional flaws. In an environment where reuse is not explicitly and strategically planned, quality problems may seriously hamper reuse. Domain Knowledge and Software Reuse It has been shown that domain knowledge is still the key to reuse of software [10]. All of our respondents also agreed on this question. Without domain knowledge common and variable artifacts cannot be anticipated. Actually in SPL, the domain knowledge is the thing that is reused, not code. However, not every application engineer must have that knowledge if the reuse infrastructure supports the production process adequately. Several of the respondents believe that domain knowledge is necessary but not sufficient. Organizations that have successfully adopted an SPL approach tend to report the solutions to their problems and underplay what they were already doing well. It has been seen that SPL efforts generated from management, architects, and process engineers are ultimately necessary. Domain knowledge is a significant factor, but not the main key. Management commitment and process discipline to follow through on a reuse agenda that’s driven by business goals are more significant. 4.3 Section 3: Software Reuse Technical Aspects in SPL Many respondents believe that software engineers need to change their programming language to promote reuse. For instance, classes and components are key aspects for reuse. The use of high-level programming languages makes it easier to develop reusable software and use it. However, opinion on the importance of the choice of programming language is divided. The first major reuse store were math libraries written in FORTRAN. The large volume of literature on reuse CASE tools and their growing market show that many organizations regard CASE tools as a way to improve reuse. To study this question, participants were asked whether they agreed with the statement, “CASE tools have promoted reuse across projects in their organization”. The data shows that 50% of the respondents generally feel that CASE tools have not promoted reuse across projects in their organization; while 50% agree that they do. Model Driven Tools and environments are the way to go and less so with the traditional CASE tools. In Model-Drive Development (MDD) and Model-Driven Architecture (MDA) reuse happens at the model level. CASE tools can help reuse design artifacts. Tools with round trip reengineering can help reverse engineer code to better understand the properties of software to enable reuse. We conclude that CASE tools are not currently very effective in promoting reuse. 50% of the respondents said the SPL community is using CASE tools and the other 50% said they are not using them. However approaches such as SOA and some product line tools promote reuse [10]. When asked if given an opportunity to build from scratch or reuse respondents believe that in a product line organization reuse is built in and is clearly not an option. Arbitrarily reusing assets that have not been prepared and proactively planned to be reused for a given task is of course problematic as we know from ad hoc reuse experiences. Respondents are of the opinion that if they find reusable software that fits to the architecture they would prefer to reuse rather than build from the scratch. So, the

Identifying Issues and Concerns in Software Reuse in Software Product Lines

187

architecture is the main key issue that facilitates reuse. However one very interesting response was received: “Reuse can be done effectively if we know (1) which domain artifacts are available and (2) we can trust the domain artefacts to do the things we want them to do, why not reuse them? The issue then is do we know and how can we know these issues.” This is a very challenging question posed to SPL community. All the participants are of the opinion that domain knowledge is the key to reuse of software. One cannot decide what software to build as reusable software without understanding the domain and identifying common functionality/features that can be developed as reusable software. According to our respondents the main advantages of the SPL approach are: • The product line’s wide engineering vision can be shared among the projects easily • Development knowledge and corporate expertise can be utilized efficiently across projects, and • Assets can be managed systematically. The SPL approach often requires a large upfront capital investment to create an organizational structure dedicated to implementing a reuse program and it takes time to see a return on investment [9]. 4.4 Section 4: Testing and the Reliability of Reused Software In spite of the enthusiasm of the components community, finding in a library of components, the precise one that will solve the problem at hand is nearly an impossible task. The problem is two fold. Firstly, we cannot find the perfect component in the library of reusable parts. Secondly even if we find the perfect component, is the component reliable enough to use. Having a documented architecture of the software can improve understanding of a system and show how system reliability depends on the components reliability and its interfaces. The behavior of the software with respect to the manner in which different modules of the software interact is defined through the software architecture. We considered several responses in our survey in investigating this issue. Participants were asked to rate their agreement with the statement, “Software developed elsewhere is reliable” and were also asked, “Do you test a component in any way before you reuse it?” Our survey has shown a strong correlation between these variables suggesting that quality concerns are very much related to amount of external reuse. Almost 63% of our respondents stated that they had some element of reuse in their code but only 40% of these claimed that they tested the reused code in any way. Usually people spend more time testing product specific components. And usually the problems that are discovered are related to reusable components that were not properly adapted for a specific product. When the participants were asked if they “understand system and component reliability, their interactions and the process to identify critical components?”, the respondents said that the architecture of the system helps in the understanding of the above mentioned attributes. There are several piece of research that document some of the methods typically used (and demanded by customers) in certain domains for example in the automotive area. Examples include FMEA (Failure Mode and Effects Analysis), FTA (Fault Tolerance Analysis), and the upcoming ISO WD 26262. In

188

M. Jha and L. O’Brien

some cases scenario-based approaches like ATAM (Architecture Tradeoff Analysis Method) [5] have been used to identify critical components of a system. In most SPL teams the identification of critical component is based on experience. The understanding of the critical components is mainly derived from source code and documentation, or actual testing of the component. There is no systematic/formal process to understand system and component reliability and the components interactions. System reliability is determined by (field or internal) defect reports – component reliability is determined by how those defect reports are caused by any component. Informal architectural reviews are used to analyze systems and component interactions. The majority of the survey respondents have stated that to find reliable components an architecture review is the answer. When the participants were asked if they “test a single component, class or core component in any way?” the respondents said that single classes are typically not tested. Most of the respondents said they do not test single components by themselves. Sometimes single components are tested, but most of the time some integration with other components is already done and the integrated components are tested. Sometimes unit testing is performed on an assembly of components, as the simplest “unit” of test. Unit testing is not widespread. One of the respondents said that perhaps 10% of the components have some sort of unit test. JUnit, Window Tester, httpUnit, and Rational Test Manage are the main test frameworks used in SPL. A traditional type of structured approach is used when testing software which involves unit and integration testing. Unit testing focuses on internals of a component. Integration testing is carried out on the combined usage of components. The same testing is done at the model and code levels. Some respondents also said that they don’t do component testing as they have trust in the Eclipse process. Some of the respondents are of the view that, because a component is always developed from the requirements of a product, it must be tested to determine if it fulfils those requirements. Reusing software components may also alter time allocated in the software development phases. If a component is already available for reuse then the analysis and design phase time can be minimized. Test cases can be reused if documented properly and in sync with the current version of the core asset. The results we have collected say that between 5%-20% of the time is spent in the analysis phase, 5%-60% of the time is spent in the design phase, 0%-50% in the implementation phase and 10%-30% for the testing phase. Some of the respondents said they spend 0% of time in implementation and 50% in testing. 4.5 Section 5: Development Environments for Reuse Survey participants were asked what sort of development environment they usually use. The majority of the respondents preferred Java Eclipse, Netbeans or a simple programming IDE, e.g. JCreator. Some of the respondents use a model based approach, UML2 as a standard modelling language extended by ontology oriented design models and tools. Nowadays, Eclipse is used quite a lot as a tooling platform. A specific environment for embedded systems/microcontrollers e.g., Java projects Eclipse and for PC-based applications typically Visual Studio is used. One of the responses from a project designer was that the environment or programming language

Identifying Issues and Concerns in Software Reuse in Software Product Lines

189

depends on the Programmable Logic Controller (PLC) used, as every PLC brand has its own environment for the software created for that PLC. When asked if in their opinion, the choice of framework (e.g., .NET, EJB, CORBA, etc.) affects the possibility for the software to be easily upgradeable in the long term, the responses we received were mixed. One of our respondents was not familiar with these frameworks, some did not agree and the rest believed that the mainstream frameworks would probably be more upgradeable than others. Most of our respondents also agreed that complexity of a component does not affect the decision to develop or reuse the component. If a component has been previously tested, it is used regardless of its complexity. However, one of our respondents believed that complexity and domain knowledge could have an effect on such decisions but this really depends on the business strategy. We received a very positive response on the need of well documented software architecture of the system in order to reuse the code. Well documented software/system architectures are very important to support decision making about reuse and to correctly integrate the different components to reduce the testing time. However, few of the respondents believe that as long as domain expertise remains adequate it is not essential to have architectural details. Experts are good at storing domain knowledge. Architecture helps a lot in reuse but may not always be the case. This leads us to a question what if the experts leave an organization. This is an important unsolved issue with legacy systems where experts have left and organization and the documentation is out of sync [7]. All of our respondents believe that most software product lines must be maintained and must evolve over time. Core asset architect respondents believe that maintaining and evolving core assets and integrating core assets in application engineering are the biggest problems for SPL.

5 Conclusion and Future Work In this paper, the issues and concerns of software reuse in SPL were gathered from a survey of members of the SPL community. Some of the key findings are that domain knowledge is not the only key to reuse in SPL. While many assets within the SPL can be reused, priority should be given to the product line architecture. It is not essential to have a well documented architecture for reuse in SPL as SPL has a well defined reuse process. However an SPL process should enforce documentation of the architecture. The issue with the reuse is that the variability should be described in domain requirements artefacts and adequate traceability of the artefacts within and between interacting domain and application requirements engineering life-cycle needs to be maintained. Also there is no systematic/formal process to understand system and component reliability and the components interactions. Software architecture may help identify critical components Existing tools and techniques do not address some of the most important practical needs in software product lines. We believe this paper will be of interest to software product line practitioners as well as those organizations struggling with reuse as they will gain the following from the paper:

190

M. Jha and L. O’Brien

• Software architecture can be used in a product line setting to support better identification, reuse, and integration of reliable components. Documentation of software architecture requires a substantial amount of effort. • The list of issues and concerns given in this paper can be used to implement a new software reuse process in SPL. • Current product engineering and derivation processes in SPL can be analyzed for systematic reuse of software in the SPL. Today’s software product lines can be large and have complex variability that they must be supported by a Knowledge Base Software Reuse System (KBSRS), otherwise a systematic approach to software reuse may not be possible. Part of our future work is based on the development of a KBSRS that will be used to capture expertise on software reuse within organizations. We have previously conducted a similar survey to find issues and concerns of software reuse in the conventional software engineering community. We will be comparing the findings of this survey and the previous survey to compare both communities and to identify lessons that can be shared across both communities.

References 1. McIlroy, D.: Mass Produced Software Components. In: Buxton, J.M., Naur, P., Randall, B. (eds.) Software Engineering Concepts and Techniques: Proceedings of the NATO Conferences, Petrocelli/Charter, pp. 88–98 (1969) 2. Hoffman, D.M., Weiss, D.M.: Software Fundamentals: Collected Papers by David Parnas. Addison-Wesley, Reading (2001) 3. Torkar, R., Mankefors, S.: A Survey on Testing and Reuse. In: Proceedings of the IEEE International Conference on Software-Science, Technology & Engineering, p. 0-76952047-2/03 (2003) 4. Jha, M., O’Brien, L., Maheshwari, P.: Identify Issues and Concerns in Software Reuse. In: Proceedings of Second International Conference on Information Processing (ICIP 2008), Bangalore, India (2008) 5. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. Addison Wesley, Boston (2002) 6. Birk, A., Heller, G., et al.: Product Line Engineering: the State of the Practice. IEEE Software 20(6), 52–60 (2003) 7. Jha, M., Maheshwari, P.: Reusing Code for Modernization of Legacy Systems. In: IEEE International Workshop on Software Technology and Engineering Practice (STEP), Budapest, Hungary, September 24-25 (2005) 8. Kruger, C.W.: Software Reuse. ACM Computing Surveys 24, 131–183 (1992) 9. Weiss, D.M., Lai, C.R.: Software Product Line Engineering: A Family-Based Software Development Process. Addison-Wesley, Reading (1999) 10. Pohl, K., Bockle, G., Linden, F.v.d.: Software Product Line Engineering: Foundations, Principles, and Techniques, 1st edn. Springer, New York (2005) 11. Linden, F.v.d.: Software Product Families in Europe: The ESAPS & CAFÉ Projects. IEEE Software 13(3), 41–49 (2002)

Reuse of Architectural Knowledge in SPL Development Pedro O. Rossel1,2 , Daniel Perovich1, and Mar´ıa Cecilia Bastarrica1 1

2

CS Department, Universidad de Chile {dperovic,cecilia}@dcc.uchile.cl Dept. Ingenier´ıa Inform´ atica, Univ. Cat´ olica de la Sant´ısima Concepci´ on [email protected]

Abstract. Software Product Lines (SPL) promote reuse within an application domain in an organized fashion. Preimplemented software components are arranged according to a product line architecture (PLA). Balancing possibly conﬂicting quality attributes of all potential products makes PLA design a challenging task. Moreover, if quality attributes are part of the variabilities of the SPL, then a unique PLA may result highly inconvenient for particular conﬁgurations. We consider the PLA as a set of architectural decisions organized by the features in the Feature Model. A particular product architecture (PA) is deﬁned as the subset of decisions associated to the chosen features for the product. Architectural knowledge is then reused among products and when new features are required in the SPL. Variability at the quality attribute level will impact the style of the resulting architecture, thus choosing diﬀerent quality features will produce PAs following diﬀerent styles, even within the same SPL. We use MDE techniques to operationalize this procedure and we illustrate the technique using the case of a Meshing Tool SPL.

1

Introduction

Software product lines (SPL) are frameworks for organized and planned reuse of core assets within a particular application domain [5]. Software components are typical reusable assets, but there are several others as well: software requirements, documentation, test cases, and most importantly the product line architecture (PLA). A PLA identiﬁes the variabilities of the SPL at the design level: some components are optional, some others are mandatory, and for others there is a series of diﬀerent alternatives that may be chosen. A PLA also deﬁnes the structure that is shared by all products in the SPL, and as such, it has a determinant impact on the quality attributes the products exhibit. We have worked with a meshing tool SPL for a couple of years and we have deﬁned a PLA for it [1], considering variability at the functional level, as most authors

The work of Pedro O. Rossel was partially supported by grant No. UCH 0109 from MECESUP, Chile and by grant of the Departamento de Postgrado y Post´ıtulo de la Vicerrector´ıa de Asuntos Acad´emicos of Universidad de Chile. The work of Daniel Perovich has been partly funded by CONICYT Chile.

S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 191–200, 2009. c Springer-Verlag Berlin Heidelberg 2009

192

P.O. Rossel, D. Perovich, and M. C. Bastarrica

do. This PLA seems to work well for most cases, and its tiers style [3] makes it nicely portable and extensible, not necessarily penalizing its performance [6]. But there are circumstances where variability should be at the quality attribute level, and then an explicitly deﬁned PLA may not be the best solution for all the products in the SPL. For example, in the case of meshing tools, we have found that when parallel implementations for the algorithms are required [9,17], then not only a diﬀerent deployment view is required to show the distributed setting, but also new components in charge of dividing the mesh among diﬀerent processors and synchronizing the results are required as part of the tiers architecture too. Therefore the PLA cannot be reused directly as it is, and to the best of our knowledge there is no systematic approach for adapting the PLA to new circumstances derived from changes in quality attributes once it is designed. In this paper, we assume that the domain model is deﬁned using a Feature Model (FM) [8,11], and each product in the SPL can be characterized with a feature conﬁguration model (FCM), i.e. a FM where all variabilities are resolved. Features in the FM correspond to critical functional and quality requirements documented in separate artifacts, hence, we consider variability at both functional and quality level. Following the approach in [18], we consider the PLA as a set of architectural decisions organized by the features in the FM, and each particular product architecture (PA) is deﬁned as the subset of decisions associated to the chosen features for the product [15]. Architectural knowledge is then reused among products and when new features are required in the SPL. Variability at the quality attribute level will impact the style of the resulting architecture, thus choosing diﬀerent quality features will produce PAs following diﬀerent styles, even within the same SPL. We code these decisions as model transformations, so each PA could be built as a sequence of model transformations associated to the corresponding FCM [7]. The case of the meshing tool SPL is used along the paper to illustrate our proposal. The case study was implemented extending the toolset developed in a previous work [16] using the same technologies. Section 2 discusses related work. Section 3 describes how the PLA is deﬁned as a set of model transformations, and how particular PAs are derived. Section 4 summarizes some conclusions and presents ongoing and future work.

2

Related Work

In a previous work [15] we have built software architectures by resolving requirements one by one and recording the resulting rationale as the sequence of model transformations. In [16] we applied this approach for automating the complete application engineering stage in the context of software product lines. We dealt with variability at the functional level exclusively, so the resulting PLAs were guided by a single architectural style shared among all products. In this paper, we extend these previous work by considering variability at the quality level. Then, diﬀerent architectural styles may guide the design structure of diﬀerent products, making apparent the need to structure and document architectural knowledge so reuse is feasible.

Reuse of Architectural Knowledge in SPL Development

193

Matinlassi shows his QAMT technique in [13] for transforming architecture models to new models according to deﬁned quality requirements. This approach is similar to ours, but it does not explicitly document which requirements are changing, and therefore this knowledge is only kept by the architect. In [21] a systematic approach for dynamical reconﬁguration of the PA for mobile settings is presented and they apply MDE techniques for automating this approach [20]. However, dynamically reconﬁguring the PLA for them does not mean to change the PLA style since quality attributes are not change drivers. Also the binding time for mobile applications is runtime, while for meshing tools is design time, and thus we have the opportunity to adapt the PA to the particular needs in advance. The approach followed in [4] is similar to ours because they also use feature models and transformations for product derivation, but they focus on functionality conﬁguration directly building products and not stressing architectures. Our goal is to build the PA that best ﬁts the selected quality attributes. In [12], a procedure is provided for building a PLA based on the quality attributes, and using architectural patterns as we do, but they do not focus on how particular quality requirements impact the architecture. On the other hand, in [14] the QRF method is proposed for establishing traceability between quality attributes and the PLA, but mainly based on goals. In [2], the Category Theory concept of commuting diagram is used to expose the foundation for MDE, SPL and Computation Design. A set of arrows are associated to features, and a feature model indicates which are the valid combinations of feature usage. They apply MDE to deﬁne a DSL that eases the generation of some of these arrows. While we use feature models for the same purpose, in our approach artifacts are actually models, and we deﬁne the arrows that build the PAs in terms of model transformations, not as meta-transformations that generate the arrows. Combining SPL and MDE is also our goal. [19] applies FeatureOriented MDD to architectural synthesis. They use a XML-based language to represent decisions, they associate a model transformation to each feature, and particular architectures are synthesized by composing transformations. In contrast to our approach, they use features to represent only functionality, and they require transformations to respect commutativity diagrams. In our work, we require that transformation rules are coded only considering the context of the associated feature.

3

Documenting the PLA Implicitly

The goal of our approach is to enable architectural knowledge reuse and to automate the development of the Application Design stage. 3.1

Assumptions and Rationale

Features represent functionality and quality attributes. Commonalities and variabilities in a SPL usually deﬁne the expected functionality of the products while the expected quality attributes are shared among all products. We also consider

194

P.O. Rossel, D. Perovich, and M. C. Bastarrica

quality attributes as part of the feature model since in several settings they may also be considered as variabilities. In all cases, we assume that requirements represented by each feature are speciﬁed in a separate artifact. Features lead architecture construction. Domain Design traditionally focuses on the construction of the PLA that embodies the critical design decisions that address functionality and quality, and also commonalities and variabilities of the SPL. In our approach, we organize these decisions in terms of the features in the Feature Model, which in turn, guide the compositional structure of the product architecture. Each feature that may be selected as part of a product inspires a set of architectural decisions that guides the construction of part of the PA that includes that feature. Decisions are made locally to each particular feature, mainly considering its close context, i.e. its direct member features or siblings. The more local to the feature the decision, the more reusable. However, certain architectural decisions may depend on features which are not close, creating a kind of dependency among the involved features. Record the architecting activity, not the architecture. In our approach, we record the product line architecting activity instead of the PLA. For each feature in the Feature Model, we preserve the set of decisions involved in providing this feature by the architecture. Such decisions are explicitly recorded as the set of actions that must be performed on a PA to support the feature. These actions are described in terms of model transformation rules that output a fragment of the PA model when the particular feature is present in the product. Then, the whole set of model transformation rules corresponding to the features in the Feature Conﬁguration Model constitutes the core of the model transformation that produces a particular PA. Incrementally develop the SPL. In our process, the (implicit) PLA can be built incrementally. While a complete Feature Model is usually built during Domain Analysis, the associated design decisions can be produced incrementally by addressing only those features that are required by each particular product under development and afterward they can be reused in subsequent products including the same features. Our modularization strategy not only favors incrementality, but also evolvability as changes in the SPL scope have restricted impact on other developed artifacts besides the Feature Model itself. The development eﬀort is greater for the ﬁrst products as decisions for all commonalities must be developed as they will participate in all products. 3.2

Model-Driven Architecture Development

Domain Analysis. The metamodel we use for building the Feature Model is that proposed by Czarnecki et al. [8]; we depict it in Figure 1. All Features in the Feature Model have distinct names and may have attributes and composing members. Root features are used to modularize the model; they cannot be members of other features, and exactly one of them must be marked as main in the model. Solitary and Grouped features represent those that are

Reuse of Architectural Knowledge in SPL Development

195

Feature Model Metamodel 0..*

name : String value : String

1

1

0..*

name : String

attributes

members

main : Boolean

lower : Natural upper : UNatural

1

lower : Natural upper : UNatural 1

0..* members

0..*

Fig. 1. Feature Model Metamodel proposed by Czarnecki et. al. [8] Symbols

Meshing Tool

Mandatory feature Optional feature

Command Language

Generate Mesh Processing Mesh Processing Distribution initial mesh Response Time

Geometry

User Interface CSG Form Fill−in

Menu Selection Direct Manipulation

Alternative features Inclusive−or features

dd−rep b−rep

2D

3D

Distributed

Non−Distributed

Delaunay 2D Quadtree Sphere−packing Advancing front 2D

Octree

Delaunay 3D Advancing front 3D

Intersection based approach

Meshing Tool

Algorithms

Output Format

Refine Optimize PLC PLSG Move Derefine Boundary Improve Polyhedra with shared vertices

Mesh

2D Mesh

3D Mesh

Triangle Medit Tool

2D Mixed 3D Mixed Tetrahedral element element Quadrilateral Hexahedral

Fig. 2. Feature Model for Meshing Tools

ungrouped and grouped, respectively. Members of composed features can be Solitary, Reference to a particular Root feature, or Group. A Group consists of a set of Grouped or Reference features. Variability is represented by the cardinality: for Solitary features, cardinality indicates how many times it can be used to compose the owner feature, and for Groups, cardinality indicates how many members can be actually used. Figure 2 describes a FM including six functional areas involved in Meshing Tools. The User Interface feature represents all possible user interfaces for a product. Geometry indicates diﬀerent mechanisms to load into the tool a representation of the object to be modeled, in diﬀerent input formats. Generate initial mesh provides several algorithms for transforming an input geometry to a Mesh. These algorithms generate either 2D or 3D meshes. The initial mesh could need to be modiﬁed, both in quality and size of its elements; here we can use diﬀerent Algorithms. Finally, the mesh can be saved in diﬀerent Output Formats. We also included two quality attribute requirements: Mesh Processing Response Time and Mesh Processing Distribution. The former

196

P.O. Rossel, D. Perovich, and M. C. Bastarrica Feature-to-Architecture Transformation Rule Metamodel

1

0..* declarations

1

{xor} calledRule

0..1

(from ATL-0.2) top

1

0..* rules

name : String members

helper

1

owner 0..1

0..1

(from ATL-0.2)

0..*

0..1 filter

(from OCL::Expressions)

variables

(from ATL-0.2)

0..*

0..1

outPattern

(from ATL-0.2)

0..1

actionBlock

(from ATL-0.2)

Fig. 3. Feature-to-Architecture Transformation Rule Metamodel

is mandatory because all products must satisfy a speciﬁed mesh processing response time, and the latter implies a mandatory choice between Distributed and Non-Distributed, i.e. a variability at the quality level. We use the FeaturePlugin to deﬁne this Feature Model and we developed a text-to-model transformation which transforms the XML ﬁle produced by the FeaturePlugin to the corresponding model of the metamodel in Figure 1. Product Analysis. The goal of Product Analysis is the selection of the desired features for a particular product. These features are selected from those provided by the SPL, considering variability constraints. Thus, a Feature Configuration Model (FCM) deﬁnes which conﬁguration of the FM represents the product to be developed and consists of Features composed by subfeatures which are valid with respect to all the constraints of the FM. For the meshing tool case we will consider a 2D mesh with one input and one output format. We will choose Improve, Reﬁne and Optimize algorithms. We will also consider both cases for distribution, as two diﬀerent tool conﬁgurations. Domain Design. The goal of Domain Design is to make the critical decisions about the PLA. Architectural patterns are used in order to address the quality and functional requirements. Provided that features in the FM represent functional and quality aspects, we follow the tree-structure of such a model to modularize architectural decisions. Our approach records for each feature the architectural decisions that are made to address the functionality and quality variability represented by such a feature in the architecture. The architectural decisions made during Domain Design are recorded as fragments of a compound model transformation. Each fragment consists of a set of rules encapsulating the knowledge of how to build the PA when the feature is present in the FCM. Domain Design builds a Feature-to-Architecture Transformation Rule artifact, expressed in terms of the metamodel in Figure 3. A PLA element is formed by a set of declarations and a top feature. Each Declaration corresponds to a general declaration that can be used by the rules attached to each feature. Features have distinct names, and are organized in a tree-structure in-

Reuse of Architectural Knowledge in SPL Development

197

spired by the FM. The name of the Feature is used for matching purposes with the features in an input FCM. Each Feature has a set of rules to indicate how to aﬀect an output PA when the given feature is present in a FCM. Declaration and Rule metaclasses are abstract for portability purposes. Specializations of the metamodel can be made, targeting diﬀerent model transformation technologies. In Figure 3, we also illustrate one of such specializations targeting the AtlanMod Transformation Language (ATL) [10]. An ATLDeclaration can include either a CalledRule or a Helper, both metaclasses of the ATL metamodel. A particular ATLRule consists of: (i) a ﬁlter OCLExpression to distinguish among diﬀerent cases of the input feature (e.g. whether a particular child feature is present or not), (ii) various RuleVariableDeclarations for rule-speciﬁc constants, (iii) an OutPattern indicating the elements in the target Product Architecture model that must be present, and (iv) an ActionBlock for imperative actions for the rule. These metaclasses are deﬁned in the ATL metamodel and they conform the core composing elements of a general ATL rule in such a metamodel. Product Design. The goal of Product Design is to deﬁne the PA for the particular product being developed, considering its desired features deﬁned in the FCM. The architectural decisions made during Domain Design must be used to produce the PA; the set of the model transformation rules corresponding to the features included in the product under development are used to derive the PA. To this end, we developed and applied a meta-transformation to transform the particular Feature-to-Architecture Transformation Rule artifact developed during Domain Design, that produces a Feature-to-Architecture Transformation. This meta-transformation is independent of any particular SPL project, and only depends on the MDE technology used to express the rules attached to features. Although to deﬁne this meta-transformation requires considerable eﬀort, once developed it can be reused. The derived Feature-toArchitecture Transformation is then applied to the FCM to obtain the particular PA. By this means, the Product Design activity is fully automated. The particular rules associated to the features in the meshing tool FCM are as follows. Mesh generates a component and an interface, with the component’s internals according to the chosen subfeatures. Similarly, GenerateInitialMesh and OutputFormat generate a component and an interface. Algorithms generates a component and its chosen subfeature, in this case Reﬁne, Improve and Optimize, add an interface for each of them. The Geometry feature generates a Geometry component and connects it to GenerateInitialMesh following the client-server pattern. UI generates a component for UI and adds the corresponding required interfaces depending on the interfaces provided by Algorithms. The rule for Non-Distributed organizes the components according to the 3-tiers and blackboard patterns by connecting UI with the interfaces in the second tier (the knowledge sources), and the second tier with the third one, that in this case is formed by the Mesh that plays the role of the blackboard. The resulting PA is the one shown in Figure 4, that is similar to the one in [1]. In the case the Distributed feature is chosen instead, almost all rules may be reused, and only the one referring distribution itself would be diﬀerent, organiz-

198

P.O. Rossel, D. Perovich, and M. C. Bastarrica

IGeometry

IRefine

IOptimize

IImprove

IGeometry

IRefine

IGenerate

IGenerate

IImprove

IMesh

IOutput

IOptimize

IMesh

IOutput

IMesh

IMesh

Fig. 4. Non-distributed product architecture

IGeometry

IGeometry

IGenerate

IRefine

IRefine

IGenerate

IImprove

IImprove

IOptimize

IOptimize

IOutput

IRefine

IMesh

IOutput

IRefine

IImprove

IImprove

IOptimize

IOptimize

IMesh

IMeshExchange IMesh

IRefine IImprove IOptimize

IMeshExchange

IRefine

IRefine IImprove IOptimize

IImprove

IMesh

IOptimize

IMesh IMesh

Fig. 5. Distributed product architecture

ing the PA following another pattern. In this case a relaxed 4-tier architecture is built, where a master-slave [3] pattern is followed in the second tier, and the Mesh as a blackboard in the fourth tier is connected with knowledge sources in both, the second and the third tier. The rule would then add a Master Tool component in charge of dividing the mesh, distributing it among the Slave Tools, and combining the results. It also adds the Slave Tool component that is in charge of applying the algorithms to a part of the Mesh. There may be several of these components at runtime, and as such there may be several of these processes in a process view. The rule also connects the UI with the non-distributed knowledge

Reuse of Architectural Knowledge in SPL Development

199

sources and the Master Tool, and the Master Tool with the Slave Tool between each other and with Algorithms. It also connects the knowledge sources with the Mesh, applying blackboard. The resulting PA is depicted in Figure 5. In the case where the distributed subfeature is chosen, the rule not only adds elements to the C&C view, but also modiﬁes the process and deployment views. In the latter a Master Node and a Slave Node are added, and where the only client Master Node is connected 1 to many with Slave Node. All components but the Slave Tool are deployed to Master Node, and Slave Tool is deployed to Slave Node. There will be a unique process running in Master Node and one process in each Slave Node.

4

Conclusions and Future Work

Most traditional approaches to SPLs build the PLA using the FM as a basis, but not necessarily associating features with concrete software modules [8]. We follow a similar approach but we also include quality attributes as potentially variable features. Designing a PLA that considers functional variabilities of all potential products in the SPL is a complex task, and considering variabilities at the quality level is even more complex. We provide a systematic approach to deﬁning an implicit PLA that considers both, functional and quality variabilities, and that also enables automatic PA generation. This implicit PLA can be incrementally built. Only those features that are part of already built products need to have their associated rules. This characteristic also allows for evolution in the SPL scope without loosing the design eﬀort already invested: as new features are added or modiﬁed, only their rules need to be added or updated, respectively. However, in certain cases, these changes may need to review the rules aﬀecting other features, but the divide-and-conquer strategy used for the design phase, makes this task easier. In all cases, the architectural knowledge associated to each feature can be more easily reused. One of the drawbacks of our approach is that the implicit PLA cannot be assessed with traditional methods. Therefore, we can only asses particular PAs, and this may be risky and costly. We have already addressed the automation of the complete application engineering stage in a previous work but only considering variability at the functional level. We still need to complete this stage when qualities vary too. Improved tooling is also needed.

References 1. Bastarrica, M.C., Hitschfeld-Kahler, N., Rossel, P.O.: Product Line Architecture for a Family of Meshing Tools. In: Morisio, M. (ed.) ICSR 2006. LNCS, vol. 4039, pp. 403–406. Springer, Heidelberg (2006) 2. Batory, D.S., Azanza, M., Saraiva, J.: The objects and arrows of computational design. In: Czarnecki, K., Ober, I., Bruel, J.-M., Uhl, A., V¨ olter, M. (eds.) MODELS 2008. LNCS, vol. 5301, pp. 1–20. Springer, Heidelberg (2008)

200

P.O. Rossel, D. Perovich, and M. C. Bastarrica

3. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: PatternOriented Software Architecture. A System of Patterns. John Wiley & Sons, Chichester (1996) 4. Cirilo, E., Kulesza, U., de Lucena, C.J.P.: A Product Derivation Tool Based on Model-Driven Techniques and Annotations. Journal of Universal Computer Science 14(8), 1344–1367 (2008) 5. Clements, P., Northrop, L.: Software Product Lines: Practices and Patterns. SEI Series in Software Engineering. Addison-Wesley, Reading (2001) 6. Contreras, F.: Adapting a 3D Meshing Tool to a New Architecture. Master’s thesis, Computer Science Department, Universidad de Chile (2007) (in Spanish) 7. Czarnecki, K., Antkiewicz, M., Kim, C.H.P., Lau, S., Pietroszek, K.: Model-Driven Software Product Lines. In: OOPSLA 2005 Companion, pp. 126–127 (2005) 8. Czarnecki, K., Helsen, S., Eisenecker, U.W.: Staged Conﬁguration Using Feature Models. In: Nord, R.L. (ed.) SPLC 2004. LNCS, vol. 3154, pp. 266–283. Springer, Heidelberg (2004) 9. Jones, M.T., Plassmann, P.E.: Parallel algorithms for adaptive mesh reﬁnement. SIAM Journal on Scientiﬁc Computing 18, 686–708 (1997) 10. Jouault, F., Kurtev, I.: Transforming Models with ATL. In: MoDELS Satellite Events, pp. 128–138 (2005) 11. Kang, K.C., Cohen, S.G., Hess, J.A., Novak, W.E., Peterson, A.S.: FeatureOriented Domain Analysis (FODA). Feasibility Study. Technical Report CMU/SEI-90-TR-21. Software Engineering Institute (November 1990) 12. Kim, J., Park, S., Sugumaran, V.: DRAMA: A framework for domain requirements analysis and modeling architectures in software product lines. Journal of Systems and Software 81(1), 37–55 (2008) 13. Matinlassi, M.: Quality-Driven Software Architecture Model Transformation. In: WICSA 2005, pp. 199–200 (2005) 14. Niemel¨ a, E., Immonen, A.: Capturing quality requirements of product family architecture. Inf. and Software Technology 49(11-12), 1107–1120 (2007) 15. Perovich, D., Bastarrica, M.C., Rojas, C.: Model-Driven Approach to Software Architecture Design. In: SHARK 2009, May 2009, pp. 1–8 (2009) 16. Perovich, D., Rossel, P.O., Bastarrica, M.C.: Feature Model to Product Architectures: Applying MDE to Software Product Lines. In: WICSA/ECSA 2009, September 2009. IEEE CS Press, Los Alamitos (2009) 17. Rivara, M.C., Calder´ on, C., Fedorov, A., Chrisochoides, N.: Parallel decoupled terminal-edge bisection method for 3d mesh generation. Engineering with Computers 22(2), 111–119 (2006) 18. Taylor, R.N., Medvidovic, N., Dashofy, E.M.: Software Architecture. Foundations, Theory, and Practice. John Wiley and Sons, Chichester (2009) 19. Trujillo, S., Azanza, M., Diaz, O., Capilla, R.: Exploring Extensibility of Architectural Design Decisions. In: SHARK-ADI 2007, May 2007, p. 10 (2007) 20. White, J., Schmidt, D.C.: Model-Driven Product-Line Architectures for Mobile Devices. In: Proceedings of the 17th Annual Conference of the International Federation of Automatic Control, Seoul, Korea (July 2008) 21. White, J., Schmidt, D.C., Wuchner, E., Nechypurenko, A.: Automating ProductLine Variant Selection for Mobile Devices. In: SPLC 2007, pp. 129–140 (2007)

Introducing Motivations in Design Pattern Representation Luca Sabatucci1 , Massimo Cossentino2 , and Angelo Susi1 1

Fondazione Bruno Kessler IRST, Via Sommarive, 18 I-38050 Trento, Italy {sabatucci,susi}@fbk.eu 2 ICAR-CNR, Consiglio Nazionale delle Ricerche, Palermo, Italy [email protected]

Abstract. Design pattern formalization is aimed at encouraging the use of design patterns during the design phase. Many approaches focuses on providing solutions with a graphical notation and complementary text, typically composed by a static and a dynamic definitions. The weak point is the lack of flexibility when customizing the generic solution to the specific context of use. This paper proposes a criterion to motivate design pattern selection and reuse. Designer is supported with a technique for balancing pattern and context forces for selecting among alternative implementations. The provided representation summarizes and organizes relevant information in the classical informal pattern documentation. Keywords: Design Patterns, Goal-Oriented Modeling.

1 Introduction The informal description provided by the Gamma et al. (GoF) book [4] is very rich of details and it is perfectly suitable for communicating successful experience. Despite of the clarity of the exposition and the leading example that gives further clarifications and improves the global understanding, this format is not the best way to quickly and properly handle a pattern at design time. The description is long-winded and many important information are scattered along subsections (about 10 pages for each pattern) thus the reasoning process on the design problem is not properly supported. In addition, during the design phase many implementing details can be lost thus patterns gets poorer than in the original intent. In order to better handle pattern complexity, many representations in literature use the concept of pattern role [3,10,8]. Originally conceived as a shortcut to talk about generic collaborating elements [4], the pattern role turned into an holder of responsibilities [10], thus drawing up patterns to social organizations [6]. This concept is the base for some preliminary works on design patterns, such as [2] where a catalogue of design patterns for the agent oriented design was presented and [11] dealing with design pattern composition. These previous experiences raised the need for an instrument for creating a representation where original design patterns topics, such as motivations, applicability, consequences and implementing issues, are maintained but where information is in a compact and handleable form. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 201–210, 2009. c Springer-Verlag Berlin Heidelberg 2009

202

L. Sabatucci, M. Cossentino, and A. Susi

The aim of this work is to propose a semi-formal representation for design patterns that considers motivations underlying the solution structure as the basic key for describing and reusing design patterns. This representation is different from many approaches in literature, that provide the detailed specification of “what” is to be done when reusing a pattern. Here the pattern is primarily concerned with exposing “why” certain choices for behaviour and/or structure were made or constraints introduced. The approach is based on the i* framework, that uses goal-oriented analysis for modeling and reasoning on strategic relationships among multiple actors of a domain. The approach derives from a mapping between the design pattern domain and the goal-oriented analysis, thus creating a framework for reasoning on force balancing, thus allowing the designer to better understand and customize the solution for the specific reuse context. The proposed representation summarizes and reorganizes information taken from the informal textual description with the benefit to present relevant data in a compact format. The semi-formal nature of the representation is due to two factors: (i) the i* framework is provides a not fully semantically formalized language for defining system requirements, and (ii) a close connection is maintained between the i* pattern formalization and its informal textual description. The paper is organized as follows: Section 2 provides a brief introduction of the i* framework. Section 3 describes the representation, from designer’s need to the implementing solution. Section 4 discusses the approach, reasoning on benefits, understandability and reuse issues. Some related work are analyzed and compared in Section 5, and finally some conclusions are given in Section 6.

2 Background: The i* Framework The i* framework [12] supports goal-oriented modeling and reasoning about functional and non-functional requirements. It is a conceptual framework for modeling social domains (in which both humans and software systems coexist) and provides constructs for expressing concepts that appear during the requirement process: actors, intentional elements (such as, goals, softgoals, tasks and resources) and relationships among those concepts. A relevant characteristic of the approach is that of offering means for representing not only the requirements of the system to be, but also the motivations for underling design choices. The intentional elements and the relationships between them allow answering questions such as why particular behaviours, informational and structural aspects have been chosen to be included in the system requirements, what alternatives have been considered, what criteria have been used to deliberate among alternative options, and what are the reasons for choosing one alternative against the other. This representation supports the analysis of strategies, which help reach the most appropriate trade-offs among (often conflicting) goals and soft-goals. A strategy consists of a set of intentional elements that are given initial satisfaction values. Actors are holders of intentions; they are the active entities in the system or its environment who want goals to be achieved, tasks to be performed, resources to be available and softgoals to be satisfied. Actors in a system collaborate in order to address common goals.

Introducing Motivations in Design Pattern Representation

203

Intentional Elements. The Goal is a condition or state in the world that actors would like to achieve. How the goal is to be achieved is not specified, allowing alternatives to be considered. The Soft Goal is similar to a goal, but there are no clear-cut criteria for whether the condition is achieved, and it is up to subjective judgment and interpretation of the modeler. Soft goals are often used to describe qualities and non-functional aspects such as security, robustness, performance and usability. The Task specifies a particular way of doing something. Tasks can also be seen as the solutions in the target system, which will address (or operationalize) goals and softgoals. These solutions provide operations, processes, data representations, structuring, constraints, and agents in the target system to meet the needs stated in the goals and soft goals. The Resource represents a physical or informational entity, for which the main concern is whether it is available. Actors and intentional elements are connected by different types of structural and intentional relationships. Several modelling perspectives allow to specify relationships between concepts. Actor Modeling Perspective focuses on the identification of actors that participate to a social organization. This activity mainly concerns on identifying high-level dependencies relationships among actors. A Dependency describes how a source actor (the depender) depends on a destination actor (the dependee) for an intentional element (the dependum). The dependum is expressed by an intentional element thus to specify the nature of the dependency and its motivation. The Goal Modeling Perspective is focused on detailing an actor’s boundary, defining its intentional elements according to various techniques. The Decomposition analysis allow to refine the goals or plans into sub-goals or subplans generating a goal/plan hierarchical decomposition. In the AND decomposition, all of the decomposing intentional elements are necessary for the target intentional element to be satisfied. Whereas OR decomposition provides a description of alternative ways of satisfying a target intentional element. The Means-ends analysis allows to represent the operationalization of a goal via a task via the specification of means-ends relationships. The Contribution analysis defines the level of impact that the satisfaction of a source intentional element has on the satisfaction of a destination intentional element. The i* framework defines a standard set of contributions: “−−”/“−”, strong/weak negative, the intentional element is sufficiently/partially dissatisfied; “++”/“+”, strong/weak positive, the intentional element is sufficiently/partially satisfied.

3 Pattern Intentions: From Purpose to Solution The introduction of the GoF’s book [4] reports ”Design patterns goal is to capture design experience in a form that people can use effectively. Design patterns help you choose design alternatives that make a system reusable and avoid alternatives that compromise reusability”. The context gives a dimension of pattern usability, by providing motivations that justify its reuse and consequences of its application into the system. This is classically done by using a leading example that illustrates a concrete design problem and how the class and object structures in the pattern can solve the problem. Goal-oriented analysis can be profitably employed to describe pattern motivations, forces and consequences in a semi-formal way, as well as Gross and Yu already did

204

L. Sabatucci, M. Cossentino, and A. Susi

Table 1. Association map between pattern domain terms and i* concepts used in this work

in [6] even if limited to non-functional requirements. Here, the proposed approach is based on abstracting the process of modeling a system with the support for reusing design patterns. 3.1 Designer Needs The proposed abstraction considers the designer as the main Actor of the design activity domain, whose job is to balance design forces coming from the system under modeling. The pattern reuse activity elicits from designer’s needs that are: (i) Design Goals to solve specific design problem emerging during the development of the system related to the correct distribution of responsibilities among classes of the system and (ii) NonFunctional Requirements (soft goals) that emerge during the analysis phase and specify qualities of the system-to-be. These needs define some conditions in the model that designer would like to achieve; non functional requirements are similar to designer’s needs, but there are no clear-cut criteria for whether conditions are achieved. The concept of Pattern Role is the core of the proposed representation. Riehle (in [10]) introduces role diagrams that focus on the collaboration and distribution of responsibilities between objects of the system. Roles are holder of responsibilities whereas the notion of class becomes an implementation construct only. The current work proposes to enrich this concept of role by giving it the responsibility to handle a piece of the pattern solution. This upgrades the role from a passive template for the solution, to an active reasoning element for achieving the solution. In this vision, a design pattern is the delegation of some design choices to the experience of expert designers. It is composed by: (i) Actors, that represent active entities of the system who want goals to be achieved, tasks to be performed, resources to be available and softgoals to be satisfied. The abstraction considers the designer as the main actor of this domain, which job is to balance forces coming from the context in order to address some design (functional/non

Introducing Motivations in Design Pattern Representation

205

KEYS

[G1] to introduce a level of [SG1]

to create a smart reference between client and real subject design

[SG2] to de-couple client and real subject

indirection to the real subject

proxy role

[T1] access to real subject functionalities

er

[G2] to provide functionality

human/ pattern role

role A

design goal delegation of responsibility force

real subject role

client role

role B

solution

Fig. 1. Actor diagram for the Proxy pattern

functional) objectives. The pattern contains some roles, that represent the proactive parts that are used to allocate the solution to elements of the context. (ii) Goals encapsulates the intentional part to solve the problem. A design goal is a condition or state in the model that designers would like to achieve; how the goal is to be achieved is not specified, allowing alternatives to be considered. (iii) Soft Goals are similar to goals, but there are no clear-cut criteria for whether the condition is achieved, and it is up to subjective judgment and interpretation of the modeler. Softgoals are used to describe forces coming from the context that represent specific qualities of the system-to-be. (iv) Tasks specify a particular way of doing something. Tasks are atomic component of pattern solutions, addressing design goals and soft goals. Tasks provide operations, processes, data representations, structuring, constraints to meet the needs stated in the goals and softgoals. The whole pattern solution is the synthesis of actions, guidelines and techniques described in pattern tasks. (v) Resources represent physical or informational entities of the system, for which design activity is executes. Resources in a pattern are elements of the system such as data to introduce into the system in order to obtain a good solution. Summarizing, the proposed map considers a pattern role as an actor that encapsulates a piece of the well-tested experience of an expert. The i* framework models this situation so that, when reusing a pattern, the designer delegates a part of his/her duty to pattern roles. Thus each role addresses some goals and proposes a strategic plan to achieve them. The actor diagram is the instrument to represent the pattern collaboration view. Table 1 gives an outline the mapping between terms of these two domains. Figure 1 depicts responsibility organizations for the Proxy pattern. The i* visual notation represents actor as circles having a balloon associated to represent its internal rationale; design goals are represented by using rounded rectangles, whereas clouds are used to specify context forces. The main actor is always the designer of the system, who orchestrates with pattern roles, by delegating them some design responsibilities. Delegation is a dependency represented as an intentional element (goal, soft-goal, task or resource) that connect two actors. The directions of arrows indicate who is the original handler and who is the receiver of the responsibility. As an instance, in the Proxy pattern, the designer needs [to create a smart reference between two objects] (SG1) and [to de-couple these two objects] (SG2); this couple of soft-goals represents the main motivation for the use of this pattern. The commitment of these objectives requires a

206

L. Sabatucci, M. Cossentino, and A. Susi

delegation of responsibility: the proxy is responsible [to introduce a level of indirection between these two objects] (G1) and the real subject must be able [to provide the functionality] (G2). Finally a third role, the client delegates [the access to real subject functionality] (T1) to the proxy. 3.2 Alternative Solution Implementations Design patterns are descriptions of communicating objects and classes that are customized to solve a general design problem in a particular context. One of the common limits of most pattern representation techniques in literature is that solutions are provided as rigid templates that solve problems in a unique and invariant way. A pattern description should maintain its original informative content: applicability, pitfalls, hints, criticism and, above all, implementing alternatives and design issues. This informative core is maintained by using the i* framework goal modeling perspective. This perspective uses goal decomposition to explore motivations for each specific implementing detail. Each role as at least a main goal to address and a collection of forces concerning qualities of the system. The decomposition analysis is performed by using AND/OR and means-end analysis techniques that generate a hierarchy of goals, sub-goals, tasks and resources for addressing role main goals. Goal Model Hierarchy. Role main goals describe motivations for applying a specific pattern into a context. By using the AND decomposition, goals are iteratively refined in other sub-goals, thus creating a tree hierarchy. The AND operator implies the achievement of all decomposing goals are necessary for the target goal to be satisfied. The means-end relationship is also used for introducing tasks in the goal hierarchy as the operationalization of goals. Tasks are atomic steps for modifying the current system as a consequence of pattern instantiation. The means-end link indicates that the achievement of the task implies the full satisfaction of the target goal. The whole solution is provided by executing all selected tasks. Some tasks can be connected to resources that represent elements that will be introduced into the system in order to generate the solution. They can be structural elements (attributes, methods, abstract classes, interfaces) or behavioral elements (events, method calls, and so on). OR Decomposition. The main strength of this representation is the natural capability to represent alternative paths for the implementing solution. In fact each OR decomposition introduces a decisional point in the goal-model that provides a description of mutually exclusive ways of satisfying a target goal. This alternative is referred to a specific design choice that designer can select in order to customize the solution for the context. This analysis uses contributions for giving the designer an instrument to balance these trade-offs. Contributions are a technique for specifying the impact goals and tasks have against context forces, in order to give details of pattern applicability or eventual drawbacks. A (weak/strong) positive impact means the task introduces benefits to the specified force, otherwise a (weak/strong) negative impact indicates a conflict against that force. Figure 2 shows a slice of the goal/plan model for the Proxy pattern, built on the GoF’s book specifications (the client and real-subject roles are omitted). The proxy role

Introducing Motivations in Design Pattern Representation

207

[G3] [SG3]

to represent the real object

proxy role

to simulate a local representative for a remote object

AND

[G4] [G5]

to receive requests from client

[T2]

KEY system element

boundary

OR implement the subject interface

[T3]

[T4]

maintain a direct reference to real subject

[R1] subject interface

++

to communicate to the real subject

handle a protocol to send requests to the real subject

decomposition AND design goal

[R2] +

real subject object

[G6] to control the real subject lifecycle

[SG4] to create objects on demand

+

to protect the access to an object

+

OR

means-end

[SG5]

force

contribution +/++/-/-solution

resource-use

Fig. 2. Goal/Plan diagram for the main role in the Proxy pattern

is responsible of [to represent the real object] (G3). This goal is claimed by the achievement of two sub-goals: [to receive requests from clients] (G4) and [to communicate to the real subject] (G5). G4 is achieved by the [implement the subject interface] (T2) task, that introduces a new interface (R1) used by clients to access to some methods. On the other hand, the communication between proxy and real subject (G5) is possible in two interchanging ways: [maintain a direct reference to a real subject] (T3) or [handle a protocol to send requests to the real subject] (T4). The solution issue T3 is more suitable for cases in which the proxy controls the real subject life-cycle (to create objects on demand, or to protect the access to the object), whereas solution T4 is required when the real subject is a remote object. Resolution of Forces. Given this kind of pattern representation, the solution is provided by choosing among design alternatives and then by selecting the corresponding tasks. Design issues have to be balanced in according to the specific application context. Context forces are the means for giving different weight to pattern objectives. The design pattern will solve the context problem whether all main role goals are fully addressed by task selection. AND/OR decompositions, means-end links and contribution links Table 2. Two possible implementing solutions for the Proxy pattern

208

L. Sabatucci, M. Cossentino, and A. Susi

are fundamental to check this property. As shown in the previous example (Figure 2), by trading with contextual forces, designer can choose al least two different solutions (summarized in Table 2), one suitable for local proxies and one for remote proxies. This example considers the proxy role only, but actually alternatives in the Proxy pattern are more than the two described before.

4 Approach Analysis and Argumentation The main idea of this goal-based representation is to focus on the design pattern rationale rather than on the solution structure only. This provides an instrument, complementary to the traditional textual description, that summarizes pattern motivations within all implementing details, thus improving understanding and reuse. Understandability Issues. The traditional textual description (typically found in the GoF’s pattern catalogue [4]) includes a lot of information spread across various sections. The comprehension of a design pattern requires a great effort in studying an average of 10 pages for each pattern. This work proposes to replace or to complement the very detailed description with a couple of compact diagrams reporting the most relevant information. The actor diagram provides an explicit structure where intent, applicability and consequences are highlighted. This aids in quickly searching in a catalogue for selecting the best pattern in according to the specific context problem. The explicit reference to intent and applicability clarifies the actual difference among some patterns that present very similar structures; for instance State and Strategy patterns have an identical structure that becomes inexpressive for understanding their totally different intents. Figure 3 highlights their differences. Instead, the couple actor and goal/plan diagrams provides means for deeply understanding the rationale of each implementation detail, and eventually selecting the best alternative solution to apply.

STATE

STRATEGY

UML Context request()

State handle()

ConcreteStateA handle()

i*

to make state transitions designer explicit

Context request()

ConcreteStateB handle()

to allow an object to modify its behavior as a consequence of its state

+ to manage the state

context role

to encapsulate the behavior

generate the state-specific behavior

Strategy handle()

object behavior depends on its state

state role

Concrete Strategy A handle()

Concrete Strategy B handle()

to avoid mixing algorithm implementations with context data

to remove conditional statements for selecting a desired behavior

+

clients can choose among strategies with different time and space trade-offs context role

designer

+

to define strategy for algorithm selection and execution

execute the algorithm

+

to ease to understand, maintain and extend the context

to define a family of interchangeable algorithms

strategy role

Fig. 3. Comparison of two commons patterns from GoF book that present a very similar UML structure

Introducing Motivations in Design Pattern Representation

209

Reuse Issues. The approach provides an useful support for forward engineering and design traceability that is totally independent of the kind of design methodology the user is following up. This means that the approach can be exploited in a traditional design methodology as well as in a goal-oriented one. The approach covers all perspectives of pattern documentation and reuse, from motivation, applicability and consequences to the implementation in a compact and readable form. It is not limited to the final solution to reuse in a specified context, but it also catches the whole reasoning that led to the pattern definition. The explicit documentation of pattern rationale raises the reusability of the pattern, allowing for considering design choices leading to the final desired result. This issue has been already successfully explored by Gross and Yu (in [6]) and the proposed approach can be considered complementary to the cited paper. It is worth noting that there is not a direct relationship between functional requirements and pattern goals, because they belong to different domains: design is the activity to realize objectives emerged during the problem analysis, thus requirements give indication about desired functionality, whereas design patterns solve problems of design.

5 Related Works Several works have been proposed for stating design pattern solutions, improving traceability and maintenance issues; such problems become even greater when more patterns are used in composition. Pattern specification languages that utilize mathematical notation provide the needed formality, but often at the expense of usability. Mikkonen in [9] applies rigorous formalization to pattern solution, in a way that reasoning can be made on pattern temporal behaviors in terms of high-level abstractions of communication. Many approaches, including the original GoF diagrams [4], use a subset of the UML notation for the pattern formalization. UML is very good at communicating designs, and it is also continuously evolving for better expressiveness. Class, sequence and activities diagrams are the most frequently used for representing structure and collaboration views. Rielhe introduces role diagrams [10], using a notion of role that is more abstract. These diagrams define roles played by objects and thus the views objects hold on each other. Rielhe relies on roles mainly to address boundary conditions in recursive structures, explicitly focusing on developing and documenting object collaboration patterns. Sabatucci et al [11] front the problem of design pattern composition, by introducing a fine-grained description of the static and dynamic aspects of a pattern solution. Composition is provided by a small set of operators working on solution elements, tracing transformations before pattern instantiation into the system. They stress the importance of representing the pattern semantics for increasing the consistence and the reusability of multi-patterns. In [6], the author proposes of considering non-functional requirements (NFR), coming from the analysis phase, during the design. These are treated as design goals, leading the designer to explore a design history of alternative choices, by reasoning on the impact of each pattern over non-functional requirements. An extension of this approach was carried on by Weiss [1] who introduces a rigid form for soft-goal hierarchy to reason about a pattern. This structure is built starting from a standard set of NFR crossed with other NFRs coming from the problem context.

210

L. Sabatucci, M. Cossentino, and A. Susi

6 Conclusions and Future Works Some interesting directions can lead to future developments. The first issue concerns the independence from a domain. The emphasis given on representing properties and structures of a solution against a specific need makes the approach domain-independent. It is interesting to investigate whether the same approach could be extended to support different categories of patterns, for example analysis patterns, architectural pattern, agentoriented pattern, aspect-oriented-patterns and so forth. The second issue concerns the semantics of the solution. The nature of this representation is semi-formal, in fact, instructions inside each task are given in natural language, so they mey be unclear, redundant or incomplete. The research question is whether it is possible to identify a formal semantics to describe these tasks. Some directions that will be explored are: (i) to use the meta-modeling for defining an ontology of solutions [3] and (ii) the integration of the approach with the aspect oriented programming in order to allow automatic generation of aspectized-design patterns (as proposed in [7,5]). Finally, this representation can be extended to include pattern composition operators given the importance to consider reciprocal force influence occurring when multipatterns are used to solve a conjoined problem [11]. Resolution of forces must consider possible conflicts and semantically inconsistent situations. It is interesting to analyze (semi-automatic) reasoning techniques and tools for for identifying these problems.

References 1. Araujo, I., Weiss, M.: Linking Patterns and NonFunctional Requirements. In: Proc. of the 9th Conference on Pattern Language of Programs, PLoP (2002) 2. Cossentino, M., Sabatucci, L., Chella, A.: Patterns reuse in the PASSI methodology. In: ESAW, pp. 294–310 (2003) 3. France, R.B., Kim, D.-K., Ghosh, S., Song, E.: A uml-based pattern specification technique. IEEE Trans. Softw. Eng. 30(3), 193–206 (2004) 4. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements od Reusable Object-Oriented Software. Addison-Wesley Publishing Company, New York (1995) 5. Garcia, A., Sant’Anna, C., Figueiredo, E., Kulesza, U., Lucena, C., von Staa, A.: Modularizing design patterns with aspects: a quantitative study. In: Proc. of AOSD 2005, pp. 3–14. ACM Press, New York (2005) 6. Gross, D., Yu, E.: From non-functional requirements to design through patterns. Requirements Engineering 6(1), 18–36 (2001) 7. Hannemann, J., Kiczales, G.: Design pattern implementation in java and aspectj. In: Proceedings of OOPSLA 2002, pp. 161–173. ACM Press, New York (2002) 8. Mak, J., Choy, C., Lun, D.: Precise modeling of design patterns in UML. In: Proc. of ICSE 2004, Washington, DC, USA, pp. 252–261. IEEE Computer Society, Los Alamitos (2004) 9. Mikkonen, T.: Formalizing design patterns. In: Proceedings of ICSE 1998, Washington, DC, USA, pp. 115–124. IEEE Computer Society, Los Alamitos (1998) 10. Riehle, D.: Describing and composing patterns using role diagrams. In: M¨atzel, K.-U., Frei, H.-P. (eds.) 1996 Ubilab Conference, Z¨urich, Germany, June 1996, pp. 137–152 (1996) 11. Sabatucci, L., Garcia, A., Cacho, N., Cossentino, M., Gaglio, S.: Conquering fine-grained blends of design patterns. In: Mei, H. (ed.) ICSR 2008. LNCS, vol. 5030, pp. 294–305. Springer, Heidelberg (2008) 12. Yu, E.S.-K.: Modelling strategic relationships for process reengineering. PhD thesis, Toronto, Ont., Canada, Canada (1996)

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse Oliver Hummel and Colin Atkinson Software Engineering Group, University of Mannheim 68161 Mannheim, Germany {hummel,atkinson}@informatik.uni-mannheim.de

Abstract. The adapter or wrapper pattern is one of the most widely used patterns in software engineering since the problem of reconciling unsuitable component interfaces is so ubiquitous. However, the classic adapter pattern as described by the Gang of Four has some limitations which rule out its use in certain situations. Of the two forms of the pattern, only the object adapter form is usable with common programming languages not supporting multiple inheritance (such as Java or C#), and this is not able to adapt interfaces of classes whose own type is used in one or more of their operations. This makes it impossible for a tool to automatically generate “glue code” for such components and forces developers to come up with some non-trivial (and typically invasive) workarounds to enable clients to use them. In this paper we present an enhanced form of the adapter pattern which solves this problem by extending the way in which an adapter stores and manages adaptees. We therefore call it the Managed Adapter Pattern. After describing the pattern in the usual Gang of Fouroriented way, we describe its application in the system that initially motivated its development – a test-driven component search engine which is able to retrieve reusable assets based on their semantics. A key challenge in the implementation of this engine was developing a flexible glue code generator that was able to automate the creation of adapters for all the kinds of components delivered by the underlying component repository. Keywords: Software Engineering, Software Reuse, Software Components, Web Services, Adapter Pattern, Glue Coding.

1 Introduction For many years the biggest obstacle to component-oriented software reuse [1] was the high-level of effort involved in searching for components and the low chance of finding ones with suitable functionality [2]. However, as the number of available components has increased [5] and the technologies for indexing [3] and searching over them (see e.g. [4] and [6]) have matured, the relative importance of this obstacle has diminished and the problems involved in using components (and services) once found have started to assume greater significance. One of the most common problems is that components with the right functionality for a particular purpose often do not have the right interface to be used “as is” in a given environment. Therefore, they need to be “adapted” to meet the interface expectations of the using application. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 211 – 224, 2009. © Springer-Verlag Berlin Heidelberg 2009

212

O. Hummel and C. Atkinson

The idea of creating glue code or a so-called “adapter” to make the interface of one component meet the expectations of another is a commonly used concept in software engineering, and is the focus of one of the most well known patterns in the Gang of Four (GoF) pattern catalog [9]. It is also found in some other pattern systems under the name “wrapper”. As this name implies, the basic idea is to wrap the interface offered by a reuse candidate with an adapter similar to those used in the physical world to make power outlets work with inappropriate plugs from foreign countries. Analogously, the idea of applying a software adapter is motivated by the desire to have a non-invasive solution for making two parties work together that would otherwise require changing the provided interface of the retrieved reuse candidate (and most likely its code as well). The adapter pattern as discussed by the GoF comes in two main variants, namely the object adapter which delegates requests to adaptee classes and the class adapter which in turn is based on subclassing the adaptee classes. Unfortunately, as we will discuss in the next section, both variants of this pattern share several constraints that limit their practical applicability in many situations. As evidenced by the inclusion of the adapter in common pattern catalogs, the application of adapters for gluing components together is a long practiced approach in the component-based and service-oriented community and therefore has already been investigated from various viewpoints. From the perspective of component-based reuse the ultimate goal of adapter creation is to automate this process as far as possible in order to reduce the costs of integrating externally acquired (off-the-shelf) components. To this end [15] proposed the so-called type-based adaptation approach that provides for an additional repository of adapters in the reuse process in order to automate the adaptation of components. However, the downside of this approach is that an adapter repository first needs to be filled with manually created adapters which eventually only adds an extra layer of complexity and shifts the effort for adapter creation to an earlier point in the development process. Unfortunately, the authors do not elaborate on how the adapters are created which is certainly the most labor-intensive aspect of their approach. To address this problem and to provide an approach to adaptation that lends itself better to automated rather than just human application, in this paper we present an enhanced adapter pattern which we called the “managed adapter”. This is not only able to cope with a wider range of adaptation challenges it is also compatible with all object-oriented programming languages. After first describing the traditional adapter pattern in section 2 and explaining the need for more flexible and efficient adaptation facilities in sections 3, we present an implementation example and the generic structure of our improved pattern variant in section 4. After that, in section 5, we elaborate on some important scalability properties of our approach. The following section, section 6, then introduces an interesting application of the pattern – in fact, exactly the scenario that first motivated us to extend the original pattern – namely a component search engine driven by ordinary unit tests in which managed adapters are created to determine whether syntactically suitable components pass user defined test cases, and thus are also suitable reuse candidates. The final section 7 wraps up the paper with a short summary and an outlook on potential benefits and ongoing developments.

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

213

2 Background As mentioned above, the Gang of Four adapter pattern comes in two forms – a static variant called the class adapter which is based on multiple inheritance and a dynamic variant known as the object adapter which is based on delegation. For developers used to working with today’s most widespread object-oriented languages such as Java and C#, the more intuitive variant of the two is perhaps the object adapter. We therefore explain this first. The UML class diagram below depicts the following situation: the Client class on the left hand side needs to work with the component on the right hand side (the Adaptee) providing the offered interface (we prefer to use this term instead of “provided interface” since it clearly has a different meaning). This interface is typically different to the specified (Target) interface desired by the client. The role of the ObjectAdapter class is thus to implement Target by forwarding requests to the Adaptee in order to enable the Client to use the Adaptee.

Fig. 1. Object adapter pattern as envisaged by the Gang of Four

Obviously, the Client can also be programmed to use the ObjectAdapter directly, but this would be considered a bad design in many cases. In order to perform its function, the adapter needs to maintain a reference to an instance of the Adaptee which has to be created during its initialization. When receiving a service request, an ObjectAdapter instance forwards all received parameters to the corresponding method of the instance of the Adaptee it has created and then hands back the received result to the Client. In other words, the adapter essentially “wraps” the Adaptee and delegates incoming method invocations to it. In contrast to the object adapter pattern, the class adapter pattern follows a slightly different approach. As shown in figure 2, instead of delegating requests to an Adaptee instance the ClassAdapter inherits from the Adaptee in order to implement the Target interface using it. However, the basic idea remains the same: the class adapter also forwards incoming calls to a corresponding method of the Adaptee class. The difference is just that this time the adapter does not need to maintain a reference to the Adaptee, since it has inherited all the methods from it and thus can use the super reference to forward requests. In general, both patterns have their advantages and disadvantages: while the object adapter is supposed to be more flexible the class adapter is considered easier to implement [16]. Nevertheless, the object adapter is straightforward to implement in

214

O. Hummel and C. Atkinson

Fig. 2. Generic structure of the class adapter pattern

Java, C# and all other object-oriented languages as well, but unfortunately it is not possible to fully implement the class adapter in languages that do not support multiple inheritance. Although there exists a simple workaround to circumvent the lack of multiple inheritance by replacing the Target class with an interface in Java, for instance, this approach remains very limited since it does not support the adaptation of constructors, for example. Thus, the object adapter is typically the preferred solution in most of today’s widespread programming languages.

3 Motivation For the reasons explained above the object adapter pattern at first sight seems to be the ideal candidate to use as a connector in component-based development and is of course recommended for this purpose by the GoF. Unfortunately, however, the classic version of the object adapter pattern described above is only able to adapt a limited number of interfaces – namely, those that only contain parameters of primitive and predefined types. The GoF adapter pattern is not able to handle situations in which the class to be adapted contains a reference to its own type or other self-defined types and the programming language used does not support multiple inheritance. Or, as the GoF say in their own words: “A potential problem with adapters is that they aren’t transparent to all clients.” [9]. The GoF realized that so-called “two-way adaptation”, in which an adapter needs to offer more than one interface, is a potential problem for their pattern. In our context, the problem of adapting classes with self-referencing parameters can be seen as a simple variant of the two-way adaptation problem. Although, the GoF briefly demonstrated a solution (p. 143) it is based on multiple inheritance again and thus not applicable for the object adapter and therewith for today’s most commonly used programming languages To establish a better understanding of this issue, we use a concrete example of an interface containing references to its own type in order to demonstrate how the classic object adapter needs to be extended. The chosen example is the BinaryTree data type since this is a widely known data structure having the appropriate characteristics. It plays the role of the ObjectAdapter in the standard object adapter pattern shown in figure 1, and would implement a Target interface with the same profile. However, we don’t show this interface here since it is unimportant to the discussion.

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

215

public class BinaryTree { public BinaryTree(int value, BinaryTree right) {} public BinaryTree getLeft() {} public BinaryTree getRight() {} public void setLeft(BinaryTree bt) {} public void setRight(BinaryTree bt){}

left,

BinaryTree

}

The critical elements of the code fragment above are those printed in bold font, namely the parameters of the methods setLeft and setRight as well as the return values of their corresponding getter methods getLeft and getRight. The same problem holds true for the constructor because it also contains self-referencing parameters. Given the above interface and an arbitrary reuse candidate the classic implementation of the adapter (i.e. the BinaryTree in this case) would result in the situation sketched in the following diagram:

BinaryTreeAdaptee bta = new ... // to be instantiated // in the constructor public void setLeft(BinaryTree bt) { bta.setLeftChild(bt); } public BinaryTree getLeft() { return bta.getLeftChild(); }

Fig. 3. A situation in which the basic implementation of an object adapter will fail

The compiler will of course detect this problem. As can be seen on the left hand side of the figure, the BinaryTree’s (i.e. the adapter’s) set-method expects a parameter of type BinaryTree which would be delivered by the client and normally passed on directly to a BinaryTreeAdaptee (shown on the right) as suggested by the underlined variable reference in the note. Of course, the adaptee object (as well as its method) only knows its own type and is not aware of the need for adaptation. Hence, it would naturally expect an object of type BinaryTreeAdaptee to be delivered to its setLeftChild method and not a BinaryTree instance as in this case. Thus, a parameter with an incorrect type would be passed to BinaryTreeAdaptee and the adaptation would fail. However, as we shall see in the next section this issue is rather

216

O. Hummel and C. Atkinson

simple to solve, at least compared with the inverse situation where a value is returned from the adapter (as it is the in the getter-methods, for example). Our so-called managed adapter pattern, explained in the next section, was designed to provide a solution to both these challenges.

4 Exemplary Implementation and Generic Structure As discussed before, simply forwarding a BinaryTree object as shown in figure 3 will not work in this situation because the associated instance of BinaryTreeAdaptee needs to be forwarded instead. Since creating an adaptee object within each adapter object is the usual way of implementing this pattern anyway, the only additional overhead involved at this point is adding a getAdaptee method that returns the adaptee instance of its object (despite the fact that the object has direct access to this attribute, accessing it via a getter method is a cleaner implementation). Using this approach, the extended implementation of the setLeft method could have the following form in a Java-like syntax: public void setLeft(BinaryTree left) { bta.setLeftChild((left != null) ? null); }

left.getAdaptee()

:

We have used Java’s ternary operator in order to check whether or not the delivered reference equals null and simply pass on the null reference if this is the case. Otherwise we simply return the adaptee instance associated with the adapter. All other methods that expect a BinaryTree as parameter have to be adapted in the same way. This, of course, also applies for the constructor which has two further important responsibilities in this context. First it has to instantiate the adaptee object. This could be made to work in the following way if we apply the same strategy as above: bta = new BinaryTreeAdaptee(value, (left != null) ? left.getAdaptee() : null, (right != null) ? right.getAdaptee() : null);

Second, the constructor needs to be responsible for storing the newly created instance in order to support the correct delivery of BinaryTree objects from methods such as getLeft etc. An easy way to solve this is to create a static Hashtable object in class BinaryTree that keeps track of all existing adapter-adaptee relations by simply using adaptee objects as the key: private static Hashtable adaptees = new Hashtable();

The next step to make this work is of course to put the newly created instance into the Hashtable and thus to extend the code of the constructor with the following line: adaptees.put(bta, this);

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

217

Now, we merely need to wrap all method calls delivering an adaptee object with an appropriate call to the get method of the Hashtable. Since the retrieval of the adapter from the Hashtable involves some logic it makes sense to place this code in a getAdapter method as shown in the following snippet taken from the getLeft method: return getAdapter(bta.getLeftChild());

The logic required in this method is rather simple again, but it is important to distinguish between null references that are stored as such in the adaptee and null references that occur due to adaptees that cannot be found in the Hashtable. private BinaryTree getAdapter(BinaryTreeAdaptee bta) { BinaryTree bt = null; if (bta != null) { bt = adapters.get(bta); if (bt == null) bt = new BinaryTree(bta); } return bt; }

The latter is typically the case when an adaptee object has created a new object of type BinaryTreeAdaptee. Thus, initially, there will be no adapter object for this instance. This is not a problem as long as this object stays within the boundary of the adaptee. If it does not, however, we need to create a new placeholder for this object inside the adapter (i.e. a BinaryTree object the Client could work with). Since the initialization parameters for the new adaptee object have already been set by the adaptee that was creating it, it is sufficient to establish the connection between adapter and adaptee through a new (and private) constructor in the adapter expecting the orphan adaptee instance as parameter. 4.1 Structure Putting the above pieces together yields the generic representation of the managed adapter pattern as shown in the class diagram below where the ObjectAdapter has become a ManagedAdapter and has been extended with a Hashtable for storing the relations to the adapted objects as well as with a constructor and two methods for accessing it. As already demonstrated in the previous section by means of a concrete example, from an implementation point of view it is straightforward to create such an adapter once the desired interface, the potentially reusable component and the mapping between the two are available. However, in more complex systems and in the case of programming languages with automated garbage collection such as Java even more challenges arise as we will discuss in the next section.

218

O. Hummel and C. Atkinson

Fig. 4. Extended version of the object adapter managing the wrapping and unwrapping of adaptees automatically

5 Consequences Another interesting issue that may come up in relation to the managed adapter pattern as well as with the classic object adapter is the question of scalability, or in other words, what happens in a more complex setting when the adaptee has dependencies on other classes requiring adaptation, too? The first step required to address this question is of course the creation of an adapter class for each dependency using the managed adapter pattern that we have just introduced. However, as soon as such an adapted object is passed from an adapter to an adaptee or vice versa we need to wrap or unwrap it in the same manner described above for adapters with self-referencing classes. Thus, we need to increase the visibility of the getAdapter and getAdaptee methods of each adapter from “private” to “package wide” in order to enable other adapter classes (assumed to be in the same package) to perform the wrapping and unwrapping as necessary. The following example illustrates this scenario in more detail using a simple ShoppingCart component of the kind that is often used in online shopping applications: When the Client wants to create a new Product object in order to put it into the cart, the Product (adapter) instantiates an appropriate Item transparently on behalf of the Client. As soon as the Client calls e.g. an addProduct method of the ShoppingCart (adapter), passing on the Product in its original form would result in an interface mismatch since Cart expects an Item instead. To overcome this problem, the ShoppingCart needs to retrieve the appropriate Item from the Product’s Hashtable. As soon as an Item is returned by the Cart (for example, when its getItem method is called) the ShoppingCart is responsible for turning it into a Product by calling the getAdapter method of Product. In principle, this approach makes the managed adapter well scalable for components (or in this case “class assemblies” is perhaps a better term) of arbitrary size.

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

219

Fig. 5. Structure of a more complex example of the managed adapter pattern (business operations omitted for the sake of clarity)

Unfortunately, even in the form just described, the practical usability of the managed adapter pattern might be limited by the automated garbage collection of modern languages like Java, especially where large-scale systems are concerned. The reason is simple: since a reference to each adapter object will always be stored in the Hashtable, the garbage collector would never be able to recognize this object as unused and will eventually never be able to free the memory resources that it consumes. Thus, using a large number of such objects in a long running system could cause serious memory leaks. Our solution to this problem in Java is inspired by C# in which the IDisposable interface declares a dispose method that can be used to explicitly delete objects. If the ManagedAdapter class were to implement such a method, clients that are aware of this feature would be able to delete adapter objects by calling their respective dispose method. Following the implementation example introduced in the last section, the code for this method would simply remove the adapter’s adaptee from the Hashtable. Of course, the client needs to delete all its references to the adapter object as well. Finally, the class diagram of this yet again extended version of the pattern is shown in the following figure. Eventually, one might ask whether the presented version of the pattern is also able to cope with static methods and class attributes? Fortunately, forwarding requests to a static method is even simpler than to a non-static one since no adaptee instance is required in this case. (Static) attributes in the adaptee are also covered by this approach as long as they are accessible via methods (and ideally are only accessible in this way). In other words, such variables are supposed to be private, which should be good practice in object-oriented programming anyway. If this were not the case for some reason, they can still be wrapped by getter and setter methods in the adapter and are thus no problem for our approach. Even final public attributes (i.e. constants) in the adaptee can be handled by our approach as their values can be copied into the adapter during its initialization and are in this way directly accessible through the adapter. However, if in some very rare cases direct access to volatile public attributes of the adaptee should be required the only feasible solution we see at the time being is

220

O. Hummel and C. Atkinson

Fig. 6. Further extended version of the managed adapter implementing an IDisposable interface

using an arbitrary framework for aspect-oriented programming (such as [13]) which would be able to intercept access to the attribute(s) in the adapter appropriately.

6 Usage Example An approach that relies heavily on the managed adapter pattern is test-driven reuse as first introduced in [7]. Based on the notion of test-driven development (TDD) promoted by Beck [9] and others, test-driven reuse (TDR) uses specifications in form of ordinary unit tests to evaluate the degree to which reuse candidates match a potential reuser’s needs. In TDD, as opposed to traditional waterfall-like development approaches, testing is carried out by the developer of a unit whilst he is coding it. Experienced purists take this idea even further: they define the test cases for a unit before it is actually created in order to derive its interface description from the test case. There are two obvious benefits of this approach: first, testing (and thus potentially expensive fault detection and correction) is not deferred to the very end of the development process and, second, there exists a clear measure of when the coding of a unit is complete – namely, as soon as all interfaces are covered by appropriate test cases and they can be successfully executed. In other words, TDD introduced a practical criterion for determining the semantic acceptability of a code unit which can just as well be used in order to evaluate the acceptability of reuse candidates with TDR. Although a “unit” typically correspondents to a class in object-oriented programming languages the applicability of this idea has already been taken to higher levels of granularity by [10] in order to promote acceptance tests for whole (sub-)systems. Since we do not want to enter the ongoing debate about what exactly a component is [1], for this paper we simply regard all programming units that provide their functionality through a well-defined interface and hide their implementation details as components. Thus, TDD is well-suited for the various types of objects, classes and components that occur in today’s programming languages and can even be used for the acceptance testing of (web) services.

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

221

As indicated before, test-driven reuse is also strongly related to the idea of semantically validating components through test cases. As in TDD a TDR-based development project starts by defining a test for a component. Consider the following excerpt of a simple JUnit test case used to validate a class supporting matrix calculations: public void testMatrixMultiplication() { Matrix mtx1 = new Matrix(2, 3); Matrix mtx2 = new Matrix(3, 2); mtx1.set(0, 0, 1.0); mtx1.set(0, 1, 2.0); mtx1.set(1, 0, 2.0); mtx1.set(1, 1, 3.0); mtx1.set(2, 0, 1.0); mtx1.set(2, 1, 4.0); mtx2.set(0, 0, 1.0); mtx2.set(0, 1, 2.0); mtx2.set(0, 2, 3.0); mtx2.set(1, 0, 3.0); mtx2.set(1, 1, 2.0); mtx2.set(1, 2, 1.0); mtx1 = mtx1.mul(mtx2); assertEquals(mtx1.get(0, 0), 7.0); assertEquals(mtx1.get(1, 1), 10.0); assertEquals(mtx1.get(2, 1), 10.0); assertEquals(mtx1.get(2, 0), 13.0); }

Starting from a test case like this, a tool such as Code Conjurer [4] (available under the GPL3 license at http://www.code-conjurer.org.) is able to extract the interface of the matrix component the JUnit code is supposed to test. In Java terms, code that provides this interface has the following form: public class Matrix() { public Matrix(int rows, int cols) {} public double get(int row, int col) {} public void set(int row, int col, double val) {} public Matrix mul(Matrix m) {} }

The nice thing about the above test case is that it not only contains the syntactical description of the required component’s interface, it also gives a precise description of the semantics of its constructor in interaction with its methods. In other words, such a simple test case typically contains all the information about a component’s specification required by component-based development approaches such as KobrA [11]. In practice, of course, the test case needs to be more elaborate, but we stay with this simple example here for the sake of brevity. As documented in [4] for this example, the Code Conjurer tool is able to automatically test the ten reuse candidates delivered by the merobase.com component search engine that exactly match the required interface. This means all their names and signatures conform to the interface specified in the test case. Out of these ten candidates two fully tested and thus fully functional matrix components are delivered in less than 30 seconds. However, a search based on pure signature matching [12] (i.e. ignoring the method and class names) would have resulted in a total of 137 reuse candidates for the same example and thus in a potentially much larger result set which might deliver a much greater choice of reusable components. Obviously, there existed a gap between the interface specified by the JUnit test case and the interfaces provided by 127 of these 137 candidates which

222

O. Hummel and C. Atkinson

hindered compilation and made immediate reuse impossible. Thus, tools (or developers performing this task manually) clearly need to perform an additional adaptation step in order to cut this “Gordian knot”. To address this problem we have implemented an automated adaptation engine using the managed adapter pattern introduced in this paper in order to improve a testdriven reuse environment comprising Code Conjurer and the merobase search engine. Building on the matrix example from before, the left hand side of the following diagram shows the interface extracted from the JUnit test case and the right hand side, shows a potential reuse candidate and the mapping from the former to the latter automatically discovered by our adaptation engine based on the test case:

Fig. 7. Automatically discovered adaptation: from the desired interface to a reuse candidate

As demonstrated by this example, the implementation we created was able to increase the number of successfully tested components significantly on 26. We are currently working on publishing further details on the underlying automated adaptation engine in the near future; the overall efficiency of our test-driven reuse environment using it has already been described in [4].

7 Conclusion As already realized by the GoF themselves, there are sometimes situations in which the well-known adapter pattern as introduced in [9] is not able to adapt components. To address this problem, in this paper we have presented an approach which can be used to adapt classes and components in order to integrate them into a given environment. Based on the idea of wrapping and unwrapping objects of the adapted type as soon as they are passed into, or returned by, the adapter, the approach provides a very versatile solution for component adaptation and increases the applicability of the adapter pattern considerably. The presented pattern shares some basic principles with other approaches such as [14], but is much more highly optimized for the automated and non-invasive adaptation of classes and components in the context of componentbased reuse. Since our approach is absolutely non-invasive, it is fully transparent for the client as well as for the adapted class and thus can easily be used with all kinds of

The Managed Adapter Pattern: Facilitating Glue Code Generation for Component Reuse

223

artifacts, ranging from simple Java (source-code) classes to commercial-off-the-shelf (COTS) components or even (web) services where altering the source code is usually not possible. Combined with the notion of test-driven reuse as introduced in [7] our approach also enables the fully automatic creation of adapter classes based on the upfront creation of test cases that are a feature of test-driven development. Since this has not been feasible with other approaches such as [15] so far, it is another central benefit of our approach. However, the automated creation of adapters not only requires the extended version of the adapter pattern discussed in this paper, but also a so-called permutation engine capable of trying out the various internal wirings of the adapter when more than one method with the appropriate signature exists (as for example in figure 7 with the add, sub and mul methods). Since this is an interesting challenge in its own right, we are preparing to elaborate on this on another occasion. Furthermore, as we have already proposed in [17], integrating the capability of automatic adaptation based on test cases directly into components would allow for an even higher degree of automation in component-based and service-oriented development as it would enable components to automatically adapt their provided interfaces to those required by a given deployment environment. Acknowledgments. We wish to thank our colleagues Daniel Brenner and Werner Janjic for stimulating discussions.

References 1. Szyperski, C.: Component Software, 2nd edn. Addison-Wesley, Amsterdam (2002) 2. Mili, A., Mili, R., Mittermeir, R.: A Survey of Software Reuse Libraries. Annals of Software Engineering 5 (1998) 3. Frakes, W.B., Pole, T.P.: An Empirical Study of Representation Methods for Reusable Software Components. IEEE Transactions on Software Engineering 20(8) (1994) 4. Hummel, O., Janjic, W., Atkinson, C.: Code Conjurer: Pulling Reusable Software out of Thin Air. IEEE Software 25(5) (2008) 5. Hummel, O., Atkinson, C.: Using the Web as a Reuse Repository. In: Morisio, M. (ed.) ICSR 2006. LNCS, vol. 4039, pp. 298–311. Springer, Heidelberg (2006) 6. Inoue, K., Yokomori, R., Fujiwara, H., Yamamoto, T., Matsushita, M., Kusumoto, S.: Ranking Significance of Software Components Based on Use Relations. IEEE Transactions on Software Eng. 31(3) (2005) 7. Hummel, O., Atkinson, C.: Extreme Harvesting: Test Driven Discovery and Reuse of Software Components. In: International Conference on Information Reuse and Integration. IEEE Press, New York (2004) 8. Beck, K.: Test-Driven Development by Example. Addison-Wesley, Amsterdam (2003) 9. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns. In: Elements of Reusable Object-Oriented Software. Addison-Wesley, Amsterdam (1995) 10. Mugridge, R., Cunningham, W.: FIT for Developing Software: Framework for Integrated Tests. Prentice Hall, Upper Saddle River (2005) 11. Atkinson, C., Bayer, J., Bunse, C., Kamsties, E., Laitenberger, O., Laqua, R., Muthig, D., Paech, B., Wust, J., Zettel, J.: Component-based Product Line Engineering with UML. Addison Wesley, Amsterdam (2002)

224

O. Hummel and C. Atkinson

12. Zaremski, A.M., Wing, J.M.: Signature Matching: A Tool for Using Software Libraries. ACM Transactions on Software Engineering and Methodology 4(2) (1995) 13. Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm, J., Griswold, W.G.: An overview of AspectJ. In: Knudsen, J.L. (ed.) ECOOP 2001. LNCS, vol. 2072, pp. 327–353. Springer, Heidelberg (2001) 14. Seiter, L., Mezini, M., Lieberherr, K.: Dynamic Component Gluing. In: Czarnecki, K., Eisenecker, U.W. (eds.) GCSE 1999. LNCS, vol. 1799, p. 134. Springer, Heidelberg (2000) 15. Gschwind, T.: Type Based Adaptation: An Adaptation Approach for Dynamic Distributed Systems. In: van der Hoek, A., Coen-Porisini, A. (eds.) SEM 2002. LNCS, vol. 2596. Springer, Heidelberg (2003) 16. Freeman, E., Freeman, E., Bates, B., Sierra, K.: Head First Design Patterns. O’Reilly, Sebastopol (2004) 17. Atkinson, C., Hummel, O.: Reconciling Reuse and Trustworthiness through Self-Adapting Components. In: International Workshop on Component-Oriented Programming, WCOP (2009)

Reusing Patterns through Design Refinement Jason O. Hallstrom1 and Neelam Soundarajan2 1 2

School of Computing, Clemson University [email protected] Computer Sci. & Eng., Ohio State University [email protected]

Abstract. Reﬁnement concepts, such as procedural and data reﬁnement, are among the most important ideas of software engineering. In this paper, we investigate the idea of design refinement, the process of reﬁning a set of design patterns to arrive at application-speciﬁc design components, and ultimately, to system implementations. The approach also enables designers to reﬁne a given pattern to arrive at more specialized versions of that pattern —sub-patterns— thus enabling the creation of pattern hierarchies. We present three contributions: (i) We explore the concept of design reﬁnement and consider what it means for such a reﬁnement to be correct, in the sense of being faithful to the pattern being reﬁned. (ii) We describe a two-part formalism for documenting patterns and subpatterns. A pattern contract captures the requirements and behavioral guarantees associated with a given pattern, while a subcontract captures the ways in which the pattern is specialized for use in a particular application or sub-pattern. Contracts and subcontracts serve as the basis for validating the correctness of a given reﬁnement. (iii) We consider how related patterns may be organized into suitable hierarchies based on the notion of design reﬁnement. We focus on variations of the standard Observer pattern as examples. A key feature of our formalism is that while it enables us to specify patterns and sub-patterns precisely, it allows us to do so without compromising their flexibility.

1

Introduction

Refinement has been a central theme in software engineering since the inception of the ﬁeld. Development techniques based on procedural reﬁnement and data reﬁnement provide a powerful set of methods for designing and implementing software. Equally important, they provide a foundation for ensuring software correctness. For each reﬁnement technique, suitable reasoning methods and/or calculi [1,2] have been developed to help software practitioners validate the correctness of their reﬁnement steps. The result has been a dramatic improvement in software quality. Our work is based on the observation that there is another form of reﬁnement, design refinement, that has become increasingly important during the past decade as the use of design patterns has become ubiquitous in software practice. Design reﬁnement corresponds to the process of transforming a set of design patterns into system design components, and ultimately, to system S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 225–235, 2009. c Springer-Verlag Berlin Heidelberg 2009

226

J.O. Hallstrom and N. Soundarajan

implementation components that exhibit speciﬁc behavioral properties — provided that the reﬁnement steps applied respect the requirements dictated by the pattern. In this paper, we investigate the principles of design reﬁnement and explore its application in capturing hierarchies of related patterns in a manner that enables designers to reuse the eﬀort involved in understanding them. We additionally consider how to ensure that the particular reﬁnement steps applied in specializing a pattern satisfy the requirements associated with its correct usage. While there is extensive literature documenting various aspects of patterns and the advantages of using them (e.g., [3,4,5]), questions related to precisely specifying the requirements associated with applying patterns and associated techniques for checking if those requirements are met —i.e., ensuring design correctness— have not been fully addressed. Our goal is to develop such techniques. In our approach, the requirements that must be met when applying a pattern and the consequent behaviors that are expected as a result are expressed in the form of a pattern contract. Details concerning the specialization of the pattern as used in a particular system are expressed in the form of a corresponding pattern subcontract. While pattern formalization can be expected to provide the usual beneﬁts, such as eliminating ambiguity and serving as the basis for ensuring correctness, the process runs the risk of compromising pattern flexibility [3]. This is a serious concern; much of the power of patterns and the driving force behind their broad adoption derives from the ﬂexibility they aﬀord in applying them. The notion of abstraction concepts, an essential part of the formalism, helps preserve this ﬂexibility while simultaneously achieving speciﬁcation precision. Each abstraction concept corresponds to a dimension of ﬂexibility that must be preserved. Indeed, the process of identifying these abstraction concepts can help to identify latent dimensions of ﬂexibility missing from standard pattern descriptions [6]. Thus, in our approach, a pattern contract speciﬁes the requirements that must be satisﬁed to ensure the correct application of a given pattern, with the abstraction concepts used in its deﬁnition allowing for appropriate variations based on the needs of particular systems. The design reﬁnement process eﬀectively “pins down” these variations by providing suitable deﬁnitions for the abstraction concepts, while ensuring that the requirements dictated by the contract are satisﬁed. These deﬁnitions are supplied in a corresponding subcontract. To summarize, the information contained in a pattern contract applies to all possible uses of a given pattern, while the information contained in a subcontract captures how the pattern was specialized for use in a given application. In some cases, however, it may be desirable to leave some of the abstraction concepts undeﬁned, with the corresponding ﬂexibility dimensions unbound. In this case, the subcontract will not capture a pattern application, but rather, a more specialized version of the original pattern — a sub-pattern. The contract for this new pattern is formed by the original contract as specialized by the subcontract for this reﬁnement. This approach introduces an interesting possibility: Patterns related through a series of reﬁnements can be classiﬁed in the form of a pattern hierarchy. The beneﬁts of doing so are two-fold. First, pattern

Reusing Patterns through Design Reﬁnement

227

hierarchies can highlight the interconnections among related patterns, aiding developers in the pattern selection process. Second, pattern hierarchies enable designers to reuse the reasoning eﬀort involved in understanding a given pattern when reasoning about sub-patterns of that pattern. We illustrate these points by considering variations on the standard Observer pattern [4]. Although some of the variations in the resulting hierarchy have been documented in the literature, others, equally natural from the point of view of design reﬁnement, have not. The work reported in this manuscript represents a substantial revision and extension of our earlier work in pattern speciﬁcation [6,7]. Although the earlier work was also based on the idea of pinning down ﬂexibility dimensions when documenting pattern applications, the kinds of abstraction concepts used in the formalism were limited. Most important, the formalism did not support variation in the interaction sequences among participating objects. Hence, the associated ﬂexibility was also limited. Further, and partly as a result of this, the formalism could not help identify or characterize relations among patterns, nor organize them into suitable hierarchies. Paper Organization. The remainder of the manuscript is organized as follows. Section 2 surveys related work in pattern formalization. Section 3 introduces the principles of design reﬁnement, including the three types of abstraction concepts at its core. Section 4 summarizes the basic structure of pattern contracts and subcontracts. Section 5 demonstrates the principles of design reﬁnement by constructing a hierarchy of patterns based on the standard Observer pattern. Finally, Section 6 concludes with a summary of contributions.

2

Related Work

A number of authors have investigated issues related to pattern formalization. Structural properties have been an important focus. Eden [8,9] presents an approach to specifying the structural properties of patterns using a higher-order logic notation. Each set of pattern formulae specify the participating classes, methods, and inheritance hierarchies, and the corresponding relations among them. Kim and Carrington [10] present an Object-Z-based formalization of patterns using role concepts. Each role concept describes a pattern participant, such as a class, class feature, or like element. The resulting formalizations capture, in Object-Z, the structural relations among role concepts. Sunye et al. [5] and Dong [11] consider UML extensions used to model the structural aspects of patterns. Lano [12] also focuses on structural issues, using model transformations to formalize patterns. His work shows how a pattern can be viewed as a transformation from a given set of classes to another set of classes with the desired pattern properties. In contrast to the work of these authors, our focus is on behavioral properties — which are not readily captured using any of the above approaches. Mikkonen et al. [13,14] focus on behavioral properties using an action system notation that abstracts over the ﬂow of control among participants. Superposition is used to support pattern reﬁnement. And while the approach has been shown to be useful in reasoning about the temporal aspects of pattern behavior, the

228

J.O. Hallstrom and N. Soundarajan

ﬂexibility enabled by our abstraction concepts is richer. It is worth noting that Taibi and Ngo [15] combine the action system approach of Mikkonen et al. with the higher-order logic approach of Eden. While the resulting formalism is more comprehensive, the behavioral portion of the formalism suﬀers the same ﬂexibility limitations as Mikkonen et al. ’s approach. Closest to our work is that of Helm et al. [16], published before the seminal patterns book [4]. While the authors consider some structural issues, they focus on capturing behavioral properties. The speciﬁcation notation includes support for reﬁning object interactions and for arriving at application-speciﬁc behaviors. But the formalism’s expressivity is limited. For example, the notion of a call sequence as a mathematical object is underdeveloped. It is impossible, for instance, to quantify over a call sequence to require that a particular method be invoked exactly once. There is also nothing similar to our use of concept constraints to prevent incorrect concept reﬁnements. Nor can conditions be imposed on behaviors of methods not named in the pattern being speciﬁed. As a result, these other methods might nullify behaviors implemented by the named methods. We should also mention the work on generative reuse [17,18,19]. Although not based on design patterns, the type of reﬁnement that underlies this work is, in some ways, similar to design reﬁnement. Hence our approach may also be applicable to reasoning about generative software. Before concluding this section, it is interesting to note that aspects of our approach are related to important issues identiﬁed by authors who use an informal approach to documenting patterns. According to Buschmann et al. [3], “You should be able to reuse the pattern in many implementations, but so that its essence is still retained. . . . After applying a pattern, an architecture should include a particular structure that provides for the roles speciﬁed by the pattern, but adjusted and tailored to the speciﬁc needs of the problem at hand.” What is the essence of a pattern and what types of “adjusting and tailoring” of roles are allowed? The answer to the former question is provided by pattern contracts, the answer to the latter by the notion of design reﬁnement.

3

Design Refinement and Abstraction Concepts

A key beneﬁt of any reﬁnement-based approach is the flexibility it provides in the form of abstractions that may be realized in various ways. While design reﬁnement builds upon the ideas of procedural and data reﬁnement, it aﬀords much greater ﬂexibility via more powerful types of abstractions that are unique to patterns. These abstractions can be classiﬁed into structural abstraction, staterelation abstraction, and interaction abstraction. Consider the standard Observer pattern [4], which deﬁnes two roles, Subject and Observer. The pattern’s intent is to maintain consistency between the state of the object playing the Subject role and the state(s) of the object(s) playing the Observer role. The subject maintains a set, obs, of references to the observers attached to the subject. Subject provides attach() and detach() methods for attaching and detaching observers, respectively. The subject must also provide a

Reusing Patterns through Design Reﬁnement

229

method, which must be invoked whenever there is a significant change in the subject’s state. notify() is required to invoke update() on each attached observer, which must in turn update the observer’s state to make it consistent with the new state of the subject. In a system built using Observer, a developer need not implement classes named Subject and Observer; application-speciﬁc names are likely to be more appropriate. Role methods, such as update() and notify(), may also be suitably renamed. Thus, in the example that Gamma et al. [4] consider, the subject is a spreadsheet, the observers being windows, each displaying the information in the spreadsheet in diﬀerent formats such as a bar graph, pie chart, etc. The corresponding classes and their methods will be named appropriately. Further, in implementing application-speciﬁc versions of update() and notify(), the corresponding method signatures need not match those prescribed by the pattern. An application might, for instance, require additional parameters as part of the update() signature to pass state components from the object playing the Subject role to those playing the Observer role. Structural abstraction aﬀords this ﬂexibility. The role maps, corresponding to the various roles of the pattern, in the pattern subcontract for a given system, will specify the details of these reﬁnements. Consider now the update() method deﬁned by the Observer role. As noted above, when this method is invoked on an observer, it must make the observer’s state consistent with the current state of the subject. But what precisely does this mean? One possibility is that the observer makes a copy of the subject’s state. While this is what some standard descriptions of the pattern suggest, it is inappropriate if an observer needs partial information about the subject. It is even possible that in an application, instances of two diﬀerent classes, both playing the Observer role, might be simultaneously attached to a subject and maintain information about diﬀerent aspects of the subject’s state. The solution is to treat the notion of consistent as a state-relation abstraction concept —henceforth relation abstraction concept— between the states of the subject and the observers. The deﬁnition of the Consistent() concept will be tailored to suit the needs of particular applications or sub-patterns. Another relation abstraction used in specifying this pattern corresponds to the notion of significant modification in the state of the subject. The relation, Modified(), is deﬁned between two states of the subject and used to determine which subject state changes trigger calls to notify(). The pattern contract will require that if Modified(s1 , s2 ) is true (respectively, false), and the subject’s state changes from s1 to s2 , then notify() must (respectively, need not) be invoked. The subcontract for a system built using the pattern will provide deﬁnitions, applicable to that system, for both concepts. Finally, consider the implementation of the notify() method. When notify() is called, it is required, according to standard descriptions, to invoke update() on each attached observer. While this strategy will achieve consistency with all the observers, there are other ways to accomplish this goal. For example, the observers might be arranged in a chain, with each observer maintaining a reference to the next. When update() is invoked on a given observer, it would then update its state and propagate the call to its successor. In this case, notify() need only invoke notify()

230

J.O. Hallstrom and N. Soundarajan

on the ﬁrst observer in the chain. Alternately, the observers might be arranged in clusters, with one member of each cluster responsible for invoking update() on the others. In this case, notify() must invoke update() only on the designated cluster head within each cluster. One could argue that such variations are not allowed by the Observer pattern, given its standard descriptions. But that is simply an issue of terminology. One could introduce a new pattern, General Observer, which only requires that notify() invoke update() on appropriate observers to ensure consistency of the entire observer set. Then Standard Observer and the variations described above would be legitimate reﬁnements of that pattern. We call this type of reﬁnement interaction refinement since it is the sequence of interactions among the objects that is being reﬁned. In the pattern contract for Observer, interaction abstraction concepts are used to capture these points of ﬂexibility. The subcontract for an application would provide deﬁnitions for these concepts applicable to that system. Similarly, the subcontract for a sub-pattern such as Standard Observer would provide deﬁnitions for these concepts — without, however, providing definitions for the other abstraction concepts, such as Consistent(). update()

4

Contracts and Subcontracts

Suppose that a system S is constructed using a pattern P . During the execution of S, there will be zero or more groups of objects interacting according to P . Each such group, pi , is an instance of P , and each object in pi is enrolled to play a role R in P . Further, each such object may be simultaneously enrolled in other instances. We use a ghost variable, playersi[], to denote the set of objects currently enrolled in pi . P ’s contract will include a role contract for each role. Role R’s contract consists of an abstract data model for R and pre and post speciﬁcations of its methods. If a class C of S plays role R in some pi , the subcontract will specify how the state of C maps to R’s model and how the methods of C map to the methods of R. Correctness of this reﬁnement requires that these methods of C, under the mappings speciﬁed in the role contract, satisfy the corresponding method speciﬁcations in the role contract. R’s contract will include an others clause that must be satisﬁed by any remaining (unmapped) methods of C to prevent those methods from violating the intent of the pattern. The role contract also includes enrollment and disenrollment clauses that specify how objects enroll in and dis-enroll from the role, respectively; we omit these details. Finally, P ’s contract will specify an invariant over the objects in the players[] array for each pi . Ensuring appropriate relations between these objects is the purpose of applying P , thus the invariant is a key part of the pattern contract. The speciﬁcations of the role methods and invariant will be in terms of the models of P ’s roles and will include clauses involving the relation abstraction concepts of P , representing some of the ways in which P can be reﬁned. With each pi , we associate an instance trace τi , a ghost variable that records information about the method invocations involving objects in pi . At runtime,

Reusing Patterns through Design Reﬁnement

231

when such a call is made, an element is added to τi that records the method name, the identity of the target object, the calling object, and any parameter/return values; the states of the caller and callee are also included in the record. A similar post-conditional record is added to τi when the invocation completes. Between the pre record and the post record, additional records may be added to τi , corresponding to calls made from the original method, calls from within those methods, etc. As long as the calls involve objects in pi , they will be recorded on τi . These traces provide a pattern-centric view of the object interactions within S. For example, examining the records in τi that correspond to methods that enroll/dis-enroll objects in various roles yields the set of objects currently in pi . Both the pattern invariant and the role methods deﬁned by P ’s contract may include conditions on τi . For example, the role method speciﬁcation of notify() in the Standard Observer contract will require that upon completion, τi be extended by calls to update() on each observer. In general, these clauses will involve the interaction abstraction concepts of P , representing the ways in which the interactions of P may be reﬁned. The contract will typically impose constraints on these concepts, as well as on the relation abstraction concepts, that govern the allowable deﬁnitions that may be supplied in a subcontract, lest the pattern invariant (and the correctness of the reﬁnement) be violated. Now consider the subcontract corresponding to S. It speciﬁes how the abstractions of P are reﬁned to satisfy the requirements of S. Structural reﬁnement is achieved through the role maps deﬁned by the subcontract. For each class C of S that plays a role R of P , the subcontract speciﬁes a role map that maps the concrete state and methods of C to the abstract model and methods of R. These methods must satisfy, under the deﬁned mappings, the corresponding method speciﬁcations deﬁned by R’s role contract. The subcontract also provides deﬁnitions for each relation abstraction concept and interaction abstraction concept. The deﬁnitions must satisfy the constraints speciﬁed in P ’s contract. Now suppose that PS is a specialized sub-pattern of P , and R is a role of P . The simplest case is when the role contract for R is inherited by PS from P . More interesting is the case when the data model for R is inherited, but the speciﬁcation of one or more methods is strengthened by weakening the precondition and/or strengthening the post-condition. The role model may also be diﬀerent, in which case the role map would be similar to that for a class playing role R. Two roles of PS might also play role R. The role map for each would deﬁne mappings from the respective role models to R’s model and might provide strengthened speciﬁcations for some of the methods, inheriting the rest from R. Another possibility is that PS may include a new role that does not map to any role of P . The corresponding role contract is not constrained by P ’s contract. The interaction traces for instances of PS will record method calls on objects playing such additional roles. When checking whether the assertions of PS imply the corresponding assertions of P —in particular, when dealing with the assertions over the traces— we eﬀectively project out these elements. The PS subcontract may also introduce new abstraction concepts and include constraints on these concepts; the constraints may involve the concepts inherited from P . (Of course,

232

J.O. Hallstrom and N. Soundarajan

all constraints in the contract of P are inherited.) The precise syntax for the various elements deﬁned within pattern contracts and subcontracts is part of our ongoing work. We omit these details due to space limitations, but illustrate some of the most important ideas in the next section.

5

Observer Hierarchy

Consider General Observer, the generalized Observer pattern intended to serve as the specialization base for (i) Standard Observer, (ii) Chained Observer (with the observers arranged in a chain), (iii) Clustered Observer (with the observers arranged in multiple clusters), and other sub-patterns. The contract for General Observer cannot require that notify() directly invoke update() on each attached observer since some of these specializations would not meet this requirement. But it would not be suﬃcient to simply require that when notify() ﬁnishes, each attached observer’s state be Consistent() with the subject’s state. This could be satisﬁed by, for example, simply resetting the subject’s state back to its preconditional value (i.e., before the change that triggered notify()) rather than updating the observers’ states. Another point related to the ﬂexibility of the Observer pattern is worth noting. Many standard treatments of the pattern require that the other methods of the Observer role not make any changes to the observer’s state, lest it become inconsistent with the subject’s state. This is too restrictive since, for example, it doesn’t allow an observer to change the format used to display information about the subject. In [6], we relaxed this to allow other methods of Observer to make changes as long as those changes left the observer in a state consistent with the same subject state that held at the start of the modifying Observer method. While this improves ﬂexibility, it is still not ﬂexible enough. For example, in the MVC architecture [3], both View and Controller play the Observer role. While View’s other methods meet this requirement, Controller’s methods do not. Indeed, in some MVC-based systems, the only way for a user to modify the subject’s (i.e., model’s) state is via these methods. So, to maintain consistency, the state of the controller (and other observers) would have to be updated, not left unchanged. To allow for the types of variation described above in regard to how update() is invoked on the various attached observers, we introduce the AllObsUpdated() interaction abstraction concept. The concept is deﬁned over the subject state and the interaction trace and represents the notion of whether all attached observers have been updated, as necessary, to make them consistent with the current state of the subject. This intuition is captured in the following constraint, declared as part of the General Observer contract: AllObsUpdated(s, τ ) ⇒ [¬Modified(s, @CurrState(s, τ ))∧ ∀ob ∈ s. obs : Consistent(s, @CurrState(ob, τ ))] @CurrState() is an auxiliary function that returns the state of the speciﬁed object in the most recent record in τ involving the object. Hence, the ﬁrst clause

Reusing Patterns through Design Reﬁnement

233

of the consequent requires that the current subject state be unmodified from s, which represents the state of the subject at the start of the notify() call. That is, the clause requires that the subject’s state not be modiﬁed while the observers are updated. The second clause of the consequent requires that for each observer in the obs set, the most current state recorded in τ be consistent with the subject’s state. This allows the various updating strategies used in the sub-patterns, while ensuring that all the observers are updated. Thus, we use AllObsUpdated() in the speciﬁcation of notify() in the contract of General Observer. This ensures, given the above constraint, the intended behavior of the method. The subcontract for each specialization of General Observer will provide an appropriate deﬁnition for AllObsUpdated(). For example, the subcontract for Standard Observer will deﬁne AllObsUpdated() to be true if τ contains a sequence of calls to update() on each element of obs, and false otherwise. The deﬁnitions corresponding to the subcontracts for Chained Observer and Clustered Observer will be more complex since they must account for the richer structure of the associated interaction sequences. Interestingly, the variation in the behavior of the other methods of Observer, as in the Controller of MVC, can be represented without additional abstraction concepts. The others speciﬁcation in the base pattern contract will require that changes in the observer state during execution of these methods must be due to intervening calls to update(), which themselves are due to changes in the subject state that result in calls to notify(). This requirement will be imposed by specifying suitable constraints on elements of τ as part of the others clause.

6

Conclusion

We have described a new form of refinement that plays a fundamental role in the design and implementation of object-oriented systems. Design refinement complements traditional reﬁnement concepts and corresponds to the process of transforming a set of design patterns into design components, and ﬁnally, into implementations. Further, design reﬁnement allows us to reﬁne existing patterns to arrive at specialized sub-patterns and pattern hierarchies. This not only aids in pattern selection, but enables reuse of the eﬀort involved in reasoning about a given pattern when reasoning about its variants. We presented three contributions. First, we developed the idea of design reﬁnement, including the three types of abstraction at its core. Second, we described an approach to specifying pattern requirements and behavioral guarantees in the form of pattern contracts, and to specifying pattern subcontracts that correspond to particular reﬁnements of a pattern. A key consideration was to ensure that the ﬂexibility of the pattern being speciﬁed was not compromised; the three types of abstraction concepts supported by the formalism ensure this. Indeed, a natural result of developing pattern contracts is that the contracts suggest partial reﬁnements that correspond to specialized patterns and hierarchies. Thus, as the third contribution of the paper, we explored a hierarchy of Observer patterns. Although the Observer pattern has been discussed widely in the literature, and various authors have suggested variations, our work seems to be the ﬁrst

234

J.O. Hallstrom and N. Soundarajan

to investigate them systematically. We were able to do so because the notion of design reﬁnement, as well as the contracts for the various Observer variants, provided a natural foundation on which to base the hierarchy. In our future work, we intend to investigate other pattern hierarchies. This should be of great help to developers since each pattern in the hierarchy will be clearly speciﬁed.

Acknowledgments This work was supported in part by the National Science Foundation (CNS[CAREER]-0745846). The authors gratefully acknowledge the NSF for its support.

References 1. de Roever, W., Engelhardt, K.: Data Reﬁnement: Model-Oriented Proof Methods and their Comparison, Cambridge (2001) 2. Morgan, C.: The speciﬁcation statement. ACM Transactions on Programming Languages and Systems 10, 403–419 (1988) 3. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: PatternOriented Software Architecture: A System of Patterns. John Wiley & Sons, Chichester (1996) 4. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, Reading (1995) 5. Suny´e, G., Guennec, A.L., J´ez´equel, J.: Design patterns application in UML. In: The 14th European Conference on Object-Oriented Programming, pp. 44–62. Springer, Heidelberg (2000) 6. Soundarajan, N., Hallstrom, J.: Responsibilities and rewards: Specifying design patterns. In: The 26th International Conference on Software Engineering, pp. 666–675. IEEE Computer Society, Los Alamitos (2004) 7. Hallstrom, J., Soundarajan, N., Tyler, B.: Amplifying the beneﬁts of design patterns. In: The 9th International Conference on Fundamental Approaches to Software Engineering, pp. 214–229. Springer, Heidelberg (2006) 8. Eden, A.: Formal speciﬁcation of object-oriented design. In: The International Conference on Multidisciplinary Design in Engineering (2001) 9. Eden, A.: LePUS: a visual formalism for object-oriented architectures. In: The 6th World Conference on Integrated Design and Process Technology, pp. 149–159. IEEE Computer Society, Los Alamitos (2002) 10. Kim, S., Carrington, D.: Using integrated metamodeling to deﬁne OO design patterns with Object-Z and UML. In: The 11th Asia-Paciﬁc Software Engineering Conference, pp. 257–264. IEEE Computer Society, Los Alamitos (2004) 11. Dong, J.: UML extensions for design pattern compositions. Journal of Object Technology 1, 151–163 (2002) 12. Lano, K.: Formalising design patterns as model transformations. In: Design Pattern Formalization Techniques, pp. 156–182. IGI Publishers (2007) 13. Mikkonen, T.: Formalizing design patterns. In: The 20th International Conference on Software Engineering, pp. 115–124. IEEE Computer Society, Los Alamitos (1998)

Reusing Patterns through Design Reﬁnement

235

14. Helin, J., Kellom¨ aki, P., Mikkonen, T.: Patterns of collective behavior in Ocsid. In: Design Pattern Formalization Techniques, pp. 73–93. IGI Publishers (2007) 15. Taibi, T., Ngo, D.: Formal speciﬁcation of design patterns – a balanced approach. Journal of Object Technology 2, 127–140 (2003) 16. Helm, R., Holland, I., Gangopadhyay, D.: Contracts: Specifying behavioral compositions in object-oriented systems. In: The European Conference on ObjectOriented Programming, pp. 169–180. ACM Press, New York (1990) 17. Batory, D., Singhal, V., Thomas, J., Dasari, S., Geraci, B., Sirkin, M.: The GenVoca model of software-system generators. IEEE Software 11, 89–94 (1994) 18. Biggerstaﬀ, T.: A perspective of generative reuse. Annals of Software Engineering 5, 169–226 (1998) 19. Neighbors, J.: Draco: A method for engineering reusable software systems. In: Software Reusability: Concepts and Models, vol. 1, pp. 295–319. ACM Press, New York (1989)

Building Service-Oriented User Agents Using a Software Product Line Approach Ingrid Nunes1 , Carlos J.P. de Lucena1 , Donald Cowan2 , and Paulo Alencar2 1

PUC-Rio, Computer Science Department, LES - Rio de Janeiro, Brazil {ionunes,lucena}@inf.puc-rio.br 2 University of Waterloo - Waterloo, Canada {dcowan,palencar}@cs.uwaterloo.ca

Abstract. This paper presents an approach to develop service-oriented user agents using the Software Product Line (SPL) engineering paradigm. The approach comprises activities and models to support building service-oriented customized agents that automate user tasks based on service orchestration involving multiple agents in open environments, and takes advantage of the synergy of Service-oriented Architectures, Multi-agent Systems and SPLs. The domain-based process involves extended domain analysis with goals and variability, domain design with the speciﬁcation of agent services and plans, and domain implementation. Keywords: Multi-agent Systems, User Agents, Service-oriented Architectures, Software Product Lines, Personalization.

1

Introduction

An agent-based method is often the approach of choice in many of the existing software tools that support applications such as web-based supply-network management, medical-record processing, and e-commerce [1]. These systems are typically open, highly interactive, autonomous and context-aware, and need to support customized and ﬂexible user services. In contrast, Service-oriented Architectures (SOAs) [2] and related approaches are often used to deliver application functionality as reusable services to end-user applications or to build other supporting services. Service-oriented systems follow many of the ideas from research conducted in Agent-oriented Software Engineering (AOSE) but there are several challenges that still need to be faced in terms of their combination [3]. The integration of these two approaches has been used in domains such as ecommerce, in which users have agents acting on their behalf to automate their tasks. However, given that agents represent individuals in these scenarios, there remains a need to personalize an agent to meet speciﬁc needs of the users and to support their implementation in an automated way. A potential solution for this problem is the development of families of agents. In this context, SPLs [4] are a software engineering trend that promotes reduced development costs, shorter time-to-market and higher quality, when developing families of systems by the exploitation of the common features among family S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 236–245, 2009. c Springer-Verlag Berlin Heidelberg 2009

Building Service-Oriented User Agents Using a SPL Approach

237

members. Existing approaches do not address the problem of generating these user component services in a SPL fashion, and fail to provide reusable multiagent service components as well as suitable representations and processes that support automated software generation based on common and variable features within a domain. Moreover, most of agent-based approaches do not take into account the adoption of extensive reuse practices that bring both reduced time and costs to the software development [5]. In this paper we present a domain engineering process-oriented approach to build customized service-oriented user agents using the SPL engineering paradigm. An agent is an encapsulated software system that may oﬀer business services and collaborate with other agents in interesting business service engagements. The approach comprises activities and models to support the development of service-oriented customized agents that automate user tasks based on service orchestration involving multiple agents in open environments such as the Web. Although, the idea of providing agents to act on behalf of users has been introduced in previous work [6], there is a clear need to provide personalized agents in large numbers given that they represent individuals. This issue leads us to consider automated generation using a SPL approach. Therefore, our approach takes advantage of the synergy of SOA, Multi-agent System (MAS) and SPL. The remainder of this paper is organized as follows. Section 2 gives an overview of our approach, whose phases are detailed in Sections 3 to 5. Section 6 details the derivation process of customized user agents. Section 7 discusses some relevant issues that emerged from our study, and in Section 8 we describe related work. Finally, in Section 9 we present our conclusions.

2

Approach Overview

Our approach aims at deﬁning activities and models to address the development of customized agents, deployed in a MAS. These agents achieve their goals by the execution of plans that may use other agents’ services and they can provide services to other agents or users. The approach incorporates principles and concepts of SOA, MAS and SPL, which are described in the next three paragraphs. Problem Decomposition into MAS Concepts. AOSE provides several concepts for understanding and modeling a complex and distributed system. Each agent of a MAS may be classiﬁed from two diﬀerent perspectives [7]: (i) internally as a software system with its own purpose (intra-agent ); (ii) externally as part of a society interacting with other individuals (inter-agent ). This classiﬁcation is illustrated in Figure 1(a). Our approach focuses on developing a single agent to be part of an existing MAS, detailing its internal structure and interaction with other agents. This agent is structured according to the belief-desireintention (BDI) model [8], which supports modeling cognitive agents and whose advantages include: relative maturity having been used successfully in large scale systems; supported by several agent platforms, such as Jadex and Jason; and based on solid philosophical foundations. Service Analysis and Orchestration. Even though AOSE is based on a powerful abstraction for modeling complex systems, most of AOSE methodologies

238

I. Nunes et al.

(a) Canonical view of a complex distributed system: e-Marketplace

(b) Approach Overview

Fig. 1. Approach Overview and its Illustrating Example

focus on the development of closed systems, in which agents are known at design time. A key advantage of SOAs is that they enable services to be selected and integrated dynamically at runtime, thus enabling system ﬂexibility and adaptation. In our approach, we deﬁne speciﬁc activities for identifying and specifying services provided by agents. Analysis and Implementation Support to Variability. Given that there are agents representing users in the MAS, there is the requirement of representing their speciﬁc needs. Our approach aims at addressing a SPL of agents through which we can systematically derive customized agents. Our approach contemplates variability analysis in order to capture variations and diﬀerent possible conﬁgurations of the user agents and provide implementation support to build reusable assets with the aim of allowing automatic derivation of agents. Figure 1(b) depicts the phases and activities performed in our approach, the sequence in which they must be performed and their output artifacts. Activities are categorized into the typical phases of domain engineering processes, and they result in reusable artifacts that support the identiﬁed variability. The Agent Derivation and Deployment activity is part of the application engineering process, which enables a systematic assembly of a selected collection of artifacts for building a customized application, which in this case is an agent. 2.1

e-Marketplace Case Study

The e-Marketplace is the case study used to illustrate our approach. Providing applications to automate commerce is one of the application domains of MAS [1]. Currently, commerce is almost entirely driven by human interactions. However, some commercial decision-making can be placed in the hands of agents. Figure 1(a) depicts the overall structure of the case study. It is a MAS, on which there are: (i) agents/organizations representing stores that sell products and provide support services; (ii) agents/organizations that support the buying

Building Service-Oriented User Agents Using a SPL Approach

239

process, such as credit card companies and PayPal; and (iii) user agents that automate the activities performed by users to buy products. Our focus is not to address the development of the whole MAS. Organizations representing stores and other companies are already deployed in the system and ready to interact and provide services to other agents. Furthermore, this existing MAS already provides an ontology giving a formal representation for a set of concepts, which includes the messages exchanged by agents within the domain and protocols deﬁning how messages are exchanged.

3

Domain Analysis

Given that agents are proactive and have goals, it is natural to consider using goals to describe requirements. Goals capture, at diﬀerent levels of abstraction, the various objectives the system should achieve. However, SPL engineering is distinguished from single-system engineering by its focus on the systematic analysis and management of the common and variable characteristics of a set of software systems [9]. In SPL approaches, there are typically two ways of modeling variability. One of them is based on the use of the feature model to capture common and variable features and the explicitly variability documentation along SPL artifacts [10]. The other approach, used in [4,9], proposes the use of a separate model to express variability. We have adopted this second method primarily for two reasons: (i) modeling common and variable features in the same model can lead to huge and complex models; (ii) given that our target is to develop agents that follows the BDI model, it is more natural to make a goal-oriented domain analysis instead of using a feature-oriented approach. Thus, in our domain analysis, two activities are performed in parallel and are complementary to each other. The Goal-oriented Domain Analysis is responsible for capturing the SPL goals. This is performed at diﬀerent levels of abstraction by the goal decomposition. Figure 2(a) illustrates the goal model of the user agent SPL to buy products in the e-Marketplace case study. At certain points of the goal analysis, some variable traits can be identiﬁed in the domain. The activity responsible for analyzing and documenting variability within the domain is the Domain Variability Analysis. In order to document variability, we adopted the notation proposed in [4], because it documents only the variable aspects of the SPL. It describes variation points (what varies) and variants (how it varies). In addition, it also contains constraints. These constraints ensure that only valid combinations of the variations are selected. The variability model for the user agent in the e-Marketplace case study is presented in Figure 2(b). In the model, there are two kinds of variation points: (i) the optional variation point is one that may or may not be present in the agent; (ii) the alternative variation point is associated with a set of variants, some of which must be chosen. There is a variation point named Services, because we intend to extend the user agent in the future to incorporate new services for users, but for the moment the only service provided is Buy Product. The variability identiﬁed and documented in the Domain Variability Analysis is used for two purposes: (i) to indicate optional and alternative parts that have to

240

I. Nunes et al.

Fig. 2. Approach Artifacts

be supported along all the subsequent models and code assets; and (ii) to provide a way for users to choose the appropriate conﬁguration of their agent. Besides the two reasons presented for documenting variability in a separate model, this practice helps to keep consistency among the models. However, even though this information is not explicitly shown in the goal model, it is important to know what is variable in this model. As a consequence, there is another model that provides traceability links of the variability, ensuring the consistent deﬁnition of the common and variable traits of the SPL throughout all artifacts. Our traceability model consists of a mapping between goals and variability expressions (Figure 2(c)). For instance, the goal Pay is optional, given that if the Payment Type is Pay upon Pick up the agent must not pay for the product because the user is responsible for paying when the product is collected at the store. So the valid variability expression for this goal is either !Pay upon Pick up or Credit Card | PayPal.

4

Domain Design

In this phase, agents are designed by the identiﬁcation and speciﬁcation of their services and plans to achieve goals. The deﬁnition of agents that oﬀer business services has several advantages. First, agent services can be discovered and used by other agents. Agents that are in charge of several responsibilities have a set of goals, beliefs and plans to accomplish each of these responsibilities. Nevertheless, no technique is used to provide modularization of these concepts in order to provide a better understanding of their purpose. Besides providing modularization, encapsulating them into a service improves reuse in MASs. Moreover, using this service-oriented loosely-coupled approach brings to business process models a structure that can signiﬁcantly improve the ﬂexibility and agility with which processes can be remodeled, thus helping to deal with variability. Most MAS methodologies address the development of closed MASs. Thus, discovering agents and their services is not a concern. To deal with open MASs,

Building Service-Oriented User Agents Using a SPL Approach

241

Fig. 3. Agent Model

in which agents are not known a priori, our approach is to deﬁne the Agent Service Speciﬁcation activity. Their speciﬁcation may require the use of other “black-box” services provided by other agents. Based on the goal model generated in the previous activity, we have identiﬁed the services provided by the user agent (Figure 3). It can be seen that there are four diﬀerent atomic services, which are lower level services and do not use other internal services. On the other hand, the Buy Product service is a composite service, given that it is composed of atomic services. Services are considered atomic from an agent internal viewpoint, however these services can use services provided by other agents. After identifying services, their workﬂows are speciﬁed with UML 2.0 activity diagrams (Figure 4). The workﬂow contains diﬀerent paths, which are based on the variability of the SPL. The workﬂow speciﬁcation shows actions to be performed to achieve service goals and invocations of other agents’ services. In addition, it supports the identiﬁcation of plans to achieve lower-level goals. The artifact that describes which plans are used to achieve which goals is the Agent model (Figure 3). It is divided into two layers: (i) Goal Layer – it shows agent’s goals and their decomposition into sub-goals. It is structured according to the Goal model; and (ii) Plan Layer – it shows agent’s plans. There are links between goals and plans indicating that a certain plan achieves a certain goal. For some goals, diﬀerent plans can be used. After specifying an agent’s services and plans, traceability links must be provided between these concepts and the variation points, similar to the way it was done with goals. This is to ensure that, for instance, if the Payment Type chosen is Credit Card, the selected plan for the Check Payment goal is Check Credit Card Acceptance and for the Pay goal is Pay with Credit Card. So the Traceability model is reﬁned in order to incorporate these new mappings. This model facilitates systematic and consistent reuse, and allows the application engineering to be performed eﬃciently.

5

Domain Implementation

During the implementation of code assets, there is a need to adopt techniques to support the deﬁned variability and allow the derivation of customized agents. We have identiﬁed some types of variability and we adopted guidelines to implement

242

I. Nunes et al.

Fig. 4. Buy Service Workﬂow Speciﬁcation

them. These guidelines are speciﬁc for Jadex, an agent platform based on the BDI model. Jadex supports programming software agents in Agent Deﬁnition Files (ADFs) (XML ﬁles), and Java. While agent’s beliefs, goals and plans are declared in ADFs, a plan’s body is implemented in Java classes. In addition, Jadex provides the capability concept, which is an encapsulated agent module composed of beliefs, goals, and plans that can be reused wherever it is needed. All goals, whether they are top level goals or sub-goals, need to be declared in the ADF. However, some of these goals are optional and alternative. The condition for the Pay goal to be present in a derived agent is according to the variability expression related to it (Credit Card | PayPal). So, we adopted a translation from variability expressions to tags, which are put in the code in order to make conditional compilation possible (see Figure 2(d)). Goals in the agent model can be achieved by either a plan or through its sub-goals. In the ﬁrst case, similar to goals, plans can be optional or alternative. As a consequence, we adopted the same strategy used for goals to support the variability in plans. In the second case, a plan is created for dispatching the appropriate sub-goals. The plan is declared in the ADF, and a Java class is created, which extends the Plan class provided by Jadex. Into the overridden body method of the plan, the sub-goals are dispatched. Some of the sub-goals are mandatory; while others can be optional or alternative. In the same way, we adopt tags to delimit variable sub-goals into the plan that dispatches them. The identiﬁed agent services with their respective agent concepts are encapsulated into capabilities. In addition, Jadex allows the speciﬁcation of messages to be sent and received in the ADFs, to be later used in plans. By deﬁning a service in a capability, the goals of the services and messages that it can send and receive are explicitly deﬁned. Moreover, this capability can be easily reused either by agents or other capabilities, and it can be (un)plugged easily as well.

6

Application Engineering

The reusable infrastructure generated in previous activities are used in the Agent Derivation and Deployment activity, which is part of the application engineering

Building Service-Oriented User Agents Using a SPL Approach

243

process. Based on the models and code assets generated in previous activities, it is possible to select and customize the code assets manually according to a conﬁguration of the variability model and thus deploy the derived agent. However, we are working toward developing a tool to automate this process. The ﬁrst version of the tool is speciﬁc to the e-Marketplace case study. Basically, our tool selects and provides conditional compilation for the code assets. Traditional conditional compilation could not be used because the Jadex platform deﬁnes agents based on XML ﬁles. In addition, the tool has a web-based interface through which the user can conﬁgure an agent and deploy it into the e-Marketplace MAS. The agent derivation process and deployment comprise the following steps: (i) the user describes through the web application interface the product that he wishes to buy and the agent conﬁguration. The agent conﬁguration is made through the selection of optional and alternative variation points and variants; (ii) with the data in step (i) and the traceability model, our tool selects the appropriate code assets; (iii) the tool manipulates the code assets in order to derive customized code for the agent. The tool translates the selected variation points and variants to tags used in the code, and removes optional and alternative fragments of the code that were not selected; (iv) the classes are compiled; and (v) the agent is instantiated and deployed into the e-Marketplace MAS using the Jadex platform.

7

Discussion

In this section, we discuss some relevant issues that arose during our study, which gives directions for future work. Granularity Levels. Granularity in SPLs refers to the degree of detail and precision of a variability as produced by a design or implementation element. In our approach, we have considered variability at three diﬀerent levels of granularity: (i) goals, (ii) plans; and (iii) services. Services correspond to coarse-grained variability. They can be orchestrated in diﬀerent ways in order to provide a composite service. In addition, services can present some ﬁne-grained variability as realized by alternative or optional goals and plans. As an evolution of the approach presented here, we aim at exploring other granularity levels, such as agents and roles. Dynamic Agent Adaptation. Because of the dynamic nature of the Web, the availability of any resource cannot be guaranteed at any single moment. One of the main advantages of SOAs is to promote the automation of ﬂexible and highly adaptive business processes through the orchestration of loosely coupled services. In the e-Marketplace, ﬂexibility is provided in the sense that the user agent discovers store agents to achieve their Buy Product goal; but adaptation in this process is not yet explored. Furthermore, we aim at providing a dynamic adaptation of the user agent in order to incorporate new services and change existing ones. Tool Support. The derivation process of user agents in the e-Marketplace case study is accomplished by our application-speciﬁc tool. However, there is

244

I. Nunes et al.

a general-purpose model-based derivation tool, the GenArch [11], which was developed in our research group and was recently extended to incorporate a new domain-speciﬁc model that addresses the Jadex [12]. The main advantage of this model is that it allows mapping features to higher level concepts, e.g. agents and plans, and therefore they do not get diﬀused into code as it occurs in a generic tool. This GenArch extension is not fully compatible with our approach, however we are aiming to combine the two approaches in order to automate the derivation process.

8

Related Work

In this section we describe approaches to personalization of systems that use agent-based solutions, and to promote reuse in MAS. A personalized recommendation system with multi-agents based on web intelligence is proposed in [13]. The approach provides an intelligent user agent, which is in charge of interacting with the users and receiving the feedback. The development of a personalized time manager, represented by an agent, is addressed in [14]. The agent is designed to assist a human user in managing her time commitments. Both approaches provide personalized assistance for users by means of agents that use artiﬁcial intelligence techniques. The idea is to use cognition to provide customized information, but the agent conﬁguration and behavior are not adjusted to user needs, as our approach proposes. Recent research proposes the use of SPL techniques to promote reuse in MASs. [15] proposes an approach that uses goal-oriented requirement documents, role models, and traceability diagrams to build a ﬁrst model of the system. In the approach, variabilities are analyzed after modeling the MAS, and this may lead to the high coupling between mandatory and optional variabilities and their inadequate modularization. An extensible agent-oriented requirements speciﬁcation template for distributed systems that supports safe reuse is proposed in [16], but this approach covers only the requirements engineering phase. A domain engineering process for Multi-agent Systems Product Liness (MAS-PLs) is presented in [17], however it is a feature-oriented process and focus on deriving closed MASs.

9

Conclusion

Inspired by human societies, several service-oriented multi-agent applications are developed by representing users as agents, which act on the users’ behalf and automate their tasks. The approach presented in this paper advances the development of these service-oriented user agents by providing a systematic method to derive customized versions of the agents. Additionally, it increases the application domain of SPLs and promotes reuse in MASs. The main goal of the approach is to tailor service provision to the preferences and circumstances of the user requesting the service. The domain-based process proposed involves extended domain analysis with goals and variability, domain design with the

Building Service-Oriented User Agents Using a SPL Approach

245

speciﬁcation of agent services and plans, and domain implementation. The approach takes advantage of the interplay of SOA, MAS and SPL, which are not mutually exclusive, but complementary, as shown by our approach. This paper addresses ongoing research on the customization of service-oriented MASs and the integration of SOA, MAS and SPL. We aim to extend our approach in several directions, which includes the provision of dynamic adaptation of agents and the integration with the GenArch tool to support automation of the derivation process. In addition, we intend to use our approach in other case studies in order to provide further evaluation of it.

References 1. Jennings, N.R., Wooldridge, M.: Applications of intelligent agents. In: Agent technology: foundations, applications, and markets, pp. 3–28. Springer, Heidelberg (1998) 2. Erl, T.: Service-Oriented Architecture: Concepts, Technology, and Design. Prentice-Hall, Englewood Cliﬀs (2005) 3. Huhns, M.N., et al.: Research directions for service-oriented multiagent systems. IEEE Internet Computing 9(6), 65–70 (2005) 4. Pohl, K., B¨ ockle, G., van der Linden, F.J.: Software Product Line Engineering: Foundations, Principles and Techniques (2005) 5. Girardi, R.: Reuse in agent-based application development. In: SELMAS 2002 (2002) 6. Krulwich, B.: Automating the internet: Agents as user surrogates. IEEE Internet Computing 1(4), 34–38 (1997) 7. Zambonelli, F., Jennings, N., Omicini, A., Wooldridge, M.: Agent-Oriented Software Engineering for Internet Applications, pp. 326–346 (2000) 8. Rao, A., Georgeﬀ, M.: BDI-agents: from theory to practice. In: ICMAS 1995 (1995) 9. Muthig, D., Atkinson, C.: Model-driven product line architectures. In: SPLC 2, pp. 110–129 (2002) 10. Gomaa, H.: Designing Software Product Lines with UML: From Use Cases to Pattern-Based Software Architectures. Addison-Wesley, Reading (2004) 11. Cirilo, E., Kulesza, U., Lucena, C.: A Product Derivation Tool Based on ModelDriven Techniques and Annotations. JUCS 14, 1344–1367 (2008) 12. Cirilo, E., Nunes, I., Kulesza, U., Nunes, C., Lucena, C.: Automatic product derivation of multi-agent systems product lines. In: SAC 2009 (2009) 13. Huang, L., Dai, L., Wei, Y., Huang, M.: A personalized recommendation system based on multi-agent. In: WGEC 2008, pp. 223–226 (2008) 14. Berry, P., Peintner, B., Conley, K., Gervasio, M., Uribe, T., Yorke-Smith, N.: Deploying a personalized time management agent. In: AAMAS 2006, pp. 1564–1571 (2006) 15. Pena, J., et al.: Building the core architecture of a multiagent system product line: with an example from a future nasa mission. In: Padgham, L., Zambonelli, F. (eds.) AOSE VII / AOSE 2006. LNCS, vol. 4405, pp. 208–224. Springer, Heidelberg (2007) 16. Dehlinger, J., Lutz, R.R.: A Product-Line Requirements Approach to Safe Reuse in Multi-Agent Systems. In: SELMAS 2005 (2005) 17. Nunes, I., Lucena, C., Kulesza, U., Nunes, C.: On the development of multi-agent systems product lines: A domain engineering process. In: AOSE 2009 (2009)

DAREonline: A Web-Based Domain Engineering Tool Raimundo F. Dos Santos and William B. Frakes Virginia Polytechnic Institute and State University Computer Science Department 7054 Haycock Rd, Falls Church, VA 22043 USA {rdossant,wfrakes}@vt.edu

Abstract. DAREonline is a web-based tool for domain engineering. It supports the DARE framework in a centralized platform-independent environment. Our approach leverages concepts of Service-Oriented Architecture (SOA) to aggregate data and functionality from diverse sources that can be helpful in domain engineering. In this paper, we describe DAREonline’s1 architecture and implementation, and its use in a graduate course on software design and quality. Initial results indicate that DAREonline can be a valuable resource for domain analysts and can achieve acceptance at similar levels to DARE COTS.

1 Introduction DARE – Domain Analysis and Reuse Environment – is a comprehensive framework that assists analysts of different levels in understanding the reuse potential of software systems. Complex software arises as a consequence of growth in private corporations, government, and academic circles. Because advanced technology tends to become more accessible over time, there is a strong demand for systems that can accomplish more in more efficient ways, and at a lower cost. Software reuse meets many of these needs: leverages existing systems by pointing out common functions as well as varying behavior; promotes the utilization of shared libraries that have been optimized; shortens the software lifecycle; and lowers the overall burden (human, financial, or otherwise) associated with creating a system. Effective reuse comes about when system stakeholders perform proper domain engineering for an intended target environment. For this purpose, DARE provides facilities where domain information can be collected from technical files, architectural blueprints, and source code, among other documents. Along with that, human experts and exemplar systems complete an essential list of resources targeted by DARE. The original version of DARE [1] was designed on a UNIX platform with the C language. It implemented the concept of a domain book with text manipulation tools for lexical analysis, term clustering, word frequency calculations, synonym definitions, etc. The second prototype was conceived under Windows 3.1 using Visual Basic 3. This Rapid Application Development (RAD) environment allowed more flexibility and power with graphical elements while easing the development effort previously encountered in the C-Unix platform. The second version not only included the original domain book capability and text tools, but also incorporated several new pieces of functionality, such as forms for domain experts, facet and feature tables, and S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 246–257, 2009. © Springer-Verlag Berlin Heidelberg 2009

DAREonline: A Web-Based Domain Engineering Tool

247

an editor for architectural diagrams. The first attempt at an online version of DARE was proposed by Alonso and Frakes [7]. However, the most current DARE approach uses commercial off-the-shelf (COTS) tools as it allows users the greatest flexibility in choosing the most appropriate tools to assist with domain analysis [8]. With the advent of Service-Oriented Architecture (SOA), the next logical step is to introduce an online version of DARE. In the past few years, SOA has enabled the reuse of services across distributed providers, allowing the sharing of data and functions among enterprises. Distributed computing has been identified as a research topic for software reuse [13]. Current internet technologies have made it easier to implement a domain analysis system with many advantages: platform and OS independent functions; standard cross-browser implementation; centralized deployment; easy maintenance of existing functions; and fast go-live of new releases. DAREonline avoids some of the disadvantages of the initial versions, such as being tied to a specific platform or development language. In fact, as web technologies allow, it can support a wide array of software solutions from competing vendors, mashup functions from independent contributors, server and client-side processing, and service-oriented architectures for access to external systems that are SOA compliant. However, as with any web solution, DAREonline still faces some of its own challenges. At a minimum, it must implement the basic original functions of previous versions while maintaining the same level of utility and ease of use; it must overcome data processing limitations from service providers; and it must overcome security limitations presented by various computing technologies. One of the goals of DAREonline is to leverage open-source tools that can accomplish tasks around domain analysis. Of particular interest is the utilization of web services and its reuse potential. Moreover, DAREonline serves as a repository where analysts can store information during a domain analysis project. In the following sections, we describe the architecture of DAREonline. Section 2 presents the overall design of the framework, while providing details of the underlying implementation. An evaluation is given is section 3, outlining survey results from a group of students. In section 4 we give directions for future work where DAREonline may benefit the most according to current limitations, and present our conclusions.

2 Design and Implementation Some of the most desirable characteristics of a web-based framework are its ease of use, completeness, and self-guidance. Ease of use requires simplicity while at the same time providing efficient functionality. Simplicity encourages usability, but runs against increasing complexity in modern software. DAREonline incorporates a menudriven flow with which most parts of the framework can be accessed and jumped into and out of quickly. Completeness dictates that the system contains most of the tools necessary to accomplish a domain analysis study. Refactoring DARE as a web application was carried out over a 7-month period. Most of the effort was concentrated around designing the object model. As different approaches were attempted, many changes were made to account for required relationships and to avoid duplication of work (e.g., establish a one-to-many relationship between system and experts). The Java implementation and database were less cumbersome once the object model

248

R.F.D. Santos and W.B. Frakes

became stable. Defect resolution covered 3 categories: user interface and validation (40% of issues), java logic (40%), and database operations (20%). As DAREonline evolves, it will undergo functional enhancements, but as currently set, it already provides several tools for domain analysis, which are described below. In addition, another goal of this system is to provide a certain amount of guidance to the user. This can be accomplished with informational notes that appear in different pages as well as with messages that alert the analyst about missing components, next steps, and best practices. Domain Engineering is a rich field that cannot be covered in one place. As an efficient alternative, we describe below our online tool for domain analysis. 2.1 Object Model DAREonline employs common concepts of domain engineering as a foundation for its components. As mentioned previously, DARE uses the analogy of a domain book where each chapter of the book is filled in as the analysis process evolves. The parent component is a domain analysis object, which serves as a reference around which other components can aggregate. A domain analysis allows many children and grandchildren components depending on how extensively the analysis process is done. Thus, if the analysis is about stemming algorithms [10], it may include several systems addressing different stemmers, such as Porter, Paice, and Lovins. Domain analyses often use techniques of Information retrieval helpful in understanding specific vocabularies [2]. Figure 1 provides a snapshot of the DAREonline object model. The arrows denote a “has” relationship. For instance, a domain analysis has a scope, which has a statement in addition to one or more equations. The larger boxes represent collections of objects. In this paper, we use the word “component” to denote any part that composes the overall system, whether it be an object or just an attribute of an object. A domain analysis uses the scope object to establish coverage of what can and cannot be done in a particular instance of the domain engineering process [9]. The statement is a textual definition that briefly explains the purpose and limits of the software systems in question, and what it attempts to accomplish. Equations are derived as a means of relating variables, functions, and sets to establish a logical representation of the domain in question. For example, the goal of Information Retrieval (IR) is to search for and find documents according to some search criteria. Given the set of all documents D in a corpus, the IR system uses a translation function M to find D’ documents according to query Q. The equivalent equation can be represented as D' = M(D,Q)

(1)

DAREonline allows the analyst to store the variables, functions, and sets above in order to establish equations for a particular domain bounded by its scope. There is no limit to the number of equations that can be represented. In addition, more can be added or deleted as necessary, or edited on the go. Systems represent the next component of the domain book implemented by DAREonline. Apart from scope, systems should be defined early in the process, possibly before other analysis tasks go too far. Most of the other components (e.g., documents and experts) must have a direct relation to at least one of the systems.

DAREonline: A Web-Based Domain Engineering Tool

249

Fig. 1. DAREonline Object Hierarchy

DAREonline does not restrict the analyst in the number of systems that must be included. In domain engineering, it is generally accepted that at least three exemplar systems be specified for the domain analysis to proceed. At the object level, a system is simply composed of a name, a free-text description, and an identifying number. Other of its aspects are given deeper in the object hierarchy by the other components, such as document sources. DAREonline has two form-based components. The first one, system characteristics, describes aspects such as architectural style, QA methods, and type of users they target, and is given by Frakes [1]. Forms are collected on a one-per-system basis, allowing the analyst to quickly toggle among different ones for a quick snapshot of what each system entails. The second form-based component is related to experts. System experts represent an essential source of information in the domain engineering field, and therefore must be identified according to their qualifications and relevance to the system in question. For each named professional, the expert object tracks a name, position, familiarity, and years of experience with the system. System and experts are directly linked to one another. The final component of the DAREonline object hierarchy is the document. Throughout the domain analysis process, analysts are bound to find a wide variety of files describing various systems, such as feature tables, activity logs, and source code. Along with knowledge conveyed by system experts, documentation presents in detail the inner workings of software systems. For each system, DAREonline allows the analyst to select among predefined types of documents and create a compilation of the sources of information that have been found during the domain analysis process. 2.2 Tools The DAREonline object model described above is not the only piece that makes up the domain book. There are also several browser-based programs of different capabilities. The original DARE framework provides facilities for the domain analyst to perform tasks such as architectural diagrams, vocabulary analysis, and facet tables.

250

R.F.D. Santos and W.B. Frakes

Similarly, DAREonline has implemented several tools that somewhat mimic those capabilities, albeit not on a one-to-one basis. More information can be obtained at http://208.29.54.207:8080/dareonline/tools.htm, such as the tools in [3], [4], and [5].

Fig. 2. DAREonline 3-tier Architecture

The javascript modules, namely lexical analysis, word count and frequencies, and stemming, are fully contained within the JSP pages and do not require any additional download from the user. This is a desirable feature of the system as DAREonline was envisioned to be as self-contained as possible. We also strive to reuse components in order to avoid any extra network traffic or processing overhead either on the client or server side. The Synonyms module, however, demands more of the overall infrastructure. It is setup as a standard web service that allows similar words to be found as in a thesaurus, activating the following process. When a query is issued from the browser, an AJAX request is submitted to an intermediary JSP page. In turn, this JSP page retrieves the query parameters from the incoming request and issues a SOAP call to the web service. Internally, the web service calls a stored procedure in the database, which then returns a list of words corresponding to the synonyms (or antonyms), as requested by the original call. The next set of tools available under DAREonline makes use of Sun’s Java Web Start technologies [6] and the Java Network Launch Protocol (JNLP). This format allows programs to be downloaded to the client’s machine, installed, and executed in a single step. Download only occurs on the first access, but not on subsequent uses. DAREonline utilizes four such programs: Javadocs, FindBugs, CPD, and diagrams. Because these pieces of software are actually installed on the machine (as opposed to

DAREonline: A Web-Based Domain Engineering Tool

251

code running solely on the browser), they have full use of the Java Runtime Environment and its powerful capabilities. Moreover, because Java implements a strong security model [11], Java Web Start programs have less security concerns, such as accessing the local file system to store and retrieve files. Java Web Start programs can be easily incorporated into the DAREonline application as we find third-party software that is deemed useful in domain engineering. 2.3 Data Model DAREonline is implemented on a MySQL 5.1 database, which presents robust functionality and is open-source. MySQL 5.1 supports stored procedures for data manipulation. In addition, it releases versions for both Windows and Unix/Linux environments, which is helpful for developers on a standalone Windows computer, as well as for a production environment on Unix/Linux. Our production site uses Ubuntu 8.04. The DAREonline data model is composed of 7 tables. There are currently 15 stored procedures that handle various tasks, such as querying for synonyms, storing fields, and deleting data items (tables are not accessed directly from the web server). There are several advantages to this approach: separation of SQL scripts from Java classes; better database response since code is precompiled; easier identification of bugs; and changes that can be imposed in one module without affecting others. Since the initial release of DAREonline there have been several modifications to the data model in order to accommodate the relationships among named systems, system characteristics, documents, experts, and other components. Since this is an evolving application, it is expected that more changes will be implemented to enhance existing capabilities as well as to introduce new functions. The DAREonline data model was designed with scalability in mind. The tables are kept simple so that queries remain light in terms of processing. While this approach may increase some queries in the number of joins, it has proven efficient with the current set up. Figure 2 displays a logical model for the relationships among JSP front-end pages, java classes, and stored procedures. Note that there is a strong correspondence between JSP pages (which provide the GUI), java classes (which provide the business logic), and stored procedures (which provide the data). The bidirectional arrows denote the flow of information across the different layers. Within the application, most of the queries are related to one or more systems. Therefore, the systems table is widely accessed across the site. Any domain analysis should have no less than three exemplar systems, and ideally, should have several others. Each system can be linked to several document sources (in fact, there's no limit to this number). In addition, each system can be linked to any number of experts, and characteristics. As can be seen, a simple domain analysis can be made up of an increasing number of data items (systems, experts, characteristics, documents, etc...) that the database must handle efficiently. In order to keep query processing fast, we have limited the number of related fields and foreign keys. While systems are related to most other components of DAREonline, there are fields that do not share a direct link with any other objects. For example, systems are related to experts and systems are related to documents. However, there's no direct link between documents and experts. Note that this idea

252

R.F.D. Santos and W.B. Frakes Table 1. DAREonline Performance under Different Data Loads

# Domain Analisys 300 400 500 600 700 800 900 1000 3000 5000 10000

# Systems per domain analysis 3 4 5 6 7 8 10 15 20 50 100

# Experts # Documents # System per System per System Characteristics per System 5 25 2 6 50 3 6 7 8 8 9 20 50 80 100

50 60 70 80 100 120 300 500 1000

4 5 8 10 10 12 15 20 33

Query Time + Page Transition 25 30 33 38 41 42 50 62 80 98 360 secs

applies at the database level, but may not be necessarily true in terms of domain engineering. It is possibly true that the domain analyst may want to associate experts and documents, though we haven't implemented it that way. Avoiding this relationship prevents multi-join queries, e.g., systems-->experts--> documents, and allows us to perform quicker database input/output (I/O) on a maximum of two tables at a time. In terms of performance, our synonym queries on the WordNet database comprise by far the heaviest data processing task of the entire system. This comes about as a consequence of two factors. First, WordNet is split into several large tables: synsets, hypernyms, hyponyms, and antonyms, all of which represent in excess of 250,000 records (in fact, WordNet is much larger than that, but we only use a smaller portion of it). Second, we made a design decision to query all four tables in a single statement, and aggregate all items in a single result set. This extra processing takes long to complete, but has Graph1- QueryTime the advantage of presenting the user with a single operation, instead of forcing the user to perform several queries. Current searches can vary considerably in time. For instance, the query for the word "extensive" takes no more than 2 seconds to complete (it only returns 1 record), while the query for "system" takes approximately 15 seconds (it brings back in excess of 120 records). In all, performance is considered acceptable as we have tested the scenarios in Table 1. For example, the first line of the table indicates that when the system has accumulated 300 domain analysis projects (say, from different

DAREonline: A Web-Based Domain Engineering Tool

253

users), a single user should be able to open one of his/her domain analysis projects and view all of its items in approximately 25 seconds (query time + page transition). This number reflects the amount of time to display the page with all of its associated items, but may be higher when users decide to spend time on each page. This number also assumes that the 300 projects are each comprised of 3 exemplar systems, 5 experts per system, 25 document sources per system, 2 system characteristics, and 1 scope statement. Graph 1 shows that the query time for the site grows significantly with an increasing number of domain analysis projects. With 10000 domain analysis items, it would take the user approximately 360 seconds to open and view one single project among those 10000. At closer inspection, however, it becomes apparent that the number of exemplar systems is the factor that dictates performance, rather than the overall number of domain analysis projects that users have introduced into the system. This can be explained by the fact that systems can carry a heavy load of associated items, such as names of experts and lists of document sources, as mentioned previously. Our server runs Linux as the OS and Apache Tomcat 5.5 as the web server. Both the web pages and the database are currently housed in the same machine.

3 Evaluation

Fig. 3. DAREonline Activity Flow

DAREonline was initially tested in a graduate software quality and design class at Virginia Tech. A group of 49 students were given access to the site and instructed to use it in its full capacity, or to just try parts of it as a helping tool while using DARE COTS. Of the 18 respondents, 25% had no more than 1 year of computer

254

R.F.D. Santos and W.B. Frakes

programming experience, 37% had between 1 and 3 years, 13% had 3 to 5, and 25% had more than 5. Being the first release of DAREonline, some issues had already been documented and certain enhancements had already been envisioned, though not yet worked on. Our goals were to: understand if the site flow was consistent with the steps an analyst would take while performing domain analysis; verify how easy it was to understand the various pieces of functionality; and obtain an overall satisfaction rating from participants. As shown in Figure 3, the initial page of DAREonline was designed as a sequence of steps that we believed would “make sense” in a domain analysis project. While there are many different sequences to the way a domain analysis project can be performed, this flow proposes a top-down organization in which the overall scope of the domain should be established early on. This idea underlies the fact that the analyst should have a good grasp on what the domain in question does or does not do, before moving on to the next steps. Our previous study had shown that improper scoping lead to serious problems in the domain analysis [12]. However, it must be said that DAREonline does not programmatically enforce that users establish a domain scope as the first step. In this sense, DAREonline takes a “passive” role since users are free to go about the site without any predefined order. One limitation, however, has to do with exemplar systems. A system is a parent element of a document, an expert, or a system characteristic. Therefore, at least one system must be created before any of the aforementioned components are created. This explains why Systems are the second element of the flow in Figure 3. Identifying documentation for software systems is an important part of Domain Engineering. Documents such as logs, diagrams, and source code, can be used with the vocabulary and code analysis tools to mine many pieces of information, such as repeated code structures, common classes, and word frequencies. When some of these items have been identified as features, they can be plugged into feature or facet tables and saved for later use. The last item of the flow is a diagramming tool.

Graph 2 - Survey Responses

Graph 3 - DARE COTS Ratings

DAREonline: A Web-Based Domain Engineering Tool

255

In our study, there is no clear method to evaluate the usefulness of the proposed flow. We therefore investigated the number of data items that were created in order to see the extent of usage, as it may provide clues into success or failure. In a period of 30 days, there were 1,780 hits to DAREonline pages that resulted in the creation of 97 domain analysis projects. Out of those, 31 were “empty stubs”, meaning that the user created the initial project, but did not follow through with it (nothing was added). This points to users who were possibly trying out the system for the first time and experimenting with it. Further, there were 55 projects that were incomplete”, meaning that users added certain data points, but not others. We consider “incomplete” any project that lacks one or more of the following: at least one exemplar system, at least one system characteristics form, at least one piece of documentation, and at least one expert. Finally, 11 projects were created with a minimum number of the items above, and are considered “complete”. Due to the high number of empty or incomplete projects, we can speculate that most users employed DAREonline as a guiding tool rather than the principal tool for their analysis. DAREonline is clearly more restrictive than DARE COTS, but may be more assistive. In addition, the fact that it is a new tool carries with it a certain amount of getting used to, which may at first discourage usage. Class participants were asked to grade DAREonline in terms of its ease of use, functionality, and how much it helps understand the DARE framework (Graph 2). Most participants agreed that DAREonline is helpful in pointing to what needs to be done, as all answers fell in the fair, good, and excellent categories. As for functionality, the results reveal a certain amount of satisfaction with the various tools, but leave room for much improvement. This was expected. All users rated functionality fair or good. In terms of ease of use, the results are scattered across the board. 12% rated it poor, 12% excellent, with the remainder (38% each) finding ease of use either fair or good. A closer look at the vocabulary analysis tools (data not shown in graphs) indicates that 62.5% of users found the lexical analysis tool to be good, while the word counter received 50%, and the stemmer 37.5%. The synonyms tool was less utilized, as 37% did not respond to having used it. The remainder conveyed mixed satisfaction ranging from poor to good almost equally. When compared with DARE COTS (Graph 3), DAREonline’s overall satisfaction falls somewhat short. DARE COTS receives high grades possibly due to its flexibility in the choice of tools that can be used, among other reasons. Users suggested that some tools should be integrated with one another (e.g., allow output of one tool to serve as the input to another, in order to facilitate vocabulary analysis). In addition, they reported that DAREonline could benefit from online examples of how to perform various actions.

4 Conclusions and Future Work Over time, the DARE framework has been implemented on different operating systems (e.g., Windows and Linux) and supported by several programming languages (e.g., C and VB). We now present DAREonline, a web-based application that assists users implementing domain engineering tasks under the DARE framework. More specifically, we take the domain book concept and create a centralized environment with tools for domain analysis. In addition, we propose the use of web services to

256

R.F.D. Santos and W.B. Frakes

obtain data and functionality externally. We target SOA components for their strong reuse potential across distributed environments. Our goal is to provide a repository of functions that is helpful in domain analysis while maintaining ease of use throughout the application. DAREonline is designed with open-source tools, some of which are developed and maintained in-house, and others provided by third parties. DAREonline has certain advantages over previous versions: platform independence as it requires only a browser and a standard Java Runtime Environment; centralized deployment when changes are necessary; easy integration of new tools; a structured and more assistive interface and fast data processing. This paper describes the object model, the data model, and the tools that currently support DAREonline. This first release was made available to Virginia Tech students who rated it according to its ease of use, functionality, and helpfulness in understanding the DARE framework. While results were mixed, we envision many enhancements that can be introduced in order to make DAREonline more robust. For example, we plan to investigate how workflow languages (e.g., EBPL, WSFL) can be leveraged to identify and schedule the execution of reusable services as part of our SOA research. We must also understand how to assure software quality given that web services are often disconnected components over which service consumers have little control. In addition, we plan to introduce upload/download capabilities for documents as well as a search tool. We continuously examine tools that can be added to the repository, such as software for code analysis. Our initial results motivate us to continue pursuing intelligent options to support the DARE framework.

References [1] Frakes, W., Prieto-Diaz, R., Fox, C.: DARE: Domain Analysis and Reuse Environment. Annals of Software Engineering 5, 125–141 (1998) [2] Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, Englewood Cliffs (1992) [3] http://www.comp.lancs.ac.uk/computing/research/stemming/ index.htm (last accessed on May 09, 2009) [4] http://findbugs.sourceforge.net/ (last accessed on April 12, 2009) [5] http://www.jfimage.com/ (last accessed on May 06, 2009) [6] http://java.sun.com/developer/technicalArticles/Programming/ jnlp/ [7] Alonso, O., Frakes, W.B.: DARE-Web: Domain Analysis in a Web Environment. In: AMCIS Americas Conference on Information Systems, Long Beach, CA (2000) [8] Frakes, W., Prieto-Diaz, R., Fox, C.: DARE-COTS A Domain Analysis Support Tool. In: XVII International Conference of the Chilean Computer Society, Valparaiso, Chile, pp. 73–77. IEEE Press, Los Alamitos (1997) [9] Frakes, W.: A Method for Bounding Domains. In: IASTED International Conference on Software Engineering and Applications, Las Vegas (2000) [10] Paice, C.: An evaluation method for stemming algorithms. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland (1994)

DAREonline: A Web-Based Domain Engineering Tool

257

[11] Java Security Architecture, http://java.sun.com/j2se/1.4.2/docs/guide/security/ spec/security-specTOC.fm.html (last accessed on May 07, 2009) [12] Frakes, W., Harris, D.: Domain Engineering: An Empirical Study, http://eprints.cs.vt.edu/archive/00000936/01/ techReportDomainEngineering.htm (last accessed on May 09, 2009) [13] Frakes, W., Kang, K.: Software Reuse Research: Status and Future. IEEE Transaction on Software Engineering 31, 529–536 (2005)

Extending a Software Component Repository to Provide Services Anderson Marinho1, Leonardo Murta2, and Cláudia Werner1 1

PESC/COPPE – Universidade Federal do Rio de Janeiro Rio de Janeiro – RJ – Brazil {andymarinho,werner}@cos.ufrj.br 2 Instituto de Computação – Universidade Federal Fluminense Niterói – RJ – Brazil [email protected]

Abstract. A Software Component Repository is a tool that supports component based development (CBD). Generally, most of the existing repositories only provide components physically, not considering another important way of providing them: as services. Therefore, this paper presents an infrastructure for providing component services in a component repository. This infrastructure deals with access control, utilization monitoring, components to services conversion, and composite services generation issues. Keywords: CBD, SOA, component, service, repository, pricing.

1 Introduction Software reuse is the discipline responsible for creating software systems from existing software [1]. This discipline is considered a form of gaining better indicators of quality and productivity in the construction of complex systems [2]. One of the paradigms used is Component-Based Development (CBD), which aims at the development of interoperable components to be reused when building new systems [3]. In CBD, an important tool is a component repository that consists of a place for publication, storage, search, and retrieval of components [4]. Normally, components in a repository are physically distributed. It means that in order to use a component it is necessary to download its files and deploy them in a specific platform. However, in some scenarios this kind of usage has drawbacks, such as low interoperability, which makes component reuse difficult in different platforms. Moreover, different organizations are adopting outsourcing, transferring to third-parties their development and production infrastructures. To overcome these issues, an alternative is to distribute components as services. Services solve interoperability and outsourcing issues since they are not executed in the requester's machine, but in a remote provider [5]. Therefore, there is no need to deploy components in specific platforms. Services also use standard languages and protocols, which are common in any information technology environment. Moreover, services have been considered to be a major step to distributed [5] and demanding [6] S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 258–268, 2009. © Springer-Verlag Berlin Heidelberg 2009

Extending a Software Component Repository to Provide Services

259

computing. Regarding the former, services may be offered by different enterprises over the internet, constituting a distributed computing infrastructure. From that infrastructure, new applications can be constructed via the collaboration of different enterprises. Regarding the latter, processing will not be made locally but purchased by services offered in the Web, just like electricity or communication services [6]. For that reason, repositories should be prepared for the market needs and be extended by providing components not only physically but also as services, which results in a greater reuse perspective. However, this evolution must be carefully made in order to provide services in a similar way as components. Hence, some issues must be considered, such as access control, utilization monitoring, contract establishment, pricing, among others, which are the basic functionalities in a repository. Even though these functionalities are, in a certain way, easy to be applied in components, this situation is not the same regarding services. The main reason is due to the fact that services are not something concrete, which a repository can keep under control. Services may be hosted in remote providers, out of reach of the repository. For that reason, keeping them under control is a challenge. Another difference between providing components and services is related to the resources required to provide them. A hosting server is necessary to provide services. However, not all users of a repository have this kind of resource available. Therefore, a repository can actively participate in this sense, providing resources to allow users to publish their own services. These resources can be taxed by the repository (e.g., discounting a stipulated percentage over the service price) in order to pay back the costs involved to maintain the services. Hence, a repository can offer three types of mechanisms to help producers to create and provide their services: (1) mechanisms of services generation from previously published components, dealing with producers who have no resources to provide services; (2) external services publication mechanisms, dealing with producers who already provide services and want to publish them in a repository; and (3) mechanisms for the generation of services from the composition of existing ones, dealing with producers who have no resources to provide services and no specific component to generate a new service, but want to create and publish a new service from the composition of existing ones. The goal of this paper is to describe an infrastructure for service reuse implemented in a component repository, enabling two perspectives of component usage: physical or as a service. This infrastructure deals with access control, utilization monitoring, components to services conversion, and composite services generation issues. The remaining of this paper is organized as follows. Section 2 presents the redesign of a component repository to host services. Section 3 presents some related work. Finally, Section 4 concludes the paper, presenting some limitations and future work.

2 Services in a Component Repository This section presents an approach for extending a component repository to provide services. This approach was implemented in the Brechó repository [7]. Brechó is a web information system that goes beyond a simple repository with basic functionalities such as documentation, storage, publication, search, and retrieval of software components. Brechó provides advanced functionalities that enable its use for both

260

A. Marinho, L. Murta, and C. Werner

commercial and non-commercial purpose. Some of these functionalities are: management of contracts that connect producers, consumers, and acquired components; mechanism for component pricing; mechanisms for component evaluation; and mechanisms for marketing and negotiation. Therefore, some detail about Brechó’s implementation will be discussed along this section for better understanding of our approach. However, it is important to notice that we used Brechó as guinea pigs to validate the idea of supporting both physical components and services in a repository. Our approach is not restrict to Brechó, and can be implemented in other repositories. One of our concerns of transforming a conventional component repository in a service-enabled component repository is to provide services in an analogous manner to physical components. It means that the services published in the repository should be supported with access control and utilization monitoring mechanisms. These mechanisms are important to make the repository proactive, not only for service discovery but also for service utilization. Different functionalities can be derived from the repository participation at the service utilization moment. One of them is service charging, which can be easily applied when services have access control. Charging is very important, mostly to commercial repositories, since providing services implies in costs. Another functionality is the utilization monitoring, which enables the capturing of important information about the service utilization. In Brechó, for instance, the utilization monitoring mechanism captures information about the service consumer, the operation requested, and the price paid for the service. This kind of information can help both producers and consumers to improve their selling and purchases, respectively, in the repository. For instance, a producer can adjust the service prices according to the usage frequency. On the other hand, a consumer can use this information as a reliability parameter of a service, especially if it is complemented with satisfaction feedback [8]. As mentioned before, we defined three strategies for publishing services in the repository: external services, for already existing services; hosted internal services, for services generated from previously published components in the repository; hosted composite services, for services constructed from the composition of external and internal services. In a general way, each publication strategy presents some difficulties in its implementation. Regarding hosted internal services, the major issue is related to the generation of services from components. This generation is complex because a component may be written in any programming language, and each language has a distinct way to provide services. On the other hand, the external services issue is related to how to adapt these kinds of services to provide access control and utilization monitoring mechanisms. It is complex because they are deployed outside the repository. Regarding hosted composite services, the main difficulty is the implementation of an engine to execute these services. The engine should be able to instantiate a composite service from BPEL specification [9] and orchestrate it. Fig. 1 shows the architecture of a components’ repository extended with the service infrastructure. The main idea of this architecture was to maintain the repository with only component and service management basic functionalities. The most complex mechanisms for service publishing and generation were transferred to specific modules. These modules can be classified into three types: Language specific modules, which generate services from components written in a specific programming language, Legacy module, which adapts existing services to work according to the repository’s access

Extending a Software Component Repository to Provide Services

261

control and utilization monitoring mechanisms, and Orchestrator module, which produces composite services from BPEL descriptor documentation. This architecture division between basic and complex functionalities is important, because it makes the repository adaptation easier. Therefore, a repository should change its core to support only the service management and the module communication API. The modules implementation is done separately, since they do not depend on the repository.

Fig. 1. Component repository extended with services infrastructure

The communication between modules and the repository is done through a services API defined by the repository. This API defines, at a high level, the modules operation and can be classified into two types: API for service publishing and API for service utilization. The first type is related to the communication between module and repository during the service publishing and removal processes. This type of API is defined differently for each module. It is due to the fact that the modules have different services publishing processes. The second type of API is related to the service utilization process. It is used by modules to communicate with the repository during a service request, having the purpose of performing the access control and utilization monitoring functionalities. This API, however, does not differ for each module, as the service utilization process is the same for everyone. Fig. 2 shows the service utilization process. In the first step, the consumer requests the service, sending authentication information and service parameters. The service request goes directly to the module where the services are hosted. When the module receives the request, it communicates with the repository to perform the authentication and verifies if the consumer is authorized to use the service or not. In the positive case, the service request is registered, and then the module forwards the request to the service. At the end of the process, the module returns the result to the consumer.

Fig. 2. Service utilization process

262

A. Marinho, L. Murta, and C. Werner

The following sections present the three types of services modules, describing their main functionalities. 2.1 Language Specific Modules These modules are responsible for generating services from software components. In principle, each module is responsible for generating services from components written in a specific programming language. This feature is due to the lack of a generic solution for building services. Therefore, the repository should activate the appropriate module according to specific requirements and generate services from components coded in a specific programming language. For instance, if a repository just works with Java and .Net components, two language specific modules are needed. The architecture of a language specific module can vary for each programming language. For that reason, we chose Java as one example of language specific module. The Java module architecture is composed of four main elements: session manager, reflection engine, service generator, web service hosting service, and the deployer. The session manager is responsible for controlling the service generation states for each component requested by the repository. The reflection engine uses a reflective API [10] to extract information of component classes to construct the services. In this case, the engine uses the Java reflection API. The service generator and web service hosting service are the most important elements of the Java module. They are responsible for exposing Java classes as services and hosting them to be used by the repository consumers. These two architectural elements can be implemented by Apache Axis2 tool [11]. Axis2 is widely used in Java developing to construct services from Java classes. Apache Axis2 is a good option to the Java module since it provides many features that ensure the module mechanism operation. For example, Axis2 enables the classpath configuration of Java applications that will be exposed as services. This feature is important because it allows that each component that will be exposed as service has its own classpath, eliminating the problem of components that have ambiguous classes but are functionally different. This kind of situation happens in a repository all the time. Another important feature is that Axis2 works with the web services official security specifications, such as WS-Security [12]. This specification is essential to enable the access control and the service utilization monitoring mechanism. The deployer element is used to intermediate the communication between the module and Axis2. This element is responsible for preparing a component to be deployed or undeployed to Axis2. The deployer element creates service configuration files and uses Axis2 API to perform the communication. Regarding the service generation process, it can be done automatically or semiautomatically. In the first case, all the component functionalities are automatically defined as service operations. However, working in this way, the process ignores the fact that the producers may want to define which component functionalities they want to expose as service. In the second case, the producers participate more actively in the process, defining the operations that will be included in the service. Nevertheless, the process becomes semi-automated. In order to make the service generation process more flexible for producers, the language specific module works in a semi-automatic manner. Thus, the process is characterized by three stages: (1) to analyze and extract

Extending a Software Component Repository to Provide Services

263

the component operation candidates; (2) to allow the user to select the desired operations; and (3) to generate the service. In order to illustrate the service generation process, Fig. 3 shows the screenshots for creating internal services in Brechó. At the first step, the user should fill in a form, which requires some information to create an internal service, as shown in Fig. 3.a. This information is: name, description, price, and the programming language in which the component has been coded. This last information is needed to allow the selection of the appropriate language specific module to communicate with. When the user submits the form, the repository communicates with the module, sending a component and requesting the operation candidates that can be extracted from it. According to the module, the operations or methods of the component are analyzed and transmitted to the repository (Stage 1). The available operations for the service are shown in a new screen, according to Fig. 3.b. These available operations are extracted from the component in which the user previously chose to publish the service. In this example, a service is being generated from a Java component. The available operations are shown in a tree form, where nodes represent classes and leaves represent methods (the operations to choose from). According to Fig. 3.b, the sine and cosine methods were selected as operations of the service that will be generated (Stage 2).

(a)

(b)

Fig. 3. Internal service registration screens

After that, Brechó sends this data to the module to initiate the service generation (Stage 3). In the Java module case, the deployer communicates with Axis2 to request the creation of a new service. Based on the component and its selected functionalities, the deployer creates the deploy configuration file for the service and communicates with Axis2 sending along the component files. At the end, the service descriptor (i.e., WSDL [13]) of the new service is sent to Brechó to finish the service publication. 2.2 Legacy Module The Legacy module has the duty of adapting external, or legacy, services to the repository context. This adaptation consists of the insertion of access control and utilization monitoring mechanisms in the service with the purpose of having the repository to participate at the service utilization as well as service discovery moment. Nevertheless, this adaptation is complex, since these services are hosted in a remote server,

264

A. Marinho, L. Murta, and C. Werner

out of reach of the repository. A solution to this problem is to make this adaptation indirectly through the generation of another service, which works as a broker. The broker service intermediates all the requests to the external service in order to perform the access control and utilization monitoring mechanisms. Fig. 4 shows the broker service operation. The broker service receives the request from the user and communicates with the repository to authenticate and authorize him. If the user is authorized, the broker service adapts the request message written in SOAP protocol [14] to fit to the external service specification. This adaption consists of removing the user authentication header and changing some namespaces values. Afterwards, the adapted SOAP message is sent to the external service, which processes and returns the response to the broker service. In the last step, the broker service receives the response and forwards it to the user.

Fig. 4. Broker service operation

Each broker service is constructed based on the external service characteristics that it represents. This means that broker services provide the same communication protocols, and service operation signatures provided by the external services. It is important because it allows the external services to be used in the same way that they are used out of the repository. The main difference between the broker service and the external service specification is the addition of WS-Security protocol to provide the access control mechanism. The Legacy module architecture is composed by three main elements: web services hosting server, message adapter, broker service generator. The web services hosting server is the element responsible for hosting the broker services that are used to represent the adapted external services. Likewise in language specific modules, the Axis2 tool can be used to play this role. The message adapter has the task of receiving a request SOAP message and adapting it to the external service specification. In order to work properly for each external service, this component is configured and attached to each broker service. Finally, the broker service generator element is responsible for constructing broker services for each external service. They are constructed based on the external services specification document (WSDL), which contains all necessary information, such as operation service signatures, interface, and protocol definitions. The broker service generator is responsible for constructing the message adapter application, according to the external service, and the access control and utilization monitoring mechanisms that are used by Axis2 to generate the broker service.

Extending a Software Component Repository to Provide Services

265

Fig. 5.a shows external services being published in Brechó. This form is similar to the internal service form. The only difference is the replacement of the programming language field by the WSDL field. This field requires the URL location for the WSDL of the external service that the user wants to publish. When the user submits the form, the service is published. However, in this moment, Brechó transparently communicates with the Legacy module to adapt the external service provided by the user. 2.3 Orchestrator Module Having internal and external services published in the repository, the next step is to enable producers to publish new services from the composition of existing ones in the repository. The orchestrator module was designed to fulfill this task. This module is responsible for creating and hosting composite services from BPEL specification files. These services can be composed of not only services published in the repository but also external services that were not published yet. Furthermore, the composite services generated by this module provide access control and utilization monitoring mechanisms just like the services that are generated or adapted by the other modules. The architecture of the Orchestrator Module may be classified into four main elements: service orchestration engine, web service hosting server, deployer, and configurator. The service orchestration engine is responsible for coordinating the execution of a composite service. This engine invokes each service that is part of a composite service following a BPEL specification. This specification describes the service execution order and the data that are exchanged between them. The web service hosting server works together with the service orchestration engine hosting the composite services. A tool that can be used to represent these two elements is Apache ODE [15]. This tool creates and hosts composite services from BPEL specifications. Specifically, ODE supports both current WS-BPEL 2.0 [9] and legacy BPEL4WS 1.1 [16] specifications. The deployer element is used by the Module to communicate with the web service hosting server to perform the service deployment process. The deployer element is constructed according to the web service hosting server and service orchestration engine used. In case of using Apache ODE, the deployer is constructed according to it. Finally, the configurator element is responsible for setting up the composite services to support the access control and utilization monitoring mechanisms. This configuration is made before the service deployment process. In Brechó, Apache ODE is used, hence the composite service generation process in the Orchestrator Module is defined by the following steps: (1) Setting up the access control and utilization monitoring mechanisms in the composite service; (2) performing the service deployment in Apache ODE; and (3) returning the composite service specification (WSDL) to the repository. Fig. 5.b shows composite services being published in Brechó. This form is very similar to the previous one, only differing by the addition of a ZIP file field. This field requires a zip file that should contains all the documents necessary to generate a composite service. For example, this file must contain the BPEL file, which describes the composite service execution, and WSDL files, describing the composite service interface and the simple services that are used by it. However, this last type of WSDL files is optional depending on how the BPEL specification is coded. When the user submits this information, Brechó communicates with the Orchestrator module, requesting the

266

A. Marinho, L. Murta, and C. Werner

generation of a new composite service and sending the zip file submitted by the user. When a new request arrives, the Orchestrator Module triggers the configurator element for setting up the access control and utilization monitoring mechanisms in the composite service. This configuration is done through the generation of scripts that are attached to the service at deployment time in Apache ODE. The scripts will be triggered every time a new request arrives to the service. Having done the configuration, the deployer element is activated to perform the composite service deployment in Apache ODE. It sends to Apache ODE the composite service descriptor files, and the scripts generated by the configurator element. At the end of the process, the WSDL of the new service is sent to Brechó to finish the service publishing.

(a)

(b)

Fig. 5. Legacy and composite service publish screens

3 Related Work There are some related works that deal with the presented problem. A similar work from Lee et al. [17] presents a mechanism to service generation from components in a repository. This work does not provide a solution to the problem of components written in different programming languages. It is limited to Java and C++ components, and the mechanism is not extensible to support other languages. This approach does not provide a solution for access control and, consequently, utilization monitoring of services. In addition, it is not concerned about service composition. Another work from Smith [18] deals with reflection to generate services dynamically. This work, differently from ours, does not present to the user a preview of the possible service operations that can be extracted from an existing application. Moreover, the tool does not generate a WSDL document to the new service. There are some commercial tools to service repositories [19,20], but these tools do not provide service generation mechanisms, neither from components nor BPEL descriptors. They do not treat access control either.

4 Conclusions This paper presented an infrastructure for enabling services in a software component repository. This infrastructure provides specific mechanisms for access control and

Extending a Software Component Repository to Provide Services

267

utilization monitoring, besides automated mechanisms for service generation from either software components or BPEL descriptor documents. The access control and utilization monitoring mechanisms are important, but increase the load of the repository, since all service requests need to be processed in the repository. Nevertheless, this issue can be solved by placing modules distributed in different machines to balance the load. Another issue is the increase of the service response time due the repository intervention. This issue, however, cannot be solved with the prior strategy. An alternative solution is to let the insertion of these mechanisms optional during the service publication. The drawback is that these services will not support all repository functionalities, such as pricing, consumers list, etc. The current version of the infrastructure only provides support for converting components written in the Java programming language. Besides, the Java module imposes a restriction on the programming style, which forces that all java components classes should follow the JavaBean specification [21]. We are planning to implement additional specific modules to other programming languages, such as .Net, C++, etc. The current version of Brechó, including the service management support described in this paper, can be found at http://reuse.cos.ufrj.br/brecho. It is at this moment under an initial usage evaluation in an academic setting. However, we intend to submit Brechó to a more systematic evaluation in commercial settings in the near future. Acknowledgments. The authors would like to thank CAPES, CNPq and FAPERJ for the financial support, and to all participants of Brechó Project.

References 1. Krueger, C.: Software reuse. ACM Comput. Surv. 24, 131–183 (1992) 2. Frakes, W., Kang, K.: Software Reuse Research: Status and Future. IEEE Trans. Softw. Eng. 31, 529–536 (2005) 3. Szyperski, C.: Component Software: Beyond Object-Oriented Programming. AddisonWesley Professional, Reading (1997) 4. Sametinger, J.: Software Engineering with Reusable Components. Springer, Heidelberg (2001) 5. Papazoglou, M.: Service-oriented computing: concepts, characteristics and directions. In: Proc. of the Fourth International Conference on WISE 2003, Rome, Italy, pp. 3–12 (2003) 6. Smith, M., Engel, M., Friese, T., Freisleben, B.: Security issues in on-demand grid and cluster computing. In: Sixth IEEE International Symposium on Cluster Computing and the Grid Workshops, 2006, Singapore, pp. 14–24 (2006) 7. Brechó Project, http://reuse.cos.ufrj.br/brecho 8. Santos, R., Werner, C., Silva, M.: Incorporating Information of Value in a Component Repository to Support a Component Marketplace Infrastructure. In: 10th IEEE IRI, Las Vegas, USA (to appear, 2009) 9. OASIS: Web Services Business Process Execution Language (WSBPEL) 2.0 (2007) 10. Sobel, J., Friedman, D.: An introduction to reflection-oriented programming. In: Proceedings of Reflection 1996, San Francisco, USA, pp. 263–288 (1996) 11. Apache Axis2, http://ws.apache.org/axis2/ 12. OASIS: Web Services Security (WS-Security) 1.1 (2006) 13. W3C: Web Service Description Language (WSDL) 1.1 (2007) 14. W3C: Simple Object Access Protocol (SOAP) 1.2, 2nd edn (2007)

268

A. Marinho, L. Murta, and C. Werner

15. Apache ODE, http://ode.apache.org/ 16. IBM: Business Process Execution Language for Web Services (BPEL4WS) 1.1 (2007) 17. Lee, R., Kim, H.-K., Yang, H.S.: An architecture model for dynamically converting components into Web services. In: 11th Asia-Pacific Software Engineering Conference, 2004, Busan, Korea, pp. 648–654 (2004) 18. Applied Reflection: Creating a dynamic Web service to simplify code, http://downloads.techrepublic.com.com/download.aspx?docid=26 4551 19. WebSphere Service Registry and Repository, http://www-306.ibm.com/software/integration/wsrr 20. Logic Library Logidex, http://www.logiclibrary.com 21. Enterprise JavaBeans Technology, http://java.sun.com/products/ejb

A Negotiation Framework for Service-Oriented Product Line Development Jaejoon Lee, Gerald Kotonya, and Daniel Robinson Computing Department, InfoLab21, South Drive, Lancaster University, Lancaster, United Kingdom (tel) +44-1524-510359, (fax) +44-1524-510492 {j.lee,gerald,robinsdb}@comp.lancs.ac.uk

Abstract. Software product line engineering offers developers a low-cost means to rapidly create diverse systems that share core assets. By modeling and managing variability developers can create reconfigurable software product families that can be specialised at design- or deployment-time. Serviceorientation offers us an opportunity to extend this flexibility by creating dynamic product-lines. However, integrating service-orientation in product line engineering poses a number of challenges. These include difficulty in i) ensuring product specific service compositions with the right service quality levels at runtime, ii) maintaining system integrity with the dynamically composed services, and iii) evaluating actual service quality level and reflecting this information for future service selections. In this paper, we propose a negotiation framework that alleviates these difficulties by providing a means of achieving the dynamic flexibility of service-oriented systems, while ensuring the userspecific product needs can be also satisfied.

1 Introduction Software product lines are designed to be reconfigured. However, the reconfigurations are largely limited to design-time configurations where the developer modifies a common core by developing, selecting or adapting components to create a new system, and deploy-time configurations where a generic system is designed for configuration by a customer [1]. Increasingly, however, there’s need to support software product lines in a dynamic environment of continuously changing user needs and expectations, often at runtime [2]. A possible solution is to integrate service-orientation with software product line engineering. Service oriented technology provides a framework for realizing dynamic software systems by allowing applications to be composed and reconfigured using services discoverable on the network [3][4]. However, combining service-orientation with product line engineering raises several of challenges. These include difficulty in i) ensuring product specific service compositions with the right service quality levels at runtime, ii) maintaining system integrity with the dynamically composed services, and iii) evaluating actual service quality level and reflecting this information for future service selections. S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 269–277, 2009. © Springer-Verlag Berlin Heidelberg 2009

270

J. Lee, G. Kotonya, and D. Robinson

What lies at the heart of these challenges is how SOA can facilitate the dynamic composition of services so that a consumer can have a right product that s/he expects. Thus, a product line infrastructure that supports not just service discovery for dynamic composition, but also considers product specific needs is required. Service selection and reconfiguration is likely to vary with consumer needs, application runtime environment and the obligations placed on the consumer by the service provider. Therefore, some form of negotiation is required to achieve an acceptable service agreement between the consumer and provider [5][6]. However, current negotiation processes for service-oriented systems offer poor support for important product-line engineering aspects such as variability and product specific service compositions. Instead they focus on the negotiation of QoS parameters for the creation of Service Level Agreement (SLA) between the service consumer and provider [7]. Our solution has been to develop a pluggable negotiation framework that integrates product line engineering with service-orientation. The framework is based on a brokerage model [8], that combines pluggable QoS negotiation, QoS ontology and SLA evaluation with consumer strategies to compute service acceptability. The rest of this paper is organized as follows; section 2 describes a case study and provides an overview of the development process. Section 3 describes the service analysis activity of the development process. Section 4 describes the negotiation framework. Section 5 illustrates an architecture of service consumer. Section 6 provides some concluding thoughts.

2 Case Study: A Navigation System Product Line We developed a medium-size service-oriented Navigation System (NS) consisting of multiple services as a case study. A series of consumer devices which run the application were simulated, along with multiple service providers of different service types. NS is composed of location, maps, traffic, weather, and information services. Providers of simulated location, maps, and traffic, weather, and information services also register with the service registry, and each provider establishes a relationship with a broker from the brokerage system. Each provider offers services with different levels of QoS, making certain providers more acceptable than others for different consumers. The consumer devices query the service registry to obtain a consumer broker from the brokerage system, obtain monitors from the monitoring system. The process of developing core assets for a service-oriented product line comprises five activities; feature analysis, service analysis, workflow specification, dynamic service and QoS specification and system composition. Feature analysis identifies externally visible characteristics of products in a product line and organizes them into feature model [9]. During service analysis the features are classified based on two categories: workflow service (i.e., orchestrating) and dynamic service. The workflow services are to define transactions (behaviors) of services and the dynamic services are the services that should be searched and used at runtime for executing the workflow services. In the next activities, the workflow services are specified and developed, while the identified dynamic services are specified along with their quality of service levels. The runtime system interacts with service providers through an automated negotiation broker that combines consumer strategy with pluggable QoS

A Negotiation Framework for Service-Oriented Product Line Development

271

negotiation, QoS ontology and SLA evaluation and a provider rating system to ensure service acceptability. Fig. 1 shows a feature model of the NS product line. Feature modeling is the activity of identifying externally visible characteristics of products in a product line and organizing them into a model called a feature model. The primary goal of feature modeling is to identify commonalities and differences of products in a product line and represent them in an exploitable form, i.e., a feature model. Common features among different products in a product line are modeled as mandatory features (e.g., Route Calculation), while different features among them may be optional (e.g., Points of Interest) or alternative (e.g., Shortest Path or Major Roads First). Navigation System Local Map Info

… Route Calculation

Local Info Display Local Info Current Events Display

…

POI Display

Points of Interests

Current Events

Weather Info

Path Selection Strategy

Shortest Path

Avoid Congestion

MajorRoadsFirst

Real-time Traffic Info Construction Info

Real-time Traffic Consideration

Avoid Construction Area

Avoid Bad Weather

Composition Rules - Local Info Display requires Local Inf o. - Current Events Display requires Current Events. Legend Mandatory feature

Optional feature

Static-composition

Alternative features Dynamic-composition

Fig. 1. Feature Model of the Navigation System Product Line

Optional features represent selectable features for products of a given product line, and alternative features indicate that no more than one feature can be selected for a product. Composition rules supplement the feature model with mutual dependency and mutual exclusion relationships which are used to constrain the selection from optional or alternative features. In the NS product line, Local Info Display requires Local Info and they should be selected together. In our approach, we have distinguished a static relation between features from a dynamic one to present the availability of features explicitly. For example, the availability of Local Info can only be determined at runtime, as we have to search a service provider and check whether the provider provides traffic information of a certain location. In such cases, these features are connected with a dotted line, which

272

J. Lee, G. Kotonya, and D. Robinson

indicates a dynamic composition relation between them. (See the line between Local Info and Current Events in Fig. 1) This also implies that its dependent features (i.e., Local Info Display and its descending features require Local Info via the composition rule.) can only be composed and activated at runtime.

3 Service Analysis Once we have a feature model, the features are further analyzed through the service analysis activity. During service analysis the features are classified based on two categories: workflow service (i.e., orchestrating) and dynamic service. The workflow services are to define transactions (behaviors) of services and the dynamic services are the services that should be searched and used at runtime for executing the workflow services. For workflow services, correctness of their overall control behavior is the foremost concern. For example, providing an expensive color-printing service with proper authorization and billing processes is critical for virtual office service providers. Therefore, adopting a formal method framework to specify, validate, and verify is the most suitable way for developing orchestrating services. In our approach, we adapted a workflow specification language [10] with pre/post conditions and invariants to enhance the reliability of specifications. 1: dynamic service Local_Info_Display (Local_Info li) 2: invariants li.available == true; 3: precondition Local_Info_Display.selected == true; 4: postcondition li.fee_paid == true; 5: option POI_Display; 6:

binding time runtime;

7:

precondition POI_Display.selected == true &&

8: 9:

li.Points_of_Interests.available == true; postcondition none;

10: option Current_Events_Display; 11:

binding time runtime;

12:

precondition Current_Events_Display.selected == true &&

13: 14:

li.Currents_Events.available == true; postcondition none;

Fig. 2. Local Info Display Feature Specification

On the other hand, the availability of dynamic service features, which are shared by the workflow services, depends on the service providers and they have to be searched and integrated into the system at runtime. For example, the Local Info service and its optional features (Points of Interests and Current Event) are selected at runtime among available service providers. (See Fig. 1 for the features with the dynamic-composition relation.) Depending on their availability, their dependent dynamic services are bound into the workflow services. This dependency is specified by

A Negotiation Framework for Service-Oriented Product Line Development

273

using the dynamic service specification template. For example, the Local Info Display specification in Fig. 2 shows its pre/post-conditions and invariants that have to be satisfied to for the service provision. Also note that one of the preconditions is about the feature selection, which checks whether the feature is selected for the particular product configuration. If a feature is not selected for the product configuration, the service feature cannot be provided though the service provider is available at runtime.

4 Negotiation Framework The framework provides automated negotiation of services and service agreements. Its architecture features one or more brokerage systems, which create service brokers for consumers and providers on demand. These brokerage systems are registered with the service registry, for consumers and providers to discover. An overview of the framework architecture is shown in Fig. 3. Negotiation can be contained within a single brokerage system, or span two or more distributed brokerage systems. supplies negotiation models and strategies

IService Requester

Service Consumer Service Consumer Service Consumer (Product ofofthe (Product (Product ofthe the product line) product productline) line) sends service ratings

returns negotiated SLAs

IReputation Service

IBrokerage Service

Brokerage System QoS Negotiation SLA Creation/Evaluation

Reputation System

Service technology Interfaces

Service Rating

Connector

SOA Implementation (e.g. Web Service, Jini, other)

IReputation Service

Resource Management queries reputation IService Provider

returns negotiated SLAs

Service Provider supplies negotiation models and strategies

Fig. 3. Negotiation Framework

The brokerage system is designed to support the integration of a variety of different negotiation models and decision support algorithms. The service consumer or provider supplies the brokerage system with templates that describe the negotiation models to use. These templates also contain the details of decision algorithms and service strategies that should be applied during the creation and evaluation of service proposals. The brokerage system is designed to negotiate and manage specific consumer needs. This ensures product-specific compositions with the right service quality levels at runtime. It also provides a mechanism for evaluating actual service quality levels for product-line evolution (i.e. service selection). The negotiation process is initiated and led by the service consumer. Consumer brokers negotiate a single quality at a time. The counter proposal from the provider

274

J. Lee, G. Kotonya, and D. Robinson

broker contains not only its offer for that quality, but offers for any other qualities which are related to that quality. Once the consumer broker has finished negotiating with the set of discovered provider brokers for each service, each possible composition is ranked, and the service proposals for the most acceptable composition are accepted. The remaining proposals are rejected but are recorded in a negotiation cache. The negotiation cache enables brokers to record the negotiation behaviour and proposals provided by other brokers that they have previously interacted with. During negotiation, the consumer proposal engine combines the reputation of a provider with the service proposal from the provider's broker, in order to determine a proposal's overall acceptability. Provider proposal engines use the reputation system to avoid the brokers of consumers who have previously provided them with poor ratings. 4.1 Service Acceptability The service brokers are configured with utility methods for calculating the acceptability of service proposals and provider reputation with the service strategies of consumers and providers. Each service, service operation and service quality in the strategy template is given a weighting from 0.0 to 1.0 (the ideal QoS). The total quality acceptability, Qta, is calculated as shown in equation (i). Qta = (Qa ×Pw) + (R × Rw)

(i)

Qa is the proposed quality acceptability; Pw is the weight of the proposal. R represents the total reputation of the proposer’s creator and Rw is the weight of the total reputation. Pw+Rw=1.0. The total acceptability of a single service operation Ota is calculated as shown in equation (ii). The operation's weight Ow, in relation to other operations of the same service, is applied to the acceptability of each operation quality Qta, and the result divided by the total number of service operations On +1. I

Ota=

∑

|Qta|i×

i=0

Ow On + 1

(ii)

The total acceptability of a single service Sta is calculated as shown in equation (iii). The service acceptability is the sum of each operation acceptability Ota, plus the summed acceptability of each service-level quality Qta after dividing by the number of service operations On + 1. I

Sta=

J |Qta|j |Ota|i + ∑ O +1 i=0 j=0 n

∑

(iii)

The total acceptability of a service composition Cta is calculated as the sum of each total service acceptability Sta, after applying the individual service weight Sw to each service. I

Cta=

∑ i=0

|Sta|i × Sw

(iv)

A Negotiation Framework for Service-Oriented Product Line Development

275

4.2 Negotiation Sessions and Composition Overviews We have developed GUI for visualizing the negotiation and composition processes at runtime. The consumers of the application simulate a navigation device with different requirements, such as a mobile phone, internet tablet, or automobile navigation system. These differences in requirements mean that different providers are more acceptable to different types of consumers. Each provider offers their services with arbitrary differences in levels of QoS and cost. To simulate the different provider scenarios, each service provider is randomly doped so that they occasionally violate the operational qualities agreed in the SLA with their consumers. We don’t have the space to show all the different negotiation and service visualizations. However, Fig. 4 represents a typical scenario. The user can view the contents of any proposal (see Fig. 4 (a)).

Fig. 4. GUI for Visualizing the Negotiation and Composition Process

The example in Fig. 4 shows an accept proposal message from the provider of MapService. The user can also compare the final proposals from each provider of a particular service (Fig. 4 (b)). Overviews of negotiated and runtime composition acceptability are also provided (Fig. 4(c) and Fig. 4(d) respectively). Messages and charts are updated in real-time, as events flow from consumer brokers back to the simulation application.

5 An Architecture Model for Service Consumer For the development of the Navigation System product line, we adapt the C2 style [11], which provides flexibility through its layered structure and modular

276

J. Lee, G. Kotonya, and D. Robinson

components, which are called "bricks." A brick can send/receive messages to/from other bricks through its top and bottom ports, and bus-style connectors connecting ports. In the following, we explain the meta-models of architectural styles and an instance of the meta-model for the Navigation System. The UML based presentation of C2 style proposed in [12] is extended to include two different types of bricks: workflow brick and dynamic service brick types. The workflow brick is for deploying workflow services and the dynamic service brick is for deploying the proxies for the dynamic services. For dynamic services, it first needs to be included in a product configuration and its activation is decide based on service availability at runtime, as a result of negotiation with the available service providers. A configurator brick manages reconfiguration of deployed product at runtime and maintains the product configuration information for the service negotiation. Fig. 5 shows an instance for a NS product.)

<>

Master Co nf ig urato r

<<Workflow Brick>>

Trip P lanner

<<Workflow Brick>>

<<Workflow Brick>>

D estinatio n Selectio n

Interactive Menu

<>

<>

Real-time Traf f ic Co nsid eratio n

Lo cal Inf o D isp lay

Fig. 5. An Instance of the Meta-Model for PDA with an Internet Connection

For example, the product shown in Fig. 5 can access the dynamic services and is capable of providing the Local Info Display service. In this case, the Master Configurator detects contextual changes and, if Local Info is available from a selected service provider, it dynamically binds a corresponding Local Info Display brick and activates it. That is, a configurator brick maintains the connection to the brokerage system in Fig. 3 and negotiates SLAs with it.

6 Conclusions Adopting the service-oriented paradigm for product-line engineering poses a number of challenges. These include difficulty in identifying services and determining configurations of services that are relevant in different user contexts and maintaining the system integrity after configuration changes. Current initiatives in service-oriented software development do not address these challenges adequately. In this paper we have proposed a consumer-centered approach that integrates product line engineering

A Negotiation Framework for Service-Oriented Product Line Development

277

with service-orientation by adapting feature-oriented product-line engineering to service-oriented development. What lies at the heart of these challenges is how SOA can facilitate the dynamic composition of services so that a consumer can have a right product that s/he expects. Therefore, some form of negotiation is required to achieve an acceptable service agreement between the consumer and provider. The self-managing brokerage system that combines consumer-defined service strategies with pluggable negotiations provides a means of achieving Service-Level Agreements (SLAs) by searching the best matches between service providers and consumers. The brokerage system also incorporates a reputation system that provides a means to reward and penalize service consumers in accordance with SLA violation. Our approach represents a feasible solution to a set of challenging problems in service-oriented product line development. We are currently exploring better ways tailoring the service granularity to enhance reusability.

References 1. Sommeville, I.: Software Engineering, 8th edn. Addison-Wesley, Reading (2006) 2. Cetina, C., Pelechano, V., Trinidad, P., Cortes, A.R.: An Architectural Discussion on DSPL. In: Proceedings of the 12th International Software Product Line Conference (SPLC 2008), pp. 59–68 (2008) 3. Brereton, P., Budgen, D., Bennnett, K., Munro, M., Layzell, P., MaCaulay, L., Griffiths, D., Stannett, C.: The Future of Software. Communications of the ACM 42(12), 78–84 (1999) 4. Bennett, K., Munro, M.C., Gold, N., Layzell, P., Budgen, D., Brereton, P.: An Architectural Model for Service-based Software with Ultra Rapid Evolution. In: Proceedings of IEEE International Conference on Software Maintenance (ICSM 2001), p. 292 (2001) 5. Elfatatry, A., Layzell, P.: A Negotiation Description Language. Software Practice and Experience 35(4), 323–343 (2005) 6. Lee, J., Muthig, D., Matthias, M.: An Approach for Developing Service Oriented Product Lines. In: Proceedings of the 12th International Software Product Line Conference, pp. 275–284 (2008) 7. Yan, J., Kowalczyk, R., Lin, J., Chhetri, M.B., Goh, S.K., Zhang, J.: Autonomous Service Level Agreement Negotiation for Service Composition Provision. Future Generation Computer Systems 23(6), 748–759 (2007) 8. Robinson, D., Kotonya, G.: A Self-Managing Brokerage Model for Quality Assurance in Service-Oriented Systems. In: Proceedings of High Assurance Systems Engineering Symposium (HASE 2008), pp. 424–433 (2008) 9. Lee, J., Muthig, D.: Feature-Oriented Variability Management in Product Line Engineering. Communications of ACM 49(12), 55–59 (2006) 10. JBoss jBPM 2.0 jPdl Reference Manual, http://www.jboss.com/products/jbpm/docs/jpdl (viewed May 2009) 11. Medvidovic, N., Rosenblum, D.S., Taylor, R.N.: A Language and Environment for Architecture-Based Software Development and Evolution. In: Proceedings of the 21st International Conference on Software Engineering (ICSE 1999), pp. 44–53 (1999) 12. Robbins, J.E., Redmiles, D.F., Rosenblum, D.S.: Integrating C2 with the Unified Modeling Language. In: Proceedings of the 1997 California Software Symposium (Irvine, CA), UCI Irvine Research Unit in Software, Irvine, CA, 11–18 (1997)

Ranking and Selecting Services Alberto Sillitti and Giancarlo Succi Center for Applied Software Engineering, Free University of Bolzano, Via della Mostra 4, Bolzano, Italy {asillitti,gsucci}@unibz.it

Abstract. Service composition is the most recent approach to software reuse. The interactions among services propose many problems already approached in the composition of software components even if introducing more issues related to the run-time composition that is very limited in the components world. One of such problems is the identification of a set of services that can be integrated to build a complete system. This paper proposes an adaptation of a methodology for ranking and selecting components to a service-based environment. In particular, the methodology has been extended to support the peculiarities of services and include the support to the related technologies.

1 Introduction Service-oriented development is becoming a mature discipline. The emergence and the consolidation of standard protocols and architectures allow integrators to build complex systems through the integration of services. However, researchers and developers have to deal with several challenging problems including the retrieval of suitable services to develop a complete system. In this paper, we propose the adaptation to a service-based environment of a methodology to select components for building an integrated system (Clark et al., 2004) introduced in (Sillitti and Succi, 2008). This work aims at adapting this approach to a service-based environment. According to the definitions of SOA (Service Oriented Architecture) provided by W3C (http://www.w3.org/) and OASIS (http://www.oasis-open.org/), a service has all the characteristics of a component and some more: a) It can be developed using different technologies; b) It can be executed independently in different run-time environments. This last feature allows the development of systems that result from the run-time integration of several pieces owned and run independently. This new kind of integration generates new problems related to the dynamic composition: the ability to modify (semi)automatically the structure of a system at run-time changing the services involved and/or the integration workflow. A SOA is quite similar to the architecture proposed to support the component integration but the roles and the interaction protocols are defined through W3C standards. The main actors of a SOA are three: 1) a service supplier; 2) a service integrator; 3) a service directory (or registry or broker). S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 278–287, 2009. © Springer-Verlag Berlin Heidelberg 2009

Ranking and Selecting Services

279

If a huge set of services is available, the main problem is retrieving the ones that satisfy the requirements of the integrator. Therefore, a requisite for an effective adoption of the service-oriented paradigm is the availability of a smart broker service (PrietoDiaz and Freeman, 1987). A broker is a mediator between service suppliers and system integrators. Technologies such as UDDI (http://www.uddi.org/), ebXML (http://www.ebxml.org/), and their evolutions have been developed to support the creation of public and private directories in which suppliers can deploy the description of their services using a standard language such as WSDL (Web Service Description Language). However, such technologies present several limitations from the points of view of the description and the discovery of the services stored in the broker. When developing directories and search tools for them, the integrator’s perspective – namely, building systems – needs to be kept in mind: a service-based system is built from existing services that work together through integration. The difficulties of single service selection multiply in the case of the selection of many services for building a system. This system perspective is adopted throughout this paper. The paper is organized as follows: section 2 introduces the background of the study; section 3 presents the proposed process in relation to the state of the art; section 4 details the selection process; finally, section 5 draws the conclusions and presents future work.

2 State of the Art At present, the most common technologies for implementing service directories are two: UDDI (http://www.uddi.org/) and ebXML (http://www.ebxml.org/). UDDI does not force any specific format for service description and often such description is just plain text. This approach is suitable if the user querying the directory is a human being but it is nearly useless if the user is a machine. In UDDI, queries are designed in a way that is very close to the ones used in standard search engines and are based on keywords matching. Again, this kind of approach is suitable for human beings but not for machines. For these reasons, the standard structure offered by UDDI is not suitable for advanced automated queries. ebXML registries provide a higher level of functionalities compared to the UDDI ones. ebXML has a more structured approach to service description including concepts like classification taxonomies (also present in UDDI), XML-based descriptions, relations/associations among objects in the directory. In this way, it is able to support a wider set of service descriptions that are more suitable for automated inspection and retrieval. However, the querying mechanism is still limited and mostly based on keywords matching. This is because the structure of the directory is quite general and it does not provide pre-defined schemas for the definition of the XML documents that describe the services. Therefore, advanced searches that need such information are required to download all the descriptions and analyze them locally without any support provided by the directory infrastructure. There are a number of research projects focusing of the development of an improved SOA through extending the capabilities of the directory service and providing a more useful description of the services and support their retrieval (e.g., SeCSE – http://secse.eng.it/).

280

A. Sillitti and G. Succi

CONTEXT

Ostertag et al. (1992) define the ability to reuse components (therefore, services) in four steps: 1) Definition; 2) Retrieval; 3) Adaptation; 4) Incorporation. Definition is the capability to represent services. Retrieval is the capability to find and retrieve services according to their definition. Adaptation and incorporation specify the effort to customise and integrate them into a system. In a service environment, the customization deals with the development of adapters that are able to make communicate services that support different data format and/or communication protocols. The proposed approach fully satisfies the first two steps using an internal classification mechanism (description) and a selection process (retrieval’s step). It partially approaches problems related to integration and customisation giving integrators useful information about capabilities of candidate services to collaborate to build a system. The approach does not fully support adaptation and incorporation because these are tasks on the integrator side, whereas the scope of the work is to build a broker’s infrastructure. Typically, suppliers and integrators interact with the broker only through the classification and selection interfaces. There are several ways to define a classification schema (Henninger, 1997): enumerated classification, free-text indexing, and faceted classification. The goal of integrators is the inspection of a directory to find services to build a system. If an integrator wants to retrieve a service, he must define a target. The target is a description of the ideal element. The target can be a service and allows the broker to find the most similar services in the directory. The selection of a system of services is similar to the single service selection, but now the target describes an entire system. The broker does not look for a single service, but for a set of services able to interact and compose the required system. The task of the broker is to match the integrator’s requirements with internal classification. Below are illustrated the steps that compose a system selection. 1. Understanding integrator’s Requirements requirements. It means to FUNCTIONAL NON-FUNCTIONAL understand the integrator’s REQUIREMENTS REQUIREMENTS “target”. 2. Definition of integrator’s profiles. The broker must SERVICES SELECTION understand user profile to understand better the reCOMPATIBILITY quirements. CHECK 3. The searching mechanism. The broker “translates” reSYSTEM SELECTION quirements in a query to the directory to find candidate services. SYSTEM 4. Ranking a multitude of candidate services. The possible great amount of candidate service can be unmanageable. The broker must help the selection of the most promising ones. 5. Compatibility. Services must interact each other to be used in a system. The broker has to check compatibility among them.

Ranking and Selecting Services

281

The integrator describes that system through requirements. Requirements concerned with precise definition of the task to be performed by the system (Meyer, 1985). For understanding integrator’s target, a broker must address some issues (Maiden and Ncube, 1998). Natural language is the simplest way to express requirements. It is the ideal notation for human communication. It is extremely powerful and, above all, requires low effort for users. Unfortunately, in software requirements it is not the best solution. Meyer (1985) identifies seven faults in natural language. These include: presence of useless information, lack of specification, and ambiguity. Another way to express requirements is the use of formalism. Formalism allows a better understanding of requirements, reducing ambiguity and lacks of natural language. Generally, formalism is based on a description logic that expresses human thoughts in a mathematical way (Premkumar and Jones, 1997). An example of a language for formalism in specification is Larch. Larch describes services through definition of its functionality and pre and post-conditions. Using formalism is more onerous, but allows a profitable interaction reducing system’s misunderstanding. Another possible approach to the problem is the usage of facets (Prieto-Diaz and Freeman, 1987). In the early definition, facets are a set of pairs key-value that describe properties of a system including both functional qualities (e.g., data formats supported, functionalities offered, etc.) and non-functional ones (e.g., price, reliability, response time, etc.). Facets allow providers to describe in a structured way the relevant aspects of a software system. Moreover, if a common and meaningful set of key-value pairs is defined, potential users can perform advanced searches inside a repository. Such queries can be more complex than the traditional keyword matching in a plain text description and exploit the additional semantic available in the facets such as values in a specific range or in a pre-defined set. In this way, users can design queries specifying conditions such as the support of a specific set of features, the response time below a specific threshold, the price in a certain range, etc. The ability to find a specific service in a large directory is related to: the quality of the taxonomy used to define the keys and the quality of the values inserted in the description by the provider. Taxonomies allow the definition of proper keys in a specific (and limited) domain area. For this reason, the usage of different taxonomies to cover different domains is a suitable solution to provide extensive support to facets. However, taxonomies are useless if the providers do not use them correctly and do not provide a complete description of their services through them. This approach requires a considerable amount of effort from the provider but is extremely useful form the point of view of the user that is looking for a service. This basic definition of facets is very limited since it is not able to support complex descriptions, relations among the defined attributes, etc. In many cases, the usage of a single value or a set is not enough and some properties need to be described in a more expressive way. For this reason, the concept of facet has evolved to include complex structures based on XML technologies (Sawyer et al., 2005; Walkerdine et al., 2007). Facets can be described through a set of XML documents. A facet is defined as a set that includes a facet type and one or more facet specifications (Fig. 1). A facet type is a label that describes the high-level concept of the facet such as quality of service, interoperability, etc. while the facet specification is the actual im-

282

A. Sillitti and G. Succi

plementation of the concept. It is possible that several facet specifications are associated to a single facet type providing different ways of describing the high-level concept. Every facet specification includes two documents: an XML schema that defines the structure and the related XML implementation (Fig. 2). Directory

FacetType: Quality of Service

FacetSpecification1

FacetSpecification2

FacetSpecificationXSD1

FacetSpecificationXSD2

FacetType

…

FacetSpecification

FacetSpecificationXSD

FacetSpecificationXML1

FacetSpecificationXML2 FacetSpecificationXML

Fig. 1. Example of facet structure

Fig. 2. Internal structure of the directory

The selection process is relevant only inside the target system’s domain. Different terminology, different relevance assigned to properties and role of the integrator during selection can attach different meanings to selection. The broker localizes services in a directory. Requirements and integrator’s profile provide the whole information needed to compose a query to localize services in the directory. The retrieval process is not a simple matching between requirements and service’s properties (Henninger, 1994). The creation of the query must consider: • • •

•

Classification influences the query composition. E.g., enumerated classification requires users to understand the repository structure (Henninger, 1997). Users and repository use different vocabularies. Words have different meanings in different domains, and also studies indicate that people name common objects in different ways (Furnas et al., 1987). Query evolves over the time. The integrator changes requirements as a consequence of results presented by the broker (Prieto-Diaz and Freeman 1987, Henninger 1994, Henninger 1997). The presence of pre-existing services binds the design. Therefore, an integrator adapts design to services or services to design. Selection allows the integrator to understand better possible design solutions. The selection of the first set of services affects the choice of subsequent services. The integrator uses this selection to set a general framework to his system. Subsequent selected services fill the details of the framework. This is a depth first approach (try to build the whole system), but it is useful to support different “starting points”.

A high number of candidate services may discourage the integrator from evaluating them accurately. On the other hand, a small number allows the integrator to

Ranking and Selecting Services

283

evaluate services deeply. In both cases, the broker can help ranking candidates according to integrator’s preferences. Ranking is done considering a set of properties (in a faceted classification). So a decision is provided in presence of multiple criteria. The decision making process can be done under uncertainty (uncertainty about data value or user preferences) or multiple criteria (which considers trade-off about criteria, and relationships between objectives) (Fandel and Spronk, 1985). Whatever decision making the broker use, iterations with integrators have to be as simple and profitable as possible (Liping and Hipiel, 1993). Decision support systems are a mature discipline since about twenty years ago. There are many tools for decision support useful to rank information about components, visualize solutions and iterations with users (Bhargava et al., 1999). A system is a set of co-operating services through compatibility mechanism. Services are black boxes that carry out some functionality. The integrator knows service’s functionality, but does not know how functionality is carried out. Services expose only interfaces that show how to interact with them. Module Interconnection Languages (MIL) and Interface Definition Languages (IDL) provide notations for describing modules with well-defined interfaces (Allen and Garlan, 1997). An interesting formalization of the problem is to consider services as black boxes and defining ports and links (Wang et al., 1999). Ports schematize service’s interface and the way to access its functionality. Ports are connection points at a service’s boundary. Links are the association among services’ ports. Ports and links are implemented in literature at different levels of abstraction. Wang et al. (1999) use them to extend EJB and JavaBeans to check compatibility. Van Der Linden and Muller (1994) include in ports also functional features of components. Allen and Garlan (1997) use the concept of connectors to link components among them.

3 The Proposed Selection Process The proposed process starts with requirements definition and end with system selection. The selection process is not a waterfall model. There is a predefined path, but the integrator can move through the steps. The process supports the following features: 1) Target’s definition through system’s functionality; 2) Definition of requirements; 3) Ranking mechanism; 4) Branches selection; 5) Compatibility check. The process starts with target definition. The integrator defines the target system describing its functionality and the broker uses functionality to discriminate services in the directory. In the repository, the faceted classification describes functionality of a service through “functional ability” property. Functional abilities are a list of actions, i.e., pairs of (with several possible variations to this scheme) (Prieto-Diaz and Freeman, 1987). Some actions can be further refined in smaller actions. This mechanism creates a hierarchy of functional abilities at different levels of granularity. The functional ability at the highest level (the root) describes the general functionality of a service.

284

A. Sillitti and G. Succi

Functional abilities of a service are a tree-like structure. From the root to leafs, the description looses abstraction and improves details. For example, a service for “account management” follows: + Account Management - Request Account Set-up - Maintain Account - Close Account The “account management” can be broken down into three smaller functional abilities. The integrator describes target’s functional requirements (in addition to other requirements not considered here). The integrator should be able to express its requirement with only a functional ability saying, for instance, “I want to <manage e-commerce service>”. <manage e-commerce service> is a valid functional ability and can be used to query the repository. If a component that matches <manage e-commerce service> is found, the integrator’s requirement is fulfilled. However, a matching is not likely to be found at the root level. In most cases, the integrator needs to decompose a single functional ability into smaller ones. This refinement process could be repeated recursively, until a matching is found within the directory. This refinement of the description is performed manually by the integrator that details the functional abilities based on: a pre-defined taxonomy for the description of the domain and its own needs. As a result, the integrator creates a system that implements the root functional ability (<manage e-commerce service>) through the composition of services that implement finer-detail functional abilities. Using the previous example, the root functional ability could be decomposed as follows: + Manage e-commerce service - Set up catalogue - Order and Quote Management - Product Management - Account Management If the broker is not able to find these functional abilities, the integrator could refine the decomposition even more (by breaking down “set up catalogue” and so on): + Manage e-commerce service + Set up catalogue - Introduce Product + Order and Quote Management - Provide Quote - Manage Order Process …

Ranking and Selecting Services

285

When the broker finds candidate services, it presents them to the integrator and the process of refinements stops. We have classified requirements in constraints and preferences. Constraints indicate what the system must do, therefore the integrator needs. Preferences indicate what the system should (if possible) do, therefore the integrator wishes. The first requirements contain a list of functional properties, as defined in previous paragraph. The integrators can choose to order, or not, constraints. According to this, the broker can consider it as a preference. An ordered list facilitates the starting choice of the most important services (for the integrator) in the system. The first requirements are the ones, which are the most important and influence the followings. After the definition, the broker translates the requirements into XPath queries to the directory. The integrator refines the query at each step of interaction with the broker. He starts in a state with a lot of preferences and few constraints and, at the end, he is in state with a lot of constraints satisfied by a set of services, and no more preferences. The approach supports query reformulation to add (or remove) requirements during the entire process. Each constraint can be satisfied by a wide set of services. The ranking mechanism helps the integrator to find right ones. Simple ranking, according to only one property is possible, but, generally, the integrator has to take a decision in presence of multiple criteria. The ranking mechanism is possible only with comparable values. Noncomparable values are ignored and the integrator evaluates these manually. The broker has to know the preference of the integrator to compare services. Criteria can be concurrent (e.g., cost/quality), and the broker has to know how to evaluate concurrent properties. Some preferences are in the context, others can be requested to the integrator. The selection process uses MCDM (Multiple Criteria Decision Methods) to help comparison and selection. Such methods are based on algorithms that require different amount of information from the integrator (e.g., weights of the different properties). Many candidate services can satisfy a single constrain. Therefore, several compositions with the same functionalities are possible. All the candidate systems proposed are compared at the end of selection based on the non-functional properties and the context defined by the integrator. At the end of the process, the integrator has different candidate systems and evaluates the candidate systems through reports. These reports highlight common features (e.g., standards support), compatibility issues, and aggregated information (e.g., total cost). Service-based systems are made by services that communicate exchanging messages/data across the HTTP channel. There is a problem to ensure that groups of services are compatible. The broker can suggest a list of services, which satisfy a set of requirements, but if those services are not able to work together due to incompatibilities, the selection is useless. Therefore, the broker needs to describe the communication mechanisms of the services in a way to consider automatically compatibility issues. The adoption of stacks with layered protocols and formats is a recommended approach. Each layer has attributes and roles associated with it. This concept is much

286

A. Sillitti and G. Succi

Layer 4 Layer 3 Layer 2

My Do cument-Schema formats

XML SOAP

protocols

HTTP

Layer 1 Fig. 3. Simple stack

Fig. 4. A concrete example

easier to visualize, understand, and automate. A stack is an abstraction to represent data path (messages) through services. A stack is made up of layers (Fig. 3). Layers get more specialized from bottom to top. Upper layers are the “semantic” layers (e.g., the pdf document format). Bottom layers are conceptually similar to transport protocols (e.g., SOAP). Fig. 4 shows a concept example. My Document-Schema specifies the message format. The Schema is semantic aware, i.e. it is suitable to represent concepts in the semantic domain. XML provides a format for My Document-Schema. SOAP is the application protocol used to send content. Finally, HTTP is the physical communication protocol. Services have stacks to describe all possible interaction ways. Two services must implement compatible stack to allow communication.

4 Conclusions In this paper, we have adapted a methodology for retrieving components from a repository to the retrieval of services from a directory. The service retrieval includes a wide research area. This work introduces a process to improve the identification of services and the selection of a complete system build using them. The main features are the following: 1. 2.

3. 4. 5.

Selection of interacting services for a system as opposed to single service selection This methodology allows a profitable interaction between the broker and the integrator. The broker helps the integrator to define and refine his queries (through requirements’ definition and reports) It improves the reuse of software (through services selection) It allows multiple selections to consider impact of different decisions (selection branches) It checks the interaction capability among services (compatibility checks)

The work intends to support the selection of services as a human-driven process. However, some concepts can be extended and applied at run-time when most of the decisions have to be taken by an automated system.

Ranking and Selecting Services

287

References 1. Allen, R., Garlan, D.: A Formal Basis for Architectural Connection. ACM Transactions on Software Engineering and Methodology 6(3) (1997) 2. Bhargava, H., Sridhar, S., Herrick, C.: Beyond Spreadsheets: Tools for Building Decision Support Systems. IEEE Computer 32(3) (1999) 3. Clark, J., Clarke, C., De Panfilis, S., Granatella, G., Predonzani, P., Sillitti, A., Succi, G., Vernazza, T.: Selecting Components in Large COTS Repositories. Journal of Systems and Software 73(2) (2004) 4. Fandel, G., Spronk, J.: Multiple criteria decision methods and applications. Springer, Heidelberg (1985) 5. Furnas, G., Landauer, T., Gomez, L., Dumais, S.: The Vocabulary Problem in HumanSystem Communication. Communications of the ACM 30(1) (1987) 6. Henninger, S.: Using iterative refinements to find reusable software. IEEE Software 11(5) (1994) 7. Henninger, S.: An evolutionary approach to constructing effective software reuse repositories. ACM Transaction on Software Engineering and Methodology 6(2) (1997) 8. Liping, F., Hipel, K.: Interactive Decision Making. Wiley, Chichester (1993) 9. Maiden, N., Ncube, C.: Acquiring Requirements for Commercial Off-The-Shelf Package Selection. IEEE Software 15(2) (1998) 10. Meyer, B.: On formalism in specifications. IEEE Software 2(6) (1985) 11. Ostertag, E., Hendler, J., Prieto-Daz, R., Braun, C.: Computing similarity in a reuse library system: An AI-based approach. ACM Transactions on Software Engineering and Methodology 1(3) (1992) 12. Premkumar, T., Jones, M.: The Use of Description Logics in KBSE Systems. ACM Transactions on Software Engineering and Methodology 6(2) (1997) 13. Prieto-Diaz, R., Freeman, P.: Classifying Software for Reusability. IEEE Software 4(1) (1997) 14. Sawyer, P., Hutchinson, J., Walkerdine, J., Sommerville, I.: Faceted Service Specification. In: Workshop on Service-Oriented Computing: Consequences for Engineering Requirements (2005) 15. Sillitti, A., Succi, G.: Reuse: from Components to Services. In: 10th International Conference on Software Reuse (ICSR-10), Beijing, China (May 25-29, 2008) 16. Van Der Linden, F., Muller, J.: Creating architectures with building blocks. IEEE Software 12(6) (1994) 17. Walkerdine, J., Hutchinson, J., Sawyer, P., Dobson, G., Onditi, V.: A Faceted Approach to Service Specification. In: 2nd International Conference on Internet and Web Applications and Services (2007) 18. Wang, G., Ungar, L., Klawitter, D.: Component Assembly for OO Distributed Systems. IEEE Computer 32(7) (1999)

A Reusable Model for Data-Centric Web Services Iman Saleh1, Gregory Kulczycki1, and M. Brian Blake2 1

Virginia Polytechnic Institute and State University, Computer Science Department, Falls Church, VA, USA 2 University of Notre Dame, Department of Computer Science and Engineering, South Bend, IN, USA {isaleh,gregwk}@vt.edu, [email protected]

Abstract. Service-oriented computing (SoC) promotes a paradigm where enterprise applications can be transformed into reusable, network-accessible software modules or services (i.e. Web services). In many cases, existing concrete applications can be wrapped to perform within the SoC environment by (1) converting their required input data and output provisions into XML-based messages (e.g. SOAP) and (2) specifying the newly-created services using other XML-based software specifications (e.g. WSDL). In addition, enterprise organizations also devise natural language specifications to describe the service capability. Unfortunately, consumers of these capabilities often misinterpret the data requirements for using the underlying services. In this paper, we propose a generic model for data-centric Web Services that aids formal specification of service-data interactions and provides practical and verifiable solutions to eliminate data ambiguity and promote service reusability. Keywords: Data-Centric Web Services, SOC, SOA, Formal Specification.

1 Introduction Web Services are reusable software components that make use of standardized interfaces to enable loosely-coupled business-to-business and customer-to-business interactions over the Web. In such environments, service consumers depend heavily on the service interface specification in order to discover, invoke and synthesize services over the Web. In this paper, we consider data-centric Web Services whose behavior is determined by its interactions with a data store. A major challenge in this domain is interpretation of the data that must be marshaled between consumer and producer systems. Some of the consumers’ feedback on Amazon and PayPal Web Services included:“I have met a problem when I was trying to get the sales rank of digital cameras using the web service. It showed me pretty different sales ranks from what I saw through Amazon Website…”[1] and “Can someone explain the precise distinction between 'Completed' and 'Processed' [as the output of a payment service]?”[2]. These comments and many others indicate the confusion of service consumers due to ambiguous data interactions and hidden business rules. As detailed in Figure 1, when a service consumer produces the input messages that must be used to invoke a web service, S.H. Edwards and G. Kulczycki (Eds.): ICSR 2009, LNCS 5791, pp. 288–297, 2009. © Springer-Verlag Berlin Heidelberg 2009

A Reusable Model for Data-Centric Web Services

289

(s)he must interpret natural language specifications (annotated as A) and formulate messages as specified with WSDL [3] specifications (annotated as B). The messages are used to invoke the Web service via an enterprise gateway. While the Web Services Description Language (WSDL) is currently the de facto standard for Web services, it only specifies a service operation in terms of its syntactical inputs and outputs; it does not provide a means for specifying the underlying data model, nor does it specify how a service invocation affects the data. The lack of data specification potentially leads to erroneous use of the service by a consumer. We suggest that there is some logical data model and business rules (annotated as D) that exist between the interface-based messages and the underlying concrete software applications. With the existing service-based specifications, the definition of this model and rules is not well represented.

Fig. 1. Data Specification Challenges within the SoC environment

There are number of interesting problems that arise with regards to the scenario described in Figure 1. 1. Is there a reusable model/template that can be used in the formal specification of service-data interactions (as annotated by D)? 2. Can such a formal specification be used to predictably enhance the usage of the underlying data-centric services? 3. Can automated systems on the provider-side capture consumer usage behavior (and mistakes) to facilitate evolution and enhancement of the advertised specifications? In this work, we focus on (1) and (2). In addressing research question (1), we introduce a model that enables the specifications of the service-data interactions and we evaluate the effectiveness of employing different levels of the specification. Our model is based on formal methods and hence enables automatic reasoning about the service behavior. In section 2, we present our model, and in Section 3 we demonstrate its use using a real life example. We employ our example within a case study in section 4 to demonstrate its effectiveness in disambiguating data and interaction aspect of a service

290

I. Saleh, G. Kulczycki, and M.B. Blake

(effectively addressing research problem (2)). We review some related work in Section 5. Section 6 concludes the paper and presents our future work.

2 Modeling a Data-Centric Web Service We model a data source as set of entities where each entity is a set of records. In addition to a unique record identifier (key), a record can have zero or more attributes. We view this model as a common denominator of many popular data models that we surveyed [4][5][6] including mainly the relational and object-oriented modeling of databases, and some earlier efforts for formally specifying databases [7][8]. We adapt class GenericDataModel attribute entity1: Set(GenericRecord1) attribute entity2: Set(GenericRecord2) ... attribute entityn: Set(GenericRecordn) operation GenericRecordi findRecordByKey(key: GenericKeyi) requires ( GenericKeyi is the key for GenericRecordi ) ensures (result.key = key and result in this.entityi ) or result = NIL operation Set(GenericRecord_i) findRecordByCriteria( values1: Set(Ti1) values2: Set(Ti2), ... valuesn: Set(Tin)) requires ( Tij is the type of the j-th attribute of GenericRecord_i ) ensures ∀rec in result, rec.attrj in values_j and result in this.entityi operation GenericDataModel createRecord( gr:GenericRecordi) requires this.findRecordByKey( gr.key) = NIL ensures result.entityi = this.entityi U gr and ∀j ≠ i, result.entityj = this.entityj operation GenericDataModel deleteRecord(key: GenericKeyi ) requires this.findRecordByKey(key) ≠ NIL ensures result.entityi = this.entityi – this.findRecordByKey(key) and ∀j ≠ i, result.entityj = this.entityj operation GenericDataModel updateRecord(gr:GenericRecordi) requires this.findRecordByKey( gr.key) ≠ NIL ensures result.entityi = this.entityi – this.findRecordByKey(key) U gr and ∀j ≠ i, result.entityj = this.entityj end GenericDataModel class GenericRecord attribute key: Tkey attribute attr1: T1 attribute attr2: T2 ... attribute attrn: Tn end GenericRecord

Listing 1. Generic Data Model class

A Reusable Model for Data-Centric Web Services

291

the CRUD (Create-Read-Update-Delete) [9] model to include functional descriptions of the basic data operations. We implement our model as a generic class (shown in Listing 1) to facilitate its understanding by programmers of Web services. The Read operation is supported in our model by two functions; findByKey and findByCriteria. The findByKey takes a data model and a key value and returns a record whose key value matches the input key. The findByCriteria takes a data model and set of filtering values for each attribute type in the model. It returns a set of records such that an attribute value of a returned record is a member of the corresponding input filtering set. Our generic model class can be used as the basis for a reusable JML[10] or Spec#[11] specification class to model different data-centric services. Using the proposed model, a developer of a data-centric Web service can specify its data-behavior by following these steps: 1- Abstracting the service underlying data model as set of records, and identifying their attributes’ types. 2- Implementing the service data model as a class using our generic data model as a template (Listing 1). 3- Annotating the data model class with invariants that define any data constraints or business rules. 4- Annotating the service interface with formal specifications that are defined in terms of the data model and data functionalities defined in step 2.

3 Example: Amazon Associates Web Service Amazon provides a collection of remote services for e-commerce tools that are used by more than 330,000 developers [1]. We will use our model to specify a simplified version of the ItemSearch service which searches for items within Amazon product catalog given a set of search filters. The service is described as follows: Service Signature ItemInfo[] ItemSearch(searchIndex: CatString, keywords: String, minPrice: Float maxPrice: Float author: String artist: String title: String availability: AvailString merchant: String sort: SortString)

Data Types CatString: Enum of {Books, CD, DVD, All} AvailString: Enum of {Available} SortString: Enum of {price, -price} ItemInfo: Array of [itemId: Integer, detailPageURL: String, title: String, author: String, artist: String]

We used the service documentation available at [1] and our own testing of the service to guess the underlying data schema and constraints, and specified the service behavior accordingly. We model the service data a set of records of type ItemRecord. Listing 2 shows our model implemented as the ItemSeacrhDataModel class that is based on our template class defined earlier. ItemSeacrhDataModel supports the same data operations as our template class. We have, however, omitted their definition in listing 2 to avoid repetition. The data constraints and business rules are defined as

292

I. Saleh, G. Kulczycki, and M.B. Blake

class invariants. For example, the first class invariant states that a record whose category is either CD or DVD cannot have a non-nil author attribute. class ItemSearchDataModel attribute itemEntity: Set(ItemRecord) end ItemSearchDataModel class ItemRecord attribute key: Integer attribute category: { Book, CD, DVD } attribute merchantName: String attribute author: String attribute artist: String attribute title: String attribute price: Float attribute stockLevel: Integer invariant (category = CD or category = DVD) ⇒ author = NIL invariant (category = Book) ⇒ artist = NIL invariant stockLevel ≥ 0 invariant price ≥ 0 invariant merchantName ≠ NIL invariant title ≠ NIL end ItemRecord

Listing 2. The ItemSearch data model class

Finally, Listing 3 is the service specification based on the defined data model. The service preconditions and postconditions are enclosed in requires and ensures clauses, respectively. The old prefix denotes the value of a variable before service execution. The variable result denotes the service output. We define a specification variable isdm of type ItemSearchDataModel. Additionally, a group of specification variables (lines 4-7) are used to denote data filters. For example, the assertion at line 17: ensures (old(keywords) ≠ NIL) and (old(searchIndex) = Books) ⇒ Book searchIndices and old(keywords) authors and old(keywords)

titles

denotes that when the input keywords is provided, and the searchIndex input is set to Books, the search is done for items with category equal to Book and either the title or the author is matching the keywords. For the sake of simplicity, we assume the service searches for exact match between the keywords and the title/author. The sets searchIndices, authors and keywords are used in line 11 as the input filtering sets to the findByCriteria function. Lines 13-27 represent different search scenarios (search by keywords, by title, etc.). Lines 28-36 represent further filtering criterion based on input parameters. Lines 37-39 specify that when the sort input is set to price, the resulting items are sorted ascending by the item price and when sort input is set to –price, the results are sorted in descending order. It should be noted that, while the model provides the necessary constructs for full specification of the data model and data interactions, it is up to the service developer to decide on the sophistication level of the specification. This flexibility ensures that our model is applicable under different time and cost constraints.

A Reusable Model for Data-Centric Web Services

293

1 requires minPrice ≥ 0 and maxPrice ≥ 0 and minPrice ≤ maxPrice 2 ensures result.length ≥ 0 3 // The following specification variables are assumed: isdm: ItemSearchDataModel 4 // authors, artists, titles, merchants: Set(String) 5 // searchIndices: Set(CatString) 6 // prices: Set(Float) 7 // stockLevels: Set(Integer) 8 // Specifying the results in terms of the service inputs and the defined model 9 ensures ∀i, 1 ≤ i < result.length, 10 result[i] ∈ { [rec.key, ”http://www.amazon.com”+rec.key, rec.title, rec.author, rec.artist] 11 | rec ∈ isdm.findRecordByCriteria(searchIndices, merchants, authors, artists, titles, 12 prices, stockLevels)} 13 // Case 1: searching by keywords in the CD and DVD categories 14 ensures old(keywords) ≠ NIL and (old(searchIndex) = CD or old(searchIndex)= DVD) ⇒ 15 {DVD,CD} ∈ searchIndices and old(keywords) ∈ artists and old(keywords) ∈ titles 16 // Case 2: searching by keywords in the Books category 17 ensures (old(keywords) ≠ NIL) and (old(searchIndex) = Books) ⇒ 18 Book ∈ searchIndices and old(keywords) ∈ authors and old(keywords) ∈ titles 19 // Case 3: searching by keywords in all categories of items 20 ensures (old(keywords) ≠ NIL) and (old(searchIndex) = All) ⇒ 21 {Book, DVD, CD} ∈ searchIndices and old(keywords) ∈ titles 22 // Case 4: searching by title in the Books category 23 ensures (old(title) ≠ NIL) and (old(searchIndex) = Books) ⇒ 24 Book ∈ searchIndices and old(title) ∈ titles 25 // Case 5: searching by author in the Books category 26 ensures (old(author) ≠ NIL) and (old(searchIndex) = Books) ⇒ 27 Book ∈ searchIndices and old(author) ∈ authors 28 // Filtering results by the min and max prices 29 ensures (old(minPrice) ≠ NIL) ⇒ ∀ Float v ∈ prices, v ≥ old(minPrice) 30 ensures (old(maxPrice) ≠ NIL) ⇒ ∀ Float v ∈ prices, v ≤ old(maxPrice) 31 // Filtering results by availability 32 ensures old(availability) = Available ⇒ Z+ ∈ stockLevels 33 ensures old(availability) = NIL ⇒ {0} U Z+ ∈ stockLevels 34 // Filtering results by the merchant name, this parameter has a default value “Amazon” 35 ensures old(merchant) ≠ NIL ⇒ old(merchant) ∈ merchants 36 ensures old(merchant) = NIL ⇒ “Amazon” ∈ merchants 37 // Results are sorted based on the value of the sort input 38 ensures old(sort) = price ⇒ ∀i, 1 ≤ i < result.length, result[i].price ≤ result[i+1].price 39 ensures old(sort) = -price ⇒ ∀i, 1 ≤ i < result.length, result[i].price ≥ result[i+1].price

Listing 3. ItemSearch specification using the data model

4 Case Study: Improving Accessibility to the Deep Web The Deep Web refers to online data that is hidden behind dynamically generated sites and hence cannot be crawled or indexed by search engines. It is estimated that 80% of

294

I. Saleh, G. Kulczycki, and M.B. Blake

the content on the Web is dynamically generated [12]. With the increasing use of Web Services as gateways to online databases, data can be automatically queried and indexed by search engines by automatically invoking the corresponding services. We extracted from [13] a set of goals for effectively and efficiently crawling the Deep Web. We consider Amazon’s repository as part of the hidden Web and demonstrate how our specification for ItemSearch service (listing 3) helps achieve each of the crawler goals. - Goal 1: Identifying services that are candidates for crawling data. Such services are typically read-only, either for any inputs or under certain input conditions. . In the later case, these input conditions should be identifiable by the Web crawler. The ItemSearch service can be easily identified as read-only under any inputs since our specification does not contain any delete/create/update operations. - Goal 2: Balancing the trade-off between trying fewer queries on the online database and maximizing the returned result set. This is achieved by enabling only relevant queries to be formulated by a crawler. When an input can have different values, the crawler can consult our model to automatically identify the relevant set of values that will maximize the result set. For example, as shown in Listing 3, the searchIndex input can have 4 different values but setting it to All (case 3) maximize the result set since the filtering set searchIndices will contain all possible item categories {Book, CD, DVD}. - Goal 3: A service should be invoked with inputs that would potentially return distinct sets of records to avoid crawling duplicate data. The crawler can automatically reason about the filtering criteria under different inputs. For example, a search for books by keywords (case 2) includes both a search by title (case 4) and by author (case 5). Consequently, if a search by keywords is performed, the same search using the title or the author will not return new results. - Goal 4: Identifying presentation inputs, i.e. inputs that only affect the output presentation like for example a parameter representing a sorting criteria. Different values for these presentation inputs will actually retrieve the same results. Presentation inputs can be identified in our model as those inputs that do not affect the filtering sets. The sort input is an example. - Goal 5: Avoiding invalid inputs that would generate error pages. Invalid inputs can be identified from the preconditions like in line 1 of the specification. They can also be identified from the model invariant. We next evaluate our model success in decreasing the level of ambiguity about the service behavior. Our evaluation is based on the observation that the more queries the crawler has to try to retrieve the same set of data records, the more ambiguous the service behavior. Consequently, we consider the total number of possible input combinations (queries) as an indication of the level of ambiguity of the service specification. Our goal is to minimize the total number of input combinations by only considering the relevant ones. Table 1 shows the inputs of the ItemSearch service and their possible values. We assume the crawler can generate 2 values for each of the free text inputs using the

A Reusable Model for Data-Centric Web Services

295

technique in [13]. We also assume that a min and a max values are selected for the minPrice and maxPrice inputs, respectively, and each of them can also be set to Nil. Table 1. Summary of number of possible values for each input parameter Input Parameter searchIndex minPrice maxPrice keywords author artist title merchant availability Sort

Possible Values Books, CD, DVD, All min, max, Nil min, max, Nil 2 Free Text strings, Nil 2 Free Text strings, Nil 2 Free Text strings, Nil 2 Free Text strings, Nil 2 Free Text strings, Nil Available, Nil price, -price

# of Values 4 3 3 3 3 3 3 3 2 2

Table 2 summarizes the effect of the specification level on the number of queries formulated. Levels are defined in terms of number of assertions from Listing 3 that are used to annotate the service. Levels are cumulative. Table 2. Number of possible queries for each service specification level Specification Level L0:No specifications L1:Lines 1-2 L2:Lines 3-27

L3:Lines 28-33

L4:Lines 34-36 L5:Lines 37-39

Number of queries Queries filtering criterion 4*37*22 = 34,992 None (some of these queries are invalid) 4*8*35*22 = 31,104 Excluding queries that have minPrice > maxPrice since they violate the precondition 1*8*2*13*3*22 = 192 Only case 3 is considered to return maximum number of records, case 1,2, 4 and 5 will generate duplicates. 1*8*2*13*3*1*2 = 96 Excluding queries where availability is set to Available since it implies a more strict filtering set. 1*8*2*14*2 = 32 merchant default value is used to ensure results are returned. 1*8*2*16= 16 The sort input is identified as a representation parameter.

It can be seen from table 2 that the service specifications have a great impact not only on narrowing the search space for queries but also selecting those queries that will not generate errors, and that will avoid duplicates and maximize the result set.

5 Related Work Current research in data-centric Web Services focuses on either schema integration or transactions planning. The work in [14] falls in the first category and proposes an architecture for designing a data service layer to provide a declarative foundation for

296

I. Saleh, G. Kulczycki, and M.B. Blake

building SoC applications within an enterprise. The main idea is to abstract and compose information from a range of enterprise data sources and expose it as set of data services. While our work employs a similar methodology in abstracting data sources, our goal is not to enable information integration within an enterprise but rather to provide a data contract for the service that is exposed as part of the service specification targeting the less-controlled Web environment. On the other hand, research in transactions planning has recognized the evolvement of Web Services as an industry standard to implement transactional business processes. The OASIS Business Transactions Protocol (BTP) [15] and the Web Services Transactions (WS-Tx) [16][17] specification are the two major efforts in this area. These protocols define interactions between the service consumer and the service provider to ensure that they agree on the outcome of a transaction. It is assumed that compensation transactions are implemented to undo updates upon transaction failures. While these techniques are meant to ensure database integrity, they are all based on the assumption that service interactions against the database are well defined and known before planning a transaction. They also assume that service developers will follow protocol guidelines while implementing their data-centric Web Services. These assumptions are only valid when all services belong to a specific business domain or implemented within the same enterprise with strict conventions. On the Web however, services are typically combined from different providers that potentially employ different implementation conventions and practices. Moreover, many hidden business rules related to data manipulation may lead to incorrect transaction outcome even if transaction management measures are in place. Our model helps formally specifying the data behavior of an individual service and hence guaranteeing the service outcome when it is included within a transaction.

6 Conclusion Aspects of formal specifications as provide solutions for effectively specified web services. We believe that formal methods may allow for more rigor than processoriented languages (e.g. BPEL4WS, OWL-S, and WS-CDL). However, the two approaches can be also combined. In this paper, we propose a model for data-centric Web service based on formal code specification. Our case study and evaluation shows that our model has a large impact on disambiguating the service behavior in terms of its data interactions. While our model is applicable to any data-centric application, it is most useful in the case of Web services for the following reasons: - Testing and debugging a Web service is much costlier than a local method call. Service providers may restrict the number of calls, or charge the consumer for each call, so trial and error may not be an option [18]. - Web services can be exposed to a very wide range of clients that use different technologies [18]. So it is important for service providers to be able to unambiguously define the service behavior to simplify error and exception processing. - With the increasing need for automation, Web services will soon be accessed directly by software agents rather than by human beings [19]. A machine-readable specification of Web Services’ data interactions is an essential ingredient to fulfill this promise since it enables machine reasoning about the service behavior.

A Reusable Model for Data-Centric Web Services

297

Our future work includes implementing tools that can be used by a Web Service developer to easily construct the model, annotate the service with specifications and verify the consistency of the specification with respect to the model. We are also considering alternative design for our reusable class template that would enable implementation of a service model by simply inheriting from a generic superclass.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

11.

12. 13. 14.

15. 16. 17.

18. 19.

Amazon Web Services, http://aws.amazon.com/ PayPal Developer Community, http://developer.paypal-portal.com/pdn/ Web Service Description Language (WSDL), http://www.w3.org/TR/wsdl20/ Codd, E.F.: The relational model for database management: version 2. Addison-Wesley Longman Publishing Co., Amsterdam (1990) Vossen, G.: On formal models for object-oriented databases. SIGPLAN OOPS Mess. 6, 1–19 (1995) Date, C.: An Introduction to Database Systems. Addison Wesley, Reading (2003) Fisher, G.: Formal Specification Examples, http://users.csc.calpoly.edu/~gfisher/classes/308/doc/ ref-man/formal-spec-examples.html Souto, R., Souto, R., Barros, M., Barros, M.: On the Formal Specification and Derivation of Relational Database Applications, PhD Dissertation (1994) Kilov, H.: From semantic to object-oriented data modeling, Systems Integration. In: Proceedings of the First International Conference on Systems Integration, pp. 385–393 (1990) Leavens, G.T., Cheon, Y., Clifton, C., Ruby, C., Cok, D.R.: How the design of JML accommodates both runtime assertion checking and formal verification. Science of Computer Programming 55, 185–208 (2005) Barnett, M., Rustan, K., Leino, M., Schulte, W.: The Spec# programming system: An overview. In: Barthe, G., Burdy, L., Huisman, M., Lanet, J.-L., Muntean, T. (eds.) CASSIS 2004. LNCS, vol. 3362, pp. 49–69. Springer, Heidelberg (2005) Lawrence, S., Giles, C.L.: Searching the World Wide Web. Science (1998) Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web crawl. In: Proc. VLDB Endow., vol. 1, pp. 1241–1252 (2008) Carey, M.: Data delivery in a service-oriented world: the BEA aquaLogic data services platform. In: Proceedings of the ACM SIGMOD international conference on Management of data, pp. 695–705. ACM, Chicago (2006) OASIS Business Transaction Protocol, http://oasis-open.org/committees/business-transactions/ Web Services Atomic Transaction (WS-Atomic Transaction) version 1.2, Committee Specification 01 (2008), http://docs.oasis-open.org/ws-tx/wstx-wsat-1.2-spec.pdf Web Services Business Activity (WS-Business Activity) version 1.2, Committee Specification 01 (2008), http://docs.oasis-open.org/ws-tx/ wstx-wsba-1.2-spec-cs-01.pdf Design by Contract for Web Services, http://myarch.com/design-bycontract-for-web-services Bansal, A., Patel, K., Gupta, G., Raghavachari, B., Harris, E.D., Staves, J.C.: Towards Intelligent Services: A Case Study in Chemical Emergency Response. In: International Conference on Web Services (ICWS), pp. 751–758. IEEE Computer Society, Los Alamitos (2005)

Author Index

Adcock, Bruce 31 Alencar, Paulo 236 Apel, Sven 106 Aschauer, Thomas 116 Atkinson, Colin 211 Bailin, Sidney C. 51 Bastarrica, Mar´ıa Cecilia Batory, Don 106 Bigot, Julien 21 Blake, M. Brian 288 Bronish, Derek 31

Lee, Jaejoon 269 Lee, Zino 137 L¨ uders, Frank 150 Lutz, Robyn R. 160

191

Causevic, Adnan 150 Choi, Hyunsik 137 Cossentino, Massimo 201 Cowan, Donald 236 da Silva, Bruno 95 Dauenhauer, Gerd 116 Dehlinger, Josh 160 de Lucena, Carlos J.P. 236 Dos Santos, Raimundo F. 246 Favaro, John 41 Foss, Luciana 95 Frakes, William B. 86, 246 Frazier, David 11 Gomaa, Hassan

76

Hallstrom, Jason O. 225 Harton, Heather 11, 31 Henderson, Matthew J. 1 Henderson, Peter 1 Hummel, Oliver 211 Jarzabek, Stan 126 Jha, Meena 181 Kang, Kyo C. 137 Kim, Dohyung 137 Kirschenbaum, Jason 31 Kotonya, Gerald 269 Krasteva, Iva 150 Kuhlemann, Martin 106 Kulczycki, Gregory 288 Land, Rikard 150 Lee, Hyesun 137

Marinho, Anderson 258 Mazzini, Silvia 41 Mei, Hong 65 Mohan, Raghuveer 11 Murta, Leonardo 258 Nunes, Daltro Nunes, Ingrid

95 236

O’Brien, Liam 181 Olimpiew, Erika Mir

76

Peng, Xin 126, 170 P´erez, Christian 21 Perovich, Daniel 191 Pree, Wolfgang 116 Ribeiro, Leila 95 Robinson, Daniel 269 Rossel, Pedro O. 191 Sabatucci, Luca 201 Saleh, Iman 288 Shen, Liwei 170 Sillitti, Alberto 278 Sitaraman, Murali 11, 31 Smith, Hampton 11, 31 Soundarajan, Neelam 225 Succi, Giancarlo 278 Sundmark, Daniel 150 Susi, Angelo 201 Weide, Bruce W. Werner, Cl´ audia Xue, Yinxing

31 258

126

Yan, Hua 65 Ye, Pengfei 126 Yilmaz, Okan 86 Zhang, Wei 65 Zhao, Haiyan 65 Zhao, Wenyun 170