Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1399
Opher Etzion Sushil Jajodia Suryanarayana Sripada (Eds.)
Temporal Databases: Research and Practice
~ Springer
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Opher Etzion IBM Research Laboratory in Haifa Matam, Haifa 31905, Israel E-mail: opher @haifa.vnet.ibm.com Sushil Jajodia Center for Secure Information Systems and Department of Information and Software Systems Engineering George Mason University Fairfax, VA 22030-4444, USA E-mail:
[email protected] Suryanarayana Sripada RWTH Aachen, Informatik V Ahornstr. 55, D-52074 Aachen, Germany Currently at: Light Software GmbH,Aachen E-mail: sripada@ compuserve.com Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme Temporal databases : research and practice / Opher Etzion ... (ed.). Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; London ; Milan ; Paris ; Santa Clara ; Singapore ; Tokyo : Springer, 1998 (Lecture notes in computer science ; 1399) ISBN 3-540-64519-5 CR Subject Classification (1991): H.2-4 ISSN 0302-9743 ISBN 3-540-64519-5 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer -Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1998 Printed in Germany Typesetting: Camera-ready by author SPIN 10637053 06/3142 - 5 4 3 2 1 0
Printed on acid-free paper
Preface
Temporal databases incorporate the concept of time to create high-level abstractions useful in database applications. This has been an active area of research for about twenty years. In the last few years the importance of the temporal database area has been recognized by the international scientific community. This recognition came in part in the form of the ARPA/NSF sponsored International Workshop on Temporal Database Infrastructure in 1993, a VLDB-affiliated temporal workshop in 1995 , a special section of the IEEE Transactions on Knowledge and Data Engineering on temporal and real-time databases published in August 1995, and the incorporation of temporal constructs, proposed by the temporal database community, in the soon-to-be standardized SQL3 language. This book arose out of the Dagstuhl seminar that was organized by us during June 23-27, 1997. This seminar focused on the future directions of this discipline, with respect to both research issues and the means to incorporate temporal databases into mainstream application development. List of topics discussed at this seminar included: 1. Temporal data models: relational, object-oriented, deductive, and hybrid models. Where do the temporal capabilities fit in? 2. Temporal languages: TSQL2 and beyond. Update and retrieval languages for various types of temporal data models. 3. The interrelationships between temporal databases and other disciplines: spatial databases, active databases, deductive databases, real-time databases, information uncertainty, belief revision, etc. 4. Implementation issues in temporal databases. Issues that arise from experience of implementors and users and the agenda for research into these areas and transition to use in practice. 5. Strategic discussions about the future of "temporal databases" as a discipline. Evaluation of the current state of the art and "call for action" to the community. The Dagstuhl seminar brought together researchers who have dealt with different perspectives on temporal databases: temporal data models, temporal retrieval and update languages, interrelationships between temporal databases and other database technologies (e.g., spatial databases, active databases, realtime databases), and interrelationships between temporal databases and temporal reasoning in artificial intelligence. Some of the invited participants have also been involved in the standardization activities of the temporal community. Having a diverse group that shared a focus on temporal information processing ensured critical evaluation of the activities that have occurred thus far, and enriched the discussions. As with any Dagstuhl seminar, the participants respresented a selected group of prominent researchers in the subject area. We solicited from the Dagstuhl
VI
Preface
seminar invitees submissiens for this book and aimed to include high-quality original papers about the state of the art in the temporal database area. The number of submissions exceeded our expectations, and we used a peer-review process to select the high-quality papers for this book. The book consists of the following parts: P a r t 1: T e m p o r a l D a t a b a s e I n f r a s t r u c t u r e : This part consists of five papers that discuss infrastructure topics. The relationship between object-oriented modeling and temporal databases is one of the emerging issues, because of the inherent data complexity of temporal applications. The paper A n O b j e c t - O r i e n t e d F r a m e w o r k for T e m p o r a l D a t a M o d e l s by Goralwalla~ Ozsu~ and Szafron presents an object-oriented basis for the design and implementation of different temporal data models, to capture alternative temporal models for different applications and to compare and analyze different temporal object models with respect to design dimensions. Heterogeneous system problems of semantic differences with respect to timerelated data do not escape temporal database applications. These differences can materialize in point versus interval semantics, different granularities, and different data types. The paper A n A r c h i t e c t u r e for S u p p o r t i n g I n t e r o p e r a b i l i t y a m o n g T e m p o r a l D a t a b a s e s by Bettini~ Wang~ and J a j o d i a proposes a multidatabase architecture where an appropriate formalization of the intended semantics is associated with each temporal relation. This allows the construction of a temporal mediator, described in this paper. While retrieval queries in temporal databases have been thoroughly discussed, the update process deserves some attention. The paper E x t e n d e d Update F u n c t i o n a l i t y by Etzion~ Gal~ and Segev provides an enhanced collection of update operation types that are possible in append-only temporal database applications (such as: freeze along an interval, revise an erroneous value over an interval keeping the previous value for historical queries). The paper discusses different possible semantics for simultaneous values (values that are valid during the same valid time), and discusses the concept of decision time as a temporal primitive. The execution of temporal database updates and queries can be optimized, due to the fact that an operation refers to specific time points. The paper On T r a n s a c t i o n M a n a g e m e n t in T e m p o r a l D a t a b a s e s by Gal provides a framework for concurrent processing of retrieval and update operations in temporal databases. The paper presents a series of modifications and tuning facilities for traditional concepts in transaction management, especially the locking mechanism. The paper I m p l e m e n t a t i o n O p t i o n s for Time-Series D a t a by E l m a s r i and Lee concentrates on a special topic of temporal databases: time-series management systems. This paper compares and demonstrates different implementation schemes of mapping time-series into relational and object-oriented databases. P a r t 2: T e m p o r a l Q u e r y Languages: This part consists of four papers that deal with query languages and their relationships to modeling and implementation.
Preface
VII
Nested relations have been mentioned as a representation scheme for temporal data. The paper E x p r e s s i v e P o w e r of T e m p o r a l R e l a t i o n a l Q u e r y L a n g u a g e s a n d T e m p o r a l C o m p l e t e n e s s by Tansel and T i n introduces an extension to the relational data model to handle temporal data and queries, based on a nested relational data model. This model captures tuple and attribute time-stamping. The paper discusses requirements for such a model and temporal relational completeness. One of the efforts in the last few years has been the attempt to incorporate temporal capabilities in the SQL standard. The TSQL2 language is the proposed language that has been devised by a committee consisting of many of the leading researchers in the temporal database community. The paper Transitioning T e m p o r a l S u p p o r t in T S Q L 2 to SQL3 by Snodgrass~ Bohlen~ Jensen~ and S t e i n e r summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transactiontime support and explains how to migrate from a regular relational database into the proposed scheme. The efforts to incorporate temporal capabilities into SQL have stimulated some discussion with respect to the nature of the desired target language. The paper Valid T i m e a n d T r a n s a c t i o n T i m e P r o p o s a l s : L a n g u a g e D e s i g n A s p e c t s by D a r w e n suggests language design principles, such as parsimony and conceptual integrity, and argues that current proposals deviate from these design principles. This language debate was discussed at length during the Dagstuhl seminar. P o i n t - B a s e d T e m p o r a l E x t e n s i o n s of SQL a n d T h e i r Efficient Imp l e m e n t a t i o n by T o m a n is the topic of the next paper. This paper proposes another extension to the SQL language by adding a single data type to represent a linearly ordered universe of individual time-instants. In addition it introduces an efficient query evaluation procedure over a compact interval-based encoding of temporal relations. P a r t 3: A d v a n c e d A p p l i c a t i o n s of T e m p o r a l D a t a b a s e s : This part consists of four papers that discuss the utilization of temporal databases for security, business event managers, knowledge discovery, and querying moving objects. The paper A p p l i c a b i l i t y of T e m p o r a l D a t a M o d e l s to Q u e r y M u l t i level S e c u r i t y D a t a b a s e s : A Case S t u d y by G a d i a points out that the multiple value abstraction, required for temporal databases, is also useful for other domains, such as spatial databases and multiple beliefs, and that these are special cases of parametric databases. This concept is discussed, along with its applicability to multilevel security databases. The paper A n A r c h i t e c t u r e a n d C o n s t r u c t i o n of a B u s i n e s s E v e n t M a n a g e r by P a t a n k a r and Segev introduces the concept of a business event, and discusses types of temporal events, and event histories. The paper introduces an architecture and an SQL-like language to define these events. Decision support and decision analysis systems serve as important motivation areas for temporal database applications. In the paper Discovering U n e x p e c t e d P a t t e r n s in T e m p o r a l D a t a Using T e m p o r a l Logic by
VIII
Preface
B e r g e r and Tuzhilin, the task of finding interesting patterns in temporal databases is discussed. The paper presents a categorization of different discovery tasks, and focuses on the task of discovering interesting patterns of events in temporal sequences. The area of spatio-temporal databases is emerging as an independent area. The paper Q u e r y i n g t h e U n c e r t a i n P o s i t i o n of M o v i n g O b j e c t s by Sistla, Wolfson~ C h a m b e r l a i n , and D a o proposes a data model for representing moving objects with uncertain positions in database systems. It also introduces a query language based on this model. P a r t 4: G e n e r a l Reference: This part provides general information about the state of the art in temporal databases. It contains a T e m p o r a l D a t a b a s e B i b l i o g r a p h y U p d a t e by Wu, J a j o d i a , and W a n g that provides current references on models, database designs, query languages, constraints, time granularities, implementations, access methods, real-time databases, sequence databases, data mining, concurrency, and other papers. An up-todate temporal database glossary prepared by J e n s e n and D y r e s o n and a glossary on time granularities by Bettini~ Wang, Snodgrass, D y r e s o n , and E v a n s follows. A p p e n d i x : S u m m a r i e s of C u r r e n t Work: At the conclusion of the seminar, all participants were invited to submit a brief summary of their activities in the temporal database area. These summaries, presented in the Appendix, provide a glimpse into some of the developments that we can expect to see in the coming years.
March 1998
Opher Etzion Sushil Jajodia Sury Sripada
Table of C o n t e n t s
P a r t 1: T e m p o r a l D a t a b a s e I n f r a s t r u c t u r e An Object-Oriented Framework for Temporal Data Models L A. Goralwalla, M. T. Ozsu, and D. Szafron
An Architecture for Supporting Interoperability among Temporal Databases C. Bettini, X. S. Wang, and S. .Jajodia
36
Extended Update Functionality in Temporal Databases O. Etzion, A. Gal, and A. Segev
56
On Transaction Management in Temporal Databases A. Gal
96
Implementation Options for Time-Series Data R. Elmasri and J. Y. Lee
115
P a r t 2: T e m p o r a l Q u e r y L a n g u a g e s Expressive Power of Temporal Relational Query Languages and Temporal Completeness A. U. Tansel and E. Tm
129
Transitioning Temporal Support in TSQL2 to SQL3 R. T. Snodgrass, M. H. B5hlen, C. S. Jensen, and A. Steiner
150
Valid Time and Transaction Time Proposals: Language Design Aspects H. Da~ven
195
Point-Based Temporal Extensions of SQL and Their Efficient Implementation D. Toman
211
X
Table of Contents
Part 3: Advanced Applications of Temporal Databases Applicability of Temporal Data Models to Query Multilevel Security Databases: A Case Study S. K. Gadia
238
An Architecture and Construction of a Business Event Manager A. K. Patankar and A. Segev
257
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic G. Berger and A. Tuzhilin
281
Quering the Uncertain Position of Moving Objects A. P. Sistla, O. Wolfson, S. Chamberlain, and S. Dao
310
Part 4: General Reference Temporal Database Bibliography Update Y. Wu, S. Jajodia, and X. S. Wang
338
The Consensus Glossary of Temporal Database Concepts - February 1998 Version C. S. Jensen, C. E. Dyreson (Eds.), M. BShlen, J. Clifford, R. Elmasri, S. K. Gadia, F. Grandi, P. Hayes, S. Jajodia, W. Kiifer, N. Kline, N. Lorentzos, Y. Mitsopoulos, A. Montanari, D. Nonen, E. Peressi, B. Pernici, J. F. Roddick, iV. L. Sarda, M. R. Scalas, A. Segev, R. T. Snodgrass, M. D. Soo, A. Tansel, P. Tiberio, and G. Wiederhold
367
A Glossary of Time Granularity Concepts C. Bettini, C. E. Dyreson, W. S. Evans, R. T. Snodgrass, and X. S. Wang
406
Appendix Summaries of Current Work The Dagstuhl Seminar Researchers
414
Index of A u t h o r s
429
An Object-Oriented Framework for Temporal Data Models ¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron Laboratory for Database Systems Research Department of Computing Science University of Alberta Edmonton, Alberta, Canada T6G 2H1 {iqbal,ozsu,duane}@cs.ualberta.ca
Abstract. Most of the database research on modeling time has concentrated on the definition of a particular temporal model and its incorporation into a (relational or object) database management system. This has resulted in quite a large number of different temporal models, each providing a specific set of temporal features. Therefore, the first step of this work is a design space for temporal models which accommodates multiple notions of time, thereby classifying design alternatives for temporal models. The design space is then represented by exploiting object-oriented features to model the different aspects of time. An object-oriented approach allows us to capture the complex semantics of time by representing it as a basic entity. Furthermore, the typing and inheritance mechanisms of object-oriented systems allow the various notions of time to be reflected in a single framework. The framework can be used to accommodate the temporal needs of different applications, and derive existing temporal models by making a series of design decisions through subclass specialization. It can also be used to derive a series of new more general temporal models that meet the needs of a growing number of emerging applications. Furthermore, it can be used to compare and analyze different temporal object models with respect to the design dimensions.
1
Introduction
The ability to model the temporal dimension of the real world is essential for many applications such as econometrics, banking, inventory control, medical records, real-time systems, multimedia, airline reservations, versions in CAD/CAM applications, statistical and scientific applications, etc. Database management systems (DBMSs) that support these applications have to be able to satisfy temporal requirements. To accommodate the temporal needs of different applications, there has been extensive research activity on temporal data models in the last decade [Sno86,SS88,Soo91,Kli93,TK96]. Most of this research has concentrated on the O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases - Research and Practice c Springer–Verlag Berlin Heidelberg 1998 LNCS 1399, pp. 1–35, 1998.
2
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
definition of a particular temporal model and its incorporation into a (relational or object-oriented) database management system (DBMS). The early research on temporal data models concentrated on extending the relational data model to handle time in an appropriate manner. The notion of time, with its multiple facets, is difficult (if not impossible) to represent in one single relational model since it does not adequately capture data or application semantics. This is substantiated by most of the relational temporal models that only support a discrete and linear model of time. The general limitation of the relational model in supporting complex applications has led to research into next-generation data models, specifically object data models. The research on temporal models has generally followed this trend. Temporal object models can more accurately capture the semantics of complex objects and treat time as a basic component. There have been many temporal object model proposals (for example, [RS91,SC91,WD92,KS92,CITB92,BFG97]). These models differ in the functionality that they offer, however as in relational systems, they assume a set of fixed notions of time. Wuu & Dayal [WD92] provide an abstract time type to model the most general semantics of time which can then be subtyped (by the user or database designer) to model the various notions of time required by specific applications. However, this requires significant support from the user, including specification of the temporal schema. Both (relational and object-oriented) approaches have led to the definition and design of a multitude of temporal models. Many of these assume a set of fixed notions about time, and therefore do not incorporate sufficient functionality or extensibility to meet the varying temporal requirements of today’s applications. Instead, similar functionality is re-engineered every time a temporal model is created for a new application. Although most temporal models were designed to support the temporal needs of a particular application, or group of similar applications, if we look at the functionality offered by the temporal models at an abstract level, there are notable similarities in their temporal features: – Each temporal model has one or more temporal primitives, namely, time instant, time interval, time span, etc. The discrete or the continuous domain is used by each temporal model as a temporal domain over the primitives. – Some temporal models require their temporal primitives to have the same underlying granularity, while others support multiple granularities and allow temporal primitives to be specified in different granularities. – Most temporal models support a linear model of time, while a few support a branching model. In the former, temporal primitives are totally ordered, while in the latter they have a partial order defined on them. – All temporal models provide some means of modeling historical information about real-world entities and/or histories of entities in the database. Two of the most popular types of histories that have been employed are valid and transaction time histories [Sno87], respectively. These commonalities suggest a need for combining the diverse features of time under a single infrastructure that is extensible and allows design reuse. In this pa-
An Object-Oriented Framework for Temporal Data Models
3
per, we present an object-oriented framework [JF88] that provides such a unified infrastructure. An object-oriented approach allows us to capture the complex semantics of time by representing it as a basic entity. Furthermore, the typing and inheritance mechanisms of object-oriented systems directly enable the various notions of time to be reflected in a single framework. The objectives of this work are fourfold. The first objective is to identify the design dimensions that span the design space for temporal models. This will classify design alternatives for temporal models. The design space is then represented by exploiting object-oriented features to model the different aspects of time. The second objective is to show how the temporal framework can be tailored to accommodate real-world applications that have different temporal needs. The third objective is to show how the various existing temporal object models can be represented within this framework. The final objective is to use the framework to analyze and compare the different temporal object models based on the design dimensions. In particular, the [RS91,SC91,KS92,PM92,CITB92,BFG97] temporal object models are considered. The work of Wuu & Dayal [WD92] and Cheng & Gadia [CG93] (which follows a similar methodology as [WD92]) are not considered since they do not provide concrete notions of time in their models. Object models supporting versioning using time usually follow a structural embedding of temporality within type definitions [KGBW90,WLH90,SRH90], [Sci94]. Thus, the notion of temporal objects is lost since the model knows nothing about temporality. Moreover, most temporal version models use the Date function call which is provided by the system. For example, though the EXTRAV version model [Sci94] supports “valid” and “transaction” time, it does so by timestamping attributes using system provided dates. This is limited in scope as no semantics of the various notions of time are provided. Since these models are not “temporal object models” in the strict sense of the term, we do not include them in this study. We can draw a parallel between our work and similar (albeit on a much larger scale) approaches used in Choices [CJR87] and cmcc [ATGL96]. Choices is a framework for operating system construction which was designed to provide a family of operating systems that could be reconfigured to meet diverse user/application requirements. cmcc is an optimizing compiler that makes use of frameworks to facilitate code reuse for different modules of a compiler. Similar to Choices and cmcc, the temporal framework presented in this paper can be regarded as an attempt to construct a family of temporal models. The framework can then be tailored to reflect a particular temporal model which best suits the needs of an application. A particular temporal model would be one of the many “instances” of the framework. The presentation of this paper is divided into five sections. Section 2 presents the temporal framework by identifying the design dimensions (key abstractions) for temporal models and the interactions between them. Section 3 illustrates how the temporal framework can be tailored to accommodate the temporal needs of different applications, and the temporal features of temporal object models. In Section 4 object-oriented techniques are used to compare and analyze different
4
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
temporal object models with respect to the design dimensions. Section 5 summarizes the work presented in this paper, discusses related work, and outlines avenues for future research.
2
The Architecture of the Temporal Framework
In order to accommodate the varying requirements that many applications have for temporal support, we first identify the design dimensions that span the design space for temporal models. Next, we identify the components or features of each design dimension. Finally, we explore the interactions between the design dimensions in order to structure the design space. These steps produce a framework which consists of abstract and concrete object types, and properties (abstractions of methods and attributes in traditional object-oriented terminology). The types are used to model the different design dimensions and their corresponding components. The properties are used to model the different operations on each component, and to represent the relationships (constraints) between the design dimensions. The framework classifies design alternatives for temporal models by providing types and properties that can be used to define the semantics of many different specific notions of time. 2.1
Design Dimensions
The design alternatives for temporal models can be classified along four design dimensions: 1. Temporal Structure − provides the underlying ontology and domains for time. 2. Temporal Representation − provides a means to represent time so that it is human readable. 3. Temporal Order − gives an ordering to time. 4. Temporal History − allows events and activities to be associated with time. There are two parts to the description of a design dimension. First, we define a set of temporal features that the design dimension encompasses. Second, we explore relationships between the temporal features and describe the resulting design space for the design dimension. The design space consists of an architectural overview of abstract and concrete types corresponding to the temporal features, and a design overview which describes some of the key properties (operations) defined in the interface of the types. We do not describe the properties in detail since many of these are traditional temporal operations that have already appeared in the literature on temporal databases. We assume the availability of commonly used object-oriented features − atomic entities (reals, integers, strings, etc.); types for defining common features of objects; properties (which represent methods and instance variables) for specifying the semantics of operations that may be performed on objects; classes
An Object-Oriented Framework for Temporal Data Models
5
which represent the extents of types; and collections for supporting general heterogeneous groupings of objects. In this paper, a reference prefixed by “T ” refers to a type, and “P ” to a property. A type is represented by a rounded box. An abstract type is shaded with a black triangle in its upper left corner, while a concrete type is unshaded. In Figures 5, 8, 9, and 15 the rectangular boxes are objects. Objects have an outgoing edge for each property applicable to the object which is labeled with the name of the property and which leads to an object resulting from the application of the property to the given object. A circle labeled with the symbols { } represents a container object and has outgoing edges labeled with “∈” to each member object. Temporal Structure The first question about a temporal model is “what is its underlying temporal structure?” More specifically, what are the temporal primitives supported in the model, what temporal domains are available over these primitives, and what is the temporal determinacy of the primitives? Indeed, the temporal structure dimension with its various constituents forms the basic building block of the design space of any temporal model since it is comprised of the basic temporal features that underlie the model. We now give an overview of the features of a temporal structure and then identify the relationships that exist between them. Components 1. Temporal Primitives Temporal primitives can either be anchored (absolute) or unanchored (relative) [Sno92]. For example, 31 July 1995 is an anchored temporal primitive since we know exactly where it is located on the time axis, whereas 31 days is an unanchored temporal primitive since it can stand for any block of 31 consecutive days on the time axis. There is only one unanchored primitive, called the span. A span is a duration of time with a known length, but no specific starting and ending anchor points. There are two anchored primitives, the instant (moment, chronon) and the interval. An instant is a specific anchored moment in time, e.g., 31 July 1995. An interval is a duration of time between two specific anchor points (instants) which are the lower and upper bounds of the interval, e.g., [15 June 1995, 31 July 1995]. 2. Temporal Domain The temporal domain of a temporal structure defines a scale for the temporal primitives. A temporal domain can be continuous or discrete. Discrete domains map temporal primitives to the set of integers. That is, for any temporal primitive in a discrete time domain, there is a unique successor and predecessor. Continuous domains map temporal primitives to the set of real numbers. Between any two temporal primitives of a continuous time domain, another temporal primitive exists. Most of the research in the context of temporal databases has assumed that the temporal domain is discrete. Several arguments in favor of using a discrete temporal domain are made by Snodgrass [Sno92] including the
6
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
imprecision of clocking instruments, compatibility with natural language references, possibility of modeling events which have duration, and practicality of implementing a continuous temporal data model. However, Chomicki [Cho94] argues that the continuous (dense) temporal domain is very useful in mathematics and physics. Furthermore, continuous time provides a useful abstraction if time is thought of as discrete but with instants that are very close. In this case, the set of time instants may be very large which in turn may be difficult to implement efficiently. Chomicki further argues that query evaluation in the context of constraint databases [KKR90,Rev90] has been shown to be easier in continuous domains than in discrete domains. Continuous temporal domains have also been used to facilitate full abstract semantics in reasoning about concurrent programs [BKP86]. 3. Temporal Determinacy There are many real world cases where we have complete knowledge of the time or the duration of a particular activity. For example, the time interval allowed for students to complete their Introduction to Database Management Systems examination is known for certain. This is an example of a determinate temporal primitive. However, there are cases when the knowledge of the time or the duration of a particular activity is known only to a certain extent. For example, we do not know the exact time instant when the Earth was formed though we may speculate on an approximate time for this event. In this case, the temporal primitive is indeterminate. Indeterminate temporal information is also prevalent in various sources such as granularity, dating techniques, future planning, and unknown or imprecise event times [DS93]. Since the ultimate purpose of a temporal model is to represent real temporal information, it is desirable for such a model to be able to capture both determinate and indeterminate temporal primitives. Design Space Figure 1 shows the building block hierarchy of a temporal structure. The basic building block consists of anchored and unanchored temporal primitives. The next building block provides a domain for the primitives that consists of discrete or continuous temporal primitives. Finally, the last building block of Figure 1 adds determinacy. Thus, a temporal structure can be defined by a series of progressively enhanced temporal primitives. Figure 2 gives a detailed hierarchy of the different types of temporal primitives that exist in each of the building blocks of Figure 1. Based on the features of a temporal structure, its design space consists of 11 different kinds of temporal primitives. These are the determinacy-domain-based temporal primitives shown in Figure 2 and described below. Continuous time instants and intervals. Continuous instants are just points on the (continuous) line of all anchored time specifications. They are totally ordered by the relation “later than.” Since in theory, continuous instants have infinite precision, they cannot have a period of indeterminacy. Therefore, continuous indeterminate time instants do not
An Object-Oriented Framework for Temporal Data Models
7
Determinacy-Domain-based Domain-based Temporal Temporal Primitives Primitives Temporal Primitives
+ determinacy/ indeterminacy + discrete/continuous domain
Fig. 1. Building a Temporal Structure exist in Figure 2. However, continuous intervals can be determinate or indeterminate. The difference between them is that a continuous determinate interval denotes that the activity associated with it occurs during the whole interval, while a continuous indeterminate interval denotes that the activity associated with it occurs sometime during the interval. Continuous intervals have lower and upper bounds which are continuous instants. Discrete time instants and intervals. Assume that somebody has been on a train the whole day of 5 January 1997. This fact can be expressed using a determinate time instant 5 January 1997det (which means the whole day of). However, the fact that somebody is leaving for Paris on 5 January 1997 can be represented as an indeterminate time instant 5 January 1997indet (which means some time on that day). Hence, each discrete time instant is either determinate or indeterminate, corresponding to the two different interpretations. Discrete time instants are analogous to continuous time intervals. Every determinate (indeterminate) discrete time instant has a granularity (Gi ) associated with it. This granularity determines the mapping of the given determinate (indeterminate) discrete time instant Idet (Iindet ) to the domain of continuous time instants. The mapping is defined as follows: Idet 7→ [Icont , Icont + Gi ) Iindet 7→ [Icont ∼ Icont + Gi ) Here Icont denotes the counterpart of Idet and Iindet in the domain of continuous time instants. This is exemplified by the mapping of the discrete determinate instant 5 January 1997det to the continuous determinate interval [5 January 1997cont, 6 January 1997cont). In this case Gi = Gdays = 1 day. A formal treatment of the different types of instants ¨ and mappings is given in [GLOS97]. Discrete time instants can be used to form discrete time intervals. Since we have determinate and indeterminate discrete instants, we also have determinate and indeterminate discrete intervals. Determinate (indeterminate) time instants can be used as boundaries of determinate (indeterminate) time intervals. Time spans. Discrete and continuous determinate spans represent complete information about a duration of time. A discrete determinate span
8
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron Temporal Structure Design Space Domain-based Temporal Primitives
Determinacy-Domain-based Temporal Primitives Determinate Discrete Instants
Discrete Instants
Temporal Primitives
Indeterminate Discrete Instants Instants
Continuous Instants
Determinate Continuous Instants
Anchored Primitives Determinate Discrete Intervals Discrete Intervals Indeterminate Discrete Intervals Temporal Structure Intervals Determinate Continuous Intervals Continuous Intervals Indeterminate Continuous Intervals
Determinate Discrete Spans Discrete Spans Indeterminate Discrete Spans Unanchored Primitives Determinate Continuous Spans Continuous Spans Indeterminate Continuous Spans
Fig. 2. Design Space of a Temporal Structure
is a summation of distinct granularities with integer coefficients e.g., 5 days or 2 months + 5 days. Similarly, a continuous determinate span is a summation of distinct granularities with real coefficients e.g., 0.31 hours or 5.2 minutes + 0.15 seconds. Discrete and continuous indeterminate spans represent incomplete information about a duration of time. They have lower and upper bounds that are determinate spans. For example, 1 day ∼ 2 days is a discrete indeterminate span that can be interpreted as “a time period between one and two days.” The mapping of the temporal structure to an object type hierarchy is given in Figure 3 which shows the types and generic properties that are used to model various kinds of determinacy-domain-based temporal primitives. Properties defined on time instants allow an instant to be compared with another instant; an instant to be subtracted from another instant to find the time duration between the two; and a time span to be added to or subtracted from an instant to return another instant. Furthermore, properties P calendar and P calElements are used to link time instants to calendars which serve as a representational scheme for temporal primitives (see Section 2.1). P calendar returns the calendar which the instant belongs to and P calElements returns a list of the calendric elements in a time instant. For example P calendar applied to the time instant 15 June 1995 would return
An Object-Oriented Framework for Temporal Data Models
9
P_succ, P_pred
T_detDiscInstant T_indetDiscInstant
P_succ, P_pred
T_instant P_leq, P_geq P_elapsed
T_detContInstant
P_calendar P_calElements
T_anchPrim T_detDiscInterval P_addDuration P_subDuration
T_indetDiscInterval T_temporalStructure T_interval P_before P_after
P_lb, P_ub, P_length P_overlaps, P_during P_starts, P_finishes, P_meets P_union P_intersection P_difference
T_detContInterval T_indetContInterval
T_detDiscSpan T_indetDiscSpan
P_succ, P_pred P_lb, P_ub P_succ, P_pred
T_unanchPrim P_add, P_subtract P_coefficient P_calGranularities
T_detContSpan T_indetContSpan
P_lb, P_ub
Supertype
Subtype
Fig. 3. The Inheritance Hierarchy of a Temporal Structure Gregorian, while the application of P calElements to the same time instant would return (1995, June, 15). Similarly, properties defined on time intervals include unary operations which return the lower bound, upper bound and length of the interval; ordering operations which define Allen’s interval algebra [All84]; and set-theoretic operations. Properties defined on time spans enable comparison and arithmetic operations between spans. The P before and P after properties are refined for time spans to model the semantics of < and >, respectively. Additionally, properties P coefficient and P calGranularities are used as representational properties and provide a link between time spans and calendars (see Section 2.1). P coefficient returns the (real) coefficient of a time span given a specific calendric granularity. For example, (5 days)· P coefficient(day) returns 5.0. P calGranularities returns a collection of calendric granularities in a time span. For example, the property application (1 month + 5 days)· P calGranularities returns {day, month}.
10
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
We note that (see Figure 3) the properties P succ and P pred are defined in all the types involving both discrete instant and span primitives. This redundancy can be eliminated by refactoring the concerned types and using multiple inheritance. More specifically, an abstract type called T discrete can be introduced, and the properties P succ and and P pred defined on it. All the types involving discrete primitives can then be made subtypes of T discrete. A similar approach can be used to factor the types that define properties P lb and P ub. An abstract type called T bounds can be introduced with the properties P lb and P ub defined on it. The T interval type and the types involving indeterminate spans can then be made subtypes of T bounds. Alternatively, the concept of multiple subtyping hierarchies can be used to collect semantically related types together and avoid the duplication of properties [HKOS96]. For example, the unanchored primitives hierarchy can be re-structured as shown in Figure 4. P_succ, P_pred
T_discSpan
T_detDiscSpan T_indetDiscSpan
T_unanchPrim P_add, P_subtract P_coefficient P_calGranularities
T_indetSpan P_lb, P_ub
T_indetContSpan
T_contSpan T_detContSpan
Supertype
Subtype
Fig. 4. Multiple Subtyping Hierarchy for Unanchored Temporal Primitives
Temporal Representation Components. For human readability, it is important to have a representational scheme in which the temporal primitives can be made human readable and usable. This is achieved by means of calendars. Common calendars include the Gregorian and Lunar calendars. Educational institutions also use Academic calendars. Calendars are comprised of different time units of varying granularities that enable the representation of different temporal primitives. In many applications, it is desirable to have multiple calendars that have different calendric granularities. For example, in financial trading, multiple calendars with different time units and operations need to be available to capture the semantics of financial data [CS93,CSS94]. In time series management, extensive calendar support is also required [DDS94,LEW96]. A calendar should be able to support multiple granularities since temporal information processed by a DBMS is usually available in multiple granularities. Such information is prevalent in various sources. For example:
An Object-Oriented Framework for Temporal Data Models
11
– clinical data − Physicians usually specify temporal clinical information for patients with varying granularities [CPP95,CPP96]. For example, “the patient suffered from abdominal pain for 2 hours and 20 minutes on June 15, 1996,” “in 1990, the patient took a calcium antagonist for 3 months,” “in October 1993, the patient had a second heart seizure.” – real-time systems − A process is usually composed of sub-processes that evolve according to times that have different granularities [CMR91]. For example, the temporal evolution of the basin in a hydroelectric plant depends on different sub-processes: the flow of water is measured daily; the opening and closing of radial gates is monitored every minute; and the electronic control has a granularity of microseconds. – geographic information systems − Geographic information is usually specified according to a varying time scale [Flo91]. For example, vegetation fluctuates according to a seasonal cycle, while temperature varies daily. – office information systems − temporal information is available in different time units of the Gregorian calendar [BP85,CR88,MPB92]. For example, employee wages are usually recorded in the time unit of hours while the history of sales are categorized according to months. Design Space. A calendar is composed of an origin, a set of calendric granularities, and a set of conversion functions. The origin marks the start of a calendar1 . Calendric granularities define the reasonable time units (e.g., minute, day, month) that can be used in conjunction with this calendar to represent temporal primitives. A calendric granularity also has a list of calendric elements. For example in the Gregorian calendar, the calendric granularity day has the calendric elements Sunday, Monday, . . . , Saturday. Similarly in the Academic calendar, the calendric granularity semester has the calendric elements Fall, Winter, Spring, and Summer. The conversion functions establish the conversion rules between calendric granularities of a calendar. Since all calendars have the same structure, a single type, called T calendar can be used to model different calendars, where instances represent different calendars. The basic properties of a calendar are, P origin, P calGranularities, and P functions. These allow each calendar to define its origin, calendric granularities, and the conversion functions between different calendric granularities. Example 1. Figure 5 shows four instances of T calendar − the Gregorian, Lunar, Academic, and Fiscal calendars. The origin of the Gregorian calendar is 1
We note that our definition of a calendar is different from that defined in [CS93,CSS94,LEW96] where structured collections of time intervals are termed as “calendars.” Our definition adheres closely to the human understanding of a calendar. However, the extensibility feature of the framework allows any other notions of calendars to be incoporated easily under the temporal representation design dimension.
12
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
given as the span 1582 years from the start of time since it was proclaimed in 1582 by Pope Gregory XIII as a reform of the Julian calendar. The calendric granularities in the Gregorian calendar are the standard ones, year, month, day, etc. The origin of the Academic calendar shown in Figure 5 is assumed to be the span 1908 academicY ears having started in the year 1908, which is the establishment date of the University of Alberta. The Academic calendar has similar calendric granularities as the Gregorian calendar and defines a new calendric granularity of semester. The semantics of the Lunar and Fiscal calendars could similarly be defined.
academicYear
semester
ε
academicMonth
ε
ε
ε
{}
1908 years
ε
P_calGranularities P_origin P_functions
Academic
T_calendar
Fiscal
Lunar
P_functions
Gregorian
{}
{}
P_origin P_calGranularities 1582 years
{}
ε year
ε month
ε ε
ε day
Fig. 5. Temporal Representational Examples
Temporal Order We now have the means of designing the temporal structure and the temporal representation of a temporal model. The next step is to provide an ordering scheme for the temporal primitives. This constitutes the third building block of our design space. Components. A temporal order can be classified as being linear or branching In a linear order, time flows from past to future in an ordered manner. In
An Object-Oriented Framework for Temporal Data Models
13
a branching order, time is linear in the past up to a certain point, when it branches out into alternate futures. The structure of a branching order can be thought of as a tree defining a partial order of times. The trunk (stem) of the tree is a linear order and each of its branches is a branching order. The linear model is used in applications such as office information systems. The branching order is useful in applications such as computer aided design and planning or version control which allow objects to evolve over a nonlinear (branching) time dimension (e.g., multiple futures, or partially ordered design alternatives). Design Space. The different types of temporal orders are dependent on each other. A sub-linear order is one in which the temporal primitives (time intervals) are allowed to overlap, while a linear order is one in which the temporal primitives (time intervals) are not allowed to overlap. Every linear order is also a sub-linear order. A branching order is essentially made up of sub-linear orders. The relationship between temporal orders is shown in Figure 6.
is-a
sub-Linear Order
is-a
Linear Order
Temporal Order composed-of
is-a
Branching Order
Fig. 6. Temporal Order Relationships
The hierarchy in Figure 7 gives the various types and properties which model different temporal orders2 .
T_subLinearOrder
T_linearOrder
P_branchingOrder T_temporalOrder P_temporalPrimitives
T_branchingOrder P_root P_branches P_in
Supertype
Subtype
Fig. 7. The Hierarchy of Temporal Orders
2
We do not consider periodic temporal orders in this work. These can easily be incorporated as a subtype of T temporalOrder.
14
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Example 2. Consider the operations that take place in a hospital on any particular day. It is usually the case that at any given time multiple operations are taking place. Let us assume an eye cataract surgery took place between 8am and 10am, a brain tumor surgery took place between 9am and 12pm, and an open heart surgery took place between 7am and 2pm on a certain day. Figure 8 shows a pictorial representation of operationsOrder, which is an object of type T subLinearOrder. operationsOrder consists of the time intervals [08:00,10:00], [09:00,12:00], [07:00,14:00], and does not belong to any branching timeline. As seen in the figure, operationsOrder consists of intervals (representing the time periods during which the different surgeries took place) that overlap each other. Hence, operationsOrder is an example of a sub-linear order.
operationsOrder
P_branchingOrder
null
P_temporalPrimitives {}
ε
ε
ε
[08:00, 10:00] [09:00, 12:00] [07:00, 14:00]
Fig. 8. An Example of a Sub-Linear Order.
Example 3. To illustrate the use of objects of type T linearOrder which are total linear temporal orders, consider a patient with multiple pathologies, for example as a result of diabetes. The patient has to attend several special clinics, each on a different day. Hence, it follows that since the patient cannot attend more than one special clinic on any day, the temporal order of the patient’s special clinics visit history is linear and totally ordered. Suppose the patient visited the opthalmology clinic on 10 January 1995, the cardiology clinic on 12 January 1995, and the neurology clinic on 3 February 1995. Figure 9 shows a pictorial representation of specialClinicOrder, which is an object of type T linearOrder. As seen in the figure, specialClinicOrder is totally ordered as its time intervals do not overlap. Example 4. Consider an observational pharmacoeconomic analysis of the changing trends, over a period of time, in the treatment of a chronic illness such as ¨ asthma [GOS97]. The analysis would be performed using information gathered over a time period. At a fixed point during this period new guidelines for the treatment of asthma were released. At that point the population of patients
An Object-Oriented Framework for Temporal Data Models specialClinicOrder
P_branchingOrder
15
null
P_temporalPrimitives {}
ε 10 January 1995
ε
ε
12 January 1995
3 February 1995
Fig. 9. An Example of a Linear Order. known to have asthma are divided into those whose doctors continue the old established treatment, and those whose doctors, in accordance with new recommendations, change their treatment. Thus, the patients are divided into two groups, each group undergoing a different treatment for the same illness. The costs and benefits accrued over the trial period for each treatment are calculated. Since such a study consists of several alternative treatments to an illness, a branching timeline is the natural choice for modeling the timeline of the study. The point of branching is the time when the new guidelines for the treatment of the illness are implemented. Figure 10 shows the branching timeline for such a medical trial history.
Regular treatment Treatment A The medical trial branching timeline which includes the Regular Treatment, Treatment A, and Treatment B Treatment B Branching point (time when new guidelines are released)
Fig. 10. An Example of a Branching Order. The same branching timeline could as easily handle the situation where different versions of a particular treatment, say Treatment A, are implemented based on certain parameters. In this case, the “Treatment A” branch would in turn branch at a certain point into different Treatment A versions. This situation is also depicted in Figure 10. Temporal History So far we have considered the various features of time; its structure, the way it is represented, and how it is ordered. The final building
16
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
block of the design space of temporal models makes it possible to associate time with entities to model different temporal histories. Components. One requirement of a temporal model is an ability to represent and manage real-world entities as they evolve over time and assume different states (values). The set of these values forms the temporal history of the entity. Two basic types of temporal histories are considered in databases which incorporate time. These are valid and transaction time histories [SA85]. Valid time denotes the time when an entity is effective (models reality), while transaction time represents the time when a transaction is posted to the database. Usually valid and transaction times are the same. Other temporal histories include event time [RS91,CK94] and decision time [EGS93] histories. Event (decision) time denotes the time the event occured in the real-world. Valid, transaction, and event times have been shown to be adequate in modeling temporal histories [CK94]. Design Space. Since valid, transaction, and event time histories have different semantics, they are orthogonal. Figure 11 shows the various types that could be used to model these different histories. A temporal history consists of objects and their associated timestamps.
T_history
T_validHistory
T_transactionHistory
P_history P_temporalOrder P_insert P_remove P_getObjects
T_eventHistory
Fig. 11. The Types and Properties for Temporal Histories
Property P history defined in T history returns a collection of all timestamped objects that comprise the history. A history object also knows the temporal order of its temporal primitives. The property P temporalOrder returns the temporal order (which is an object of type T temporalOrder) associated with a history object. The temporal order basically orders the time intervals (or time instants) in the history. Another property defined on history objects, P insert, timestamps and inserts an object in the history. Property P remove drops a given object from the history at a specified temporal primitive. The P getObjects property allows the user to get the objects in the history at (during) a given temporal primitive. The properties defined on T history are refined in T validHistory, T transactionHistory, and T eventHistory types to model the semantics of the different kinds of histories. Moreover, each history type can define additional properties, if nec-
An Object-Oriented Framework for Temporal Data Models
17
essary, to model its particular semantics. The clinical example described in Section 3.1 illustrates the use of the properties defined on T history.
2.2
Relationships between Design Dimensions
In the previous section we described the building blocks (design dimensions) for temporal models and identified the design space of each dimension. We now look at the interactions between the design dimensions. This will enable us to put the building blocks together and structure the design space for temporal models. A temporal history is composed of entities which are ordered in time. This temporal ordering is over a collection of temporal primitives in the history, which in turn are represented in a certain manner. Hence, the four dimensions can be linked via the “has-a” relationship shown in Figure 12.
Temporal Model Design Space Valid Temporal History
Transaction Event
has sub-Linear Temporal Order
Linear Branching
has Determinate Discrete Instants Indeterminate Discrete Instants Temporal Structure
Determinate Continuous Instants Determinate Discrete Intervals
has
Indeterminate Discrete Intervals Determinate Continuous Intervals Indeterminate Continuous Intervals Determinate Discrete Spans Indeterminate Discrete Spans Determinate Continuous Spans Indeterminate Continuous Spans Gregorian Academic
Temporal Representation
Business Financial
Fig. 12. Design Space for Temporal Models
18
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Basically, a temporal model can be envisioned as having a notion of time, which has an underlying temporal structure, a means to represent the temporal structure, and different temporal orders to order the temporal primitives within a temporal structure. This notion of time, when combined with application objects can be used to represent various temporal histories of the objects in the temporal model. Figure 12 gives the design space for temporal models. A temporal model can support one or more of valid, transaction, event, and user-defined histories. Each history in turn has a certain temporal order. This temporal order has properties which are defined by the type of temporal history (linear or branching). A linear history may or may not allow overlapping of anchored temporal primitives that belong to it. If it does not allow overlapping, then such a history defines a total order on the anchored temporal primitives that belong to it. Otherwise, it defines a partial order on its anchored temporal primitives. Each order can then have a temporal structure which is comprised of all or a subset of the 11 different temporal primitives that are shown in Figure 2. Finally, different calendars can be defined as a means to represent the temporal primitives. The four dimensions are modeled in an object system by the respective types shown in Figure 13. The “has a” relationship between the dimensions is modeled using properties as shown in the figure. An object of T temporalHistory represents a temporal history. Its temporal order is obtained using the P temporalOrder property. A temporal order is an object of type T temporalOrder and has a certain temporal structure which is obtained using the P temporalPrimitives property. The temporal structure is an object of type T temporalStructure. The property P calendar gives the instance of T calendar which is used to represent the temporal structure.
T_temporalFramework
T_calendar
T_temporalStructure
P_calendar
T_temporalOrder
P_temporalPrimitives
T_temporalHistory
P_temporalOrder
Fig. 13. Relationships between Design Dimensions Types
The relationships shown in Figure 13 provide a temporal framework which encompasses the design space for temporal models. The detailed type system, shown in Figure 14, is based on the design dimensions identified in Section 2 and their various features which are given in Figures 3, 7, and 11. As described in Section 2.1, refactoring of types and multiple inheritance can be used to handle identical properties that are defined over different types in the inheritance
An Object-Oriented Framework for Temporal Data Models
19
hierarchy shown in Figure 14. The framework can now be tailored for the temporal needs of different applications and temporal models. This is illustrated in Section 3.
P_succ, P_pred
T_detDiscInstant T_indetDiscInstant
T_instant
P_succ, P_pred
P_leq, P_geq P_elapsed P_calendar
T_detContInstant
P_calElements
T_anchPrim P_addDuration
T_detDiscInterval
P_subDuration
T_temporalStructure
T_indetDiscInterval
T_interval P_before P_lb, P_ub, P_length
P_after
P_overlaps, P_during P_starts, P_finishes, P_meets
T_detContInterval
P_union P_intersection P_difference
T_indetContInterval
P_succ, P_pred
T_unanchPrim
T_temporalFramework
T_indetDiscSpan
P_add, P_subtract
P_lb, P_ub, P_succ, P_pred
P_coefficient P_calGranularities
T_detContSpan
T_calendar P_origin P_calGranularities P_functions
T_detDiscSpan
T_indetContSpan P_lb, P_ub
T_subLinearOrder
T_linearOrder
P_branchingOrder T_temporalOrder P_temporalPrimitives
T_branchingOrder
P_root P_branches P_in T_validHistory
T_history
Supertype
P_history P_temporalOrder P_insert P_remove P_getObjects
T_transactionHistory T_eventHistory Subtype
Fig. 14. The Inheritance Hierarchy for the Temporal Framework
3
Tailoring the Temporal Framework
In this section, we illustrate how the temporal framework that is defined in Section 2 can be tailored to accommodate applications and temporal models
20
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Temporal History timeStampedHematology1
timeStampedMicrobiology
ε
ε
ε
timeStampedHematology2
{} P_insert ( aBloodTest,aTimeStamp ) P_remove ( aBloodTest,aTimeStamp )
{}
P_history
timestamped blood tests
P_getObjects( aTimeStamp )
bloodTestHistory
P_temporalOrder
Temporal Order
bloodTestOrder
P_branchingOrder
null
P_temporalPrimitives
ε
{}
15 January 1995
ε
20 February 1995
Temporal Structure P_calendar
P_calendar Gregorian
{}
P_calGranularities
P_origin
Temporal Representation
P_functions
{}
1582 years
ε year
ε month
ε day
Fig. 15. A Patient’s Blood Test History which have different temporal requirements. In the first two sub-sections, we give examples of two real-world applications that have different temporal needs. In the last sub-section, we give an example of a temporal object model and show how the model can be derived from the temporal framework. 3.1
Clinical Data Management
In this section we give a real-world example from clinical data management that illustrates the four design dimensions and the relationships between them which were discussed in Section 2. During the course of a patient’s illness, different blood tests are administered. It is usually the case that multiple blood tests of the patient are carried out on the same day. Suppose the patient was suspected of having an infection of the blood, and therefore had two different blood tests on 15 January 1995. These were the diagnostic hematology and microbiology blood tests. As a result of a very raised white cell count the patient was given a course of antibiotics while
An Object-Oriented Framework for Temporal Data Models
21
the results of the tests were awaited. A repeat hematology test was ordered on 20 February 1995. Suppose each blood test is represented by an object of the type T bloodTest. The valid history of the patient’s blood tests can then be represented in the object database as an object of type T validHistory. Let us call this object bloodTestHistory. To record the hematology and microbiology blood tests, the objects microbiology, hematology1, and hematology2 with type T bloodTest are first created and then entered into the object database using the following property applications:
bloodTestHistory.P insert(microbiology, 15 January 1995) bloodTestHistory.P insert(hematology1, 15 January 1995) bloodTestHistory.P insert(hematology2, 20 F ebruary 1995) If subsequently there is a need to determine which blood tests the patient took in January 1995, this would be accomplished by the following property application: bloodTestHistory.P getObjects([1 January 1995, 31 January 1995]) This would return a collection of timestamped objects of T bloodTest representing all the blood tests the patient took in January 1995. These objects would be the (timestamped) hematology1 and the (timestamped) microbiology. Figure 15 shows the different temporal features that are needed to keep track of a patient’s blood tests over the course of a particular illness. The figure also illustrates the relationships between the different design dimensions of the temporal framework. The patient has a blood test history represented by the object bloodTestHistory. The P history property when applied to bloodTestHistory results in a collection object whose members are the timestamped objects timeStampedMicrobiology, timeStampedHematology1, and timeStampedHematology2. The P insert property updates the blood test history (bloodTestHistory) by inserting an object of type T bloodTest at a given anchored temporal primitive. Similarly, the property P remove updates the bloodTestHistory by removing an object of type T bloodTest at a given anchored temporal primitive. The P getObjects property returns a collection of timestamped blood test objects when given an anchored temporal primitive. Applying the property P temporalOrder to bloodTestHistory results in the object bloodTestOrder which represents the temporal order on different blood tests in bloodTestHistory. bloodTestOrder has a certain temporal structure which is obtained by applying the P temporalPrimitives property. Finally, the primitives in the temporal structure are represented using the Gregorian calendar, Gregorian and the calendric granularities year, month, and day. Let us now consider the various temporal features required to represent the different blood tests taken by a patient. Anchored, discrete, and determinate temporal primitives are required to model the dates on which the patient takes different blood tests. These dates are represented using the Gregorian calendar.
22
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Since the blood tests take place on specific days, the temporal primitives during which the patient took blood tests form a total order. Lastly, a valid time history is used to keep track of the different times the blood tests were carried out. To support these temporal features, the temporal framework can be reconfigured with the appropriate types and properties. These are given in Figure 16.
T_instant
T_temporalStructure P_before P_after
P_leq, P_geq P_elapsed P_calendar
T_anchPrim
T_detDiscInstant P_succ, P_pred
P_calElements
P_addDuration P_subDuration
T_interval
T_detDiscInterval
P_lb, P_ub, P_length P_overlaps, P_during P_starts, P_finishes, P_meets
T_calendar
T_temporalFramework
P_origin P_calGranularities P_functions
T_temporalOrder
P_union P_intersection P_difference
T_linearOrder
P_temporalPrimitives T_history
Supertype
P_history P_temporalOrder P_insert P_remove P_getObjects
T_validHistory
Subtype
Fig. 16. The Temporal Framework Inheritance Hierarchy for the Clinical Application
3.2
Time Series Management
The management of time series is important in many application areas such as finance, banking, and economic research. One of the main features of time series management is extensive calendar support [DDS94,LEW96]. Calendars map time points to their corresponding data and provide a platform for granularity conversions and temporal queries. Therefore, the temporal requirements of a time series management system include elaborate calendric functionality (which allows the definition of multiple calendars and granularities) and variable temporal structure (which supports both anchored and unanchored temporal primitives, and the different operations on them). Figure 17 shows how the temporal requirements of a time series management system can be modeled using the types and properties of the temporal
An Object-Oriented Framework for Temporal Data Models T_instant
23
T_detDiscInstant
P_leq, P_geq P_elapsed P_calendar P_calElements
T_anchPrim P_addDuration P_subDuration
T_temporalStructure P_before P_after
T_temporalFramework
T_calendar
Supertype
P_origin P_calGranularities P_functions
T_interval
T_detDiscInterval
P_lb, p_ub, P_length P_overlaps, P_during P_starts, P_finishes, P_meets P_union P_intersection B_difference
T_unanchPrim
T_detDiscSpan
P_add, P_subtract P_coefficient P_calGranularities
Subtype
Fig. 17. The Temporal Framework Inheritance Hierarchy for Time Series Management framework. We note from the figure that only the temporal structure and temporal representation design dimensions are used to represent the temporal needs of a time series. This demonstrates that it is not necessary for an application requiring temporal features to have all four design dimensions in order to be accommodated in the framework. One or more of the design dimensions specified in Section 2.1 can be used as long as the design criteria shown in Figure 12 holds.
3.3
TOODM - A Temporal Object-Oriented Data Model
In this section, we identify the temporal features of Rose & Segev’s temporal object-oriented data model (TOODM) [RS91] according to the design dimensions described in Section 2.1, and show how these can be accommodated in the temporal framework. We specifically concentrate on TOODM since it uses object types and inheritance to model temporality. The temporal features of the rest of the reported temporal object models [SC91,KS92,CITB92,PM92,BFG97] are summarized and compared in Section 4. We first give an overview of the temporal features of TOODM and then show how these features can be derived using the types and properties of our temporal framework. There is no doubt that TOODM has more functionality to offer in addition to temporality, but presenting that is beyond the scope of this work. Overview of Temporal Features TOODM was designed by extending an object-oriented entity-relationship data model to incorporate temporal struc-
24
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Structure Representation Order History Primitives Domain Determinacy Anchored Continuous Determinate Gregorian Calendar Total Linear Valid Unanchored Transaction Event
Table 1. Temporal Design Dimension Features of TOODM
tures and constraints. The functionality of TOODM includes: specification and enforcement of temporal constraints; support for past, present, and future time; support for different type and instance histories; and allowance for retro/proactive updates. The type hierarchy of the TOODM system defined types used to model temporality is given in Figure 18. The boxes with a dashed border represent types that have been introduced to model time, while the rest of the boxes represent basic types.
Class
Object
Collections
Ptypes
V-Class
Sequence[T]
TS[T]
Time
Relative
Absolute
TI
TP
Fig. 18. System Defined Temporal Types in TOODM The Object type is the root of the type tree. The type V-Class is used to represent user-defined versionable classes. More specifically, if the instance variables, messages/methods, or constraints of a type are allowed to change (maintain histories), the type must be defined as a subtype of V-Class. The Ptypes type models primitive types and is used to represent objects which do not have any instance variables. Ptypes usually serve as domains for the instance variables of other objects. The Time primitive type is used to represent temporal primitives. The TP type represents time points, while the TI type represents time intervals. Time points can have specific different calendar granularities, namely Year, Month, Day, Week, Hour, Minute, and Second. The TS[T] type represents a time sequence which is a collection of objects ordered on time. TS[T] is a parametric type with the type T representing a
An Object-Oriented Framework for Temporal Data Models
25
user or system defined type upon which a time sequence is being defined. For every time-varying attribute in a (versionable) class, a corresponding subclass (of TS[T]) is defined to represent the time sequence (history) of that attribute. For example, if the salary history of an employee is to be maintained, a subclass (e.g., TS[Salary]) of TS[T] has to be defined so that the salary instance variable in the employee class (which is defined as a subclass of V-Class) can refer to it to obtain the salary history of a particular employee. The history of an object of type TS[T] is represented as a pair
, where T is the data type and T L defines the different timelines and their granularities that are associated with T . Three timelines are allowed in TOODM: valid time, record (transaction) time, and event time (the time an event occurred). Each timeline associated with an object is comprised of time points or time intervals and has an underlying granularity.
Representing the Temporal Features of TOODM in the Temporal Framework TOODM supports both anchored and unanchored primitives. These are modeled by the Absolute and Relative types shown in Figure 18. The anchored temporal primitives supported are time instants and time intervals. A continuous time domain is used to perceive the temporal primitives. Finally, the temporal primitives are determinate. Time points and time intervals are represented by using the Gregorian calendar with granularities Year, Month, Day, Week, Hour, Minute, and Second. Translations between granularities in operations are provided, with the default being to convert to the coarser granularity. A (presumably total) linear order of time is used to order the primitives in a temporal sequence. TOODM combines time with facts to model different temporal histories, namely, valid, transaction, and event time histories. Table 1 summarizes the temporal features (design space) of TOODM according to the design dimensions for temporal models that were described in Section 2.1. Figure 19 shows the type system instance of our temporal framework that corresponds to the TOODM time types shown in Figure 18 and described in Table 1. The Time primitive type is represented using the T temporalStructure type. The TP and TI types are represented using the T instant and T interval types, respectively. Similarly, the Relative type is represented using the T unanchPrim type. Since TOODM supports continuous and determinate temporal primitives, the (concrete) types T detContInstant, T detContInterval, and T detContSpan are used to model continuous and determinate instants, intervals, and spans, respectively. The Gregorian calendar and its different calendric granularities are modeled using the T calendar type. Time points and time intervals are ordered using the T linearOrder type. Time sequences represented by the TS[T] type are modeled by the history types in the temporal framework. More specifically, valid time (vt), record time (rt), and event time (et) are modeled using the T validHistory, T transactionHistory, and T eventHistory types.
26
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron T_instant
T_detContInstant
P_leq, P_geq P_elapsed P_calendar P_calElements
T_anchPrim P_addDuration P_subDuration
T_temporalStructure P_before P_after
T_calendar P_origin P_calGranularities P_functions T_temporalFramework
T_temporalOrder
T_interval
T_detContInterval
P_lb, p_ub, P_length P_overlaps, P_during P_starts, P_finishes, P_meets P_union P_intersection B_difference
T_unanchPrim
T_detContSpan
P_add, P_subtract P_coefficient P_calGranularities
T_linearOrder
P_temporalPrimitives T_validHistory T_history
Supertype
P_history P_temporalOrder P_insert P_remove P_getObjects
T_transactionHistory T_eventHistory
Subtype
Fig. 19. The Temporal Framework Inheritance Hierarchy for TOODM
TOODM models valid, transaction and event histories all together in one structure as shown by the TS[Salary] type in the previous section. Our temporal framework, however, provides different types to model valid, transaction, and event histories to allow their respective semantics to be modeled. Moreover, it uses properties to access the various components of histories. For example, to represent the valid history of an employee’s salary an object of type T validHistory is first created. The P insert property then inserts objects of type T integer (representing salary values) and objects of type T interval (representing time intervals) into the salary valid history object. The transaction and event time histories of the salary are similarly represented, except in these histories the P insert property inserts timestamps which are time instants (i.e., objects of type T instant).
4
Comparison of Temporal Object Models
In this section we use the temporal framework to compare and analyze the temporal object models [RS91,SC91,KS92,CITB92,PM92,BFG97] that have ap-
An Object-Oriented Framework for Temporal Data Models
27
peared in recent literature. The temporal features of these models are summarized in Tables 1 and 2. Our criteria in comparing different temporal object models is based on the design dimensions identified in Section 2.1. It is true that the models may have other (salient) temporal differences, but our concern in this work is comparing their temporal features in terms of the framework defined in Section 2. Similar to the methodology used in Section 2, object-oriented techniques are used to classify temporal object models according to each design dimension. This gives us an indication of how temporal object models range in their provision for different temporal features of a design dimension − from the most powerful model (i.e., the one having the most number of temporal features) to the least powerful model (i.e., the one having the least number of temporal features).
Model
Structure Representation Primitives Domain Determinacy OSAM*/T Anchored Discrete Determinate N/A TMAD Anchored Discrete Determinate Gregorian Calendar TEDM Anchored Discrete Determinate N/A
T-3DIS
Anchored Discrete Determinate Gregorian Calendar T-Chimera Anchored Discrete Determinate N/A
Order
History
Linear Valid Linear Valid Transaction Linear Valid Transaction Event Partial Valid Linear Valid
Table 2. Design Dimension Features of different Temporal Object Models
Temporal Structure. It can be noticed from Tables 1 and 2 that most of the models support a very simple temporal structure, consisting of anchored primitives which are discrete and determinate. In fact, all models in Table 2 support the same temporal structure, which consists of discrete and determinate anchored temporal primitives. These primitives can be accommodated in the temporal framework by the T anchPrim, T instant, T detDiscinstant, T interval, and T detDiscInterval types, and their respective properties. The temporal structure of TOODM is slightly enhanced with the presence of unanchored primitives. TOODM is also the only model that supports the continuous temporal domain. Figure 20 shows how the type inheritance hierarchy is used to classify temporal object models according to their temporal structures. The temporal structures of OSAM*/T, TMAD, TEDM, T-3DIS, and T-Chimera can be modeled by a single type − that representing temporal primitives that are anchored, discrete, and determinate. This means that any of these models
28
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
can be used to provide temporal support for applications that need a temporal structure comprised of anchored temporal primitives which are discrete and determinate. Similarly, the temporal structure of TOODM can be be modeled by a type which represents anchored and unanchored temporal primitives that are continuous and determinate. This implies that TOODM is the only model that can support applications requiring a continuous time domain, or unanchored temporal primitives. OSAM*/T, TMAD, TEDM, T-3DIS, T_Chimera
Anchored, Determinate, & Discrete Temporal Primitives Anchored & Determinate Temporal Primitives Anchored & Unanchored, Determinate & Continuous Temporal Primitives
Supertype
TOODM
Subtype
Fig. 20. Classification of Temporal Object Models according to their Temporal Structures Temporal Representation. Temporal primitives in the OSAM*/T [SC91], TEDM [CITB92], and T-Chimera [BFG97] models are simply represented using natural numbers. The models do not provide any additional representational scheme which supports calendars and different granularities. The granularity of the temporal primitives is dependent on the application using the model. When a calendric representational scheme is provided for the temporal primitives, it is comprised of a single underlying calendar, which is usually Gregorian. This is the case in the TOODM [RS91], TMAD[KS92], and T-3DIS [PM92] models. Temporal Order. All models shown in Tables 1 and 2, except T-3DIS, support a linear temporal order. The T-3DIS model supports a sub-linear temporal order. These temporal orders are accommodated in the temporal framework using the T subLinearOrder and T linearOrder types. Figure 21 shows how the models can be classified in an inheritance type hierarchy according to their temporal orders. The type modeling a partial linear order of time sits at the root of the hierarchy and represents the T-3DIS model. Since a total linear order is also a partial order, the models supporting total linear orders can be represented by a direct subtype of the root type. Temporal History. Tables 1 and 2 show how the temporal object models range in their support for the different types of temporal histories. Figure 22 shows
An Object-Oriented Framework for Temporal Data Models
29
TOODM, OSAM*/T,TMAD, TEDM, T-Chimera
T-3DIS
Partial Linear Orders Supertype
Linear Orders Subtype
Fig. 21. Classification of Temporal Object Models according to their Temporal Orders how the models can be classified according to the temporal histories they support using a type inheritance hierarchy. The root type in Figure 22 represents the models which only support valid time histories. These are the OSAM*/T, T-3DIS, and T-Chimera models. A direct subtype of the root type inherits the valid time history and provides transaction time history as well. This type represents the TMAD model. Similarly, the rest of the subtypes inherit different histories from their supertypes and add new histories to their type as shown in Figure 22. From Figure 22, we see that applications requiring only valid time histories can be supported by all models; applications requiring valid and transaction time can be supported by the TMAD, TEDM, and TOODM models; and applications requiring valid, transaction, and event time can be supported by the TEDM and TOODM models. OSAM*/T, T-3DIS, T-Chimera
Valid Time History Supertype
TMAD
Valid & Transaction Time History
TOODM, TEDM
Valid & Transaction & Event Time History Subtype
Fig. 22. Classification of Temporal Object Models according to their Temporal Histories Overall Classification. Having classified the temporal object models according to the individual design dimensions, we now treat the models as points in the design space and use the object-oriented inheritance hierarchy to compare the models on all the temporal features of the design dimensions that they support. Figure 23 gives an inheritance hierarchy in which types are used to represent the different models, and the temporal features supported by the models are used as a criteria for inheritance. The abstract type at the root of the hierarchy represents the least powerful temporal object model which supports a temporal structure comprised of anchored primitives which are discrete and determinate, no temporal repre-
30
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
fewer features (types)
Temporal Structure: Anchored, Discrete, & Determinate Temporal Representation: None Temporal Order: Partial Linear Temporal History: Valid
OSAM*/T, T-Chimera
T-3DIS
Temporal Structure: Anchored, Discrete, & Determinate
Temporal Structure: Anchored, Discrete, & Determinate
Temporal Representation: None
Temporal Representation: Gregorian
Temporal Order: Total Linear
Temporal Order: Partial Linear
Temporal History: Valid
Temporal History: Valid
TEDM
TMAD
Temporal Structure: Anchored, Discrete, & Determinate
Temporal Structure: Anchored, Discrete, & Determinate
Temporal Representation: None
Temporal Representation: Gregorian
Temporal Order: Total Linear
Temporal Order: Total Linear
Temporal History: Valid, Transaction, Event
Temporal History: Valid, Transaction
Temporal Structure: Anchored, Unanchored, Continuous & Determinate Temporal Representation: Gregorian
TOODM
Temporal Order: Total Linear
more features (types)
Temporal History: Valid, Transaction, Event
Fig. 23. Overall Classification of Temporal Object Models
An Object-Oriented Framework for Temporal Data Models
31
sentational scheme, a partial linear order, and a valid time history. This type has two immediate subtypes. The first subtype represents the OSAM*/T and the T-Chimera models. It inherits all the features of the root type and refines its partial linear order to a total linear order. Similarly, the second subtype represents the T-3DIS model, inherits all the features of the root type, and adds a representational scheme which supports the Gregorian calendar. The type representing OSAM*/T and T-Chimera also has two subtypes. The first subtype represents the TEDM model and has all the features of its supertype with the additional features of transaction and event time histories. The second subtype (which is also a subtype of the type representing T-3DIS from which it inherits the representational scheme) represents the TMAD model. This type has the additional feature of the transaction time history. A direct subtype of the types representing TEDM and TMAD represents the TOODM model. The type representing TOODM inherits the representational scheme from the type representing TMAD and the event time history from the type representing TEDM. It also adds unanchored primitives and the continuous time domain to its temporal structure. From Figure 23 it can reasonably be concluded that OSAM*/T and T-Chimera are the two least powerful temporal object models since they provide the least number of temporal features. The TOODM model is the most powerful since it provides the most number of temporal features. The comparison of different temporal object models made in this section shows that there is significant similarity in the temporal features supported by the models. In fact, the temporal features supported by OSAM*/T and T-Chimera are identical. The temporal features of TEDM are identical to those of OSAM*/T and T-Chimera in the temporal structure, temporal representation, and temporal order design dimensions. These commonalities substantiate the need for a temporal framework which combines the diverse features of time under a single infrastructure that allows design reuse. We also note that temporal object models have not really taken advantage of the richness of their underlying object model in supporting alternate features of a design dimension. They have assumed a set of fixed particular underlying notions of time. From a range of different temporal features, a single temporal feature is supported in most of the design dimensions. As such, not much advantage has been gained over the temporal relational models in supporting applications that have different temporal needs. For example, engineering applications like CAD would benefit from a branching time model, while time series and financial applications require multiple calendars and granularities. The temporal framework proposed in this work aims to exploit object-oriented technology in supporting a wide range of applications with diverse temporal needs.
5
Discussion and Conclusions
In this work the different design dimensions that span the design space of temporal object models are identified. Object-oriented techniques are used to design an
32
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
infrastructure which supports the diverse notions of time under a single framework. We demonstrate the expressiveness of the framework by showing how it can be used to accommodate the temporal needs of different real-world applications, and also reflect different temporal object models that have been reported in the literature. A similar objective is pursued by Wuu & Dayal [WD92] who provide an abstract time type to model the most general semantics of time which can then be subtyped (by the user or database designer) to model the various notions of time required by specific applications. The temporal framework presented here subsumes the work of Wuu & Dayal in that it provides the user or database designer with explicit types and properties to model the diverse features of time. Their approach requires significant support from the user, including specification of the temporal schema, which is a complex, and non-trivial task. It is therefore imperative for temporal object models to have a temporal infrastructure from which users can choose the temporal features they need. Using the object-oriented type system to structure the design space of temporal object models and identify the dependencies within and among the design dimensions helps us simplify the presentation of the otherwise complex domain of time. The framework is extensible in that additional temporal features can be added as long as the relationships between the design dimensions are maintained. The focus in this work is on the unified provision of temporal features which can be used by temporal object models according to their temporal needs. Once these are in place, the model can then define other object-oriented features to support its application domain. The temporal framework also provides a means of comparing temporal objects models according to the design dimensions identified in Section 2.1. This helps identify the strengths and weaknesses of the different models. The diverse features of time are also identified in [Sno95]. The focus however, is on comparing various temporal object models and query languages based on their ability to support valid and transaction time histories. In this work we show how the generic aspects of temporal models can be captured and described using a single framework. In [PLL96] a temporal reference framework for multimedia synchronization is proposed and used to compare existing temporal specification schemes and their relationships to multimedia synchronization. The focus however, is on different forms of temporal specification, and not on different notions of time. The model of time used concentrates only on temporal primitives and their representation schemes. The temporal framework has been implemented in C++. A toolkit has been developed which allows users/temporal model designers to interact with the framework at a high level and generate specific framework instances for their own applications. The next step is to build query semantics on top of the framework. This will involve addressing issues such as: how the choices of different design dimensions affect the query semantics; what kind of query constructs are needed; what properties should be provided; and how are these properties used, to name a few.
An Object-Oriented Framework for Temporal Data Models
33
References All84. ATGL96.
BFG97.
BKP86.
BP85.
CG93.
Cho94.
CITB92.
CJR87.
CK94. CMR91.
CPP95.
CPP96.
CR88.
CS93.
J. F. Allen. Towards a General Theory of Action and Time. Artificial Intelligence, 23(123):123–154, July 1984. A-R. Adl-Tabatabai, T. Gross, and G-Y. Lueh. Code Reuse in an Optimizing Compiler. In Proc. of the Int’l Conf on Object-Oriented Programming: Systems, Languages, and Applications - OOPSLA ’96, pages 51–68, October 1996. E. Bertino, E. Ferrari, and G. Guerrini. T Chimera - A Temporal ObjectOriented Data Model. Theory and Practice of Object Systems, 3(2):103– 125, 1997. H. Barringer, R. Kuiper, and A. Pnueli. A Really Abstract Concurrent Model and its Temporal Logic. In Proc. of the 13th ACM Symposium on Principles of Programming Languages, pages 173–183, 1986. F. Barbic and B. Pernici. Time Modeling in Office Information Systems. In Proc. ACM SIGMOD Int’l. Conf. on Management of Data, pages 51–62, May 1985. T.S. Cheng and S.K. Gadia. An Object-Oriented Model for Temporal Databases. In Proceedings of the International Workshop on an Infrastructure for Temporal Databases, pages N1–N19, June 1993. J. Chomicki. Temporal Query Languages: A Survey. In D. Gabbay and H. Ohlbach, editors, Proceedings of the International Conference on Temporal Logic, pages 506–534. Lecture Notes in Computer Science, Vol. 827, Springer Verlag, July 1994. W.W. Chu, I.T. Ieong, R.K. Taira, and C.M. Breant. A Temporal Evolutionary Object-Oriented Data Model and Its Query Language for Medical Image Management. In Proc. 18th Int’l Conf. on Very Large Data Bases, pages 53–64, August 1992. R.H. Campbell, G.M. Johnston, and V.F. Russo. Choices (Class Hierarchical Open Interface for Custom Embedded Systems). Operating Systems Review, 21(3):9–17, 1987. S. Chakravarthy and S-K. Kim. Resolution of Time Concepts in Temporal Databases. Information Sciences, 80(1-2):91–125, September 1994. E. Corsetti, A. Montanari, and E. Ratto. Dealing with Different Time Granularities in Formal Specifications of Real-Time Systems. The Journal of Real-Time Systems, 3(2):191–215, 1991. C. Combi, F. Pinciroli, and G. Pozzi. Managing Different Time Granularities of Clinical Information by an Interval-Based Temporal Data Model. Methods of Information in Medicine, 34(5):458–474, 1995. C. Combi, F. Pinciroli, and G. Pozzi. Managing Time Granularity of Narrative Clinical Information: The Temporal Data Model TIME-NESIS. In L. Chittaro, S. Goodwin, H. Hamilton, and A. Montanari, editors, Third International Workshop on Temporal Representation and Reasoning (TIME’96), pages 88–93. IEEE Computer Society Press, 1996. J. Clifford and A. Rao. A Simple, General Structure for Temporal Domains. In C. Rolland, F. Bodart, and M. Leonard, editors, Temporal Aspects in Information Systems, pages 17–30. North-Holland, 1988. R. Chandra and A. Segev. Managing Temporal Financial Data in an Extensible Database. In Proc. 19th Int’l Conf. on Very Large Data Bases, pages 302–313, August 1993.
34 CSS94.
¨ Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
R. Chandra, A. Segev, and M. Stonebraker. Implementing Calendars and Temporal Rules in Next-Generation Databases. In Proc. 10th Int’l. Conf. on Data Engineering, pages 264–273, February 1994. DDS94. W. Dreyer, A.K. Dittrich, and D. Schmidt. An Object-Oriented Data Model for a Time Series Management System. In Proc. 7th International Working Conference on Scientific and Statistical Database Management, pages 186– 195, September 1994. DS93. C.E. Dyreson and R.T. Snodgrass. Valid-time Indeterminacy. In Proc. 9th Int’l. Conf. on Data Engineering, pages 335–343, April 1993. EGS93. O. Etzion, A. Gal, and A. Segev. Temporal Active Databases. In Proceedings of the International Workshop on an Infrastructure for Temporal Databases, June 1993. Flo91. R. Flowerdew. Geographical Information Systems. John Wiley and Sons, 1991. Volume 1. ¨ ¨ GLOS97. I.A. Goralwalla, Yuri Leontiev, M.T. Ozsu, and Duane Szafron. Modeling Temporal Primitives: Back to Basics. In Proc. Sixth Int’l. Conf. on Information and Knowledge Management, pages 24–31, November 1997. ¨ ¨ GOS97. I.A. Goralwalla, M.T. Ozsu, and D. Szafron. Modeling Medical Trials in Pharmacoeconomics using a Temporal Object Model. Computers in Biology and Medicine - Special Issue on Time-Oriented Systems in Medicine, 27(5):369 – 387, 1997. HKOS96. W.H. Harrison, H. Kilov, H.L. Ossher, and I. Simmonds. From Dynamic Supertypes to Subjects: a Natural way to Specify and Develop Systems. IBM Systems Journal, 35(2):244–256, 1996. JF88. R.E. Johnson and B. Foote. Designing Reusable Classes. Journal of ObjectOriented Programming, 1(2):22–35, 1988. KGBW90. W. Kim, J.F. Garza, N. Ballou, and D. Wolek. Architecture of the ORION Next-Generation Database System. IEEE Transactions on Knowledge and Data Engineering, 2(1):109–124, March 1990. KKR90. P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint Query Languages. In Proc. of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 299–313, April 1990. Kli93. N. Kline. An Update of the Temporal Database Bibliography. ACM SIGMOD Record, 22(4):66–80, December 1993. KS92. W. Kafer and H. Schoning. Realizing a Temporal Complex-Object Data Model. In Proc. ACM SIGMOD Int’l. Conf. on Management of Data, pages 266–275, June 1992. LEW96. J.Y. Lee, R. Elmasri, and J. Won. Specification of Calendars and Time Series for Temporal Databases. In Proc. 15th International Conference on Conceptual Modeling (ER’96), pages 341–356, October 1996. Proceedings published as Lecture Notes in Computer Science, Volume 1157, Bernhard Thalheim (editor), Springer-Verlag, 1996. MPB92. R. Maiocchi, B. Pernici, and F. Barbic. Automatic Deduction of Temporal Information. ACM Transactions on Database Systems, 17(4):647–688, 1992. PLL96. M.J. Perez-Luque and T.D.C. Little. A Temporal Reference Framework for Multimedia Synchronization. IEEE Journal on Selected Areas in Communications, 14(1):36–51, January 1996. PM92. N. Pissinou and K. Makki. A Framework for Temporal Object Databases. In Proc. First Int’l. Conf. on Information and Knowledge Management, pages 86–97, November 1992.
An Object-Oriented Framework for Temporal Data Models Rev90. RS91.
SA85.
SC91.
Sci94. Sno86. Sno87. Sno92.
Sno95.
Soo91. SRH90.
SS88. TK96. WD92.
WLH90.
35
P.Z. Revesz. A Closed Form for Datalog Queries with Integer Order. In International Conference on Database Theory, pages 187–201, 1990. E. Rose and A. Segev. TOODM - A Temporal Object-Oriented Data Model with Temporal Constraints. In Proc. 10th Int’l Conf. on the Entity Relationship Approach, pages 205–229, October 1991. R. Snodgrass and I. Ahn. A Taxonomy of Time in Databases. In Proc. ACM SIGMOD Int’l. Conf. on Management of Data, pages 236–246, May 1985. S.Y.W. Su and H.M. Chen. A Temporal Knowledge Representation Model OSAM*/T and its Query Language OQL/T. In Proc. 17th Int’l Conf. on Very Large Data bases, pages 431–442, 1991. E. Sciore. Versioning and Configuration Management in an ObjectOriented Data Model. The VLDB Journal, 3:77–106, 1994. R. Snodgrass. Research Concerning Time in Databases: Project Summaries. ACM SIGMOD Record, 15(4), December 1986. R.T. Snodgrass. The Temporal Query Language TQuel. ACM Transactions on Database Systems, 12(2):247–298, June 1987. R.T. Snodgrass. Temporal Databases. In Theories and Methods of SpatioTemporal Reasoning in Geographic Space, pages 22–64. Springer-Verlag, LNCS 639, 1992. R. Snodgrass. Temporal Object-Oriented Databases: A Critical Comparison. In W. Kim, editor, Modern Database Systems: The Object Model, Interoperability and Beyond, pages 386–408. Addison-Wesley/ACM Press, 1995. M.D. Soo. Bibliography on Temporal Databases. ACM SIGMOD Record, 20(1):14–23, 1991. M. Stonebraker, L.A. Rowe, and M. Hirohama. The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering, 2(1):125–142, March 1990. R. Stam and R. Snodgrass. A Bibliography on Temporal Databases1. IEEE Database Engineering, 7(4):231–239, December 1988. V.J. Tsotras and A. Kumar. Temporal Database Bibliography Update. ACM SIGMOD Record, 25(1):41–51, March 1996. G. Wuu and U. Dayal. A Uniform Model for Temporal Object-Oriented Databases. In Proc. 8th Int’l. Conf. on Data Engineering, pages 584–593, Tempe, USA, February 1992. K. Wilkinson, P. Lyngbaek, and W. Hasan. The Iris Architecture and Implementation. IEEE Transactions on Knowledge and Data Engineering, 2(1):63–75, March 1990.
An Object-Oriented Framework for Temporal Data Models Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron Laboratory for Database Systems Research Department of Computing Science University of Alberta Edmonton, Alberta, Canada T6G 2H1 {iqbal,ozsu,duane}@cs.ualberta.ca
A b s t r a c t . Most of the database research on modeling time has con-
centrated on the definition of a particular temporal model and its incorporation into a (relational or object) database management system. This has resulted in quite a large number of different temporal models, each providing a specific set of temporal features. Therefore, the first step of this work is a design space for temporal models which accommodates multiple notions of time, thereby classifying design alternatives for temporal models. The design space is then represented by exploiting object-oriented features to model the different aspects of time. An object-oriented approach allows us to capture the complex semantics of time by representing it as a basic entity. Furthermore, the typing and inheritance mechanisms of object-oriented systems allow the various notions of time to be reflected in a single framework. The framework can be used to accommodate the temporal needs of different applications, and derive existing temporal models by making a series of design decisions through subclass specialization. It can also be used to derive a series of new more general temporal models that meet the needs of a growing number of emerging applications. Furthermore, it can be used to compare and analyze different temporal object models with respect to the design dimensions.
1
Introduction
The ability to model the temporal dimension of the real world is essential for m a n y applications such as econometrics, banking, inventory control, medical records, real-time systems, multimedia, airline reservations, versions in C A D / C A M applications, statistical and scientific applications, etc. Database m a n a g e m e n t systems (DBMSs) that support these applications have to be able to satisfy temporal requirements. To a c c o m m o d a t e the temporal needs of different applications, there has been extensive research activity on temporal d a t a models in the last decade Sno86,SS88,Soo91,Kli93,TK96. Most of this research has concentrated on the O. Etzion, S. Jajodia, and S. Sripada (Eds,): Temporal Databases- Research and Practice
LNCS 1399, pp. 1-35, 1998.
(~) Springer-Verlag Berlin Heidelberg 1998
2
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
definition of a particular temporal model and its incorporation into a (relational or object-oriented) database management system (DBMS). The early research on temporal data models concentrated on extending the relational data model to handle time in an appropriate manner. The notion of time, with its multiple facets, is difficult (if not impossible) to represent in one single relational model since it does not adequately capture data or application semantics. This is substantiated by most of the relational temporal models that only support a discrete and linear model of time. The general limitation of the relational model in supporting complex applications has led to research into next-generation data models, specifically object data models. The research on temporal models has generally followed this trend. Temporal object models can more accurately capture the semantics of complex objects and treat time as a basic component. There have been many temporal object model proposals (for example, RS91,SC91,WD92,KS92,CITB92,BFG97). These models differ in the functionality that they offer, however as in relational systems, they assume a set of fixed notions of time. Wuu & Dayal WD92 provide an abstract time type to model the most general semantics of time which can then be subtyped (by the user or database designer) to model the various notions of time required by specific applications. However, this requires significant support from the user, including specification of the temporal schema. Both (relational and object-oriented) approaches have led to the definition and design of a multitude of temporal models. Many of these assume a set of fixed notions about time, and therefore do not incorporate sufficient functionality or extensibility to meet the varying temporal requirements of today's applications. Instead, similar functionality is re-engineered every time a temporal model is created for a new application. Although most temporal models were designed to support the temporal needs of a particular application, or group of similar applications, if we look at the functionality offered by the temporal models at an abstract level, there are notable similarities in their temporal features: - Each temporal model has one or more temporal primitives, namely, time instant, time interval, time span, etc. The discrete or the continuous domain is used by each temporal model as a temporal domain over the primitives. Some temporal models require their temporal primitives to have the same underlying granularity, while others support multiple granularities and allow temporal primitives to be specified in different granularities. - Most temporal models support a linear model of time, while a few support a branching model. In the former, temporal primitives are totally ordered, while in the latter they have a partial order defined on them. All temporal models provide some means of modeling historical information about real-world entities and/or histories of entities in the database. Two of the most popular types of histories that have been employed are valid and transaction'time histories Sno87, respectively. -
-
These commonalities suggest a need for combining the diverse features of time under a single infrastructure that is extensible and allows design reuse. In this pa-
An Object-Oriented Framework for Temporal Data Models
3
per, we present an object-oriented framework JF88 that provides such a unified infrastructure. An object-oriented approach allows us to capture the complex semantics of time by representing it as a basic entity. Furthermore, the typing and inheritance mechanisms of object-oriented systems directly enable the various notions of time to be reflected in a single framework. The objectives of this work are fourfold. Tile first objective is to identify the design dimensions that span the design space for temporal models. This will classify design alternatives for temporal models. The design space is then represented by exploiting object-oriented features to model the different aspects of time. The second objective is to show how the temporal framework can be tailored to accommodate real-world applications that have different temporal needs. The third objective is to show how the various existing temporal object models can be represented within this framework. The final objective is to use the framework to analyze and compare the different temporal object models based on the design dimensions. In particular, the RS91,SC91,KS92,PM92,CITB92,BFG97 temporal object models are considered. The work of Wuu &: Dayal WD92 and Cheng & Gadia CG93 (which follows a similar methodology as WD92) are not considered since they do not provide concrete notions of time in their models. Object models supporting versioning using time usually follow a structural embedding of temporality within type definitions KGBW90,WLH90,SRH90, Sci94. Thus, the notion of temporal objects is lost since the model knows nothing about temporality. Moreover, most temporal version models use the Bate function call which is provided by the system. For example, though the EXTRAV version model Sci94 supports "valid" and "transaction" time, it does so by timestamping attributes using system provided dates. This is limited in scope as no semantics of the various notions of time are provided. Since these models are not "temporal object models" in the strict sense of the term, we do not include them in this study. We can draw a parallel between our work and similar (albeit on a much larger scale) approaches used in Choices C JR87 and cmec ATGL96. Choices is a framework for operating system construction which was designed to provide a family of operating systems that could be reconfigured to meet diverse user~application requirements, cmcc is an optimizing compiler that makes use of frameworks to facilitate code reuse for different modules of a compiler. Similar to Choices and cmcc, the temporal framework presented in this paper can be regarded as an attempt to construct a family of temporal models. The framework can then be tailored to reflect a particular temporal model which best suits the needs of an application. A particular temporal model would be one of the many "instances" of the framework. The presentation of this paper is divided into five sections. Section 2 presents the temporal framework by identifying the design dimensions (key abstractions) for temporal models and the interactions between them. Section 3 illustrates how the temporal framework can be tailored to accommodate the temporal needs of different applications, and the temporal features of temporal object models. In Section 4 object-oriented techniques are used to compare and analyze different
4
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
temporal object models with respect to the design dimensions. Section 5 summarizes the work presented in this paper, discusses related work, and outlines avenues for future research.
2
The Architecture of the Temporal Framework
In order to accommodate the varying requirements that many applications have for temporal support, we first identify the design dimensions that span the design space for temporal models. Next, we identify the components or features of each design dimension. Finally, we explore the interactions between the design dimensions in order to structure the design space. These steps produce a framework which consists of abstract and concrete object types, and properties (abstractions of methods and attributes in traditional object-oriented terminology). The types are used to model the different design dimensions and their corresponding components. The properties are used to model the different operations on each component, and to represent the relationships (constraints) between the design dimensions. The framework classifies design alternatives for temporal models by providing types and properties that can be used to define the semantics of many different specific notions of time.
2.1
Design Dimensions
The design alternatives for temporal models can be classified along four design dimensions: 1. Temporal S t r u c t u r e -
provides the underlying ontology and domains for
time. 2. Temporal R e p r e s e n t a t i o n - provides a means to represent time so that it is
human readable. 3. Temporal Order - gives an ordering to time. 4. Temporal H i s t o r y - allows events and activities to be associated with time.
There are two parts to the description of a design dimension. First, we define a set of temporal features that the design dimension encompasses. Second, we explore relationships between the temporal features and describe the resulting design space for the design dimension. The design space consists of an architectural overview of abstract and concrete types corresponding to the temporal features, and a design overview which describes some of the key properties (operations) defined in the interface of the types. We do not describe the properties in detail since many of these are traditional temporal operations that have already appeared in the literature on temporal databases. We assume the availability of commonly used object-oriented features a t o m i c entities (reals, integers, strings, etc.); types for defining common features of objects; properties (which represent methods and instance variables) for specifying the semantics of operations that may be performed on objects; classes
An Object-Oriented Framework for Temporal Data Models
5
which represent the extents of types; and collections for supporting general heterogeneous groupings of objects. In this paper, a reference prefixed by "T_" refers to a type, and "P_" to a property. A type is represented by a rounded box. An abstract type is shaded with a black triangle in its upper left corner, while a concrete type is unshaded. In Figures 5, 8, 9, and 15 the rectangular boxes are objects. Objects have an outgoing edge for each property applicable to the object which is labeled with the name of the property and which leads to an object resulting from the application of the property to the given object. A circle labeled with the symbols ~ } represents a container object and has outgoing edges labeled with "E" to each member object. T e m p o r a l S t r u c t u r e The first question about a temporal model is "what is its underlying temporal structure?" More specifically, what are the temporal primitives supported in the model, what temporal domains are available over these primitives, and what is the temporal determinacy of the primitives? Indeed, the temporal structure dimension with i~s various constituents forms the basic building block of the design space of any temporal model since it is comprised of the basic temporal features that underlie the model. We now give an overview of the features of a temporal structure and then identify the relationships that exist between them. Components 1. T e m p o r a l P r i m i t i v e s Temporal primitives can either be anchored (absolute) or unanchored (relative) Sno92. For example, 31 July 1995 is an anchored temporal primitive since we know exactly where it is located on the time axis, whereas 31 days is an unanchored temporal primitive since it can stand for any block of 31 consecutive days on the time axis. There is only one unanchored primitive, called the span. A span is a duration of time with a known length, but no specific starting and ending anchor points. There are two anchored primitives, the instant (moment, chronon) and the interval. An instant is a specific anchored moment in time, e.g., 31 July 1995. An interval is a duration of time between two specific anchor points (instants) which are the lower and upper bounds of the interval, e.g., 15 June 1995, 31 July 1995. 2. T e m p o r a l D o m a i n The temporal domain of a temporal structure defines a scale for the temporal primitives. A temporal domain can be continuous or discrete. Discrete domains map temporal primitives to the set of integers. That is, for any temporal primitive in a discrete time domain, there is a unique successor and predecessor. Continuous domains map temporal primitives to the set of real numbers. Between any two temporal primitives of a continuous time domain, another temporal primitive exists. Most of the research in the context of temporal databases has assumed that the temporal domain is discrete. Several arguments in favor of using a discrete temporal domain are made by Snodgrass Sno92 including the
6
Iqbal A. Goralwalla, M. Tamer C}zsu, and Duane Szafron
imprecision of clocking instruments, compatibility with natural language references, possibility of modeling events which have duration, and practicality of implementing a continuous temporal data model. However, Chomicki Cho94 argues that the continuous (dense) temporal domain is very useful in mathematics and physics. Furthermore, continuous time provides a useful abstraction if time is thought of as discrete but with instants that are very close. In this case, the set of time instants may be very large which in turn may be difficult to implement efficiently. Chomicki further argues that query evaluation in the context of constraint databases KKR90,Revg0 has been shown to be easier in continuous domains than in discrete domains. Continuous temporal domains have also been used to facilitate full abstract semantics in reasoning about concurrent programs BKP86. 3. T e m p o r a l D e t e r m i n a c y There are many real world cases where we have complete knowledge of the time or the duration of a particular activity. For example, the time interval allowed for students to complete their Introduction to Database Management Systems examination is known for certain. This is an example of a determinate temporal primitive. However, there are cases when the knowledge of the time or the duration of a particular activity is known only to a certain extent. For example, we do not know the exact time instant when the Earth was formed though we may speculate on an approximate time for this event. In this case, the temporal primitive is indeterminate. Indeterminate temporal information is also prevalent in various sources such as granularity, dating techniques, future planning, and unknown or imprecise event times DS93. Since the ultimate purpose of a temporal model is to represent real temporal information, it is desirable for such a model to be able to capture both determinate and indeterminate temporal primitives. D e s i g n S p ace Figure 1 shows the building block hierarchy of a temporal structure. The basic building block consists of anchored and unanchored temporal primitives. The next building block provides a domain for the primitives that consists of discrete or continuous temporal primitives. Finally, the last building block of Figure 1 adds determinacy. Thus, a temporal structure can be defined by a series of progressively enhanced temporal primitives. Figure 2 gives a detailed hierarchy of the different types of temporal primitives that exist in each of the building blocks of Figure 1. Based on the features of a temporal structure, its design space consists of 11 different kinds of temporal primitives. These are the determinacy-domain-based temporal primitives shown in Figure 2 and described below. C o n t i n u o u s t i m e i n s t a n t s a n d intervals. Continuous instants are just points on the (continuous) line of all anchored time specifications. They are totally ordered by the relation "later than." Since in theory, continuous instants have infinite precision, they cannot have a period of indeterminacy. Therefore, continuous indeterminate time instants do not
An Object-Oriented Framework for Temporal Data Models
7
Determinacy-Domain-based I I Domain-base6 Temporal I Temporal Primitives Primitives Temporal
~.=_;,.. ^~ v,,,,,,t,vu=
I i _ I '
+
Ij
'r determinacy/ ~, indeterminacy discrete/continuous domain W-+
Fig. 1. Building a Temporal Structure
exist in Figure 2. However, continuous intervals can be determinate or indeterminate. The difference between them is that a continuous determinate interval denotes that the activity associated with it occurs during the whole interval, while a continuous indeterminate interval denotes that the activity associated with it occurs sometime during the interval. Continuous intervals have lower and upper bounds which are continuous instants. Discrete t i m e i n s t a n t s a n d i n t e r v a l s . Assume that somebody has been on a train the whole day of 5 January 1997. This fact can be expressed using a determinate time instant 5 January 1997get (which means the whole day oJ). However, the fact that somebody is leaving for Paris on 5 January 1997 can be represented as an indeterminate time instant 5 January 1997indet (which means some time on that day). Hence, each discrete time instant is either determinate or indeterminate, corresponding to the two different interpretations. Discrete time instants are analogous to continuous time intervals. Every determinate (indeterminate) discrete time instant has a granularity (Gi) associated with it. This granularity determines the mapping of the given determinate (indeterminate) discrete time instant Idet (Iindet) to the domain of continuous time instants. The mapping is defined as follows:
Idet ~ Icont, Icont + Gi) ~naet ~-~ cont "~ Icont + G~) Here Icont denotes the counterpart of Idet and Iindet in the domain of continuous time instants. This is exemplified by the mapping of the discrete determinate instant 5 January 1997det to the continuous determinate interval 5 January 1997cont, 6 January 1997eont). In this case Gi = Gdays = 1 day. A formal treatment of the different types of instants and mappings is given in GL(}S97. Discrete time instants can be used to form discrete time inte~als. Since we have determinate and indeterminate discrete instants, we also have determinate and indeterminate discrete intervals. Determinate (indeterminate) time instants can be used as boundaries of determinate (indeterminate) time intervals. T i m e s p a n s . Discrete and continuous determinate spans represent complete information about a duration of time. A discrete determinate span
8
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron Temporal Structure Design Space
........................... Determlnacy-Oornain-lbased TemporalPrimitives ~.: DeterminateDiscreteInstants i IndeterminateDiscreteInstarCs
Domain-baNd Temporal Primitives
,- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i TemporalPrimittvm
~ i Z
',
: AnchoredPrimitives
', /
~
\
~
i ', TemporalStructure/
', " "
DiscreteInstants ~
', ContinuousInstants ',
' = DeterminateContinuousInstants
', ', ', ', ', ' DeterminateDiscrete Intewals i : DiscretelateP~Lsi,~--~ ~--~"~" : , : , IndeterminateDiscreteIntervals Z ' "
'ate~
\
,
'~
\
i
:
liunanchored Pnmttwes~ * ~ " -
i
i
\
9- . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i
- - - - - ~
,~--~, ',
~
IndetacminateContinuousIntervals DeterminateDiscreteSpans
i~ ~
: ?:::?;
~
IndeterminateDiscrete Spans DeterminateContinuousSpans
, et--,oate nt,o = s ns
Fig. 2. Design Space of a Temporal Structure
is a s u m m a t i o n of distinct granularities with integer coefficients e.g., 5 days or 2 months § 5 days. Similarly, a continuous determinate span is a s u m m a t i o n of distinct granularities with real coefficients e.g., 0.31 hours or 5.2 minutes + 0.15 seconds. Discrete and continuous indeterminate spans represent incomplete information about a duration of time. T h e y have lower and upper bounds t h a t are determinate spans. For example, 1 day ~ 2 days is a discrete indeterminate span t h a t can be interpreted as '% time period between one and two days." The mapping of the temporal structure to an object type hierarchy is given in Figure 3 which shows the types and generic properties t h a t are used to model various kinds of determinacy-domain-based temporal primitives. Properties defined on time instants allow an instant to be compared with another instant; an instant to be subtracted from another instant to find the time duration between the two; and a time span to be added to or subtracted from an instant to return another instant. Furthermore, properties P_calendar and P_ealElements are used to link time instants to calendars which serve as a representational scheme for temporal primitives (see Section 2.1). P_calendar returns the calendar which the instant belongs to and P_calElements returns a list of the calendric elements in a time instant. For example P_calendar applied to the time instant 15 June 1995 would return
An Object-Oriented Framework for Temporal Data Models
~
~ ~"
~
/
P_before
f
/
/
ka ~
/ T_detDisclnterval I
/
P_subDuration X
----------~1
k --
.~.~ T_indetDisclnterval ) (
J ~
P overlaps, Pduring P_union P intersection P_difference
~
r
f
~ /
Pj,d, Pjutmaa P calGranularities
1 P..succ,P_pred
t I P...... P~pred
I P_.dO.r,,.on \ 9
9
"~t- ' j . . . . - _prea. f
~
~1 ~
'
~Plb, Pub
"I" detContSpan l (
~ .....
~,T._indetContSpan J ........
Supe~pe ,
s.btype
Fig. 3. The Inheritance Hierarchy of a Temporal Structure
Gregorian, while the application of P_calElements to the same time instant would return (1995, June, 15). Similarly, properties defined on time intervals include unary operations which return the lower bound, upper bound and length of the interval; ordering operations which define Allen's interval algebra All84; and set-theoretic operations. Properties defined on time spans enable comparison and arithmetic operations between spans. The P_before and P_after properties are refined for time spans to model the semantics of < and >, respectively. Additionally, properties P_coe~cient and P_calGranularities are used as representational properties and provide a link between time spans and calendars (see Section 2.1). P_coeIt~cient returns the (real) coefficient of a time span given a specific calendric granularity. For example, (5 days). P_coe~cient(day) returns 5.0. P_calGranularities returns a collection of calendric granularities in a time span. For example, the property application (1 month + 5 days). P_calGranularities returns {day, month}.
10
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron We note that (see Figure 3) the properties P_succ and P_pred are defined in all the types involving both discrete instant and span primitives. This redundancy can be eliminated by refactoring the concerned types and using multiple inheritance. More specifically, an abstract type called T_discrete can be introduced~ and the properties P_succ and and P_pred defined on it. All the types involving discrete primitives can then be made subtypes of T_discrete. A similar approach can be used to factor the types that define properties P_lb and P_ub. An abstract type called T_bounds can be introduced with the properties P_lb and P_ub defined on it. The T_interval type and the types involving indeterminate spans can then be made subtypes of T_bounds. Alternatively, the concept of multiple subtyping hierarchies can be used to collect semantically related types together and avoid the duplication of properties HKOS96. For example, the unanchored primitives hierarchy can be re-structured as shown in Figure 4. P_succ,P~red
Supertype
SubtypeD
Fig. 4. Multiple Subtyping Hierarchy for Unanchored Temporal Primitives
Temporal Representation Components. For human readability, it is important to have a representational scheme in which the temporal primitives can be made human readable and usable. This is achieved by means of calendars. Common calendars include the Gregorian and Lunar calendars. Educational institutions also use Academic calendars. Calendars are comprised of different time units of varying granularities that enable the representation of different temporal primitives. In many applications, it is desirable to have multiple calendars that have different calendric granularities. For example, in financial trading, multiple calendars with different time units and operations need to be available to capture the semantics of financial data CS93,CSS94. In time series management, extensive calendar support is also required DDS94,LEW96. A calendar should be able to support multiple granularities since temporal information processed by a DBMS is usually available in multiple granularities. Such information is prevalent in various sources. For example:
An Object-Oriented Framework for Temporal Data Models
11
clinical data - Physicians usually specify temporal clinical information for patients with varying granularities CPP95,CPP96. For example, "the patient suffered from abdominal pain for 2 hours and 20 minutes on June 15, 1996," "in 1990, the patient took a calcium antagonist for 3 months," "in October 1993, the patient had a second heart seizure." - real-time s y s t e m s - A process is usually composed of sub-processes that evolve according to times that have different granularities CMR91. For example, the temporal evolution of the basin in a hydroelectric plant depends on different sub-processes: the flow of water is measured daily; the opening and closing of radial gates is monitored every minute; and the electronic control has a granularity of microseconds. - geographic i n f o r m a t i o n s y s t e m s - Geographic information is usually specified according to a varying time scale Flo91. For example, vegetation fluctuates according to a seasonal cycle, while temperature varies daily. - office i n f o r m a t i o n s y s t e m s - temporal information is available in different time units of the Gregorian calendar BP85,CR88,MPB92. For example, employee wages are usually recorded in the time unit of hours while the history of sales are categorized according to months. D e s i g n Space. A calendar is composed of an origin, a set of calendric granularities, and a set of conversion functions. The origin marks the start of a calendar 1. Calendric granularities define the reasonable time units (e.g., m i n u t e , day, month) that can be used in conjunction with this calendar to represent temporal primitives. A calendric granularity also has a list of calendric elements. For example in the Gregorian calendar, the calendric granularity day has the calendric elements Sunday, Monday, . . . , Saturday. Similarly in the Academic calendar, the calendric granularity s e m e s t e r has the calendric elements Fall, Winter, Spring, and S u m m e r . The conversion functions establish the conversion rules between calendric granularities of a calendar. Since all calendars have the same structure, a single type, called T_calendar can be used to model different calendars, where instances represent different calendars. The basic properties of a calendar are, P_origin, P_calGranularities, and P J u n c t i o n s . These allow each calendar to define its origin, calendric granularities, and the conversion functions between different calendric granularities. -
E x a m p l e 1. Figure 5 shows four instances of T_calendar -- the G r e g o r i a n , Lunar, A c a d e m i c , and Fiscal calendars. The origin of the Gregorian calendar is given as the span 1582 y e a r s from the start of time since it was proclaimed in
1 We note that our definition of a calendar is different from that defined in CS93,CSS94,LEW96 where structured collections of time intervals are termed as "calendars." Our definition adheres closely to the human understanding of a calendar. However, the extensibility feature of the framework allows any other notions of calendars to be incoporated easily under the temporal representation design dimension.
12
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
1582 by Pope Gregory XIII as a reform of the Julian calendar. The calendric granularities in the Gregorian calendar are the standard ones, year, m o n t h , day, etc. The origin of the Academic calendar shown in Figure 5 is assumed to be the span 1908 academicYears having started in the year 1908, which is the establishment date of the University of Alberta. The Academic calendar has similar calendric granularities as the Gregorian calendar and defines a new calendric granularity of s e m e s t e r . The semantics of the Lunar and Fiscal calendars could similarly be defined.
academlcYear
1908years I
I semester
academicMonth
~
P_origin
E ---"
P_,unctions~
' Academic JI u !
FIscal
~ ..... (T_calendar)- .... -c.
Lunar
u u i
Gregorian
I P_functions>~
P_calGranularities 1582years
(
Fig. 5. Temporal Representational Examples
T e m p o r a l O r d e r We now have the means of designing the temporal structure and the temporal representation of a temporal model. The next step is to provide an ordering scheme for the temporal primitives. This constitutes the third building block of our design space. C o m p o n e n t s . A temporal order can be classified as being linear or branching In a linear order, time flows from past to future in an ordered manner. In
An Object-Oriented Framework for Temporal Data Models
13
a branching order, time is linear in the past up to a certain point, when it branches out into alternate futures. The structure of a branching order can be thought of as a tree defining a partial order of times. The trunk (stem) of the tree is a linear order and each of its branches is a branching order. The linear model is used in applications such as office information systems9 The branching order is useful in applications such as computer aided design and planning or version control which allow objects to evolve over a nonlinear (branching) time dimension (e.g., multiple futures, or partially ordered design alternatives). D e s i g n Space. The different types of temporal orders are dependent on each other. A sub-linearorder is one in which the temporal primitives (time intervals) are allowed to overlap, while a linearorder is one in which the temporal primitives (time intervals) are not allowed to overlap9 Every linear order is also a sub-linear order. A branching order is essentially made up of sub-linear orders. The relationship between temporal orders is shown in Figure 6.
9
sub-Linear
Order Temporal Order
js-a Linear Order
I composed-of Branching Order
Fig. 6. Temporal Order Relationships
The hierarchy in Figure 7 gives the various types and properties which model different temporal orders 2.
~T T_temporalOrder
T_linearOrder
~ P_branchingOrder
P_temporalPrimitives
Supertype
subLinearO~er )----~(
T branchingOrder )
P__root Pbranches Pin
Subtype
Fig. 7. The Hierarchy of Temporal Orders
We do not consider periodictemporal orders in this work. These can easily be incorporated as a subtype of T_temporalOrder.
14
Iqbal A. Goralwalla, M. Tamer (gzsu, and Duane Szafron
Example 2. Consider the operations that take place in a hospital on any particular day. It is usually the case that at any given time multiple operations are taking place. Let us assume an eye cataract surgery took place between 8am and 10am, a brain tumor surgery took place between 9am and 12pm, and an open heart surgery took place between 7am and 2pm on a certain day. Figure 8 shows a pictorial representation of operationsOrder, which is an object of type T_subLinear0rder. operationsOrder consists of the time intervals 08:00,10:00, 09:00,12:00, 07:00,14:00, and does not belong to any branching timeline. As seen in the figure, operationsOrder consists of intervals (representing the time periods during which the different surgeries took place) that overlap each other. Hence, operationsOrder is an example of a sub-linear order.
operationsOrderP_branchingOrder~ ~P_temporalPrimitives
08:00,10: 800~ 09:00,12:00 I ~ 07:00,14:00 Fig. 8. An Example of a Sub-Linear Order.
Example 3. To illustrate the use of objects of type T_linearOrder which are total linear temporal orders, consider a patient with multiple pathologies, for example as a result of diabetes. The patient has to attend several special clinics, each on a different day. Hence, it follows that since the patient cannot attend more than one special clinic on any day, the temporal order of the patient's special clinics visit history is linear and totally ordered. Suppose the patient visited the opthalmology clinic on 10 January 1995, the cardiology clinic on 12 January 1995, and the neurology clinic on 3 February 1995. Figure 9 shows a pictorial representation of specialClinicOrder, which is an object of type T _ l i n e a r 0 r d e r . As seen in the figure, speciaiClinicOrder is totally ordered as its time intervals do not overlap. Example 4. Consider an observational pharmacoeconomic analysis of the changing trends, over a period of time, in the treatment of a chronic illness such as asthma GC)$97. The analysis would be performed using information gathered over a time period. At a fixed point during this period new guidelines for the treatment of asthma were released. At that point the population of patients
An Object-Oriented Framework for Temporal Data Models
I specialClinicOrder
15
IP-branchingOrder~-~
P_temporalPrimitives
()
Fig. 9. An Example of a Linear Order.
known to have asthma are divided into those whose doctors continue the old established treatment, and those whose doctors, in accordance with new recommendations, change their treatment. Thus, the patients are divided into two groups, each group undergoing a different treatment for the same illness. The costs and benefits accrued over the trial period for each treatment are calculated. Since such a study consists of several alternative treatments to an illness, a branching timeline is the natural choice for modeling the timeline of the study. The point of branching is the time when the new guidelines for the treatment of the illness are implemented. Figure 10 shows the branching timeline for such a medical trial history.
(time when new guidelines
are
released)
Fig. 10. An Example of a Branching Order.
The same branching timeline could as easily handle the situation where different versions of a particular treatment, say Treatment A, are implemented based on certain parameters. In this case, the "Treatment A" branch would in turn branch at a certain point into different Treatment A versions. This situation is also depicted in Figure 10.
Temporal H i s t o r y
So far we have considered the various features of time; its structure, the way it is represented, and how it is ordered. The final building
16
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
block of the design space of temporal models makes it possible to associate time with entities to model different temporal histories. C o m p o n e n t s . One requirement of a temporal model is an ability to represent and manage real-world entities as they evolve over time and assume different states (values). The set of these values forms the temporal history of the entity. Two basic types of temporal histories are considered in databases which incorporate time. These are valid and transaction time histories SA85. Valid time denotes the time when an entity is effective (models reality), while transaction time represents the time when a transaction is posted to the database. Usually valid and transaction times are the same. Other temporal histories include event time RS91,CK94 and decision time EGS93 histories. Event (decision) time denotes the time the event occured in the reM-world. Valid, transaction, and event times have been shown to be adequate in modeling temporal histories CK94. Design Space. Since valid, transaction, and event time histories have different semantics, they are orthogonal. Figure 11 shows the various types that could be used to model these different histories. A temporal history consists of objects and their associated timestamps.
P_history
T_history
~
0
) P_temporalOrder P_insert P_remove bjects
( T_validHistory; IT_transactionHistory1 I "l_eventHistory) Fig. 11. The Types and Properties for Temporal Histories
Property P_history defined in T_history returns a collection of all timestamped objects that comprise the history. A history object also knows the temporal order of its temporal primitives. The property P_temporalOrder returns the temporal order (which is an object of type T_temporal0rder) associated with a history object. The temporal order basically orders the time intervals (or time instants) in the history. Another property defined on history objects, P_insert, timestamps and inserts an object in the history. Property P_remove drops a given object from the history at a specified temporal primitive. The P_getObjects property allows the user to get the objects in the history at (during) a given temporal primitive. The properties defined on T_history are refined in T_validHistory, T _ t r a n s a c t i o n H i s t o r y , and T_eventHistory types to model the semantics of the different kinds of histories. Moreover, each history type can define additional properties, if nec-
An Object-Oriented Framework for Temporal Data Models
17
essaxy, to model its particular semantics. The clinical example described in Section 3.1 illustrates the use of the properties defined on T_history.
2.2
Relationships between Design Dimensions
In the previous section we described the building blocks (design dimensions) for temporal models and identified the design space of each dimension. We now look at the interactions between the design dimensions. This will enable us to put the building blocks together and structure the design space for temporal models. A temporal history is composed of entities which are ordered in time. This temporal ordering is over a collection of temporal primitives in the history, which in turn are represented in a certain manner. Hence, the four dimensions can be linked via the "has-a" relationship shown in Figure 12.
TemporalModelOesignSpace ~ _ ~ Temporal History
Valid Transaction Event
hasI Temporal O r ~ r
~
|
has~ Temporal Structure
has
~ - - sub-Linear Linear ~ Branching
- -
Determinate Discrete Instants
-~
Indeterminate Discrete Instants Determinate Continuous Instants
--
Determinate Discrete Intervals
--
Indeterminate Discrete Intervals
---
Determinate Continuous Intervals Indeterminate Continuous Intervals
--
Determinate Discrete Spans
--
Indeterminate Discrete Spans
--
Determinate Continuous Spans
--
Indeterminate Continuous Spans
Gregorian Temporal Ropm~nmtian
i
Academic Business Financial
Fig. 12. Design Space for Temporal Models
18
Iqbal A. Goralwalla, M. Tamer C)zsu, and Duane Szafron
Basically, a temporal model can be envisioned as having a notion of time, which has an underlying temporal structure, a means to represent the temporal structure, and different temporal orders to order the temporal primitives within a temporal structure. This notion of time, when combined with application objects can be used to represent various temporal histories of the objects in the temporal model. Figure 12 gives the design space for temporal models. A temporal model can support one or more of valid, transaction, event, and user-defined histories. Each history in turn has a certain temporal order. This temporal order has properties which are defined by the type of temporal history (linear or branching). A linear history may or may not allow overlapping of anchored temporal primitives that belong to it. If it does not allow overlapping, then such a history defines a total order on the anchored temporal primitives that belong to it. Otherwise, it defines a partial order on its anchored temporal primitives. Each order can then have a temporal structure which is comprised of all or a subset of the 11 different temporal primitives that are shown in Figure 2. Finally, different calendars can be defined as a means to represent the temporal primitives. The four dimensions are modeled in an object system by the respective types shown in Figure 13. The "has a" relationship between the dimensions is modeled using properties as shown in the figure. An object of T_ternporalHistory represents a temporal history. Its temporal order is obtained using the P_tempora1Order property. A temporal order is an object of type T_temporali3rder and has a certain temporal structure which is obtained using the P_temporaIPrimitives property. The temporal structure is an object of type T _ t e m p o r a l S t r u c t u r e . The property P_calendar gives the instance of T_calendar which is used to represent the temporal structure.
~ T temporalFramework 1
I T_calendar ) ~rT_tePOalStructure) ~T_temporalOrderI ~T_temporalHistory 1 P_calendar
P_temporaIPrimitives
P_temporatOrder
Fig. 13. Relationships between Design Dimensions Types
The relationships shown in Figure 13 provide a temporal framework which encompasses the design space for temporal models. The detailed type system, shown in Figure 14, is based on the design dimensions identified in Section 2 and their various features which are given in Figures 3, 7, and 11. As described in Section 2.1, refactoring of types and multiple inheritance can be used to handle identical properties that are defined over different types in the inheritance
An Object-Oriented Framework for Temporal Data Models
19
hierarchy shown in Figure 14. The framework can now be tailored for the temporal needs of different applications and temporal models. This is illustrated in Section 3.
~ 'cus
P_~red I_OOtDlSelnstant
\
Pjucc. p~red
p calElements /PjddD~ra~ion
~_cletDisclnterval1
P subD.~a,~
P after
detContlnstant
P lb, pub, P_leng~h P overlaps, P durin$ _ etCont|r~te~tal P_starts, P~ni~les, P_meets P uniaa ~-P_inter-~ctz~ IndetContlnterval1 P di~rer, ce
~1~temporalFramewe~ ,~ P_calGranularities
PJb, P_ub
3//
P_branchingOrder
temporalOrder K
Su~,~pe
=
\ P temporaIPrimitives
~.(
I
.~
T - history Phistory
~
P - temporalOrder
~
P insert P_remove e~etObjects
"l P branches
)
~
~
~ T__transactionHistory1 T .... tHistoryl J Subtype
Fig. 14. The Inheritance Hierarchy for the Temporal Framework
3
Tailoring the Temporal Framework
In this section, we illustrate how the temporal framework that is defined in Section 2 can be tailored to accommodate applications and temporal models
20
Iqbal A. Goralwalla, M. Tamer (~zsu, and Duane Szafron
TemporalHistory tlmeStampedMIcrobiology ItlmeStampedHematologyl ~
tlmeStampedHematology2
(aBIoodTest,aTImeStamp)~ , . .
P_insert
P"N"
T'??m?
~
TP history j x ~ j ,
.~
timestampedbloodtests ,?me??m?! ..........
P_temporalOrder
P
'- .......................
: i
~
;_-;Z:L;;4.;;&-,:I
r ~ I '5~a~ .
.
.
.
.
.
..............................
~ F .
.
.
kLSJ ~ q20,e~r..~m,,5 I .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
I P_funaions
TemporalRepresentation
1582years
..............................~
-
~
Fig. 15. A Patient's Blood Test History
which have different temporal requirements. In the first two sub-sections, we give examples of two real-world applications that have different temporal needs. In the last sub-section, we give an example of a temporal object model and show how the model can be derived from the temporal framework.
3.1
Clinical D a t a M a n a g e m e n t
In this section we give a real-world example from clinical data management that illustrates the four design dimensions and the relationships between them which were discussed in Section 2. During the course of a patient's illness, different blood tests are administered. It is usually the case that multiple blood tests of the patient are carried out on the same day. Suppose the patient was suspected of having an infection of the blood, and therefore had two different blood tests on 15 January 1995. These were the diagnostic hematology and microbiology blood tests. As a result of a very raised white cell count the patient was given a course of antibiotics while
An Object-Oriented Framework for Temporal Data Models
21
the results of the tests were awaited. A repeat hematology test was ordered on 20 February 1995. Suppose each blood test is represented by an object of the type T_bloodTest. The valid history of the patient's blood tests can then be represented in the object database as an object of type T_validHistory. Let us call this object bloodTestHistory. To record the hematology and microbiology blood tests, the objects microbiology, hematologyl, and hematology2 with type T_bloodTest are first created and then entered into the object database using the following property applications:
bloodTestHistory.P_insert (microbiology, 15 January 1995) bloodTestHistory.P_insert(hematologyl, 15 January 1995) bloodTestNistory.P_insert (hematology2, 20 February 1995) If subsequently there is a need to determine which blood tests the patient took in January 1995, this would be accomplished by the following property application:
bloodTestHistory.P_getObjeets(1 January 1995, 31 January 1995) This would return a collection of timestamped objects of T_bloodTest representing all the blood tests the patient took in January 1995. These objects would be the (timestamped) hematologyl and the (timestamped) microbiology. Figure 15 shows the different temporal features that are needed to keep track of a patient's blood tests over the course of a particular illness. The figure also illustrates the relationships between the different design dimensions of the temporal framework. The patient has a blood test history represented by the object bloodTestHistory. The P_history property when applied to bloodTestHistory results in a collection object whose members are the timestamped objects timeStampedMicrobiology, timeStampedHematologyl, and timeStampedHematology2. The P_insert property updates the blood test history (bloodTestHistory) by inserting an object of type T_bloodTest at a given anchored temporal primitive. Similarly, the property P_remove updates the bloodTestHistory by removing an object of type T_bloodTest at a given anchored temporal primitive. The P_getObjects property returns a collection of timestamped blood test objects when given an anchored temporal primitive. Applying the property P_temporalOrder to bloodTestHistory results in the object bloodTestOrder which represents the temporal order on different blood tests in bloodTestHistory, bloodTestOrder has a certain temporal structure which is obtained by applying the P_temporalPrimitives property. Finally, the primitives in the temporal structure are represented using the Gregorian calendar, Gregorian and the calendric granularities year, month, and day. Let us now consider the various temporal features required to represent the different blood tests taken by a patient. Anchored, discrete, and determinate temporal primitives are required to model the dates on which the patient takes different blood tests. These dates are represented using the Gregorian calendar.
22
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Since the blood tests take place on specific days, the temporal primitives during which the patient took blood tests form a total order. Lastly, a valid time history is used to keep track of the different times the blood tests were carried out. To support these temporal features, the temporal framework can be reconfigured with the appropriate types and properties. These are given in Figure 16.
_
/ 2_~ P_origin P_caIGranularities P_functions
T _ t e m p o r a l F r a m e w o r k
T_temporalOrder~
P_lb,P_ub,P_length P_overlaps,P_during Pjtarts,P~nishes,Pmeets P_union P_intersection
~
_
9
J
'
p_difference
T_linearOrder1
P_temporaIPrimitives
~V
T_history
~-~
T validHistory1
P_history P temporalOrder P_insert
super~pe
P remoye P ~O~
subtype,.
Fig. 16. The Temporal Framework Inheritance Hierarchy for the Clinical Application
3.2
Time Series Management
The management of time series is important in many application areas such as finance, banking, and economic research. One of the main features of time series management is extensive calendar support DDS94,LEW96. Calendars map time points to their corresponding data and provide a platform for granularity conversions and temporal queries. Therefore, the temporal requirements of a time series management system include elaborate calendric functionality (which allows the definition of multiple calendars and granularities) and variable temporal structure (which supports both anchored and unanchored temporal primitives, and the different operations on them). Figure 17 shows how the temporal requirements of a time series management system can be modeled using the types and properties of the temporal
An Object-Oriented Framework for Temporal Data Models
~
23
T_OetDisclnst1ant
P calendar
T temporalStructum /
~_temporalFramework
P_before
~
\
e_lb, pub,
P length p_overlaps, P_during p_sta~s, P.finishes, Pmeets P_union
P_int~Cfi:
LT_ea.ndarI ~upertype
J\
P origin P calGranularities P Junctions
(
~~~T-detDiscSpan1 P_add,P_subtract P_coefficient P_calGranularities
Subtype=
Fig. 17. The Temporal Framework Inheritance Hierarchy for Time Series Management
framework. We note from the figure that only the temporal structure and temporal representation design dimensions are used to represent the temporal needs of a time series. This demonstrates that it is not necessary for art application requiring temporal features to have all four design dimensions in order to be accommodated in the framework. One or more of the design dimensions specified in Section 2.1 can be used as long as the design criteria shown in Figure 12 holds.
3.3
TOODM - A Temporal Object-Oriented Data Model
In this section, we identify the temporal features of Rose & Segev's temporal object-oriented data model (TOODM) RS91 according to the design dimensions described in Section 2.1, and show how these can be accommodated in the temporal framework. We specifically concentrate on TOODM since it uses object types and inheritance to model temporality. The temporal features of the rest of the reported temporal object models SC91,KS92,CITB92,PM92,BFG97 are summarized and compared in Section 4. We first give an overview of the temporal features of TOODM and then show how these features can be derived using the types and properties of our temporal framework. There is no doubt that TOODM has more functionality to offer in addition to temporality, but presenting that is beyond the scope of this work. O v e r v i e w of T e m p o r a l F e a t u r e s TOODM was designed by extending an object-oriented entity-relationship data model to incorporate temporal structures and constraints. The functionality of TOODM includes: specification and
24
II
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
Structure
II Representation
Order
I History
Primitives Domain Determinacy Anchored Continuous Determinate Gregorian Calendar Total Linear Valid Unanchored Transaction Event Table 1. Temporal Design Dimension Features of TOODM
enforcement of temporal constraints; support for past, present, and future time; support for different type and instance histories; and allowance for retro/proactive updates. The type hierarchy of the TOODM system defined types used to model temporality is given in Figure 18. The boxes with a dashed border represent types that have been introduced to model time, while the rest of the boxes represent basic types.
TST
i.........
l J
,,
"i
i ,
t ,
, ,
,.__%.,
Fig. 18. System Defined Temporal Types in TOODM
The Object type is the root of the type tree. The type V-Class is used to represent user-defined versionable classes. More specifically, if the instance variables, messages/methods, or constraints of a type are allowed to change (maintain histories), the type must be defined as a subtype of V-Class. The Ptypes type models primitive types and is used to represent objects which do not have any instance variables. Ptypes usually serve as domains for the instance variables of other objects. The Time primitive type is used to represent temporal primitives. The TP type represents time points, while the TI type represents time intervals. Time points can have specific different calendar granularities, namely Year, Month, Day, Week, Hour, Minute, and Second. The TS T type represents a time sequence which is a collection of objects ordered on time. TS IT is a parametric type with the type T representing a user or system defined type upon which a time sequence is being defined. For
An Object-Oriented Framework for Temporal Data Models
25
every time-varying attribute in a (versionable) class, a corresponding subclass (of IS T ) is defined to represent the time sequence (history) of that attribute. For example, if the salary history of an employee is to be maintained, a subclass (e.g., TS Salary ) of TS IT has to be defined so that the salary instance variable in the employee class (which is defined as a subclass of V-Class) can refer to it to obtain the salary history of a particular employee. The history of an object of type TS IT is represented as a pair , where T is the data type and TL defines the different timelines and their granularities that are associated with T. Three timelines are allowed in TOODM: valid time, record (transaction) time, and event time (the time an event occurred). Each timeline associated with an object is comprised of time points or time intervals and has an underlying granularity. R e p r e s e n t i n g t h e T e m p o r a l F e a t u r e s of T O O D M in t h e T e m p o r a l F r a m e w o r k TOODM supports both anchored and unanchored primitives. These are modeled by the Absolute and R e l a t i v e types shown in Figure 18. The anchored temporal primitives supported are time instants and time intervals. A continuous time domain is used to perceive the temporal primitives. Finally, the temporal primitives are determinate. Time points and time intervals are represented by using the Gregorian calendar with granularities Year, Month, Day, Week, Hour, Minute, and Second. Translations between granularities in operations are provided, with the default being to convert to the coarser granularity. A (presumably total) linear order of time is used to order the primitives in a temporal sequence. TOODM combines time with facts to model different temporal histories, namely, valid, transaction, and event time histories. Table 1 summarizes the temporal features (design space) of TOODM according to the design dimensions for temporal models that were described in Section 2.1. Figure 19 shows the type system instance of our temporal framework that corresponds to the TOODM time types shown in Figure 18 and described in Table 1. The Time primitive type is represented using the T_temporalStructure type. The TP and TI types are represented using the T_instant and T_interval types, respectively. Similarly, the R e l a t i v e type is represented using the T_unanchPrim type. Since TOODM supports continuous and determinate temporal primitives, the (concrete) types T_detContInstant, T_detContInterval, and T_detContSpan are used to model continuous and determinate instants, intervals, and spans, respectively. The Gregorian calendar and its different calendric granularities are modeled using the T_calendar type. Time points and time intervals are ordered using the T_linear{rder type. Time sequences represented by the TS IT type are modeled by the history types in the temporal framework. More specifically, valid time (vt), record time (rt), and event time (et) are modeled using the T_validHistory, T _ t r a n s a c t i o n H i s t o r y , and T_eventHistory types. TOODM models valid, transaction and event histories all together in one structure as shown by the TS Salary type in the previous section. Our tern-
26
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron
~T_detContlnstant 1 / P-~q,P~'~ P_elapsed P calendar P_calEleme~s
~T_detContlnter 3
~r_temporalStructure j\ P before
P_Ib, p_ub, p~'n~h
l"_,n,,ert,~s, e_du,ing pjtarts, Pjinishes, P_meets P union P_inter'~ction
\
Pafter
P_origin P_calGranularities P.functions
P_add, Pjubtmct P_coefficient P_caIGranularities
T_temporalFramework P3emporaIPrimitives
t
T hi~ory P_history P_iemporalOrder P insert Premove P_getObjects
Subtype
Fig. 19. The Temporal Framework Inheritance Hierarchy for TOODM
poral framework, however, provides different types to model valid, transaction, and event histories to allow their respective semantics to be modeled. Moreover, it uses properties to access the various components of histories. For example, to represent the valid history of an employee's salary an object of type T_validHistory is first created. The P_insert property then inserts objects of type T_integer (representing salary values) and objects of type T_interval (representing time intervals) into the salary valid history object. The transaction and event time histories of the salary are similarly represented, except in these histories the P_insert property inserts timestamps which are time instants (i.e., objects of type T_instant). 4
Comparison
of Temporal
Object
Models
In this section we use the temporal framework to compare and analyze the temporal object models RS91,SC91,KS92,CITB92,PM92,BFG97 that have appeared in recent literature, The temporal features of these models are summarized in Tables 1 and 2. Our criteria in comparing different temporal object
An Object-Oriented Framework for Temporal Data Models
27
models is based on the design dimensions identified in Section 2.1. It is true that the models may have other (salient) temporal differences, but our concern in this work is comparing their temporal features in terms of the framework defined in Section 2. Similar to the methodology used in Section 2, object-oriented techniques are used to classify temporal object models according to each design dimension. This gives us an indication of how temporal object models range in their provision for different temporal features of a design dimension - from the most powerful model (i.e., the one having the most number of temporal features) to the least powerful model (i.e., the one having the least number of temporal features).
Model )
Structure
Representation)) Order History
Primitives Domain Determinacy OSAM*/T Anchored Discrete Determinate N/A TMAD Anchored Discrete Determinate Gregorian Calendar TEDM Anchored Discrete Determinate N/A Anchored Discrete Determinate Gregorian Calendar T-Chimera Anchored Discrete Determinate N/A T-3DIS
Linear Valid Linear Valid Transaction Linear Valid Transaction Event Partial Valid Linear Valid
Table 2. Design Dimension Features of different Temporal Object Models
T e m p o r a l S t r u c t u r e . It can be noticed from Tables 1 and 2 that most of the models support a very simple temporal structure, consisting of anchored primitives which are discrete and determinate. In fact, all models in Table 2 support the s a m e temporal structure, which consists of discrete and determinate anchored temporal primitives. These primitives can be accommodated in the temporal framework by the T_anchPrim, T_insZant, T_detDiscinstant, T_interval, and T _ d e t D i s c l n t e r v a l types, and their respective properties. The temporal structure of TOODM is slightly enhanced with the presence of unanchored primitives. TOODM is also the only model that supports the continuous temporal domain. Figure 20 shows how the type inheritance hierarchy is used to classify temporal object models according to their temporal structures. The temporal structures of OSAM*/T, TMAD, TEDM, T-3DIS, and T-Chimera can be modeled by a single type - that representing temporal primitives that are anchored, discrete, and determinate. This means that any of these models can be used to provide temporal support for applications that need a temporal structure comprised of anchored temporal primitives which are discrete and determinate. Similarly, the temporal structure of TOODM can be
28
Iqbal A. Goralwalla, M. Tamer Ozsu, and Duane Szafron be modeled by a type which represents anchored and unanchored temporal primitives that are continuous and determinate. This implies that TOODM is the only model that can support applications requiring a continuous time domain, or unanchored temporal primitives.
OSAM*rr,TMAD,TEDM,T-3DIS,T_Chimera /-
.~'Anchored, Determinate,& Discrete 1 nchored & Determinate
t
~'~(
TemporalPrimitives
)
TemporalPrimitives
L,"~ Supertype
Anchored&Unanchomd,TemporaiDeterm &iContinuous nateprimitves 1 TOODM
Subtype
Fig. 20. Classification of Temporal Object Models according to their Temporal Structures T e m p o r a l R e p r e s e n t a t i o n . Temporal primitives in the OSAM*/T SC91, TEDM CITB92, and T-Chimera BFG97 models are simply represented using natural numbers. The models do not provide any additional representational scheme which supports calendars and different granularities. The granularity of the temporal primitives is dependent on the application using the model. When a calendric representational scheme is provided for the temporal primitives, it is comprised of a single underlying calendar, which is usually Gregorian. This is the case in the TOODM RS91, TMADKS92, and T-3DIS PM92 models. T e m p o r a l O r d e r . All models shown in Tables 1 and 2, except T-3DIS, support a linear temporal order. The T-3DIS model supports a sub-linear temporal order. These temporal orders are accommodated in the temporal framework using the T_subLinear0rder and T_linear0rder types. Figure 21 shows how the models can be classified in an inheritance type hierarchy according to their temporal orders. The type modeling a partial linear order of time sits at the root of the hierarchy and represents the T-3DIS model. Since a total linear order is also a partial order, the models supporting total linear orders can be represented by a direct subtype of the root type. T e m p o r a l History. Tables 1 and 2 show how the temporal object models range in their support for the different types of temporal histories. Figure 22 shows how the models can be classified according to the temporal histories they support using a type inheritance hierarchy. The root type in Figure 22 represents the models which only support valid time histories. These are the
An Object-Oriented Framework for Temporal Data Models
29
TOODM,OSAM*/T,TMAD, TEDM,T~-Chimera
T-3#OIS (
(
~ PartialLinearOrders. ~ ' ~
LinearOrders I
Supertype
Subtype
Fig. 21. Classification of Temporal Object Models according to their Temporal Orders
OSAM*/T, T-3DIS, and T-Chimera models. A direct subtype of the root type inherits the valid time history and provides transaction time history as well. This type represents the TMAD model. Similarly, the rest of the subtypes inherit different histories from their supertypes and add new histories to their type as shown in Figure 22. From Figure 22, we see that applications requiring only valid time histories can be supported by all models; applications requiring valid and transaction time can be supported by the TMAD, TEDM, and TOODM models; and applications requiring valid, transaction, and event time can be supported by the TEDM and TOODM models.
OSAM*/T,T-3DIS, T-Chimer~ I ValidTime "~ History ~ Supertype
TM#AD Valid&Transaction~ TimeHistory
TOOD~,TEDM Valid&Transaction& Event 1 TimeHistory Subtype
Fig. 22. Classification of Temporal Object Models according to their Temporal Histories Overall Classification. Having classified the temporal object models according to the individual design dimensions, we now treat the models as points in the design space and use the object-oriented inheritance hierarchy to compare the models on all the temporal features of the design dimensions that they support. Figure 23 gives an inheritance hierarchy in which types are used to represent the different models, and the temporal features supported by the models are used as a criteria for inheritance. The abstract type at the root of the hierarchy represents the least powerful temporal object model which supports a temporal structure comprised of anchored primitives which are discrete and determinate, no temporal representational scheme, a partial linear order, and a valid time history. This type has two immediate subtypes. The first subtype represents the OSAM*/T and the T-Chimera models. It inherits all the features of the root type and refines its partial linear order to a total linear order. Similarly, the second subtype
30
Iqbal A. Goralwalla, M. Tamer (~zsu, and Duane Szafron
f e w e r f e a t u r e s (types)
rTemporal Structure: Anchored, Discrete, & Determinate
Temporal Representation: None Temporal Order:
Partial Linear
Temporal History: Valid
OSAM*/T, T-Chimera
T-3DIS ~.
2 r
Temporal Structure:
Temporal Structure:
Anchored, Discrete, & Determinate
Anchored, Discrete, & Determinate
Temporal Representation:
Temporal Representation:
None
Gregorian
Temporal Order:
Temporal Order:
Total Linear
Temporal History: Valid
J~
\
Pae~ajo~:eH;story:
Volid
TEDM I g r
Temporal Structure:
"
Tempolal Structure-
Anchored,Discrete,& Determinate
Anchored, Discrete, & Determinate
Temporal Representation:
Temporal Representation:
None
Gregorian
Temporal Order:
Temporal Order:
Total Linear
Total Linear
Temporal History:
Temporal History:
Valid, Transaction, Event
Valid, Transaction
\
/
Temporal Structure: Anchored, Unanchored, Continuous & Determinate
Temporal Representation: Gregorian
/',/~
TOODM
Temporal Order: Total Linear
Temporal History: m o r e features (types)
Valid, Transaction,Event
Fig. 23. Overall Classification of Temporal Object Models
An Object-Oriented Framework for Temporal Data Models
31
represents the T-3DIS model, inherits all the features of the root type, and adds a representational scheme which supports the Gregorian calendar. The type representing OSAM*/T and T-Chimera also has two subtypes. The first subtype represents the TEDM model and has all the features of its supertype with the additional features of transaction and event time histories. The second subtype (which is also a subtype of the type representing T-3DIS from which it inherits the representational scheme) represents the TMAD model. This type has the additional feature of the transaction time history. A direct subtype of the types representing TEDM and TMAD represents the TOODM model. The type representing TOODM inherits the representational scheme from the type representing TMAD and the event time history from the type representing TEDM. It also adds unanchored primitives and the continuous time domain to its temporal structure. From Figure 23 it can reasonably be concluded that OSAM*/T and T-Chimera are the two least powerful temporal object models since they provide the least number of temporal features. The TOODM model is the most powerful since it provides the most number of temporal features. The comparison of different temporal object models made in this section shows that there is significant similarity in the temporal features supported by the models. In fact, the temporal features supported by OSAM*/T and T-Chimera are identical. The temporal features of TEDM are identical to those of OSAM*/T and T-Chimera in the temporal structure, temporal representation, and temporal order design dimensions. These commonalities substantiate the need for a temporal framework which combines the diverse features of time under a single infrastructure that allows design reuse. We also note that temporal object models have not really taken advantage of the richness of their underlying object model in supporting alternate features of a design dimension. They have assumed a set of fixed particular underlying notions of time. From a range of different temporal features, a single temporal feature is supported in most of the design dimensions. As such, not much advantage has been gained over the temporal relational models in supporting applications that have different temporal needs. For example, engineering applications like CAD would benefit from a branching time model, while time series and financial applications require multiple calendars and granularities. The temporal framework proposed in this work aims to exploit object-oriented technology in supporting a wide range of applications with diverse temporal needs.
5
Discussion and Conclusions
In this work the different design dimensions that span the design space of temporal object models are identified. Object-oriented techniques are used to design an infrastructure which supports the diverse notions of time under a single framework. We demonstrate the expressiveness of the framework by showing how it can be used to accommodate the temporal needs of different real-world applica-
32
Iqbal A. Goralwalla, M. Tamer 0zsu, and Duane Szafron
tions, and also reflect different temporal object models that have been reported in the literature. A similar objective is pursued by Wuu & Dayal WD92 who provide an abstract time type to model the most general semantics of time which can then be subtyped (by the user or database designer) to model the various notions of time required by specific applications. The temporal framework presented here subsumes the work of Wuu & Dayal in that it provides the user or database designer with explicit types and properties to model the diverse features of time. Their approach requires significant support from the user, including specification of the temporal schema, which is a complex, and non-trivial task. It is therefore imperative for temporal object models to have a temporal infrastructure from which users can choose the temporal features they need. Using the object-oriented type system to structure the design space of temporal object models and identify the dependencies within and among the design dimensions helps us simplify the presentation of the otherwise complex domain of time. The framework is extensible in that additional temporal features can be added as long as the relationships between the design dimensions are maintained. The focus in this work is on the unified provision of temporal features which can be used by temporal object models according to their temporal needs. Once these are in place, the model can then define other object-oriented features to support its application domain. The temporal framework also provides a means of comparing temporal objects models according to the design dimensions identified in Section 2.1. This helps identify the strengths and weaknesses of the different models. The diverse features of time are also identified in Sno95. The focus however, is on comparing various temporal object models and query languages based on their ability to support valid and transaction time histories. In this work we show how the generic aspects of temporal models can be captured and described using a single framework. In PLL96 a temporal reference framework for multimedia synchronization is proposed and used to compare existing temporal specification schemes and their relationships to multimedia synchronization. The focus however, is on different forms of temporal specification, and not on different notions of time. The model of time used concentrates only on temporal primitives and their representation schemes. The temporal framework has been implemented in C + § A toolkit has been developed which allows users/temporal model designers to interact with the framework at a high level and generate specific framework instances for their own applications. The next step is to build query semantics on top of the framework. This will involve addressing issues such as: how the choices of different design dimensions affect the query semantics; what kind of query constructs are needed; what properties should be provided; and how are these properties used, to name a few.
An Object-Oriented Framework for Temporal Data Models
33
References All84 ATGL96
BFG97
BKP86
BP85
CG93 Cho94
CITB92
CJR8~
CK941 CMR91
CPP95
CPP96
CR88
CS93
J. F. Allen. Towards a General Theory of Action and Time. Artificial Intelligence, 23(123):123-154, July 1984. A-R. Adl-Tabatabai, T. Gross, and G-Y. Lueh. Code Reuse in an Optimizing Compiler. In Proc. of the Int'l Conf on Object-Oriented Programming: Systems, Languages, and Applications - OOPSLA '96, pages 51-68, October 1996. E. Bertino, E. Ferrari, and G. Guerrini. T_Chimera - A Temporal ObjectOriented Data Model. Theory and Practice of Object Systems, 3(2):103125, 1997. H. Barringer, R. Kuiper, and A. Pnueli. A Really Abstract Concurrent Model and its Temporal Logic. In Proc. of the 13th ACM Symposium on Principles of Programming Languages, pages 173-183, 1986. F. Barbic and B. Pernici. Time Modeling in Office Information Systems. In Proc. ACM SIGMOD Int'l. Conf. on Management of Data, pages 51-62, May 1985. T.S. Cheng and S.K. Gadia. An Object-Oriented Model for Temporal Databases. In Proceedings of the International Workshop on an Infrastructure for Temporal Databases, pages N1-N19, June 1993. J. Chomicki. Temporal Query Languages: A Survey. In D. Gabbay and H. Ohlbach, editors, Proceedings of the International Conference on Temporal Logic, pages 506-534. Lecture Notes in Computer Science, Vol. 827, Springer Verlag, July 1994. W.W. Chu, I.T. Ieong, R.K. Taira, and C.M. Breant. A Temporal Evolutionary Object-Oriented Data Model and Its Query Language for Medical Image Management. In Proc. 18th Int'l Conf. on Very Large Data Bases, pages 53-64, August 1992. R.H. Campbell, G.M. Johnston, and V.F. Russo. Choices (Class Hierarchical Open Interface for Custom Embedded Systems). Operating Systems Review, 21(3):9-17, 1987. S. Chakravarthy and S-K. Kim. Resolution of Time Concepts in Temporal Databases. Information Sciences, 80(1-2):91-125, September 1994. E. Corsetti, A. Montanari, and E. Ratto. Dealing with Different Time Granularities in Formal Specifications of Real-Time Systems. The Journal of Real-Time Systems, 3(2):191-215, 1991. C. Combi, F. Pinciroli, and G. Pozzi. Managing Different Time Granularities of Clinical Information by an Interval-Based Temporal Data Model. Methods of Information in Medicine, 34(5):458-474, 1995. C. Combi, F. Pinciroli, and G. Pozzi. Managing Time Granularity of Narrative Clinical Information: The Temporal Data Model TIME-NESIS. In L. Chittaro, S. Goodwin, H. Hamilton, and A. Montanari, editors, Third International Workshop on Temporal Representation and Reasoning (TIME'96), pages 88-93. IEEE Computer Society Press, 1996. J. Clifford and A. Rao. A Simple, General Structure for Temporal Domains. In C. Rolland, F. Bodart, and M. Leonard, editors, Temporal Aspects in Information Systems, pages 17-30. North-Holland, 1988. R. Chandra and A. Segev. Managing Temporal Financial Data in an Extensible Database. In Proc. 19th Int'l Conf. on Very Large Data Bases, pages 302-313, August 1993.
34
Iqbal A. Goralwalla, M. Tamer (~zsu, and Duane Szafron
CSS94
DDS94
DS93 lEGS93
Flo91 GLOS97
G6S97
HKOS96
JF88 KGBW90
KKR90
Kli93 KS92
LEW96
MPB92 PLL96
PM92
R. Chandra, A. Segev, and M. Stonebraker. Implementing Calendars and Temporal Rules in Next-Generation Databases. In Proc. lOth Int'l. Conf. on Data Engineering, pages 264-273, February 1994. W. Dreyer, A.K. Dittrich, and D. Schmidt. An Object-Oriented Data Model for a Time Series Management System. In Prec. 7th International Working Conference on Scientific and Statistical Database Management, pages 186195, September 1994. C.E. Dyreson and R.T. Snodgrass. Valid-time Indeterminacy. In Proc. 9th Int'l. Conf. on Data Engineering, pages 335-343, April 1993. O. Etzion, A. Gal, and A. Segev. Temporal Active Databases. In Proceedings of the International Workshop on an Infrastructure for Temporal Databases, June 1993. R. Flowerdew. Geographical Information Systems. John Wiley and Sons, 1991. Volume 1. I.A. Goralwalla, Yuri Leontiev, M.T. Ozsu, and Duane Szafron. Modeling Temporal Primitives: Back to Basics. In Proc. Siz2h lnt'l. Conf. on Information and Knowledge Management, pages 24-31, November 1997. I.A. Goralwalla, M.T. Ozsu, and D. Szafron. Modeling Medical Trials in Pharmacoeconomics using a Temporal Object Model. Computers in Biology and Medicine - Special Issue on Time-Oriented Systems in Medicine, 27(5):369 - 387, 1997. W.H. Harrison, H. Kilov, H.L. Ossher, and I. Simmonds. From Dynamic Supertypes to Subjects: a Natural way to Specify and Develop Systems. IBM Systems Journal, 35(2):244-256, 1996. R.E. Johnson and B. Foote. Designing Reusable Classes. Journal of ObjectOriented Programming, 1(2):22-35, 1988. W. Kim, J.F. Garza, N. Ballou, and D. Wolek. Architecture of the ORION Next-Generation Database System. IEEE Transactions on Knowledge and Data Engineering, 2(1):109-124, March 1990. P.C. Kanellakis, G.M. Kuper, and P.Z. Revesz. Constraint Query Languages. In Proc. of the 9th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 299-313, April 1990. N. Kline. An Update of the Temporal Database Bibliography. ACM SIGMOD Record, 22(4):66-80, December 1993. W. Kafer and H. Schoning. Realizing a Temporal Complex-Object Data Model. In Proc. ACM SIGMOD Int'l. Conf. on Management of Data, pages 266-275, June 1992. J.Y. Lee, R. Elmasri, and J. Won. Specification of Calendars and Time Series for Temporal Databases. In Proc. 15th International Conference on Conceptual Modeling (ER'96), pages 341-356, October 1996. Proceedings published as Lecture Notes in Computer Science, Volume 1157, Bernhard Thalheim (editor), Springer-Verlag, 1996. R. Maiocchi, B. Pernici, and F. Barbic. Automatic Deduction of Temporal Information. A CM Transactions on Database Systems, 17(4):647-688, 1992. M.J. Perez-Luque and T.D.C. Little. A Temporal Reference Framework for Multimedia Synchronization. IEEE Journal on Selected Areas in Communications, 14(1):36-51, January 1996. N. Pissinou and K. Makki. A Framework for Temporal Object Databases. In Proc. First Int'l. Conf. on Information and Knowledge Management, pages 86-97, November 1992.
An Object-Oriented Framework for Temporal Data Models Rev90 RS91
SA85 sc911 Sci941 Sno86
Sno87 Sno92
Sno95
Soo91 SRH90
ss88 TK96 WD92 WLH90
35
P.Z. Revesz. A Closed Form for Datalog Queries with Integer Order. In International Conference on Database Theory, pages 187-201, 1990. E. Rose and A. Segev. TOODM - A Temporal Object-Oriented Data Model with Temporal Constraints. In Proc. lOth Int'l Conf. on the Entity Relationship Approach, pages 205-229, October 1991. R. Snodgrass and I. Ahn. A Taxonomy of Time in Databases. In Proc. ACM SIGMOD Int'l. Conf. on Management of Data, pages 236-246, May 1985. S.Y.W. Su and H.M. Chen. A Temporal Knowledge Representation Model OSAM*/T and its Query Language OQL/T. In Proc. 17th Int'l Conf. on Very Large Data bases, pages 431-442, 1991. E. Sciore. Versioning and Configuration Management in an ObjectOriented Data Model. The VLDB Journal, 3:77-106, 1994. R. Snodgrass. Research Concerning Time in Databases: Project Summaries. ACM SIGMOD Record, 15(4), December 1986. R.T. Snodgrass. The Temporal Query Language TQuel. ACM Transactions on Database Systems, 12(2):247-298, June 1987. R.T. Snodgrass. Temporal Databases. In Theories and Methods of SpatioTemporal Reasoning in Geographic Space, pages 22-64. Springer-Verlag, LNCS 639, 1992. R. Snodgrass. Temporal Object-Oriented Databases: A Critical Comparison. In W. Kim, editor, Modern Database Systems: The Object Model, Interoperability and Beyond, pages 386-408. Addison-Wesley/ACM Press, 1995. M.D. Soo. Bibliography on Temporal Databases. ACM SIGMOD Record, 20(1):14-23, 1991. M. Stonebraker, L.A. Rowe, and M. Hirohama. The Implementation of POSTGRES. IEEE Transactions on Knowledge and Data Engineering, 2(1):125-142, March 1990. R. Stain and R. Snodgrass. A Bibliography on Temporal Databasesl. IEEE Database Engineering, 7(4):231-239, December 1988. V.J. Tsotras and A. Kumar. Temporal Database Bibliography Update. ACM SIGMOD Record, 25(1):41-51, March 1996. G. Wuu and U. Dayal. A Uniform Model for Temporal Object-Oriented Databases. In Proc. 8th Int'l. Conf. on Data Engineering, pages 584-593, Tempe, USA, February 1992. K. Wilkinson, P. Lyngbaek, and W. Hasan. The Iris Architecture and Implementation. IEEE Transactions on Knowledge and Data Engineering, 2(1):63-75, March 1990.
An Architecture for Supporting Interoperability among Temporal Databases* Claudio Bettini 1, X. Sean Wang 2, and Sushil J a j o d i a 2 1 Dipartimento di Scienze dell'Informazione, University of Milano, Italy. Department of Information and Software Systems Engineering, George Mason University, Falrfax, VA.
A b s t r a c t . A significant property of temporal data is their richness of
semantics. Although several temporal data models and query languages have been designed specifically to handle the temporal data, users must still deal with much of the implicit temporal information, which can be automatically derived from the stored data in certain situations. We propose a multidatabase architecture where an appropriate formalization of the intended semantics is associated with each temporal relation and temporal database. This allows a temporal mediator to access the databases to retrieve implicit information in terms of time granularities different from those used to store data. We also describe how the temporal mediator can provide a user interface to the multidatabase system allowing temporal queries in terms of arbitrary granularities and involving relations in different TDBMS.
1
Introduction
A significant property of temporal d a t a is their richness of semantics. Although several t e m p o r a l d a t a models and query languages have been designed specifically to handle the temporal d a t a Tan93, users must still deal with much of the implicit temporal information, which can be automatically derived from the stored d a t a in certain situations. As a running example, consider the temporal relation SALES, which is used by a computer product company to keep track of items sold and the income realized by each of its branches. Specifically, the relation SALES records for each branch (Branch) and the product identifier ( P r o d u c t ) , the number of items sold ( I S o l d ) and the total income (Income) during each day (Day). We assume t h a t the database is u p d a t e d at the end of each business-day by inserting a tuple if at least one item of a particular product has been sold by t h a t branch. The values of Day are t i m e s t a m p s consisting of the date ( m o n t h / d a y / y e a r ) . An instance of SALES is shown in Figure 1. * The work was partially supported by the National Science Foundation (NSF) under the grant IRI-9633541. The work of Wang was also partially supported by the NSF grant IRI-9409769.
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 36-55, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
An Architecture for Supporting Interoperability among Temporal Databases Branch
Product ISold Income
Austin Austin Chicago Chicago Los Angeles Austin
k123 k123 pl00 pl00 m87 k123
37
Day
100 16,500 3/13/97 50 8,2503/14/97 12 9,000 3/14/97 100 80,000 3/17/97 400 180,000 4/14/97 300 48,000 4/14/97
Fig. 1. An instance of the relation SALES
From the example temporal relation in Figure 1, it is clear that the total number of item k123 sold in a month by a particular branch can be obtained by aggregating sales figures for item k123 for all those days that lie within that month. Temporal database systems, however, usually leave this task to the users of the databases; users must perform aggregation in their queries if they wish to obtain the monthly data. It is obviously desirable that the temporal database systems have the ability to automatically apply the aggregation operation and present the user with the monthly sales data, without any explicit manipulations by the users. Our research on semantic assumptions BWJa is aimed toward building such a database system. The basic idea is to provide the system with the knowledge about the relationship between months and days, as well as with the fact that sum should be used to derive monthly sales data from the daily data. Semantics assumptions are also useful in a multidatabase system consisting of autonomous databases. In a multidatabase system, a global query is one whose answer depends on the information from more than one databases. To illustrate, consider, in addition to the database having SALES information, the existence of market research information systems offering access to street prices of products. Suppose that this information is stored in the temporal relation P R I C E that records for each product (Product), its average street price (SPrice) during a quarter (Quarter). (Note that the granularity used to store this data could be different for different information providers. It is possible that some offer more accurate average street prices, in terms of months, weeks, or even days.) A typical instance of P R I C E is shown in Figure 2. A global query that asks for "which branch sold items at a price lower than the street prices" needs information from the database containing SALES as well as from the one containing the street prices. In such a multidatabase environment, we probably cannot assume that each participating database system can be extended so that each is aware of the relationships among different time granularities. Indeed, the database system may not have an extensible temporal granularity system. Furthermore, some temporal granularities may be of interest only to some global users who draw information from different databases. In order to preserve autonomy, it may be desirable that the participating databases would not be required to change their system for the convenience of some global users with specialized needs.
38
Claudio Bettini, X. Sean Wang, and Sushil Jajodia Product S P r i c e Quarter
k123 pl00 m87 133 k123 pl00 k123
175 1st-q-97 700 1st-q-97 400 1st-q-97 300 2nd-q-97 160 2nd-q-97 650 2nd-q-97 200 3rd-q-97
Fig. 2. An instance of the relation PRICE
In order to facilitate the evaluation of global queries, we adopt the idea of a which is defined as "a software module that exploits encoded knowledge about certain sets or subsets of data to create information for higher layer of applications". We propose a temporal mediator that can answer queries posed by users (or application programs) on the implicit data in granularities that are perhaps not understood by some participating databases. For example, the corporate headquarters of a company may want to analyze the sales data against the street prices in terms of the fiscal calendar of that company. The market research firm database may not have the knowledge of the fiscal calendar of that company. When some data, stored in one of the databases, must be presented in terms of a time granularity not understood by that database, the automatic reasoning that derives the implicit data in that granularity is done by the proposed temporal mediator. The temporal mediator requires two kinds of semantic information for data at a local database: (1) Semantic assumption, specifying the method (e.g., sum) and how it should be applied to derive implicit information, and (2) Specification of intended temporal granularities. In the SALES example, we may specify that the method to use is 'sum' on the set of values with timestamps included in the same tick of the target granularities. The target granularities are those that are coarser than days (intuitively, these are the granularities that are partitioned by the granularity day) (see BWJa for a precise definition). Based on these observations, we propose an architecture of a multidatabase system that can derive implicit data in terms of multiple granularities. The central component is the temporal mediator itself, which answers queries by using multiple databases. The architecture also includes a "common" knowledge to which each participating database is assumed to have access. The common knowledge includes typical system-wide time granularities (like seconds, days, etc), as well as "conversion methods" to derive implicit information. In order to use the temporal mediator, multidatabase system has for each constituent database its semantic assumptions using the names of conversion methods (part of common knowledge) as well as the specification of target granularities. The mediator uses this information to process user queries. Semantic assumptions used in our multidatabase system architecture are related to the idea of "semantic values" SSR94. A semantic value is a value m e d i a t o r Wie92,
An Architecture for Supporting Interoperability among Temporal Databases
39
that has an associated environment. For example, a value 10 may be associated with "US dollar" specifying that this value is in terms of the US currency. SSR94 discusses issues involved in using semantic values in a multidatabase system. Our semantic assumptions are different from semantic values in two different aspects. Firstly, our semantic assumptions are temporal concepts, while semantic values are more general. Moreover, our semantic assumptions deal with derivation of implicit information from a set of data (e.g., monthly information from a set of daily data), while semantic values are single-valued. Various semantic assumptions in the temporal database setting were perhaps first recognized by Clifford and Warren CW83. The earliest systematic study, however, was performed by Segev and Shoshani SS87. They recognized various properties of time sequences, such as stepwise constant, continuous, discrete, and user-defined, and provided a number of functions to be used in user query languages to accommodate these properties. However, while these earlier works essentially provided a set of functions to implement specific assumptions, we formalize a general notion of semantic assumption that turns out to be a crucial element for the interoperability of temporal databases. Clifford and Isakowitz CI94 formalized the semantics of variables that many temporal data models employ to denote various semantic assumptions. The work clarified many vague, although intuitive, notions. However, this paper did not address how user queries could be answered on databases with such variables. In CDIJS97 a framework is proposed to specify the semantics of the particular variable now used to denote the current time in temporal databases, and several aspects are discussed regarding how queries on databases with such variables should be handled. However, both papers CI94,CDIJS97 do not address issues related to time granularities. The rest of this paper is organized as follows. In Section 2, we introduce the notions of temporal types and semantic assumptions. In Section 3, the general architecture is presented, analyzing the functionalities of each of its components. In Section 4, we consider a specific case study and describe preliminary results on the implementation of some of the architecture components.
2
Multiple granularities and semantic assumptions
In this section, we briefly present two notions that we have introduced elsewhere BWJb,BWJa; these provide the basis for the architecture investigated in this paper. The notion of temporal type is a formalization of what is intuitively called a granularity. The notion of semantic assumptions is a formalization of temporal semantic properties of attributes that are often implicitly used in the design of temporal schemas. This notion has been defined for the relational data model, hence in this paper we consider only relational databases.
40
2.1
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
Temporal types
The definition of temporal types identifies an instance of the temporal type systems introduced in BWJb. We assume there is an underlying notion of absolute time, represented by the set Af of all positive integers. D e f i n i t i o n (Temporal type) A temporal type is a mapping # from the set of the positive integers (the time ticks) to 2N (the set of absolute time sets) such that for all positive integers i and j with i < j, the following two conditions are satisfied: 1. #(i) ~ @and #(j) ~ @ imply that each number in #(i) is less than all numbers in #(j), and 2. #(i) = O implies #(j) = O. Hence, a time tick for a given temporal type # identifies a set of instants on the time line. Property (1) states that the mapping must be monotonic. Property (2) disallows an empty set to be the value of a mapping for a certain time tick unless the empty set will be the value of the mapping for all subsequent time ticks. Typical granularities such as day, month, week, year, and b u s i n e s s - d a y can be defined as temporal types. As an example, suppose the underlying time is measured in terms of seconds. Then, the granularity day (assuming it starts on the first day of 1990) is a mapping such that day(l) is the set of all the seconds that comprise the first day of 1990, and day(2) maps to all the seconds of the second day of 1990, and so on. When the set of instants corresponding to a tick #(i) is equal to or contained by the set corresponding to a tick v(j) (i.e., ~t(i) C_ v(j)), we say that ~(j) covers #(i). There is a natural "finer-than" relation among temporal types. The temporal type # is said to be finer than the temporal type v if each tick of # is covered by a tick of ~. Thus, for example, day is finer than week and month is finer than year. It is easily seen that the finer than relation is a partial order BWJb. Finer-than is not a total order since, for example, week and month are incomparable (i.e., week is not finer than month, and month is not finer than
week). More formal properties of temporal types are investigated in BWJb. For this formal model, a specific finite representation of the temporal types must be defined (see e.g., NS92,CSS94). Using this representation, it should also be possible to implement functions to test the aforementioned finer-than relation as well as other common operations on granularities such as containment and intersection among ticks of different granularities, for example.
2.2
Point a n d i n t e r v a l - b a s e d s e m a n t i c assumptions
Semantic assumptions provide a formal specification of how unknown, yet implicit, values can be deduced from data explicitly present in the database. In particular, we are interested in information unknown for a particular tick of the
An Architecture for Supporting Interoperability among Temporal Databases
41
time granularity used by the database, and in information unknown for ticks in different granularities. We distinguish semantic assumptions used to derive these two different types of information into point-based and interval-based semantic assumptions. Point-based assumptions are those semantic assumptions that can be used to derive information at certain ticks of time, based on the information explicitly given at different ticks of the same temporal type. Such derivation can be done in a variety of ways. For example, (i) we may assume that the values of certain attributes persist in time unless they are explicitly changed; (ii) we may assume that a missing value is taken as the average of the last and next explicitly given values; or (iii) we may take the sum of the last three values. We adopt a general notion of point-based assumptions such that, in principle, any interpolation function to derive information from explicit values can be used BWJa. The persistence assumption has been widely used in practice. With PxOZpersis~ we denote the assumption of the attributes X Y being persistent with respect to the attributes X. This means that if we have explicit values for X and Y at a certain tick of time, these values will persist in time until we find a tick at which we have explicit values that are the same for the attributes X but different for Y. Note that the information derived by a persistence assumption always includes the original information (projected on X Y ) and the information implied by the assumption. Consider the P R I C E temporal relation. The designer of this relation could decide that when the street price for a month is not available, the previously stored street price is provided. This is formally specified by the assumption (P1)
Pp~oduct(SPrice pers~s)
which says that the values of these two attributes persist in time until a different value for S P r i c e with respect to the same value of P r o d u c t is found. For example, there is no value in the PRICE table for the street price of product 'm87' for the second quarter of 1997. However, if the designer specified the above semantic assumption, the same value as for the first quarter is implicitly associated with the product for the second quarter. In the temporal database literature a notion similar to persistence is found when the value of a tuple is given for an interval 1, uc, where uc is a short hand for "until changed" WJL91,CI94, or the common notation k, oc is used. However, the notion of until changed is not well formulated. For example, it is not clear which attributes must actually change to be qualified as "changed". In the persistence semantic assumption shown above, it is not sufficient to say that S p r i c e is persistent; the P r o d u c t attribute, based on which S p r i c e is persistent, has to be specified in order to have a clear semantics. The formal specification of persistence is also needed. If we generalize the persistence example, a point-based semantic assumption relies on the use of certain methods (called interpolation methods) to derive implicit values from explicit ones. An interpolation method is used to derive a value of an attribute by "interpolating" values of the same attribute at different ticks of time. Examples of other interpolation methods are average (taking
42
Claudio Bettini, X. Seem Wang, and Sushil Jajodia
the average of the previous and next stored value), or last-k-avg (taking the average of the last k stored values). If X, 1"1, ..., Yn are pair-wise disjoint attribute sets, and methl,...,methn are interpolation methods, the expression px(y~nethl ... ymeth~) denotes the assumption using method methi to derive implicit values of attributes Yi with respect to attributes X, for each i -- 1 , . . . , n. We call interval-based those assumptions that can be used to derive information for a certain tick of one temporal type from information at ticks of a different temporal type. The word interval indicates the fact that these "source" ticks must be intervals in the absolute time having a certain relationship (containment or intersection) with respect to the interval in the absolute time corresponding to the "target" tick for which the value is being derived. Referring to our PRICE relation, the designer could have associated with this relation an assumption that roughly says that, if a given product price is stored in the relation for a given quarter, the same price can be considered a good estimate for any month (or day, hour, etc.) of that quarter. In general, with Ix (A t) we denote the assumption of the attribute A being downward hereditary with respect to the attributes in X. This means that if we have an explicit value for the attribute A with respect to certain values for X at a certain tick of type #, then for each tick of any other type that is covered by it, A has that same value with respect to the same values for X. Hence the above assumption on P R I C E would be specified by (I1)
Iproaucz(SPricel).
Similarly, with Ix (A T) we denote the assumption of the attribute A being upward hereditary with respect to X. Roughly speaking, if we have the same value for the attribute A with respect to the same X at different ticks, that value is also the value of A for the same X for each tick of any other type that is the union of some of these ticks. With I x ( A I) we denote the assumption of the attribute A being liquid with respect to X; i.e., it is both downward and upward hereditary. A broad class of interval-based semantic assumptions is formed by so-called
aggregate assumptions. The value for an attribute at a certain tick v(i) of a target granularity v can be obtained by aggregating the values present in the database relation for ticks included in v(i). Several aggregation methods can be applied: average, sum, last, etc.. Referring to our example, let us assume that the designers of the two relations specify these additional assumptions: (I2) (I3)
Income sum) Iproduct(SPri ceaV9).
Isranch,Product(ISold,
The designers also specify that the assumptions should be applied only to target temporal types coarser than the source type they are using (day and q u a r t e r resp.). Hence, the first assumption states that the values for attributes I S o l d and I n c o m e in a tick of a granularity coarser than day can be obtained by summation of the values of these attributes corresponding to the same branch and product for all the days included in that tick. The second states that the street price in a tick of a granularity coarser than month can be obtained by taking the average of the street prices for the same product in all the months included in that tick.
An Architecture for Supporting Interoperability among Temporal Databases
43
In general, interval-based semantic assumptions can be used to answer queries that involve granularities different from those used in the database. Consider the query: "Give me the yearly income reached by the Austin branch for product k123". This query asks for information in terms of years, while the SALES relation is stored in day. The assumptions provide sufficient semantic information for the system to offer to the user a view of the relation in terms of year. In particular, for a certain branch, product, and year, the value of Income is taken as the sum of all the incomes stored for that product in a day contained in that year. Note that otherwise, the user has to properly code the query to perform the necessary conversions and different users could have a different interpretation of the semantics of the stored data. The example shown is obviously a simple case but more involved conversions and semantics are very likely to appear in real databases. Similarly to point-based assumptions, an interval-based assumption relies on the use of certain "conversion" methods. In general, if X, I1, . . . , Y, are pair-wise disjoint attribute sets, and cony1, . . . , convn are conversion methods, I x (y~onvl ... ynconv~) is the interval assumption that allows to convert values of Yi according to method conv~ with respect to values of attributes X, for each i-~ l , . . . , n . Formal definitions of methods, assumptions, their semantics and their formal properties can be found in BWJa. Note that the definition of assumptions is quite general. For example, the downward hereditary assumption (I1) can be used to derive street prices in terms of any (target) granularity that is finer than quarters, including second and minisecond. In a real application, there are usally some restrictions on the set of target granularities. These restrictions are specified along with the semantic assumptions.
2.3
Semantic environment
In order to participate in the architecture for database interoperability that we are defining, a temporal database management system (TDBMS) must provide to the external world its meta-data information, including semantic assumptions. By semantic environment of a TDBMS we mean a collection of meta-data information regarding the database. A semantic environment includes (1) the schema specification, (2) the native granularity used in each database relation to represent temporal data, (3) the set of temporal types known to the TDBMS, and (4) the semantic assumptions defined on the database relations with their associated restrictions on the target granularities. These restrictions must be formulated through a general specification, since they must not be limited to the temporal types known by the TDBMS. For example, a particular s u m conversion method, could be applicable to all pairs of temporal types (source, target) such that source is finer than target. As will be explained in the next section, this information is essential for the construction of the database schema for a mediator.
44 3
Claudio Bettini, X. Sean Wang, and Sushil Jajodia The temporal
mediator
architecture
In this section we illustrate a general architecture for querying a heterogeneous set of temporal databases having different semantic environments. The main components of the architecture are: (a) a temporal mediator, (b) a set of active subjects (users and/or applications), (c) a set of temporal databases each with an associated semantic environment, and (d) a common knowledge among components (a), (b), and (c). An instance of the temporal mediator architecture with a single active subject and two temporal databases is shown in Figure 3.
I1)BMS-2 (c) .....................
i
DB
=====================.... - ~,Temporal
Common
. . . . _K_n~_~_~(d)_.
)
Mediator (a)
User/
Processor
1. & C. M O h o d s = lnteq~lation and Conversion Methods
Application (b)
TDBMS-1 (c)
Fig. 3. An instance of the temporal mediator architecture.
The temporal mediator is the central component of the architecture. By using the mediator, an active subject can make queries referring to relations stored in different databases, and asking for information in terms of granularities possibly different from those used locally by the databases to store their data. Each active subject is seen in the architecture as a client process accessing the information contained in each participating TDBMS through the interface provided by the temporal mediator. Active subjects can be information mining processes, simple query interfaces, browsers, or more specialized applications. Each TDBMS in addition to its data, provides a semantic environment, so that the temporal mediator has access to the knowledge about which attributes and relations are locally available, and how interpolations and conversions on
An Architecture for Supporting Interoperability among Temporal Databases
45
their values should be performed. We assume that a network interface on each TDBMS local site allows the communication with the temporal mediator. A common knowledge is assumed among the different components: the formal specification of interpolation and conversion methods and a naming convention for a basic set of granularities. Indeed, day, for example, should denote the same temporal type in each database 1, and the semantic assumptions given in each database environment must refer to the same interpolation and conversion methods; the mediator itself has to know the methods specification to correctly interpret the semantic assumptions. The global set of interpolation and conversion methods should be specified in a formal language with a well-defined semantics (e.g., multi-sorted first order logic is used in BWJa). The different modules within the temporal mediator operate using a common data model and query language, that we call the TM-language. This can be an abstract query language (e.g. the M Q L F logic of BWJa) on an abstract data model (e.g. the temporal modules of WJS93), or a concrete one (e.g., the TSQL2 data model and query language Sno95). The best choice essentially depends on the presence or absence of a common interface language (like SQL for conventional databases) among the different TDBMSs and active subjects. In Section 4 we consider the case where every database provides a TSQL2 interface, hence, in that case, TSQL2 is used as the TM-language. We describe the architecture in more detail, considering each module in a temporal mediator. 3.1
The methods library
The methods library contains an implementation in the TM-language of the interpolation and conversion methods specifications. This implementation consists of what we call view templates that are essentially parametric TM-language queries. The parameters are the attribute names on which the method must be applied, and, for conversion methods, the source and target temporal types. Once the parameters are instantiated, the view template becomes a view, and hence, a query providing a view of the involved attributes in the target granularity. The methods library is used by the meta-data and query processor module. Examples of view templates and their instantiation are reported in Section 4. 3.2
The temporal type system
The temporal type system represents the knowledge that the mediator has on time granularities. The set of granularities must include at least the basic types assumed as the common knowledge; the richer is the temporal type system of the mediator the better will be its ability to answer user/application queries. The set of types included in the mediator temporal type system can be specialized based on the particular applications that the mediator is supposed to support. 1 If this is not the case we assume that appropriate synchronization tools are employed to simulate this common knowledge.
46
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
A temporal type system must also provide some functionalities to deal with its types. We distinguish local, and global functionalities. Among the first, the system should provide a mapping from ticks of one type into ticks of an other, and a set of arithmetic operators to add/subtract ticks. For example, the system may need to determine the month that contains a given business-day, the weeks properly contained in a given year, or the week obtained by adding 6 months to the first week of a given year. The specific operations that are necessary depend on the adopted TM-language. Global functionalities refer to relationships between temporal types. They should include a function to check the finer-than relationship, as well as functions to retrieve the maximum/minimum number of ticks of a temporal type covered by one (arbitrary) tick of another type. For example, it may need to determine that business-days are finer than weeks, that weeks are not finer than months, as well as knowing that months are at least 28 days and at most 31 days long. While local functionalities are routinely used by the query processor for temporal queries, the global functionalities seem to be particularly useful for optimization purposes. For example, when a finer-than relation holds among the two types involved in a conversion, a more efficient implementation of a conversion method can be used, as shown in Section 4. The temporal type system module is only used by the meta-data and query processor. 3.3
The DB interface
The DB interface is the module of the temporal mediator that interacts with the temporal databases. It receives transformed queries from the query processor that have to be sent to specific databases. If some of the databases do not provide an interface to the temporal mediator query language, a query transformation process has to be carried out by the DB interface to obtain a query in the target database query language, and another transformation process has to be done on the answer from the database to obtain a table in the mediator data model. An example of such interface is TimeDB Boh95 that supports a subset of TSQL2 query language over a conventional Oracle database. Under the assumption of a uniform query language, 2 the DB interface only performs the task of dispatching queries to the databases and returning the results to the query processor. 3.4
The mediator schema
The mediator schema module contains a schema specification on which user queries can be formulated. The mediator schema essentially provides a set of temporal relations, each one associated with the set of temporal types such that sufficient meta-data information is available to obtain the relation in terms of 2 The assumption that a set of heterogeneous databases provide the same query language is reasonable for conventional relational databases using SQL, and we believe that an SQL extension, or a similar standard will be adopted for temporal relational databases.
An Architecture for Supporting Interoperability among Temporal Databases
47
those types. Hence, an active subject can formulate a query on these relations in terms of each of the associated temporal types. The generation of the mediator schema is in general a complex task that shares many of the complexities involved in the specification of a global schema in a multidatabase system (see e.g., DS96). These complexities have been investigated in the literature, and tools are usually provided to facilitate this task. We assume each relation scheme as defined by a non-temporal relational algebra expression. Here we consider the aspects concerning the granularities. We associate with the mediator relation a set of temporal types; a type is included in this set if each TDBMS relation (or view) appearing in the relational expression can be obtained by the mediator in terms of that type using the semantic assumptions provided by the TDBMS. For example, consider a relation R defined as R1 ~ R2, where R1 is in TDB1 and R2 in TDB2. We include type # in its set of associated types, if R1 and R2 can be obtained in terms of # using the semantic assumptions and their type restrictions in TDB and TDB2, respectively. Note that # could be a temporal type unknown to the databases; the temporal mediator can determine based on the conversion methods in the assumptions (provided by TDBMS), on the specification of types allowed for those methods (provided by the TDBMS), and on the specification of # (provided by the mediator temporal type system), whether those relations can be obtained in terms of #. Semantic assumptions also help to solve some of the naming problems typical of multidatabases. For example, if the same attribute name appears in two TDBMSs relations, the semantic assumptions involving that attribute should be the same in the two YDBMSs, otherwise it means they have different semantics and should be given different names in the mediator schema. 3.5
The meta-data and query processor
The meta-data and query processor is the most important module of the temporal mediator. We schematically describe the several steps performed by this module in the processing of a query issued by an active subject component: 1. retrieve the necessary view templates from the methods library; 2. instantiate the templates according to the target temporal type and produce a view corresponding to each mediator relation appearing in the query; 3. substitute each relation name with its corresponding view obtaining a trans:formed query; 4. decompose the query and send each TDBMS-query to the DB interface module; 5. process the TDBMS-query results to obtain the global query result. We now consider each step in more detail. In Step 1, for each mediator relation R, whose name is appearing in the user query, the query processor has to identify the semantic assumptions on the TDBMS relations appearing in the specification of R. We suppose that this
48
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
meta-data information has been previously obtained and stored as part of the schema. Based on the semantic assumptions, as well as on the target temporal types required in the query, the appropriate view templates are retrieved from the methods library. A conversion method, like, for example, avg or sum, provided by a semantic assumption is usually independent from the source and target temporal types. Similarly, the corresponding template view implementation in the methods library is usually parametric with respect to the source and target types. However, for optimization purposes different view templates can implement the same method depending on the relationship between the source and target temporal types involved in the conversion. For example, we have experienced that conversions into target types that are either finer3 or coarser than the source type have more efficient implementations. (A concrete example is shown in Figures 5 and 6.) Hence, the most appropriate view template is selected by the query processor according to the specific source and target types. In Step 2, the templates, identified as explained above, are instantiated according to the temporal types required in the query. If multiple (point and interval) assumptions are present, the templates are composed accordingly (see BWJa for details). This results in a view in terms of the target temporal type for each TDBMS relation appearing in the definition of a mediator relation involved in the query. Applying the relational operators in the definition, a view in terms of the target temporal type for each involved mediator relation is obtained. In Step 3, the query processor substitutes each relation name in the user query with the corresponding view. The resulting query has all the necessary semantic information on conversions and interpolation embedded in it. The query contains TM-language constructs for basic temporal type operations, such as tick containment or intersection predicates, for example. In Steps 4 and 5, the query processor has to adopt a strategy to evaluate the global query. In principle, known strategies for multidatabase query evaluation can be adopted (see e.g., DS96). Informally, sub-queries will be sent to the appropriate databases and part of the evaluation (operations on the results) will be done by the query processor. However, the query processor has to check, by the semantic environment of each TDBMS, that it knows the temporal types involved in the corresponding sub-query and, hence it can perform the necessary basic temporal type operations. If this is not the case, these operations must be performed by the query processor in the mediator, after retrieving the data from the TDBMS. Several optimization strategies can be applied here. In the above discussion we assumed that the active subjects submit their query in the same query language used within the mediator. 4 If this is not the case the necessary transformation has to be carried out by the query processor.
3 Accordingly to the formally defined finer-than relation. a As pointed out above, the assumption that most temporal databases will provide an interface through a common query language seems to be reasonable in the long term.
An Architecture for Supporting Interoperability among Temporal Databases 4
Towards
a system
49
prototype
In Section 3, we described the functionality of each component in the temporal mediator architecture independently from a specific TM-language. The description is sufficiently general to allow TDBMSs with different data models and query languages. In this section, we present a particular case study in which a simple extension of TSQL2 is used as the TM-language, and each participating TDBMS provides a TSQL2 interface to the mediator. As mentioned earlier, TimeDB can be used to support a TSQL2-1ike query language over conventional relational databases. As we propose in BWJa, the extension of TSQL2 should allow a syntactic construct "TABLE IN g r a n u l a r i t y " that can be used in the FROM clause to specify a relation in terms of a particular granularity. For example the construct PRICE IN month is allowed wherever a relation name is allowed in TSQL2 in the FROMclause. We also assume that the use of the keyword INSTANT, allowed for event tables is extended to every kind of table. That is, it allows to de-coalesce a period into its constituents ticks. For example, if a tuple is timestamped with Jan96, Feb96, the application of INSTANTon the table that contains the tuple would return a table containing two copies of that tuple, one timestamped with Jan96 and the other with Feb96. Finally, we assume a slightly different semantics for CAST on periods: casting of period Jan96, Feb96 into days should return 1/1/96,2/29/96 and not 1/1/96,2/1/96 as with current TSQL2 casting, i.e. the last (and not first) tick of the set corresponding to the right endpoint is taken. We now illustrate, through an example, preliminary results on how certain mediator modules can be implemented using this language. We consider a temporal mediator interfacing with two TDBMSs. The first is a company TDBMS containing, among other relations, the SALES relation introduced in Section 1. The second is an information provider TDBMS containing, among other relations, the PRICE relation also introduced in Section 1. Figures 1 and 2 can be seen as TSQL2 tables where each timestamp is a single valued period. The only semantic assumption specified in the first TDBMS on SALES is (I2), illustrated in Subsection 2.2. The application of the assumption has also been restricted from that TDBMS schema designer to temporal types coarser than day. The semantic assumptions specified in the second TDBMS on PRICE are (P1), (I1), and (I3), with target types for (I1) restricted to those finer than q u a r t e r and target types for (I3) restricted to those having at least one tick covering a tick of q u a r t e r . Hence, the mediator schema provides to the active subjects, among other relations, the relation SALES associated with a list of all temporal types (known to the mediator) coarser than day. Assuming a reasonably rich mediator temporal type system, these would certainly include week, business-week, month, etc.. The schema also provides the relation PRICE associated with a set of temporal types (known to the mediator). Since month satisfies the restrictions for (I3), it is part of this set.
50
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
Suppose that the following query is formulated by an active subject: "Which branch sold product k123 for less than the street price considering monthly sales of 'k123' and corresponding street prices?" Since the "TABLEIN g r a n u l a r i t y " construct is available, the query is easily expressed in TSQL2 as shown in Figure 4. SELECT el.Branch FROM SALES IN month AS el, PRICE IN month AS e2 WHERE el.Product = 'k123' AND e2.Product = 'k123 c AND el. Income < e2.SPrice * el. ISold AND VALID(el) = VALID(e2);
Fig. 4. A user query The type month is among the types supported by the mediator for SALES because of the semantic assumption IBr~ch,Product(ISold, Income sum) given in TDBMS1, and it is among the types supported by the mediator for PRICE because of the semantic assumption/Product (SPrice 1). To process the query, the mediator has to use these assumptions to derive a view of both relations in month. The first step consists in retrieving the view templates corresponding to the conversion methods I and sum from the methods library. Note that the view templates needs to be applied to a particular relation. As observed in Section 3, it is possible that more efficient implementation of the same conversion method exist in the library depending on the relationship between the source and target type. In Figures 5 and 6 we show two template implementations for each conversion method needed by our example query. Since month is coarser than day and finer than q u a r t e r the simpler templates can be used. For finer target types SELECT e. V ,
e. W
VALID CAST(VALID(e) AS u)
FROMM(V,W) AS e;
For arbitrary target types SELECT e2. V , e2. W FROM (SELECT *
VALID CAST(VALID(el) AS u) FROM M(V, W) AS el) (INSTANT) AS e2 WHERE EXISTS (SELECT 9 FROM M(V, W ) (INSTANT) AS e3 WHERE CAST(VALID(e2) AS ~) = VALID(e3) AND e2,V = e3.V AND e2.~V = e 3 , W ) ;
Fig. 5. Two template views for the ~ conversion method
The template for target types that are finer than the source type simply retrieves the projection of the table Mon attributes in V and W coalescing tu-
An Architecture for Supporting Interoperability among Temporal Databases
51
pies with the same value for V and W, and then casts the timestamp of each tuple to the target granularity. Note that M, V, W, and v are parameters that will be instantiated according to the specific table, attributes in the assumption, and target granularity. Considering as an example the P R I C E table, if tuple /k123,100 / has timestamp lst-quarter-97,2nd-quarter97, after the parameters instantiation, the casting as used above would return the same tuple timestamped Jan97,Jun97, i.e., from the first month of the first quarter to the last month of the second. The same method is much more complex if it has to be applied to arbitrary target types. Indeed, in this case, only some of the ticks of the source type could be covered by ticks of the target. Thus, the method implementation must guarantee that the values for attributes W are "inherited" only by tuples timestamped with ticks c o v e r e d by ticks of the target. If there is no particular knowledge about the relationship among source and target type, the comparison has to be done "tick by tick". This explains the need for the (INSTANT) construct in the template.
For coarser target types SELECT e 2 . V , sual(e2.W) VALID(e2) FROM (SELECT * VALID CAST(VkLID(el) AS v) FROM M AS el) AS e2 GROUP BY e2.V, VALID(e2) USING 1;
For arbitrary target types SELECT e 2 . V , sum(e2.W) VhLID(e2) FROM (SELECT 9 VALID CAST(VALID(el) AS v) FROM M AS el)(INSTANT) AS e2, M (INSTANT) AS e3 WHERE CAST(VALID(e2) AS ~) CONTAINS VALID (e3) AND e2.V = e3.V AND e2.M/ = e3.M/ GROUP BY e2.V, VALID(e2) USING 1;
Fig. 6. Two template views for the
sum
conversion method
The s u m template for target types that are coarser than the source type first (in the external FROMclause) translates the timestamp of each tuple in Min terms of the target type (without coalescing). Then, it applies the sum aggregate, summing all values in W being in the same tick (specified by USING 1) of the target type, and having the same value for V (GROUP BY e 2 . V , VALID(e2) USING i). When the target type is coarser, we are always guaranteed that given a tick in the target type any tick of the source type is either fully covered by it or has no intersection with it. If this is not the case, ticks not fully contained must be identified and excluded from participating in the sum. This is basically what the more involved sum method implementation does for arbitrary types. Note that,
52
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
in our example, the application of this method to types not coarser than day has been explicitly ruled out by the meta-data associated with the SALES relation. Since templates are parametric with respect to the relation name, attribute names, source and target temporal types, they must be instantiated according to the semantic assumption in which they appear and to the required target granularity. For example, the template in Figure 6 is instantiated considering M----SALES, V ----{Branch, Product}, W = {ISold, Income}, day as source type, and /=month. In this case, the resulting view is indeed a view of the SALES relation in terms of month. When point-based assumptions are also present, a n d / o r more interval-based assumptions are present, the process is more involved since point-based assumptions have to be applied first, and the resulting views have to be appropriately combined. This step has been implemented in PIC97 by a C program working on TSQL view templates. Figure 7 shows the two instantiated views in our example.
SALES-VIEW-IN-MONTH
PRICE-VIEW-IN-MONTH
SELECT e2.Branch, e2.Product, SELECT e.Product, e.SPrice sum(e2.ISold), sum(e2.Income) VALID CAST(VALID(e) AS month) VALID(e2) FROM PRICE(Product,SPrice) AS e; FROM (SELECT * VALID CAST(VALID(el) AS month) FROM SALES AS el) AS e2 GROUP BY e2.Branch, e2.Product, VALID(e2) USING i;
Fig. 7. The views derived from SALESand PRICE.
In our simple example, the query processor would now substitute the expressions "SALES IN month" and "PRICE IN month" in the FRDM clause of the query with the corresponding derived views. At this point, the query is a standard TSQL2 query, except that relations in multiple databases are involved. Several optimization strategies for query decomposition, common to multidatabases, can be applied. In the specific case of our query, an optimal decomposition is probably that shown in Figure 8, where SALES-VIEW-IN-MONTH and PRICE-VIEWIN-MONTH are the views instantiated from the s u m and I conversion methods, respectively, assuming that each TDBMS knows month.
Send to TDBMS1
Send to TDBMS2
SELECT el.Branch, el.ISold, el. Income SELECT e2.SPrice FROM SALES-VIEW-IN-MONTHAS el FROM PRICE-VIEW-IN-MONTH AS e2 WHERE el.Product ffi 'k123'; WHERE e2.Product = 'k123';
Fig. 8. A decomposition of the transformed query
An Architecture for Supporting Interoperability among Temporal Databases
53
Once the query answers from each TDBMS are received, the query processor can easily obtain the answer to the user query. In our example, it simply selects the Branch attribute from the table returned by TDBMS1, checking the condition that the corresponding Income value in each selected tuple is less than the SPrice*ISold value for the same valid time. No general strategy is likely to give the optimal decomposition for all queries. In the worst case, the query processor retrieves from the TDBMSs the complete view that is needed, applying locally all of the query conditions. As briefly mentioned in Section 3, it is also possible that a TDBMS does not have knowledge about a certain temporal type used by the mediator. In this case the query addressed to this TDBMS cannot contain, for example, CASTing operations involving this type. The mediator has access to sufficient meta-data information to recognize this situation, and has to adopt an adequate strategy. In the worst case, it simply retrieves the data and perform locally all the operations requiring casting. A simplified version of the mediator has been implemented at the University of Milan, Italy PIC97 and a World Wide Web demo site can be found at h t t p : / / ~ w , i s s e . g m u . e d u / ~ c s i s / t d b . The currently implemented system allows to make queries using the "TABLE IN g r a n u l a r i t y " construct as an extension to the ATSQL language, which is essentially a subset of TSQL2. The mediator is currently interfaced with a single TDBMS implemented using TimeDB (Boh95), a temporal relational database system supporting the ATSQL query language. Since TimeDB currently does not support time granularities, all casting operations are simulated in PIC97 using ad-hoc tables. The mediator temporal type system includes standard granularities as well as non-standard ones like b u s i n e s s - d a y and business-week. The formal specification of some common methods has been given in BWJa as well as their TSQL2 implementation. The TSQL2 methods library has been enriched in PIC97.
5
Conclusion
In this paper, we presented a general architecture for temporal database interoperability. We focused in particular on time granularity issues, proposing that the notion of temporal semantic assumptions should be the formal tool to express the semantics intended by each TDBMS designer. The temporal mediator component of our architecture plays the crucial role of collecting and using the semantic information from the different TDBMSs, to provide a uniform interface to users and applications. The feasibility of the proposed approach is supported by the implementation of some basic functionalities of the architecture for a case study where TSQL2 is used as the TM-language. However, several issues regarding the mediator schema design and the decomposition and evaluation strategies in presence of multiple databases deserve a deeper investigation before a real system prototype for the whole architecture can be implemented.
54
Claudio Bettini, X. Sean Wang, and Sushil Jajodia
Interesting extensions to this work include the study of "information quality" issues. From the point of view of time granularities, for example, when different T D B M S provide the same information, but using different native granularities, the mediator has to evaluate which T D B M S to use to provide the most accurate answer to the user query. Another interesting extension is to enrich the notion of semantic environment adding semantic values as proposed in SSR94, to formalize the semantics of single attributes in the T D B M S schema.
References BWJa
BWJb
Boh95
cw83 CI94
CDIJS97 CSS941
DS96
NS921
PIC97 SSR94
ss87 Sno95
C. Bettini, X. Wang, and S. Jajodia. Temporal Semantic Assumptions and Their Use in Databases. IEEE Transactions on Knowledge and Data Engineering. To appear. C. Bettini, X. Wang, and S. Jajodia. A General Framework for Time Granularity and its Application to Temporal Reasoning. Annals of Mathematics and Artificial Intelligence, to appear. A preliminary version of this paper appeared in Proc. of TIME-96, IEEE Computer Society Press. M. H. Bohlen. Temporal Database System Implementations. SICMOD Record, 24(4), ACM, December 1995. J. Clifford and D.S. Warren. Formal semantics for time in databases. A C M Transactions on Database Systems, 8(2):214-254, June 1983. J. Clifford and T. Isakowitz. On the semantics of (bi)temporal variable databases. In M. Jarke, J. Bubenko, and K. Jeffery, editors, Proceedings of 4th International Conference on Extending Database Technology, pages 215-230, March 1994. J. Clifford, C.E. Dyreson, T. Isakowitz, C.S. Jensen and R. Snodgrass. On the Semantics of "Now" in Databases. ACM Transactions on Database Systems, 1997. To appear. R. Chandra, A. Segev, and M. Stonebraker, Implementing calendars and temporal rules in next generation databases, in Proc. of ICDE, 1994, pp. 264-273. W. Du and M. Shan. Query Processing in Pegasus. In Object-oriented multidatabase systems, O.A. Bukhres and A.K. Elmagarmid Eds., Prentice Hall, 1996. M. Niezette and J. Stevenne. An efficient symbolic representation of periodic time. In First International Conference on Information and Knowledge Management, Baltimore, MD, November 1992. N. Piccioni. Using semantic assumptions to answer temporal queries. Master Thesis. DSI - University of Milan, 1997. (In Italian) E. Sciore, M. Siegel, and A.S. Rosenthal. Using semantic values to facilitate interoperability among heterogeneous information systems. A C M Transactions on Database Systems, 19(2):254-290, June 1994. A. Segev and A. Shoshani. Logical modeling of temporal data. In U. Dayal and I. Traiger, editors, Proceedings of the ACM SIGMOD Annual Conference on Management of Data, pages 454-466, San Francisco, CA, May 1987. R. T. Snodgrass, editor. The TSQL2 Temporal Query Language. Kluwer Academic Publishers, 1995.
An Architecture for Supporting Interoperability among Temporal Databases Tan93
Wie92 WJL91I
WJS93
55
A.U. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass. Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings, 1993. G. Wiederhold. Mediators in the architecture of future information systems. IEEE Computer, March, 1992, pages 38-49. G. Wiederhold, S. Jajodia, and W. Litwin. Dealing with granularity of time in temporal databases. In Proc. 3rd Nordic Conf. on Advanced Information Systems Engineering, Trondheim, Norway, May 1991. X. Wang, S. Jajodia, and V.S. Subrahmanian. Temporal modules: An approach toward federated temporal data bases. In Proc. of ACM SIGMOD International Conference on the Management of Data, Washington, D.C., 1993.
Extended Update Functionality in Temporal Databases O p h e r E t z i o n 1, A v i g d o r G a l 2, a n d A r i e Segev 3 1
- Haifa Research Lab, M a t a m 31905, Haifa, Israel 2 Rutgers University, Department of MSIS a Haas School of Business, University of California and Information & Computing Sciences Division, Lawrence Berkeley Laboratory Berkeley, CA 94720, USA IBM
A b s t r a c t . This paper presents an extended update functionality in temporal databases. In temporal databases, the information is associated with several time dimensions that designate the validity of the information in the application domain as well as the database domain. The complexity of information, coupled with the fact that historical d a t a is being kept in the database, requires the use of an u p d a t e model t h a t provides the user with high-level abstractions. In this paper we provide an enhanced schema language and an enhanced collection of update operation types that help the system designer and the user to cope with the added complexities of such a model. One of the major issues dealt with in this paper is the situation of simultaneous values of a single d a t a item; this situation occurs when multiple values, valid at the same time, were assigned to a d a t a item at different times over the database history. Unlike the fixed semantics in conventional and existing temporal database models, we provide a flexible mechanism to handle simultaneous values which also distinguishes between regular modifications and error corrections. The extended update functionality is part of an u p d a t e model that is currently being implemented in a prototype for a simulation project in a hospital's training center. Issues related to the implementation of this functionality in various d a t a models are discussed. In particular, a mapping of the basic primitive operation types to TSQL2, and suggestions for its augmentation are provided.
K e y w o r d s : T e m p o r a l d a t a b a s e s , D a t a b a s e u p d a t e s , S i m u l t a n e o u s values, Decision T i m e , T S Q L 2
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 56-95, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
Extended Update Functionality in Temporal Databases 1
Introduction
and
57
motivation
Temporal databases enable the accumulation of information over time, and provide the ability to store different values of the same data item I with different time characteristics, thus enabling queries as if they were issued from past observation times (e.g. what was the known patient's situation at the time that a treatment was prescribed.) The capability to update past or future values of the database requires handling the following three issues: S i m u l t a n e o u s V a l u e s : a situation of simultaneous values of a single data item in a temporal database occurs when multiple values that are valid at the same (real-world) time were assigned to a data item at different times over the database history. Simultaneous values is a temporal notion that may exist implicitly in non-temporal databases, with a fixed and implicit semantics to handle such a case. Temporal databases allow to refine the semantics associated with simultaneous values, which requires a supporting update model. The concept of SVS (Simultaneous Values Semantics) defined in this paper supports the different possible answers to the question: which of the simultaneous values should be returned, as a response to a retrieve operation? A single abstraction that captures the possible answers to that question, coupled with linguistic features to define and modify them are discussed in this paper. M o d i f i c a t i o n C o n t r o l : In certain cases, it is required to restrict the ability to update the past and future states of the database. This can be done both in a static way (for all the instances of some data item in the database) or in a dynamic way (both at the object level and at the property level). Linguistic abstractions for modification control are also discussed. R e v i s i o n C o n t r o l : In temporal databases, any information that has ever been stored in the database is kept and not deleted when newer information is available for the same data item (deletions are likely to eventually occur, but they are considered storage management operations). However, erroneous values, that have been corrected later, are also kept. A concept of r e v i s i o n distinct from the concept of m o d i f i c a t i o n should be established in order to retrieve the appropriate values relative to an observation time that is later than the revision time. In this paper we devise a model that help the system designer and the users of such systems to cope with these complexities by providing enhanced schema language and enhanced set of update operation types. This section serves as an introduction and motivation, and consists of the following sub-sections. Section 1.1 provides background and basic definitions in the context of temporal databases, Section 1.2 presents a motivating example, Section 1.3 outlines the rest of the paper. 1 We use the term data item to denote a basic, not necessarily atomic, unit stored in the database, regardless of the specific data model. Examples: field, attribute, column.
58
1.1
Opher Etzion, Avigdor Gal, and Arie Segev
Background: Temporal Databases
Time plays an important role in various application areas including decision support, decision analysis, computer integrated manufacturing, computer aided design, office information systems, to name a few. Due to its complexity, the functionality required by many of those applications is only partially supported by current database technology, resulting in the use of ad-hoc and expensive solutions for each application. The temporal database research area draws an ever growing attention in the research community, summarized in a series of detailed bibliographies Kli93, Soo91 and TK96) and a survey (OS95). Taking an historic perspective, the incorporation of time in databases began in the '70s FD71; Bra78; A+79. Research has concentrated on extending the relational model to include the time concept during the '80s; a survey of algebras is introduced in MS91. The modeling of temporal databases has been addressed in many papers including KL83, CC87, Gad88, SS88, and CK94. The reader is referred to the book TCG+93, for some basic readings on the subject. In June 1993, the temporal database community organized the "ARPA/NSF International Workshop on an Infrastructure for Temporal Databases", which was held in Arlington, Texas. The results of the workshop were documented in an infrastructure recommendations report Pis94, and a consensus glossary of temporal database concepts J+94. Since then, substantial efforts have been invested in improving and unifying the many existing temporal query languages (e.g. HSQL Sar93 and TQUEL Sno87) through a unified temporal relational query language called TSQL2S+94. The infrastructure work forms a basis for temporal technology development, such that additional functionality can be incorporated either through mapping to or augmentation of such infrastructure. Another workshop was held in Zurich in September 1995 for further discussion of these issuesSJS95. Following these works, we adopt a discrete model of time CT85, which is isomorphic to the natural numbers. In this paper we use the following terms: Definition 11 C h r o n o n J+ 94 is a nondecomposable unit o/time, whose granularity is application dependent. In our case study we use the composition of date;hh:mm (date, hour and minute) to designate a chronon; date is specified as: M M M DD Y Y Y Y , where: M M M designates a month's abbreviation (e.g., Feb), DD designates the day in the month, and Y Y Y Y designates the year. Definition 12 t i m e interval designated as Its, re) is a set o/ all chronons t such that ts <_ t < re. The temporal infrastructure advocated a bi-temporal database model, in which each data item associated with two temporal dimensions, valid t i m e and transa c t i o n time. Definition 13 Valid T i m e (tv) designates the collection of chronons at which the data item is considered to be valid in the modeled reality.
Extended Update Functionality in Temporal Databases
59
D e f i n i t i o n 14 Transaction T i m e (tx) designates the time when a data item is stored in the database. The transaction time is a single chronon designating the commit time of the transaction that inserted the data item's value into the database. The valid time is expressed using a temporal element Gad88, which is a finite union of mutually disjoint time intervals. The transaction time is a chronon. 1.2
A Motivating Example
We start our discussion by introducing a motivating example that demonstrates the required functionality. The case study is a decision analysis system designed to support the analysis of the decisions of physicians while treating patients in an emergency room. A patient is admitted to the emergency room. A medical record is generated and assigned to a physician. A physician determines a diagnosis that include possible alternative disorders, based on the signs and the symptoms of the patient, and on laboratory tests. In many cases, although a patient suffers from a single active disorder, the physicians must consider several possible disorders during the process of treating such a patient. The reasons for that are incomplete information, the limitations of modern medical knowledge and the lack of resources for more thorough analysis. The treatment administered to the patient should be consistent with all the possible disorders that were identified, and should not damage the patient by causing further physical disturbances. The treatment must deal at least with one or several most probable disorders. Figure 1 shows an object-based partial schema of the case study, the underlined properties designate identifiers (foreign keys).
class---Medical-Record properties= Record-Number Patient Symptoms Signs Laboratory- Tests: Laboratory-Feature Test-Results Diagnosis Dia gnosis- I d Disorders Treatments Assigned*Physician
class= properties-~
Patient Patient-Name Social-Security-Number Records
class-= Properties=
Assigned-Physician Physician-Id Patients- Treated
Fig. 1. A partial schema of a medical database
To simplify the presentation, the properties' types are omitted. The class Medical-Record represents the knowledge about a single admission of a patient to the emergency room. It consists of Symptoms reported by the patient, Signs
60
Opher Etzion, Avigdor Gal, and Arie Segev
that were found in a preliminary examination by the physician and LaboratoryTests. A Diagnosis is composed of a set of alternative Disorders. The Assigned Physician is responsible for the diagnosis and treatment. Medical records are frequently re-assigned to other physicians before the patient leaves the emergency room. The class Patient represents general details about patients. The class Assigned Physician represents the history of assigned records to a physician. The schema is presented in an object-based notation, Laboratory-Test and Diagnosis are compound properties composed of other properties. The main goal of this application is to check if decisions made by physicians are consistent with the knowledge that was available at the time that the decision was made. This decision analysis is important for training purposes, medical research, and investigation of medical malpractice. The required functionality is demonstrated in the following investigation case: A patient named Dan Cohen arrives at the emergency room at 10:00pm, and is assigned to Dr. Livingston. Based on the preliminary examination the physician made diagnosis Da at 10:05pm that consisted of three possible disorders. Based on this diagnosis the physician prescribed treatment to the patient and ordered some laboratory tests. Based on these lab results, Dr. Livingston made the diagnosis D~ at l l : l S p m that consisted of three possible disorders, distinct from those diagnosed in D~. Based on this diagnosis, a different treatment started at 11:17 pro. The diagnoses are manually reported to the data entry center, whose responsibility is to enter each report into the database. In the data entry center, reports are accumulated and are reported to the database in batches. Due to a shift change around 11:00 pm, there was a delay of reports of some diagnoses including D~, that was committed in the database only at 11:35 pro. The diagnosis D~ was committed without outstanding delay at 11:30 pm. In order to investigate the physician's actions the following functionality is required: 1. To answer a query such as: what was the valid diagnosis when a treatment was administered?
2. The ability to determine the sequence of events that occurred in the modeled reality, even if it is distinct from the sequence of reports of these events to the database. For example, we should be able to trace the fact that the Da disorder was diagnosed before the D~ disorder and that certain treatment was given when Da was diagnosed, and D~ had not yet been diagnosed. 1.3
The Paper's s t r u c t u r e
Figure 2 shows an EER(Extended Entity Relationship) diagram of the components described in this paper. The three requirements specified in this section are defined in Section 2. These requirements should be satisfied both by static definitions, at the schema level, and by dynamic definitions that are materialized in run-time update operation types. Section 3 describes the information model primitives, representation and linguistic aspects. Section 4 discuss in detail the
Extended Update Functionality in Temporal Databases
I
Requirements Sim~es ModifyControl Revisionontrot
UpdateOperationTypes
I ExtendedSchemaLanguage
I (
~:o~ t v'~
/disable
_~,,d=a!le ve Operations
/ last
|single /and |multi
Fig. 2. The model's components
61
62
Opher Etzion, Avigdor Gal, and Arie Segev
retrieval and update semantics including extended set of update operation types. Section 5 discusses implementation issues. Section 6 concludes the paper and discusses the model vs. its requirements.
2
The
Required
Extended
Update
Functionalities
This Section discusses the three required extended update functionalities. Section 2.1. discusses the concept of simultaneous values, section 2.2. discusses the modification control issue, Section 2.3. discusses the revision issue, 2.1
S i m u l t a n e o u s Values
In the motivating example, there are several properties that require different interpretations of handling simultaneous values. We start our discussion with the definition of this term. D e f i n i t i o n 21 S i m u l t a n e o u s values o f 5 at t: A data item 5 is said to have simultaneous values at a chronon t, on the valid time dimension, if n update operations were issued over the database history assigning to 5 the values { v l , . . . , v n } ( n > 1), such that Vi: t 9 tv(vi). Note that the n updates are a subset of the total number of updates to that data item, and n must be at least two for simultaneous values to exist. In Section 2.1.1. we define the term SVS, in Section 2.1.2. we introduce single value interpretations of SVS, in Section 2.1.3. we introduce multiple value interpretations of SVS. This discussion entails a need to consider an additional time dimension called D e c i s i o n T i m e . This issue is discussed in Section 2.1.4. S i m u l t a n e o u s Value Semantics D e f i n i t i o n 22 A p p l i c a b l e value(s) of a data i t e m 5 at t: The value, or values that are being selected in response to a retrieval request about the value of 5 at the valid time chronon t.
Definition 23 SVS: an abstraction that denotes a decision procedure at retrieval time to determine which of the simultaneous values are applicable. The possible values of SVS are determined according to the result of two separate decisions: 9 Should the applicable value be a single value for a given chronon t , or are multiple values allowed to be applicable? 9 What determines which value is applicable?
Extended Update Functionality in Temporal Databases
63
Single Applicable Values If there are multiple values for a chronon t on the valid time dimension, a criterion has to be devised of how to choose a single applicable value. We discuss three possible strategies: the first value, last value and single user defined value. The approach that selects value on the basis of chronological order is rooted in the semantics of the i n s e r t and m o d i f y operation types in conventional databases. D e f i n i t i o n 24 first value semantics: The first known value of the dataelement 5 at the chronon t is the only applicable value. The term first known value may be implemented using the earliest value on the transaction time (tx) dimension, however this interpretation designates the order in which the values have been reported to the database. This order may be different from the real order of events. Applications for which this distinction is significant, should employ another time dimension to maintain the correct order. This issue is further discussed in Section 2.1.4. The insert operation type in conventional databases is a case of first value semantics. In conventional databases, the first insert operation for a given primary key value is stored in the database, while later insert operations of the same instance are considered as integrity constraint violations and thus rejected. At the attribute level, first value semantics exists in unchangeable attributes N+87.
tx,! Feb 1 92 9: I Oam
Legend: + + + + + 1st v a l i d t i m e
_
t +++
I
'~******
.......
'
2nd valid time 3rd v a l i d t i m e
Feb 1 92 9:00am
m
Feb 1 92 8:00am
"
+-t'+++
I i
Aug
I l
91
Sep91
i i
Oct91
*******
i l
N o v 91
I i
Dec91
I i
Jan 92
.~V
Fig. 3. A fixed first value semantics update protocol
In temporal databases, a kind of first value semantics update protocol as proposed in Sno87 is illustrated in Figure 3. The two bottom lines represent
64
Opher Etzion, Avigdor Gal, and Arie Segev
update operations, and the upper one represents a response to a retrieve operation. Note that the values denoted by § and * have not been overridden by the later update, due to the first value s e m a n t i c s . D e f i n i t i o n 25 last value s em a nt i c s : The last known value of the data-element 5 at the chronon t is the only applicable value.
The last v alu e semantics is compatible with the m o d i f y operation type in conventional databases, the last modified value overrides all the previous ones. Some temporal models (e.g, SK86, WJL91) employ last val ue semantics, in which the value with the latest transaction time is considered to be the applicable value. D e f i n i t i o n 26 s i n g l e u s e r d e f i n e d value s e m a n t i c s : The selection criterion among the values is provided by the user using a query language. The selection is constrained to any selection criterion that selects a single value (which can be an aggregate value).
Examples: the minimal value, the median value, the average value, a value that is selected according to other parameters, such as source and level of confidence, if such information is stored in the database GES94, and an explicit selection by the user. A special case is to look at the different values as possible interpretations, in which only one is applicable. This is compatible with the possible worlds semantics that has been thoroughly discussed in the knowledge representation area F+94, and mentioned also in the database context AKG91. Retrieval queries, under this interpretation, can include modal operators.
M u l t i p l e V a l u e d A p p r o a c h In the multiple valued approach, there is no restriction on the number of applicable values. Any subset of the set of simultaneous values of the data item 5 at chronon t may be selected. There are two types of SVS applicable to this case: the all semantics, and the m u l t i v a l u e d user defined semantics D e f i n i t i o n 27 T h e all v a l u e s s e m a n t i c s : all the simultaneous values of the data-element 5 at chronon t are applicable.
Under the all values s em a nt i c s , all the values (except for the revised ones) are being selected. This approach was referred to in semantic data models as a multi-valued attribute, in which a data item's value consists of several values HK87. For example, a data item that designates the languages that a person speaks can have a set of grouped values. The interpretation is All, designating that the person speaks all the languages in the designated set. D e f i n i t i o n 28 M u l t i p l e u s e r d e f i n e d v a l u e s e m a n t i c s : The selection criterion among the values is provided by the user using a query language.
Extended Update Functionality in Temporal Databases
65
Examples to selection criteria are: all values > 5, all values whose transaction time is earlier than to, and the differences between the original values and their average. Similar notion was defined in databases that support uncertain multiple values ZP93. In our case study, the physician's diagnosis consists of a set of disorders, any subset of which may reflect the patient situation. In this case, the values employed as part of the retrieval requests by some selection criteria or aggregation of values. Current temporal database models employ a single interpretation of simultaneous values either at the database or at the schema level. Some of the models (e.g., Sno87,NA89) enforce a s i n g l e v a l u e a p p r o a c h at the database level. Other models (e.g., Tan86) enable the support of different semantics at the schema level by making a distinction between a single valued attributes and a multi-valued attributes; however, the semantics of the multi-valued attributes is not explicit. Many other models (such as Ari86, SK86, ABN87, WJL91, and CK94) also enforce a single value for each chronon. As a result, a mechanism to handle non-unique interpretations is not available. In CK94, a mechanism for storing multiple past views is provided, but a predefined preference relation for choosing a single value of a property for each chronon is enforced. Our case study demonstrate the need for all the spectrum of simultaneous values semantics. 9 The Physician-id follows the first v a l u e semantics; 9 The Symptoms property has an all v a l u e s semantics; all the reported symptoms are considered to be applicable. 9 The Diagnosis has a last v a l u e s e m a n t i c s . The last diagnosis is the applicable one. However, as shown in our example, the transaction time is not necessarily a good measure in determining the correct order of diagnoses. 9 The Disorder property within a context of a single Diagnosis has a m u l t i v a l u e d u s e r d e f i n e d s e m a n t i c s ; any subset of the set of disorders may be applicable. 9 The Patients- Treated for a physician has a single u s e r - d e f i n e d s e m a n t i c s with respect to the question: which patient is being treated by Dr. Livingston at chronon t. If we assume that a physician can handle a single case in any given chronon, then it is known that at chronon t he treated o n e of the patients whose assignment to him is valid at t. These examples show cases of SVS that can be determined during the schema design phase. However, in some cases the semantics should be determined only at run-time. For example, Dr. Flinstone is not allowed, from a certain date on, to be responsible for more than one patient, thus for this instance, the assignment semantics should be modified to first v a l u e s e m a n t i c s . As a requirement, the model should provide both static and dynamic SVS definition. The static definitions are implemented at the schema level (with a possible schema evolution), and the dynamic level is implemented as updates at the instance level.
66
Opher Etzion, Avigdor Gal, and Arie Segev
D e c i s i o n t i m e Some of the S V S options are based on the order of events, which lead us to discuss another important aspect for the required functionality, the issue of decision time. The transaction time (tx) in some temporal database models has two major roles (in addition to the traditional role of backup and recovery): 9 It is used to determine the order of events, necessary to support first v a l u e or last v a l u e semantics; 9 It is used to answer temporal retrieval queries, such as: What was the answer to the query q, if issued from the observation point of a past chronon t?
To answer such a temporal query, the database is required to know the values committed before t; the transaction time is a means to record this knowledge. The second role concerns events in the database domain only (the commit time), thus the transaction time can be used for achieving this role without additional assmnptious. However, the first role may refer to events in the application domain and not to the database domain. There is an implicit assumption that the transaction time reflects the correct order of events in the application domain. Thus, the transaction time is sufficient to achieve the first role. Contrary to that, there are applications in which the order of events is important and the order of updates to the database does not necessarily reflect the order of events in reality; in this case, we need a time type that belongs to the application domain and not to the database domain. In our case study example, diagnosis Da occurred before diagnosis D~, but due to the batch process of reporting, diagnosis D~ was committed in the database before diagnosis D~. In general, the commit order of transactions is non-deterministic under the standard two phase locking protocol, consequently the transaction time may not reflect the order of occurrences in the modeled reality. In our context, the decision analysis context, we use the term d e c i s i o n t i m e for this time type. D e f i n i t i o n 29 D e c i s i o n T i m e (td) is the chronon at which a data item's value was decided in the application's domain of discourse EGS92. This chronon denotes the time at which an event occurred, or the time at which a decision was made (even if the value is complex, a decision about each modification is made in a single chronon). From the database's point of view, td reflects the chronon at which an event in the modeled reality entails a decision to initiate a database update transaction. The following example shows the three different types of times. Dr. Flinstone is hired as a physician in our hospital, the hiring decision has occurred on July 20 1996, and recorded in the database on July 24 1996. The hiring period is for a year starting August 1 1996. In this case td = July 20 1996, tx ----July 24 1996, and t , -- August 1 1996, July 31 1997). We assume that the d e c i s i o n t i m e dimension is the one, according to which the first and last value semantics is being determined, this means that the value having the earliest (or latest) d e c i s i o n t i m e is the applicable one.
Extended Update Functionality in Temporal Databases
67
The decision time concept was introduced in lEGS92, and is also mentioned in OS95. A similar concept has been referred to as event time CK94. It is argued in OS95 that it is still an open question whether the functionality achieved by using decision time, as a third time type is justified with respect to its overhead. In this paper we assume that the system designer and user recognize the decision time as a primitive concept; the discussion about implementation as a separate concept vs. implementation on top of existing concepts is deferred to Section 5. 2.2
Modification Control
In temporal databases, values that are valid in the past or the future may be updated. While this ability provides flexibility, it is sometimes required to restrict it and not allow to modify data items during part or all of their validity time. For example, actions that have been performed, such as the values of laboratory tests that have been reported, cannot be altered. Thus, this data item is unchangeable in the entire valid time dimension. In other cases, a data item can be changeable in some valid times and unchangeable in other valid times. The modification control can be issued either in a static way or in a dynamic way, and either at the object level or the property level. D e f i n i t i o n 210 A s t a t i c m o d i f i c a t i o n c o n t r o l is a modification control that applies to all instances of the class (or property) of the same type for any chronon on the valid time dimension. D e f i n i t i o n 211 A d y n a m i c m o d i f i c a t i o n c o n t r o l is a modification control that overrides the s t a t i c m o d i f i c a t i o n c o n t r o l for a certain object (or a dataitem) during some chronons on the valid time dimension. In a similar way to the SVS case, The s t a t i c m o d i f i c a t i o n c o n t r o l is implemented by a modified schema definition (with a possibility of evolving schema) and the dynamic one is implemented at the instance level. 2.3
Revision Control
The revision requirement is a result of the ability to ask queries from different view points, example for such a query is: what are the known symptoms of the patient John Gait, as was known at Dec 12, 1995; 10:00 pm. Such a query is
vital for decision analysis and auditing purposes. In regular database, the last value overrides the previous one, thus it is not important whether the value was replaced because some change had occurred, or the value was replaced because it was erroneous. However, in temporal databases this distinction is important. Consider the following example: The symptoms Sa, Sb, Sc for a certain patient were reported to the database at the chronon tl. at t2 > tl, it was noticed that the symptom Sb had been reported by mistake, and it should have been reported as Sd. The requirement is that a query issued from the observation point of any
68
Opher Etzion, Avigdor Gal, and Arie Segev
chronon t, such that tl < t < t2, about this patient's symptoms should return < Sa, Sb, Sc >, while the same query issued from the observation point of t _> t2 should return < Sa, Sd, Sc >. This is consistent with the knowledge that can be obtained from the database at each observation point. In our example, the value was replaced with another value for its entire validity time, but in the general case the revision control should allow either revision by another value, or just logical deletion of the revised value. The revision may apply to the entire validity time of the revised value, or to any part of it. The revision control is implemented at the instance level, dynamically.
3
The Modeling Primitives
In this section we present the primitives of the temporal database model that is intended to satisfy the requirements posed in the previous sections. These primitives are used by the system designer when constructing the application. This issue is further elaborated in Section 4. Section 3.1 presents the information modeling primitives. Section 3.2 discusses the enhanced schema language support for the static SVS and modification control definitions, Section 3.3 introduces the set of update operation types, which are the major implementation vehicle for the dynamic SVS and modification control definitions. The semantics of these components is discussed in Section 4.
3.1
Information M o d e l i n g Primitives
This section presents the information modeling primitives that are used in this paper. This data model can be implemented on top of various lower-level data models, such as relational or object-based. Information about an object is maintained as a set of variables (instances of the class' properties). Each variable contains an information about the history of values as well as the different components of the variable status (SVS, modification control, revision control) of the variable. Each component is represented using a set of state-elements; state-element is the most basic object in the database. We assume that the database is an append only database. New information is added while existing information is left intact. The append only approach is necessary to support operations that require past database states. For example, a medical examiner investigating a malpractice complaint issues the query: "What were the known laboratory test results of a given patient at 10:30pm on December 12, 19937"
This information is crucial in deciding whether the attending physician provided a reasonable treatment given the available information at that time. Since the information may have been incomplete or even erroneous at the time, the treatment decision may seem wrong from a later observation point. Unlike
Extended Update Functionality in Temporal Databases
69
some other temporal models ABN87 that employ a non-strict form of appendonly, we employ the append-only in the strictest fashion. Consequently, the data can be stored on W O R M (write once read many) devices, in which no changes can be made to a state-element after the transaction that created it had committed. A state-element is a tuple of the form: 2
(se-id, old, value, t~, td, tv ) * ix, td, tv designate the time types (as defined, tx and td are chronons and tv is a temporal element). 9 The value of a state-element designate a value assigned to the variable (e.g., Dr. Livingston), 9 A state-element includes a uniquely created system-wide identifier se-id. 9 oid designates the object-identity of the object the state-element is associated with. A state-element example is: T ~ a t ? n e n t ~-
se-id=s9, oid=86,~5~5, value=antibiotic, 12 1993; lO:3Opm, td----Dec 12 1993; lO:lOpm, tv=Dec 12 1993; 10:12pm, D e c 19 1993; 8:00pro) tx=Dec
A Bucket fl is a set of state-elements having a well-defined semantics. In our model there are four types of buckets, as defined below. A variable 5 is as a set of four buckets:
(5.data, &variable-SVS, 5.modify-control, &void-SJb-~ The data bucket contains the state-elements whose values issue the history of the data associated with the variable 5. The rest of the buckets are control buckets. The variable-SVS contains state-elements whose value designate dynamic modifications of the SVS of the variable 5. The values consists of a pair (SVS, query-id ). The query-id designates a query to be activated for user defined SVS. The modify-control is a collection of state-elements whose value (changeable or frozen) designate the history of modifications to the variable's modify control status. The void-SE is a collection of state-elements, whose value are state-elements that are being voided at the tv of the void state-element. An object c~ is represented as a set of variables:
(~.object-id, ~.class-ref, ~.object-status, (~.Pl,...,~.Pn)). The data bucket of the object-id variable consists of a single unique state-element whose value designates the object identity. Its modify-control bucket consists of a single state-element with the value frozen. The class-tel is a variable that 2 Additional attributes of information about source, validity, accessibility, etc., can be added. These extensions are discussed in GES94.
70
Opher Etzion, Avigdor Gal, and Arie Segev
classifies an object to be an instance of a specific class. The SVS of this variable can be adjusted to the specific application's assumption. If an object can be classified to multiple classes, then the SVS of class-tel is set to A N D ; if an object's classification is fixed then the SVS is set to first value SVS. This is an example of using the SVS concept to support data model independence. The object-status variable's values are stored in state-elements, with last value SVS, based on decision time. The possible values of this variables' data are: active~ suspended~ disabled. See Section 4 for the exact definition. An object's state is a set of all its variables' states, i.e the entire collection of state-elements associated with this object. In the general case, the user may not be familiar with the object-identity, and instead identifies the object using an object identifier (primary key), which is a subset of the object's state. For example, the underlined properties (Record-Number and Patient-Name) in Figure 1, are the object-identifiers. The level of granularity of temporal support was discussed in various papers (e.g. SA86). The common claim is that an attribute level support (which is equivalent to our interpretation of a state-element) reduces the space complexity relative to an object level support, because any change in any attribute results in the need to duplicate the entire object, also if the level of granularity required in the application is of an attribute, then an object level support increases the time complexity of obtaining information about the evolution of a single attribute. In any event, the concepts discussed in this paper are model independent, the concept of s t a t e - e l e m e n t can also be implemented on top of a model whose temporal granularity is in the object level, by creating an object to represent each state-element.
3.2
The Enhanced Schema Language
The schema language is the system designer's tool to express static decisions about the data representation and semantics of updates and retrieval requests. The schema definition consists of classes and properties; each property may have characteristics that are common in existing schema languages (e.g., type, default, set of legM values, reference to other objects), and additional characteristics required to support the static definitions of extended requirements (SVS and modification control) By using keywords. The SVS keywords are: first~ last, and, single~ multi. The single and multi keywords designate the user defined SVS modes. An additional keyword query ---- q i d i s allowed with the single and multi SVS options, to designate the id of a query that is activated, 3 whenever a query is issued that require the value of any variable that belongs to this property, qid is a query id. Example: if a property p has a single SVS mode associated with it, and the query associated with it is average value, then anytime that any query attempts to retrieve any instance of p, the average of the values of all the state-elements valid at the specified valid time are returned. If none of the SVS keywords is specified 3 queries are represented as objects in the database.
Extended Update Functionality in Temporal Databases
71
then the default is last. If a single or m u l t i SVS have been specified, and no query has been indicated, then the user is prompted at run-time for a selection queryGES94. The m o d i f i c a t i o n control employs two keywords: frozen and changeable. The default is changeable. In Figure 4, we re-visit the schema presented in Figure 1 with the additional keywords. Since changeable is the default, it is omitted. Note that a nested structure can have a different SVS in the different
class= Medical-Record properties= Record-Number: last Patient: first; frozen Symptoms: all Signs: all Laboratory-Tests: all Laboratory-Feature first; frozen Test-Results: all; frozen Diagnosis: last Diagnosis-Id: first; frozen Disorders: multi Treatments: last Assigned-Physician: last class = Patient properties = Patient-Name: last Social-Security-Number: last Records: all Class = Assigned-Physician properties = Physician-Id: first; frozen Patients-Treated: single
Fig. 4. The revised partial schema of a medical database
levels; Diagnosis obeys the last value SVS, while its component Disorder has a m u l t i SVS, consequently there can be only a single valid Diagnosis at each single chronon, nevertheless, within this Diagnosis multiple disorders may be simultaneously valid. In this example all the properties SVS were explicitly defined. To ease the system designer task, we suggest to use the following defaults that are compatible with update assumptions in conventional databases: 1. When the property is an object-;d, the default is first; frozen value (this is an unchangeable default). 2. When the property is an object-status (see Section 3.3), the SVS is last value; changeable (this is an unchangeable default).
72
Opher Etzion, Avigdor Gal, and Arie Segev
3. If the data type of the property is a set, a bag or a sequence, then the default is all; c ha nge a bl e . In this case insert means add a new element, while modify means change existing element(s). 4. If the data type of the property is an atomic data type, then the SVS is last; changeable. The extended schema language supports static definitions of the required options. These definitions affect all instances of the properties defined in the schema, unless a dynamic definition overrides it. The schema level is not entirely static, in the sense that a schema may evolve with time, although we assume that schema changes are not frequent. If a schema evolves, the valid schema is used. For a comprehensive discussion of the schema evolution issue the reader is referred to GE98.
3.3
The Update O p e r a t i o n T y p e s
Update operation types are the linguistic primitives of a database update language. We express the required dynamic functionality by augmenting this set of primitives, hence, providing the user a uniform linguistic commands for the entire update process that include update of data, modification control at the object and variable levels, revision control and SVS definitions. Earlier works in the temporal database area were confined to the update operation types of insert, modify and delete while assigning to these operations a slightly different meaning than in conventional databases. For example, in several works (e.g., EW90) the difference between updates in non-temporal databases and in temporal databases is that modifications of an attribute's value in the latter case retain the old value in addition to adding the new value. Others (e.g., HRDM CC87, McKS8, GE98 expanded the modify operation to include meta-data, thus allowing schema versioning, as well as data evolution. Our extended set includes the insert~ modify~ suspend~ resume~ disable~ freeze~ unfreeze~ revise~ s e t - S V S operations, as explained next. I n s e r t : This operation creates a new object in the database. Along with the object insertion, the user may assign initial data values to some or all the object variables. For example, a new patient is registered at the emergency room. The database creates a new instance of the class Patient and initializes the values Patient-Name=Dan Cohen and Social-Security-Number=1234 5678. M o d i f y : This operation adds new information about an existing object. For example, in Dec 12, 1993, 11:10pm, the results of a laboratory test of Dan Cohen caused a modification to the Diagnosis variable. Unlike non-temporal databases, the m o d i f y operation does not remove previous values. The modify operation can be applied to valid time chronons that are different than now, to an interval, or even to the entire database valid time line. S u s p e n d : This operation establishes a reversible constraint that prevents any modification to the object in the given valid time, except for the object status
Extended Update Functionality in Temporal Databases
73
which is still changeable. 4 For example, we can use the s u s p e n d operation to prevent the assignment of a treatment until the completion of appropriate tests. The s u s p e n d operation is a modify-control operation that sets an object to be u n c h a n g e a b l e For example, when a physician is off-duty it is not possible to assign any record to him. R e s u m e : This operation makes a suspended object changeable again. As in the insert operation, the resume operation may be used to set the values of some of the object's variables. The r e s u m e operation is necessary to eliminate an u n c h a n g e a b l e constraint of an object. D i s a b l e : An operation that establishes an irreversible constraint that makes the object logically deleted as of the beginning of the tv specified in the disable operation, and consequently prevents any modification to the specified object. For example, when a physician retires (assuming that a retired physician cannot practice again), the object representing this physician is disabled, however we may still want to investigate his past action, thus the history of records assigned to him is kept. The d i s a b l e operation type has two major differences from the s u s p e n d operation type: 9 d i s a b l e is irreversible; 5 9 d i s a b l e has ontological implications, because it means that an object is logically deleted, i.e. ceases to belong to the application's domain of discourse, while suspend is only a constraint that prevents updates. We use the term disable rather than delete since the history of the disabled object is preserved and there are no physical deletions. F r e e z e : This operation establishes a reversible constraint that prevents the modification of a variable (except in the case of revising erroneous values as explained below). 6 For example, the laboratory results are measured values that should not be altered, thus the laboratory results' variable is updated with a freeze constraint. The f r e e z e operation is vital to the support of the u n c h a n g e a b l e v a l u e at the variable level. U n f r e e z e : Any frozen data may be unfrozen. An unfreeze operation applied to a variable, designates the removal of the freezing constraint. Any modification to that variable is allowed from that time on. The u n f r e e z e operation is required for the retraction of the u n c h a n g e a b l e v a l u e constraint at the variable level. R e v i s e : This operation "corrects" an erroneous value of a variable at certain collection of chronons. It tags values that currently exist in the database as false ones and adds a new correct value instead. The revise operation allows the replacement of a frozen value, marking the previous value as an erroneous one. The revise operation type is the means to implement the a The object status is required to remain changeable in order to reverse the suspend constraint. 5 A Database Administrator (DBA) can use low level update primitives to "rescue" an object that was mistakenly disabled. 6 The freeze and unfreeze operation at the variable level are similar to the s u s p e n d and r e s u m e at the object level. The different names are intended to avoid semantic overloading.
74
Opher Etzion, Avigdor Gal, and Arie Segev
revision control requirement. The separation of the revise operation from the modify operation makes a semantic distinction between a change in the real world and a correction of a wrong value that was reported to the database. The user can instruct the database to include or exclude the revised values in retrieval operations. S e t - S V S : The operation dynamically sets an SVS at the variable level. Data may only be changed in a temporal database by adding new objects or adding new state-elements to the variables of an existing object. The semantics of the update model are reflected in allowable new state-elements. A new stateelement is allowed to be inserted if it obeys some general syntactic rules, such as legal value in its valid time, and other rules that are contingent on the status of the object and the variable, the update operation type, and the SVS for this variable. Section 4 discusses the exact semantics of each update operation.
4
The Semantics of the Model's Components
In this section, the formal update semantics of the various components of the model is presented. The validity semantics is presented in Section 4.1, the retrieval semantics is presented in Section 4.2, the update operation types are combined from a set of low-level primitives, presented in Section 4.3. Section 4.4 describes the semantics of the update operation types, followed by a discussion in Section 4.5. We shall use Figure 5 to demonstrate each of the functions and operations, presented in this section. The figure presents a set of state-elements, labeled according to the se-id, of an object that is an instance of the Patient class. The se-id are identified as Snn. Each state-element is preceded by the name of the bucket it belongs to. 4.1
Validity Semantics
An object is considered to be a c t i v e at chronons in which it is neither disabled nor suspended on the valid time axis. The state transition diagram of the objectstatus is presented in Figure 6. An arrow's label represents the name of the update operation that changes the object's status. Note that the disabled state is a terminal state, unlike suspended and active. The variable's states are applicable only within the context of the active object status. An object is valid when it is not disabled. When an object is disabled, all its variables are considered to be invalid, except for the Object-Status that continues to be valid, because it provides information about the validity of an object. In the example, the object is invalid in Aug 25 1994; 8:00am, c~), which is the valid time of (s22). A d i s a b l e operation sets an actual upper bound for the valid time (tv) of all the state-elements associated with the disabled objects to be the starting point of the disabled status valid time interval. Thus, the chronon Aug 25 1994; 8:00am marks the upper bound for actual valid time of
Extended Update Functionality in Temporal Databases
75
all the state-elements associated with this object. Note that the recorded t. of the state-elements cannot be modified, however, the upper bound is reflected in the update and retrieval operations semantics. An object cannot be referenced by other objects, at a valid time chronon in which it is disabled. The collection of chronons in which an object (~ is active or valid is denoted by AR (c~) or VR (a), designating the activity range and the validity range, respectively.
Object-ld. data (sl) 884555, t x =Dec 12 1993; lO:OSpm, td=Dec 12 1993; XO;OOpm,tv =Dec 12 1993; lO:OOpm, ~ ) Class-tel.data (sS) Patient, t x ~ D e c 15 1998; lO:Qgpm, td~Dec 19 1998; lO:90pm, t~=De~ 19 1995; lO:OOpm, oo) Ob~ct-xtat~.dats (st) Actwe, tw~Dec 12 1998; lO:Otpm, td=Dec 1~ 1993; lO:OOpm, iv=/Dee 12 1992; lO:OOpm, r Patient-Name.data (~4) Dan Gohen, t~=Dec 1$ 1993; lO:O~pm, $d=Dec 15 1993; lO:OOpm, tv=Dec 15 1998; lO:OOpm, oo) Soc~al-Sevurlty-Number. data (s5) 158~5678,~x=Dec 12 1998; IO:OSpm, td=Dec 15 1998; lO:OOpm, iv=Dec 15 1998; IO:OOpm, oo) Socsal-Secu~ty- Number.Modi~J-Oontro| (s6) frozen~t x ~ D e c 15 1993; lO:Otpm, td=Dec 12 1998; lO:OOpra, ~v=Dec 15 1993; lO:OOpm, ~ ) Record-Number.data (sT) 15S55678-1, ix=Dec 15 1998; IO:OSpm, td=Dec 15 1998; lO:OOpm, tv=Dec 15 1998; 10:OOpm, oo) Record-Number.Mod~f~l-Control (sS) frozen g~e=Dec 1~ 1998; lO:05pm, td=Dec 18 1998; 1O:OOpm, tv~Dee 1~ 1998; lO:OOpm, o~) ~reatment.data (89) ant*b~ot*c,tx=Dec 15 1998; IO:SOpra, td=Dec 1~ 1998; lO:lOpm, tv=Dec 12 1998; IO:12pm, Dec 19 1998; 8:OOpm) Disorder.data (slO) partial treatment, i x = D e c 12 1998; ll:80pm) $d=Dec 12 1993; 11:15pm, tv=Dec 12 1993; ll:lSpm, c~) Disorder.data ( sl l ) bra~nabscess, t~=Dec 12 1993; ll:SOpm, td=Dec 1~ 1993; ll:15pm, t v =Dec 1~ 199S; ll:15pm, oo) Disorder. data (sl~) v~na~Meningitis, t~ =Dec 1~ 1993; ll:80p~, td=Dec 15 1993; ll:15pm, t~ =Dec 12 1999; II:15pm, o~) Social-Security-Number. Vo~d-Se ~slS) sS, tw= Dec 19 I998; lI:gSp~,~d=Dec 1~ 1999;11:30pm, iv=Dec I~ t99S; lI:gOpm, oo) 8oc~a~-S~cur4ty-Number.data (s14) 053~5678, t~ =Dec 12 1998; 11:33pm, t d = D e c 18 1999; ll:30prn, Cv =Dec 15 1999; ll:80pm, o~) D~order.data (s15) bacterial Mentngitia, gw=Dec 12 1993; ll:35pm, td=Dec 12 1993; lO:05pm, iv =Dec 12 1993; lO:O5pm, oo) Disorder.data (s16) wral Men~nyit~s, g~=Dec 1~ 1998; ll:35p~a, td=Dec 12 1999; 10:05pm, t v =Dec 12 199S; lO:05pm, ~ ) ~order.data (sl 7) spontancov~ 8ubarachnout Hemorrhage , tw=Dec 1~ 1998; ll:35pm, td=Dec 19 199S; lO:05pm, iv=Dec 1~ 1993; lO:OSpm, oo) Treatment,data (slS) acyclovir, tw=Dec 12 1993; ll:85pm, td=Dec 1~ 199S; ll:lTpra~ tv=Dec 12 I993; ll:19pm, Dec 5~ 1993; 8:OOam Record-Number.modify-control (slg) changeable, t~ =Dec 13 199S; IO:OSpm, td=Dec 1S 1998; IO:OOprn, iv =Dec 13 1993; 10:OOpm, oo) Object-Status.data (sSO) Suspended, ~x=Dec 19 1993; 8:0Sam, td=Dec 19 1993; 8:OOam, iv=Dec 19 1993; 8:OOam, oo) Ob2ect-~tat~.data (sS1) Active, t x ~Au 9 25 199~; 15:0gain, id=Aug 25 1994; 15:OOum, i v =Aug 54 1994; 15:OOam, oo) Ob2ect-Statu~.data (a$5) D~abled, t~ =Aug 55 1995; 8:0Sam, td=Aug 25 1995; 8:00am, t~ =An 9 55 1994; 8:OOara, oo) Oblect-Status.modi~y-co~troi (s9~) freeze, ~x=Au9 95 1995; 8:lSar~, td=Au 9 ~5 1995; 8:lBam, tv~Dec 19 1993; 8:0Dam, r
Fig. 5. An example set of state-elements
A variable has a valid value only when its associated object is valid. The CSE function (Candidate State-Elements) returns the state-elements of a given variable which are valid at chronon t, i.e. the state-elements whose valid-time contains the chronon t. All these state-elements are candidates to be applicable, depending upon the SVS semantics.
76
Opher Etzion, Avigdor Gal, and Arie Segev
Definition 41 CSE(var, t) is a function that returns the set of state-elements of the data bucket of the variable v a t that are possibly valid at a chronon t . A state-element se belongs to this set if it satisfies the following conditions: 1. t E V R ( s e . o i d ) //* the object is valid at t ~//; 2. t E tv(se) / * se is valid at t */; 3. -~3se t I se.se-id = set.value A t E tv(se t) A se t E var.void-se A t x ( s e t) >
tx(se) //~ se is not voided at t. *//;
For example, CSE(c~.Gbject-Status, Aug 24 1994; 12:00am)={s3, s20, s21}, where ce is the object whose state-elements are presented in Figure 5.
Insert
/'~. ..
Active
Suspend Resume
=.J Yl Suspended
Disabled
Fig. 6. The state transition diagram of the object-status
Extended Update Y~nctionality in Temporal Databases
77
The applicable state-elements among those included in the CSE set are determined according to the SVS semantics. For example: in the all SVS, the whole set is considered to be applicable. In the last v a l u e SVS, the applicable stateelement is a state-element whose td is the latest among the CSE set. td may be used when the variable belongs to the application domain. We denote the stateelement chosen by the last v a l u e SVS as ASE (Applicable State Element). We assume that each decision is made at a unique chronon, thus ASE is an atom. For example, ASE(c~.Object-Status, Aug 24 1994; 12:00am)={s21}, where c~ is the object whose state-elements are presented in Figure 5. The frozen range of a variable is the range in which the variable is frozen. This is defined by the function F R (var). The function F R returns a collection of valid time chronons in which the applicable state-element is frozen, i.e., it cannot be altered. This function returns the unions of tv of all sate-element in vat.modify-control, whose value is "frozen". 4.2
Retrieval Semantics
The retrieval semantics is determined according to the variable's SVS, the validity semantics and additional information that may be obtained from the user. The basic retrieval request is: find the value of a variable vat at chronon t. By satisfying this retrieval request, many complex queries can be answered. The basic retrieval request has the following interpretation: 1. If the SVS is first v a l u e then the state-element with the earliest decision time among those returned by the CSE function is selected. 2. If the SVS is last v a l u e then the ASE function returns the value. 3. If the SVS is all then the set of all values in the CSE set is returned. 4. If the SVS is u s e r d e f i n e d then if a query is referred to at the schema or the variable level, the result of this query is returned, else the user is prompted for a selection query (in this case the SVS is deferred to run-time interpretation). An example of such query is t d < ~0, which selects only the set of state-elements decided prior to to. This semantics can be implemented on top of various query languages such as T O O S Q L RS91 that also support retrieval from various observation points (an answer to the query as-of to) that restricts the selection of values to those whose tx < to. The following examples illustrate the retrieval semantics (all of the following queries were issued on December 13, 1993). 1. Query: What is the disorder of Dan Cohen? Answer: The possible Disorders of Dan Cohen are partial treatment, brain abscess, and viral Meningitis. T h e answer is based on state-elements (sl0)-(sl2). Since Diagnosis has a l a s t value SVS, the diagnosis with the highest decision time (t4) is selected by the ASE function. The Disorders within a Diagnosis have a u s e r d e f i n e d SVS, thus the answer is interpreted as possible disorders.
78
Opher Etzion, Avigdor Gal, and Arie Segev
2. Query: What was the known Social-Security-Number of Dan Cohen at 10:30pm on December 12, 1993? Answer: The known Social-Security-Number of Dan Cohen on December 12, 1993 at 10:30pro is 12345678. An intelligent query language can point out that the value was erroneous, and was revised to 02345678 on December 12 1993, at 11:33pm. 4.3
Low-level Update Primitives
This section presents the low-level primitives the system use to update the database. These primitives are the building constructs of the update operation types and are not accessible to the user. However, the DBA may use these primitives in handling exceptional situations. The primitives are defined at three different levels: state-element primitives, variable primitives and object primitives. Throughout this section, we use the symbols @ and | The symbol @ denotes an application of an update operation to a database. The symbol @ is a separator between two successive operations; in case of an abort as part of one of the operations, subsequent operations are not performed. We also use two constants, now designates the chronon at which an operation is being performed, co designates an unlimited known upper bound, for example a state-element having a valid-time interval of now, co is considered to be valid starting from the time it was inserted, and valid at any later chronon, unless voided or overridden by other value. S t a t e - e l e m e n t L e v e l P r i m i t i v e s We introduce the basic primitive of the model: C r e a t e - s e . Prior to its introduction, we introduce three system functions that are used by it. l e g a l - t e m p o r a l ( t v , td) is a boolean function that returns "true" if the predetermined temporal constraints are satisfied. These temporal constraints are: 1. tv is a legal temporal element (not empty, contains non intersecting interval); 2. ~d is a legal chronon (according to the application's granularity); 3. td ~_ now (now is the current chronon, read from the system's clock). legal-type(val, p) is a boolean function that returns "true" only if val is in the domain of the property p. a s s o c i a t e ( s e , a.p.~) is a function that associates the state-element with a the bucket j3 in a variable of the property p of the object a. a denotes the object as identified by its identifier (primary key), this is translated to the OID using a translation function. C r e a t e - s e : creates a new state-element. S y n t a x : create-se (old, p, /3, val, Td, %). S e m a n t i c s : DB @ create-se (old, p,/3, val, Td, Tv) -(-~ legal-temporal(T,, Td) V ~ legal-type(val, p) )--* abort |
Extended Update Functionality in Temporal Databases
79
DB' :--DB U{se} l se = (se-id, oid, val, Tx, Td, T,) Ase-id=generate-se-id O | associate(se, ~.p./~) I c~.Object-id = oid. This primitive adds a single state-element se to the bucket f~ of a variable c~.p (the instance of the property p in the object (~), after checking if certain integrity constraints are satisfied. It consists of two phases: adding the stateelement to the database (each state-element is a separate entity with a unique identity in the database), and associating it with a variable and a bucket. DB p is the new database state, se-id and T~ are generated by the system; se-id is generated according to the object-identifiers' generation conventions CK86; the ~'x (transaction time) is determined at commit time. For example: The operation create-se(oid=864545, p=Patient-Name, ~=data, val=Dan Cohen, "rd=Dec 12 1993; lO:OOpm, rv =Dee 12 1993; lO:OOpm, co)) applied in a transaction that committed on Dec 12 1993; 10:02pro, resulted in the state-element (s4) in Figure 5. V a r i a b l e L e v e l U p d a t e P r i m i t i v e s This section presents the semantics of the variable level primitives. To provide upward compatibility for non-temporal databases and to provide a shortcut for the standard cases and ease the use, omission of the time values is allowed, and thus a default should be provided. We define T~ to be: now if ra=nil Ta otherwise That is, ~-~ is assigned a default value of now (the current chronon read from the system's clock of the transaction start time), only if no value has been provided for Td. This default can be adjusted by the DBA at the application initiation, to be either the start time of the transaction, or to be left as a null value, and be interpreted at retrieval time according to a user-requested interpretation (e.g., tx whose value could not be used before commit time). S e t - v a r assigns a new value to a variable's data. S y n t a x : set-vat (old, p, val, Td, Tv) S e m a n t i c s : set-vat (oid, p, val, Td, Tv) =-create-se(oid, p, data, val, T~, Tv~) I ~.Object-id = oid A , f now, oc) N AR((~) - FR((~.p) if rv=nil T~ := ~ Tv n AR(~) - FR(~.p) otherwise The default value for Tv, in this primitive, is now, co)'. This default has been used by other researchers (e.g., BZ82) assuming that the value was not valid from -co. This default is a natural extension of the update logic in conventional databases, where a new value replaces an older one as of the time it is inserted to the database. The functions F R and AR have been defined in Section 4.1. AR returns the set of chronons in which a given object is active, and FR returns the chronons in which a given variable is frozen. The actual valid time (T~) is derived by intersecting Tv with the times in which the variable can be modified A R ( ~ ) - FR(~.p) (the modifiable range). The modification of the valid time provided by the user, stems from considering a temporal database as a set
80
Opher Etzion, Avigdor Gal, and Arie Segev of many conventional databases, each of which is valid in a single chronon. Consequently, an update that affects a valid time interval in a temporal database is, in fact, a set of several independent updates, where each update can either succeed or fail in a given valid time chronon. A similar approach, in different contexts, was taken in other works as well (e.g., Sno87). For example, the operation set-vat ( oid=86~5~5, p= Social-Security-Number, va1=02345678, Td=nil, vv=ni O, applied to the database on Dec 12, 1993; ll:30pm, results in the creation of state-element (s14) in Figure 5.
F r e e z e - v a t freezes a variable. Syntax: freeze-var (old, p, Td, Tv) Semantics: freeze-var (old, p, Td, ~v) -create-se(oid, p, modify-control,"frozen", T~, Tv~) I c~.Object-id = oid A , r now, oh) n iR(c~) if Tv=nil ~-~ := ~ ~-, n AR(~) otherwise The default value for Tv in this primitive is now, co). The actual valid time (Tv~) is derived by intersecting Tv with the activity range of the object. For example, the operation freeze-vat (oid=86~5~5, p-- Social-SecurityNumber, Td=nil, Tv =nil), applied to the database on Dec 12, 1993; 10:00pm, results in the creation of state-element (s6) in Figur e 5. U n f r e e z e - v a r unfreezes a given variable. Syntax: unfreeze-var (old, p, Td, Tv) Semantics: unfreeze-var (oid, p, Td, Tv) -create-se(oid, p, modify-control, "changeable", T~, Tv') a.Object-id = oid A , ~ now, co) n AR(a) if Tv=nil Tv := ~ Tv N AR(a) otherwise
Tv~ is not calculated with respect to the frozen range of the variable. Thus, an unfreeze-var operation can override an earlier freeze decision. For example, the operation unfreeze-var ( oid=86~5~5, p= Record-Number, Td=nil, Tv----niO, that was applied to the database on Dec 13 1993; 10:02pm, resulted in the generation of state-element (s19) in Figure 5.
Extended Update Functionality in Temporal Databases
81
Object Level Update Primitives Create-obj: creates a new object that is an instance of a given class. Syntax: oid := create-obj (class) Semantics: oid := create-obj (c) old := generate-obj-id 0 | create-se-oid (oid) | create-se-class-ref (c, old). create-se-oid(oid) _-- create-se (old, p--"object-id', data, old, Td, vv)l Td=nOW A rv=now, co) create-se-class-ref (c, old) - create-se (old, p= "class-ref" ,data, c, Td, Tv) I ra=now A rv=now, co). The create-obj primitive creates two new state-elements. The first stateelement designates an object identity; the object identity is generated by the database and is returned as a result of applying the generate-obj-id builtin function. The object identity is a frozen state-element, the frozen status is protected by a meta-data integrity constraint that prevents the change of its status. The second state-element is a reference to the class c that is given as an argument using the object-id that was created earlier. The values of the time types of both state-elements are generated by the system and represent the systems defaults. They do n o t represent the object's validtime activespan, i.e., the time during which the object exists in the modeled reality. The activespan of an object is explicitly controlled by the user, and is associated with the Object-Status variable. For example, the operation create-obj (Patient), that was applied to the database on Dec 12 1993; 10:00pm resulted in the generation of state-elements (sl) and (s2) as presented in Figure 5 and returns the value 864545. S e t - o b j - s t a t u s changes the object status in a given valid-time temporal element. Possible values of the object status are Active, Suspended and Disabled. Object-Status is a special variable that cannot be handled by regular variable operations, thus it has its own set of operations that includes S e t - o b j - s t a t u s to set the value, and f r e e z e - o b j - s t a t u s and unfreeze-oh j - s t a t u s to freeze and unfreeze this status, respectively. Syntax: set-obj-status (oid, sval, Td, Tv ) Semantics: set-obj-status (old, sval, Td, %) -create-se (old, "object-status", data, sval, r~, T~) I (~.Object-id = old A sval C ("active", "suspended", "disabled" } A now, co) -- FR(~.Object-Status) if %=nil r~ = % - FR(a.Object-Status) otherwise v~ has a default value of the temporal element now, co). T~vand T~ determine the object's valid-time activespan. For example, the operation set-obj-status (oid=86~5~5, sval--"Active", Td=nil, %----nil), applied to the database on Dec 12 1993; 10:00pro, results in the generation of state-element (s3) in Figure 5. Freeze-obj-status freezes the object status in a given interval. Syntax: freeze-obj-status (oid, td, tv)
82
Opher Etzion, Avigdor Gal, and Arie Segev
Semantics: freeze-obj-status (oid, Td, Tv) -freeze-vax (oid, "Object-Status", T~, Tv~). Tv~ has a default value of the temporal element now, co). T~v and T~ are used to determine the object's valid-time activespan. For example, the operation freeze-obj-status (oid=86~5~5, td=nil, tv = Dec 12 1993; lO:OOpm, co), applied to the database on Aug 25 1994; 8:15am, resulted in the generation of state-element (s23) in Figure 5. This operation freezes the object status retroactively during its entire activespan. U n f r e e z e - o b j - s t a t u s Unfreezes the variable Object-Status. S y n t a x : unfreeze-obj-status ( oid, Td, "iv) Semantics: unfreeze-obj-status (oid, Td, Tv ) unfreeze-vat (oid, "Object-Status", Td, Tv). Disable-Obj changes the object status to Disabled in the interval Its, co), where ts is the start time associated with the valid time, given as a parameter by the user. Only the start time of the interval is used since this status is final in the sense that the object can never be revived again. Consequently, the end chronon is set to co. S y n t a x : disable-obj (old, Td, "iv) Semantics: disable-obj (oid, Td, Tv) -- a.Object-id = old A , ~ now, co) - FR(~.Object-Status) if Tv=nil Tv := ~ Its, CO) -- FR(~.Object-Status) otherwise | Tv~ r t~, co) ---, abort | Set-obj-status (oid, "disabled", v4, %) Tv~ receives a default value of now, co). The disable-obj operation assumes that the object is disabled as of a certain chronon to infinity. If the object status is frozen at some chronon during the interval of the disable-obj operation, then the object-status cannot be changed in this chronon. Thus, the disable-obj operation cannot be completed and the transaction should be either aborted or treated as an exception. For example, the result of the operation disable-obj (oid=8645~5, Td=nil, Tv =nil), applied to the database on Aug 25 1994; 8:05am, is the same as the freeze-obj operation, as given above. In the general case, the disable-obj operation is not reversible. However, in exceptional cases, an authorized DBA can use the unfreeze-obj-status to reverse the disable-obj operation and "rescue" the object. I
4.4
I
Update Operation Types
The update operation types that have been discussed in Section 2 are defined using the primitives of Section 4.3. These update operation types are the only ones that are accessible to users.
Extended Update Functionality in Temporal Databases
83
Insert : S y n t a x : insert (c, rdt, Tvt, ( V l , . . . , Vn}) lVi=(pi, vali, Tdi, Tvi). Semantics: insert (c, Tat, rvt, (//1, .-., Vn} I v~=(pi, vali, rd~, rye) -(exists-identifier (e, {Vl, ..., vn}) --* abort | oid := create-obj (c) | set-obj-status (oid, "active", Tat, rvt) | set-var (oid, Pl, Vall, Tdl, T v l ) | . . . | set-var (oid, Pn, valn rdn, rvn) exists-identifier is a function, it takes as an argument a class id and the set of input variables, according to the class definition determines the object identifier (primary key) and checks if there exists an instance of the class c with the given identifier. If this function returns true then the transaction should abort. oid is set to be the new object's id, using the create-obj operation. The insert operation creates the object, using create-obj, sets its status to be active, using set-obj-status and then updates its variables, using set-van TdZ and Tvt are the decision and valid times of the object's valid-time activespan. i.e., the temporal element in which the object is active. The generated o/d is returned to the user. Example: A new patient is inserted to the database. The following operation provides the patient's name. insert (c=Patient, tall----Dec 12 1993; lO:OOpm, Tvt=Dec 12 1993; lO:OOpm,
o0) {Vl--(pl--Patient-Name, vail=Dan Cohen, Tall=nil, Tvl----Dec 12 1993; lO:OOpm, oo))}) (sl)-(s4) of Figure 5 are the state-elements added to the database as a result of this operation. Modify : S y n t a x : modify (c, obj, rdl , rvl , {Vl,..., vn}) v~ = (Pi, vali , Tdi , Tvi ). Semantics: modify (c, obj, rdl , Tvl , {//1, "" ", Vrt} lYi=(Pi, vali, rai, rvi)) = old := identify-obj (c, obj) | (oid = nil) --* abort | (r,l ~ nil) --* set-obj-status (old, "active", Tat, rvt) | set-var (old, Pl, vail, Tall, rvl) | ... | set-var (old, Pn, valn, ran, r,n) The modify operation retrieves the object identity, based on an identifier given by the user, using identify-obj. If the user assigns a value to the t,t, then it resets the object's valid-time activespan. Finally, it updates its variables, using set-van c denotes the class-id of the object. identify-obj is a function that converts object-identifiers (primary keys) to object-identities (surrogates). If the sought object does not exist in the database, then the modify operation cannot be completed and the transaction should be either aborted or treated as an exception. If there is more
84
Opher Etzion, Avigdor Gal, and Arie Segev than one qualifying object with the same object-identifier, then the user is prompted to decide which object is the required one. Example: The operation
modify ( c=Medical-Record, obj=12345678-1, Tvl=nil, Tall=nil, {vl---(pl=Disorder, vail--partial treatment 7 ,Tm =Dec 12 1993; 11:15pm, Tvl =Dec 12 1993; 11:15pm, co))}) changes one of the disorder's alternatives in the Diagnosis. It generates the state-element (sl0) in Figure 5. Suspend : S y n t a x : suspend (c, obj, Tdt, TvZ). Semantics: suspend (c, obj, Tall, TvZ) -oid := identify-obj (c, obj) | (oid = nil) -~ abort | set-obj-status (oid, "suspended", Tdl, TvZ). The suspend operation generates a new state-element of the variable ObjectStatus with the value "suspended," using set-obj-status. The operation uses the object identity that is given by the identify-obj function. For example, the following operation suspends the patient Dan Cohen as an active patient in the emergency room. As a result, state-element (s20) of Figure 5 is added to the database.
suspend (e=Patient, obj=Dan Cohen, Tall----Dec 19 1993; 8:00am, Tvl =Dec 19 1993; 8:00am, co)). Resume : S y n t a x : resume (c, obj, Tall,~-vt, {Vl, 999 Un}) I Ui ----(Pi, vali, Td~,V~i). Semantics: r e s u m e (c, o b j , r d l , . . , I = Yah, -oid :-- identify-obj (c, obj) | (oid = nil) --~ abort | set-obj-status (old, "active", Tdl, ~'vl) | set-var (oid, Pl, vall, Tall, ~'vl) | .-. | set-vat (old, Pn, valn, Tdn, Tvn) For example, the following operation resumes the patient Dan Cohen as an active patient when he is admitted again to the emergency room.
resume (c=Patient, obj=Dan Cohen, Td~=Aug 24 1994; 12:00am, %~=IAug 24 1994; 12:00am, co)) As a result, state-element (s21) of Figure 5 is added to the database. Disable : S y n t a x : disable (c, obj, Td, 7-v) Semantics: disable (c, obj, Td, Tv) =-oid := identify-obj (c, obj) | 7 The medical term partial treatment refers to cases in which a treatment has not been completed, for example: a patient has failed to take the entire quantity of antibiotics assigned to him.
Extended Update Functionality in Temporal Databases
85
(oid = nil) --* abort | disable-obj (oid, Td, Tv) In non-temporal databases, when an object is deleted, its information is removed from the database. In temporal databases, historical information is kept and the user can retrieve the contents of each object that was disabled, during its activity range. Moreover, modifications to the state of the object at times before it was disabled are allowed. For example, it is possible to retroactively update a medical record in the period it was open, during the time in which the record is already closed. The semantics of the disable operation is compatible with the "real world semantics," since it is possible that new information is discovered after an object is no longer in the active domain. Freeze
:
S y n t a x : freeze (c, obj, Tdl , Tvl , { Vl , . . . , l/n} ) V i = (Pi , Tdi , Tvi )" Semantics: freeze (c, obj, Tall , Tvl, {l/1, "" ", l/n} l/~=(P~, Tdi, Tvi) -~ oid := identify-obj (c, obj) | (old = nil) -~ abort | (Tvl r nil) --* freeze-obj-status (oid, Tdl, Tvl) | freeze-var (oid, Pl, Tall, Tvl) | ... | freeze-var (oid, Pn, Tdn, Tvn) A freeze operation can be applied to a single chronon, to an interval or to the entire variable history. This operation can be applied to non-temporal databases as well, such that a freeze operation always refers to the current state. For example, the following operation freezes the Social-Security-Number of
Dan Cohen freeze (c--Patient, obj=Dan Cohen, td~=nil, tvz=nil, ( vl = (pl---SocialSecurity-Number, tall----Dec 12 1993; lO:OOpm, tvl=Dec 12 1993; lO:OOpm, As a result, state-element (s6) of Figure 5 is added to the database. Unfreeze : S y n t a x : unfreeze (c, obj, Tall , Tvl , { t l , . . . , I/n} ) l/i ~- (Pi, Tdi, Tvi). Semantics:
unfreeze (c, obj, Tdl, 7vl, {l/l, "" ", l/n}) I l/i'~-(Pi, mdi, Tvi) -~ oid := identify-obj (c, obj) | (oid = nil) -~ abort | (Tvl ~ nil) --* unfreeze-obj-status (oid, Tdl, Tvt) | unfreeze-var (oid, Pl, Tall, Tvl) | ... | unfreeze-vat (oid, Pn, Tdn, Tvn) An unfreeze operation eliminates the "freeze" constraint (if it exists) for the specified valid time. For example, the following operation unfreezes the Record-Number variable.
unfreeze (c--Medical-Record, obj=123~5678-1, Tall----nil,Tvl----nil, (Vl =(pl=Record-Number, Tall=Dec 13 1993; lO:OOpm, rvl=Dec 13 1993;
10:0@m, As a result, state-element (s19) of Figure 5 is added to the database.
86
Opher Etzion, Avigdor Gal, and Arie Segev
Revise : Syntax:
revise ( c, obj, ~-dl, Tvl , {71,.-., 7n}) "Y~ = ( vi , sq~ ), v~ = (p~ , vali , Td~, Tvi ) Semantics:
revise (c, obj,
T.L,
9
I
=
Sqd,
=
VaZ , Td ,
--
old :-- identify-obj (c, obj) |
(old -= nil) --* abort | (Tvl ~ nil) --+ set-obj-status (old, "active", rdt, TvZ) | vall ~ nil ~ modify (c, obj, Tdl, Tvl, { V l , . . . , gn})| Vse~ E sql 0 . . . 0 sqn: create-se(oid, pi, void-SE, se~,~'dz,Tvi). The r e v i s e operation replaces existing values with new ones, voiding the old values. Each revised value may cause the revision of multiple state-elements, selected by a selection query sqi. A revise operation can affect more than one state-elements in the following cases: 1. The valid time of the correction covers the valid time of several existing state-elements. 2. A change from a multi-valued semantics to a unique value semantics requires to void several state-elements. The r e v i s e operation has two parts. The first part adds state-elements with new values if there is at least one value that is not nil. If this part is not activated, then the state-elements are voided without replacing them with new values; the second part uses a selection query sqi for each revised value, to locate the state-elements t h a t should be voided, and voids these stateelements, or any part of their validity time that is specified by the Tv variable. For example, the operation revise (c=Patient, obj---Dan Cohen, { ~1 = ( p l = Social-Security-Number, va11=023~5678, ~'dl =Dec 12 1993; 11:30pm, %l--Dec 12 1993; 11:30pm, oc), sql---select t h e s t a t e - e l e m e n t w i t h value----12345678) applied in a transaction that committed at Dec 12 1993; 11:33pm, resulted in the creation state-elements (s13), (s14) in Figure 5. The revise operation allows the replacement of a frozen value, marking it as an erroneous one. The revise operation is necessary, along with the modify operation, in order to make a semantic distinction between a change in the real world and between a correction of a wrong value that was reported to the database. The default retrieve operations exclude revised values in retrieval operations (this default can be overridden). Additional use of the revise operation is to void state-elements without replacing them. In this case, vi -- nil and only the second part of the revise operation is applied. Set-SVS : Syntax: s e t - S V S ( c, obj, Tdl , Tvl , { b ~ l , . . . , 12n } ) 12i ---- O i , SVSi, qidi, Tdi, Tvi ) semantics: s e t - S Y S (c, obj, Tdt, TvZ, {Vl, 99 9 ~n}) I ~i = (Pi, SVSi, qid~, Td~, ~-vi) -old := identify-obj (c, obj) | (old = nil) --~ abort | (rvz ~ nil) --* set-obj-status(oid, "active',Tdl, 7vl) | create-se (oid, Pl, variable-SVS, (svsl, qidl), Tdl, Tvl) | | create-se (oid, Pl, variable-SVS, (svsl, qidn), Tdn, Tvn)
Extended Update Fkmctionality in Temporal Databases
87
The set-SVS command sets the SVS interpretation of one or more variables that belong to the same object. The interpretation consists of two parts: the SVS keyword (first, last, all, single, multi) and a query id. A query id is meaningful only when the single or m u l t i keywords are used, otherwise it is ignored. 4.5
Discussion
The update operation types are used as a uniform linguistic abstraction that supports any type of database update, for the data and control parts. The I n s e r t operation type creates a new object, it can also update the data bucket of the variables in the created object. The M o d i f y operation updates the data bucket of the variables in an existing object. Their semantics are an extended version of the semantics of these operation types in regular databases. The extended semantics follow the temporal database's structure. These operations are implemented using the set-var operation. The S u s p e n d , R e s u m e , and Disable operation are operations that affect the object-status. The R e s u m e operation can also be used to update the data bucket of the variables that belong to the object it resumes. The Freeze and Unfreeze update the modify-control buckets of variables that belong to a given object, they use the freeze-var and unfreezevar operations. The Revise operation updates the data bucket of the revised variable, and marks the revised state-elements in the Void-SE bucket. Note that the R e v i s e semantics does not use set-var, but instead it uses state-elements operations directly. This is done to bypass the frozen constraint, if exists, because it is possible to revise any state-element, even if it's variable is frozen. The Set-SVS sets SVS interpretation that overrides the static interpretation in the variable's level. This set of update operation types is a minimal set, but it is not necessarily the set that is appropriate for each application. It is possible to eliminate certain operations (i.e., not allow the Revise operation, in applications that do not support revisions) or to construct new operations using the low level primitives. For example, the combination of M o d i f y and Freeze in a single operation would enable to update values and freeze them using a single linguistic primitive. A formal definition of a new update operation type can be based on the predefined low-level primitives and should consider the following issues: 1. Whether the update operation type is applied with respect to the frozen range of the variable, the changeable range of the variable, or both? 2. What are the appropriate defaults for tv and td? 3. What are the constraints whose violation lead to a transaction failure?
5
Implementation
Issues
Several implementation issues are discussed in this section. Section 5.1 discusses alternatives for implementing the additional functionalities. Section 5.2 discusses
88
Opher Etzion, Avigdor Gal, and Arie Segev
the implementation of decision t i m e as a primitive, Section 5.3 discusses the implementation in a temporal relational model, Section 5.4 discusses the mapping of the proposed model to TSQL2, and proposes some changes to TSQL2 in order to facilitate the support for the extended functionality.
5.1
T h e I m p l e m e n t a t i o n Alternatives
The functionality discussed in this paper does not exist in TSQL2 or in any other temporal language, at the primitive level. The implementation alternatives are: 9 using the proposed primitives as system design tools, using the existing database primitives at the database implementation level; 9 developing a wrapper based on the temporal infrastructure, whose primitives are compatible with the primitives presented in this paper; 9 devising a separate implementation model bypassing the temporal database infrastructure, for the use of applications that require the extended functionalities. The first alternative cannot satisfy this study's objectives; the use of the existing primitives would make writing programs that satisfy these extended functionalities tedious, hard to verify, and ad-hoc. The third alternative of devising a new implementation model is consistent with our objectives and can result in optimized performance. The construction of a standard model that combines the desired object oriented and temporal features is a major task for future research and development in the temporal database community in general, and we intend to base our further implementation on such a model. Our current prototype implementation is based on a relational database, using a subset of TSQL2. In general, we propose to implement our model as a wrapper on top of an TSQL2 implementation.
5.2
T h e I m p l e m e n t a t i o n of the Decision T i m e P r i m i t i v e
The following discussion is relevant for applications that require the decision time functionality. We argued that the decision time in some applications is indispensable in determining the correct order of real-world events, and in making decision analysis inferences. The implementation choices are whether to implement decision time as an additional time dimension, or try to achieve this functionality in another way. The decision time has two major impacts on the model's representation and semantics: 9 It adds an additional chronon to each state-element; 9 The function ASE that selects the valid value according to the last value SVS may employ the decision time and not the transaction time to determine the last value (the same may apply for first value SVS).
Extended Update Functionality in Temporal Databases
89
It is possible to emulate the decision time functionality, without using an explicit time type, by adding objects that designate decisions,s and using the beginning of the tv interval of their variables to denote decision time. Such a solution is proposed in OS95, and it complies with the desire not to add more primitives. However, we argue that this requirement is general, and important enough to have a direct representation. Using decision-time objects is too cumbersome, even at the logical level. 9 From space complexity point of view, adding the decision time to the stateelement level requires substantially less space than the creation of a redundant object; 9 From time complexity point of view, having the decision time available at the state-element level is less expensive than joining the state-element and the decision-related object; 9 From the development and maintenance point of view, it is clearer to the user since the decision-related object is not a concrete object in the application's domain. This analysis leads to the conclusion that the support of decision time as a model primitive is cost effective in cases that this functionality is required. If the decision time functionality is not required, we may eliminate the space overhead, by supporting an initialization parameter that eliminates the decision time. In this case, the decision time support is an optional feature selected at the initialization time of the application's schema. If decision time is not selected, then no space is saved for td at the state-element's level and the transaction time (tx) replaces the decision time (td) in the interpretation of the ASE retrieval function. An existing application can be converted to include decision time, such that the value of tx will be used for any missing value of t4. 5.3
I m p l e m e n t a t i o n of t h e M o d e l
The update functionality presented in this paper is "data model independent" in the sense that it can be implemented on various data models. Although a natural implementation is in an object-oriented model, a standard object oriented temporal data model does not exist. We therefore restrict this discussion to the relational model and TSQL2. The structure defined in this paper can be trivially mapped into the nested relational model IRKS88 that has been suggested in Tan86 to be a basis for temporal database implementation. Mapping the data structure into a "fiat" relational model requires the use of normalization rules. The implementation in the temporal model is not unique. A possible implementation can use universal relations as discussed in N+87. Another possible implementation uses the ENF (Extension Normal Form), which is an extension of the TNF (Time Normal Form) NA89, as follows. s Recall that 'decision' is a generic reference to the real-world event that led to the database transaction. In many applications it represents an actual decision.
90
Opher Etzion, Avigdor Gal, and Arie Segev
Each relation designates a set of synchronous attributes, which are attributes that have common state-element's additional information (td, tv, etc.) at any chronon. We extend the definition of T N F to include all the additional information in a state-element, rather then just the tvthat represents an atomic combination of property and bucket consists of values and the state-element's additional information (without the revised-se component). A state-element of a set property is represented by several tuples with the same se-id. Each tuple is identified by both the se-id and its value. T h e implementation using the ENF blurs the original schema structure. Thus, the relationship among a class and its properties is represented using an additional relation for each bucket. Another relation stores the classification of objects into classes. Each relation represents a single combination of property and bucket, with the state-element's additional information. Note that in this particular example, all attributes are asynchronous. Object-id-data, Class-ref-data, and Object-status-data are system variables. Treatment-Data is a user defined Property. The creation of a state-element involves the addition of new tuple(s) to the appropriate property-bucket relation. This representation is restricted since the tv can be an interval but not a temporal element. To eliminate this restriction, a separate relation for the tv element should be created, identified by the state-element-id and the interval values. Redundant timestamping exist in tx when multiple state-elements are updated in the same transaction, in tv, when multiple state-elements have the same valid time, and in td, when multiple state-elements have the same decision times. Y~rthermore, there can be an overlap among all of them, e.g., t= = td = ts(tv). A possible space optimization feature is the enumeration of chronons and its use instead of the full representation; this, however, requires a conversion table from the enumeration to time-stamps, increasing the retrieval time.
5.4
Supporting The Extended Update Functionality With TSQL2
In this section we present a mapping of the update functionality to TSQL2, and the additional clauses required to augment TSQL2
Mapping to TSQL2 TSQL2 supports a bitemporal database environment, and uses temporal clauses as an extension of SQL-92. It is sufficient to map the create-se operation. The rest of the operations are translated to create-se as shown in Section 3. For the translation we assume that we have an underlying ENF bitemporal relational database with TSQL2 embedded within a host language that controls integrity constraints and aborts
the transaction when necessary. create-se(oid, p,/3, val, Td,
rv,
s)-=INSERT INTO p-~ VALUES (NEW, val, oid, VALID TIMESTAMP Tv
Td,
s)
Extended Update Functionality in Temporal Databases
91
old is the object id. val is the set of all values that have a common stateelement's additional information (e.g., tv, td, etc.) s can be either "changeable" or "frozen." Since tv is part of the temporal infrastructure schema, it is updated using the TSQL2 feature VALID T I M E S T A M P and not as part of the V A L U E S clause. The retrieval selections CSE and ASE can be easily expressed by TSQL2 queries as well.
A P r o p o s a l to A u g m e n t TSQL2 In order to support that functionality described in this paper in a convenient way, the following features should be supported as a primitive level, this can be done as a shell on top of TSQL2S+94, or as direct extension to the TSQL2 language. 1. A mechanism for handling simultaneous values is required. This mechanism should include new functions and defaults to support the retrieval of simultaneous values. These functions consist of the CSE and ASE functions. 2. A third time type (decision time) that reflects the correct order of occurrences in the modeled reality is needed. 3. A mechanism for freezing object's values and enforcing freezing constraints should be added. 4. A correction operation, that is semantically distinct from modification, should be introduced. 5. Clauses that represent the functionality of the update operation types would make the update language more powerful. We suggest to include the following new clauses in the extension of TSQL2. SUSPEND p: This clause would have a similar effect as the suspend primitive presented in Section 4. The disable primitive can use the semantics of the DELETE clause of TSQL2. The use of delete as an alias to disable is necessary to guarantee the compatibility of TSQL2 with SQL-92. It should be noted that TSQL2 permits changes to existing tuples even after the transaction commits. This can prevent the ability to restore all past states of the database. For example, a DELETE operation in a bitemporal database changes the tx according to the parameter given in this clause. Since deletions can be proactive and retroactive as well as current, the time of issuing the DELETE operation is not known after the modification. Consequently, queries with a viewpoint earlier than the time of change cannot be answered. R E S U M E p: This clause would have a similar effect as the resume operation type that was presented in Section 4. The clause: RESUME p VALID TIMESTAMP rv would effect an existing tuple, and change its validity interval. F R E E Z E : This clause would have a similar effect as the freezing operation type, as presented in Section 4. For example, FREEZE VARIABLES (vat1, ..., varn) OF p VALID TIMESTAMP zv
92
Opher Etzion, Avigdor Gal, and Arie Segev The FREEZE clause would freeze a set of a variables in a given valid time interval, but it can be effective only when it is combined with a mechanism that enforces the frozen range of a variable. U N F R E E Z E : This clause would have a similar effect as the unfreeze operation type that was presented in Section 4. For example, the clause: UNFREEZE VARIABLES (vat1, ..., yarn) OF p VALID TIMESTAMP 7v would unfreeze a set of variables, in a given valid time interval. R E V I S E . . . W I T H : This clause would have a similar effect as the revise operation type that was presented in Section 4. For example, the clause: REVISE se WITH VALUE (val) would revise the state-element se with the value with the value val. S E T - S V S This clause would have a similar effect as the set-svs operation type that was presented in Section 4. For example, the clause: SET-SVS vat WITH VALUE (val) USING QUERY (qid) would set the SVS value and associated query of var.
6
Conclusion
This work extends the temporal database functionality to accommodate complex applications. These extensions are a step in the direction of bridging the gap between temporal database capability and the needs of real-life applications. The results presented in this paper support a model that support flexible interpretation of simultaneous values semantics as an integral part of a temporal database. This functionality facilitates the database modeling and manipulation of real-world concepts. The main contribution of this paper is in the construction of a model that supports extended update features in both the schema level and update operation levels. The features include: simultaneous values semantics, modify control and revision control, all of them are required due to the simultaneous values capability of temporal databases. The case study has exemplified the need for such a model in a decision analysis system, however these functionalities can be used for other types of system. For example, it can be used to tailor a data model's capabilities according to application's needs, by adjusting the meta-data property class-ref property single or multiple classification of an object and fixed or variable classification of objects. The model presented in this paper includes a third time type called decision time that maintains the correct order of events in the modeled reality. This time type is essential for many types of applications, and is optionally supported as a model primitive. The system designer can choose if this feature is included, during the application's initialization time.
Extended Update Functionality in Temporal Databases
93
T h e proposed update functionality is d a t a model independent, and thus it can be designed as a shell on top of existing d a t a models. A mapping of the u p d a t e primitives to TSQL2 was described, as well as a list of extensions to TSQL2 required for a more complete temporal database functionality. A p r o t o t y p e of this system is currently being developed. This p r o t o t y p e is to be used in a simulation project in a hospital's training center. Further research will deal with d a t a modeling implementation on top of an object oriented model, the impact of simultaneous values on schema versioning, and investigation of applying research that has been done in the artificial intelligence area a b o u t possible world semantics and belief revision to extend this model.
Acknowledgments T h e case study was established with the help of Gilad Rosenberg M.D. We t h a n k the reviewers for m a n y helpful comments.
References A + 79.
ABN87.
AKGgl. Ari86. BraT8. BZ82. CC87.
CK86. CK94. CT85.
EGS92.
V. De Antonellis et al. Extending the entity-relationship approach to take into account historical aspects of systems. In Proceedings of the International Conference on the E-R Approach to Systems Analysis and Design. North Holland, 1979. T. Abbod, K. Brown, and H. Noble. Providing time-related constraints for conventional database systems. In Proceedings of the 13th International Conference on VLDB, pages 167-175, Brighton, 1987. S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. Theoretical Computer Science, 78, 1991. G. Ariav. A temporally oriented data model. ACM Transactions on Database Systems, 11(4):499-527, Dec 1986. J. Bradely. Operations in databases. In Proceedings of the Fourth International Conference on VLDB, W. Berlin, 1978. J. Ben-Zvi. The Time Relational Model. PhD thesis, Computer Science Department, UCLA, 1982. J. Clifford and A. Crocker. The historical relational data model (hrdm) and algebra based on lifespans. In Proceedings of the International Conference on Data Engineering, pages 528-537, Feb 1987. G.P. Copeland and S. Khoshafian. Object identity. In Proceedings of Object Oriented Programming Systems, Languages and Applications. ACM, 1986. S. Chakravarthy and S.-K. Kim. Resolution of time concepts in temporal databases. Information Sciences, 80(1-2):43-89, Sept. 1994. J. Clifford and A. U. Tansel. On an algebra for historical relational databases: two views. In Proceedings of the ACM SIGMOD, pages 247265, May 1985. O. Etzion, A. Gal, and A. Segev. Temporal support in active databases. In Proceedings of the Workshop on Information Technologies 8_4 Systems (WITS), pages 245-254, Dec 1992.
94 EW90.
F+94. FD71.
Gad88. GE98.
GES94.
HK87. J+94. KL83.
Kli93. McK88.
MS91.
N+87.
NA89. OS95. Pis94. RKS88.
RS91.
S+94. SA86.
Opher Etzion, Avigdor Gal, and Arie Segev R. Elmasri and G. Wuu. A temporal model and query language for ER database. In Proceedings of the International Conference on Data Engineering, pages 76-83, Feb 1990. R. Fagin et al. Reasoning About Knowledge. MIT Press, Cambridge, MA, 1994. N. Findler and D.Chen. On the problems of time retrieval, temporal relations, causality and coexistence. In Proceedings of the International Conference on Artificial Intelligence. Imperial College, Sep 1971. S.K. Gadia. The role of temporal elements in temporal databases. Data Engineering Bulletin, 7:197-203, 1988. A. Gal and O. Etzion. A multi-agent update process in a database with temporal dependencies and schema versioning. IEEE Transaction on Knowledge and Data Engineering, 10(1), February 1998. A. Gal, O. Etzion, and A. Segev. Representation of highly-complex knowledge in a database. Journal of Intelligent Information Systems, 3(2):185203, Mar 1994. R. Hull and R. King. Semantic database modeling: Survey, application and research issues. ACM Computing Surveys, 19(3):201-260, Sep 1987. C.S. Jensen et al. A consensus glossary of temporal database concepts. A CM SIGMOD Record, 23(1):52-63, 1994. M.R. Klopprogge and P.C. Lockmann. Modeling information preserving databases; consequences of the concept of time. In Proceedings of the International Conference of VLDB, Florence, Italy, 1983. N. Kline. An update of the temporal database bibliography. ACM SIGMOD Record, 22(4):66-80, December 1993. E. McKenzie. An Algebraic Language for Query and Update of Temporal Databases. PhD thesis, Computer Science Department, University of North Carolina in Chapel Hill, Sep 1988. E. McKenzie and R. Snodgrass. An evaluation of relational algebras incorporating the time dimension in databases. ACM Computer Surveys, 23(4):501-543, Dec 1991. B.A. Nixon et al. Design of a compiler for a semantic data model. Technical Report CSRI-44, Computer Systems Research Institute, University of Toronto, May 1987. S.B. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49:147-175, 1989. G. Ozsoyoglu and R. Snodgrass. Temporal and real-time databases: A survey. IEEE Transaction on Knowledge and Data Engineering, 1995. N. Pissinou. Towards an infrastructure for temporal databases--A workshop report. ACM SIGMOD Record, 23(1):35, 1994. M.A. Roth, H.F. Korth, and A. Silberschatz. Extended algebra and calculus for nested relational databases. A CM Transactions on Database Systems, 13(4):390-417, Dec 1988. E. Rose and A. Segev. Toodm-a temporal, object-oriented data model with temporal constraints. In Proceedings of the International Conference on the Entity-Relationship Approach, pages 205-229, San Mateo, California, 1991. R. Snodgrass et al. TSQL2 language specification. ACM SIGMOD Record, 23(1):65-86, Mar 1994. R. Snodgrass and I. Ahn. Temporal databases. IEEE Computer, 19:35-42, Sep 1986.
Extended Update Functionality in Temporal Databases
95
N.L. Sarda. HSQL: Historical query language. In Temporal Databases, chapter 5, pages 110-140. The Benjamin/Commings Publishing Company, Inc., Redwood City, CA., 1993. A. Segev, C.J. Jensen, and R. Snodgrass. Report on the 1995 international SJS95. workshop on temporal databses. ACM Sigmod Record, 24(4):46-52, Dec 1995. A. Shoshani and K. Kawagoe. Temporal data management. In Proceedings SK86. of the International Conference of VLDB, pages 79-88, Aug 1986. R. Snodgrass. The temporal query language TQUEL. ACM Transactions Sno87. on Database Systems, 12(2):247-298, June 1987. M.D. Soo. Bibliography on temporal databases. ACM SIGMOD Record, Soo91. 20(1):14-24, 1991. A. Segev and A. Shoshani. The representation of a temporal data model in SS88. the relational enviromnent. Technical ReportLBL-25461, Lawrence Berkeley Laboratories, Aug 1988. Invited Paper to the 4th International Conference on Statistical and Scientific Database Management. Tan86. A.U. Tansel. Adding time dimension to relational model and extending relational algebra. Information Systems, 11(4):343-355, 1986. TCG+93. A.U. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass. Temporal Databases. The Benjamin/Commings Publishing Company, Inc., Redwood City, CA., 1993. V.J. Tsotras and A. Kumar. Temporal database bibliography. ACM SIGTK96. MOD Record, 25(1):41-51, March 1996. WJL91. G. Wiederhold, S. Jajodia, and W. Litwin. Dealing with granularity of time in temporal databases. In R. Anderson et al., editors, Lecture Notes in Computer Science 498, pages 124-140. Springer-Verlag, 1991. ZP93. E. Zimanyi and A. Pirotte. Imperfect knowledge in databases. In P. Smets and A. Motro, editors, Proceedings of the Workshop on Uncertainty Management in Information Systems: From Needs to Solutions, pages 136-186, Santa Catalins, CA., Apr 1993. Sar93.
On Transaction Management in Temporal Databases Avigdor Gal* Department of Computer Science University of Toronto
Abstract. A transaction model provides a framework for concurrent processing of retrieval and update operations in a database. Considerable research effort has focused on various techniques and protocols to ensure the ACID properties of transactions in conventional databases. However, the adoption of these techniques and protocols to temporal databases is not trivial. In particular, a refined locking mechanism based on temporal characteristics can provide better concurrency among transactions in temporal databases than a conventional locking mechanism. Accordingly, this paper presents a set of modifications and fine tuning of traditional concepts in transaction management, to enable a better performance of temporal databases. We also suggest a scheme for implementing a transaction protocol for temporal databases on top of a relational database. The contribution of the paper is in identifying the unique properties of transaction management in temporal databases and the use of these properties to provide a refined locking mechanism to enhance the concurrency of such databases. In particular, we show that the classic 2PL mechanism cannot ensure serializability in temporal databases. Instead, we suggest an alternative method to ensure serializability and reduce redundant abort operations, which is based on a temporal serializability graph. Keywords: temporal databases, transaction management
1
Introduction
A t r a n s a c t i o n m o d e l p r o v i d e s a f r a m e w o r k for c o n c u r r e n t p r o c e s s i n g of r e t r i e v a l a n d u p d a t e o p e r a t i o n s in a d a t a b a s e . A c o n v e n t i o n a l t r a n s a c t i o n m o d e l e n s u r e s t h e following p r o p e r t i e s ( A C I D ) : A t o m i c i t y : E i t h e r all t h e o p e r a t i o n s of a t r a n s a c t i o n a r e p r o p e r l y reflected in t h e d a t a b a s e or n o n e are. C o n s i s t e n c y : E x e c u t i o n of a t r a n s a c t i o n in i s o l a t i o n p r e s e r v e s t h e c o n s i s t e n c y of t h e d a t a b a s e . * The work was conducted while the author was at the University of Toronto. He is currently at the MSIS Department, Rutgers University, 94 Rockafeller Road, Piscataway, NJ 08854-8054
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 96-114, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
On Transaction Management in Temporal Databases
97
Isolation: Each transaction assumes that it is executed alone. Any intermediate transaction results are not available to other concurrently executed transactions. D u r a b i l i t y : The values changed by the transaction persists after the transaction was successfully completed. Considerable research was dedicated to various techniques and protocols to ensure the ACID properties of transactions in conventional databases, e.g. the locking mechanism and the 2PL (Two Phase Locking) protocol, using serializability as a correctness criteria. However, adopting these techniques to temporal databases 27, i.e. databases that enable the accumulation of information over time and provide the capability to store different values of the same data element with different time characteristics, is not trivial. When adopting conventional techniques to accommodate the needs of temporal databases, a refined locking mechanism based on temporal characteristics should be designed, to provide better concurrency among transactions in temporal databases. Also, conventional protocols cannot efficiently support transactions in temporal databases. For example, as suggested in 21 and demonstrated in this paper, the classic 2PL mechanism cannot ensure serializability in temporal databases. Therefore, the use of either a strict 2PL or a serial transaction processing is required, when using conventional methods, to prevent a non-serializable transaction management in temporal databases. This paper presents a set of modifications and fine tuning of traditional concepts in transaction management, which are required for a better performance of temporal databases. To exemplify these modifications, we provide a scheme for implementing a temporal transaction protocol on top of a relational database model. The approach of using add-on temporal facilities with an existing conventional database model is considered nowadays the most suitable approach to provide temporal capabilities in databases 28. The contribution of the paper lies in identifying the unique properties of transaction management in temporal databases and the use of these properties to provide a refined locking mechanism to enhance transactions' concurrent execution in such databases. In particular, we provide an alternative method to 2PL, based on a temporal serializability graph, to ensure concurrency while reducing the number of redundant abort operations. The issue of transaction modelling for temporal databases was suggested as one of the challenges for further research at the NSF International Workshop on an Infrastructure for Temporal Databases 4 and was first introduced in 21 and 30. While the former relates to a transaction time temporal database only, the latter uses a simplified temporal data model and therefore results in a much simpler transaction model. In particular, the temporal database in 30 does not support transaction time and is not append-only. Some consideration to the issue of using commit time as a transaction time was given in 8, 19, and 28. While several previous researches have discussed the refinement of transaction models (e.g. SAGAS 14 and ACTA 6), none of them relate specifically to the unique properties of temporal databases. Nonetheless, it is worth noting
98
Avigdor Gal
that an extended model like SAGAS can serve as an underlying model for implementing better transaction models for temporal databases by using temporal independence and the refined locking mechanism presented in this paper. Most transaction models deal with time by using histories and time stamps as useful tools for ensuring serializability, and some research was done on querying transaction logs to obtain temporal-oriented information 3. Yet, these time considerations provide a different dimension than the one we handle in this paper, i.e. providing temporal databases with a coherent transaction model. Time stamping mechanisms for ensuring serializability were discussed in the framework of conventional databases 2 and some research was even dedicated to multiversion systems 16. While this area of research bares similarity to the research presented in this paper, several major differences exist. First, the time stamping does not provide temporal capabilities on top of a conventional database. Second, a transaction in some temporal database types (e.g. bi-temporal databases) is time stamped at commit time, rather than at the beginning of its execution. Therefore, as we show in this paper, the assumptions that hold for a time stamping mechanism are not valid for transactions in bi-temporal databases. The rest of the paper is organized as follows. Section 2 provides a data model and an execution model of a temporal database that is utilized throughout the paper. A transaction model for temporal databases is introduced in Section 3 followed by a scheme for implementing a temporal transaction protocol on top of a relational database model (section 4). Section 5 concludes the paper. 2
A data
model
for temporal
databases
This section introduces the basic concepts of a data model for temporal databases. The terminology is based on 10, and it uses a semantic data model which is more adequate for representing sets of sets, a common requirements in temporal databases. The generic model can be easily translated into a relational as well as an object-based data model (see 10 for details). An object is defined as an instance of a class or a tuple in a relation and a property is defined as an attribute in the object-based model and a column in the relational model. The term class defines either a class in the object-based model, or a relation in the relational model. Let D B S = {C1,C2, ..., Cm} be a database schema that consists of m classes. A class C~ has n~ properties P~,P~, ..., P~n,, each with a domain Dom(Pj), where a domain is a set of values. An instance of a property F j is an element of the set Dom(P~), represented as c~.P~, where c~ is an object identifier instance of the appropriate class, a class name, or a variable. A class domain of a class C~ (CDOM(CO) is a subset of the Cartesian product Dom(P~) x Dom(P~) x ... x Dom(P~,). An object state os of an instance o of a class C~ at time t is an element (Pl,P2, ...,Pn,) E CDOM(Ci). An application state at t is a set {os(o) I o is an instance of Ci (1 < i < m) at t}. Following previous works in the temporal database area, we adopt a discrete model of time 7, isomorphic to the natural numbers. Hence, a temporal domain is a domain T ~ N. The discrete model defines a Chronon 17 to be a nondecom-
On Transaction Management in Temporal Databases
99
posable unit of time (t E T), whose granularity is application dependent. A time interval is designated as ts, re), the set of all chronons t such that t8 <_ t < re. A temporal element 9 is a finite union of disjoint time intervals. The temporal infrastructure document 26 advocates a bi-temporal database model, in which each data element is associated with two temporal dimensions, called valid time and transaction time. A valid time (v) is a temporal element that designates the collection of chronons at which the data element is considered to be true in the modeled reality. A transaction time (x) is a chronon that designates the time in which the transaction that inserted the data element's value to the database was committed. Therefore, in a bi-temporal database, a domain Dom(P~) of an attribute P~ is the Cartesian product of three domains, one of which is the value domain of the property, while the other two are temporal domains. Information about an object is maintained as a set of variables (instances of the class' properties), where each variable contains information about the history of the values of the property. Each variable is represented using a set of stateelements, where a state-element se is an element of a domain that consists of a value (se.value) and temporal characteristics (se.v and se.x in the bi-temporal case). The following definition provides some properties of sets of state-elements: D e f i n i t i o n 1. Let SE1 and SE2 be two sets of state-elements of a variable ~.P:
- SE1 and SE2 are identical iff V1 < i , j < 2(Vse e SE~ 3se" e S E j !
se.value = se'.value A se.v =
s
s e . v /k s e . x
~ 8e .x). !
t!
- SE1 and SE2 are similar iffV1 <_ i , j < 2(V(se'l,se'l} c SEi 3(se2, se2} C
SEj l !
!
t
t
1. se~.value = se~,.value A ser = se2,;vA 2. se t .value,9 se 2 .value A se~,.v ~- se 2 .vA 3. s e l . x o s e l . x
, se2.x o se2.x
(o 9
{<, =, >}))
Based on Definition 1, the similarity of two sets of state-elements identifies two sets that consist of the same information, and were committed at the same order, yet in different chronons) Various database models accumulate state-elements in different ways. Some models (e.g. TALE 13) follow the append only approach, according to which new information is added while existing information is left intact. Other models (e.g. 1) follow the alternative approach, according to which a new state-element of a variable in a valid time T replaces any other state-element in T. Many hybrids exist between these two extremes. If the data model supports the append-only approach, previous inserted state-elements can be accessed using an observation time abstraction (see below). Temporal relationships for retrieval and update purposes are specified through the use of participants, variables with valid time binding, of the form (c~.P, v). 1 In 18, value equivalence is suggested, where all temporal characteristics are being stripped off. This is, however, a different notion than similarity.
100
Avigdor Gal
A variable followed by a v defines the state-elements that are retrieved by the operation, or the bounded effect of the operation on the generation of new stateelements. An append-only temporal database is updated by adding state-elements to variables, therefore generating a new application state each time a state-element is added. Each state-element is associated with two temporal values, one specifies its valid time and the other is the transaction time, set upon a successful termination of a transaction, where a transaction is defined using the classical definition (e.g. 29) and refined in the following section. To ensure durability, the values that are changed by the transaction persist after a commit command is issued. It is worth noting that all the state-elements that were generated by a single transaction share the same transaction time. Information retrieval from a database is done by retrieving state-elements that persist in the database. The following parameters define the set of stateelements which are considered for retrieval: 1. The required variable(s), i.e. a specific object and a list of properties. 2. A temporal element that specifies the required valid time. 3. A chronon that specifies an observation time of the query. An observation time defines a previous state of the database, rather than the current one, to be the retrieved state. A selection of an observation time to be to < now results in selecting only state-elements that are known at to, i.e. have persisted in the database no later than to ({se I se.x < to}). It is worth noting that the observation time is restricted to be less than the chronon in which a query was issued. A related type of queries retrieves previously inserted values of a variable a . P in T. Hence, instead of specifying an observation time, a version number is utilized. For example: "retrieve the value of a . P in T that was inserted before i versions" (i > 0). We term these queries version
queries. Let Q be a query for a variable a . P in r as of to. Q is formalized as Q = aZ=a.PAvnrr where X is the set of attributes that represents the object identifier, and Q' is the query part that is associated with the nontemporal aspects of Q. Q returns the state-elements {sel, ..., sen} of a . P such that Vl < i < n, sei.v N T ~ 0 A x <_ to. It is worth noting that some queries may require the use of the full set of state-elements of a variable. For example, the query "find the chronons in which the price of a share x increases within a day after an increase in patrol prices," requires the use of the full set of stateelements of the share x and the patrol prices. It is also noteworthy that in order to enable an observation time specification, a participant can be extended to be a triplet (a.P, v, to) where the latter component represents the observation time. It is possible to define a preference criterion to provide a partial order among state-elements. A preference criterion that chooses the value(s) that is (are) valid in an interval T for which a variable a . P has overlapping values can be based on several preference relations. For example, let se~ and sej be two state-elements
On Transaction Management in Temporal Databases
101
of a variable a . P that are candidates for retrieval, sej is preferred to se~ iff sej.x > se~.x. This preference criteria (denoted "last value semantics") is the common one in temporal databases and we shall use it as a default criteria. Therefore, a variable for which no observation time is specified (i.e. there is no value specified for to) is assumed to require the state-element that has a higher x value for each chronon than of all the other candidate state-elements. 3
A transaction
model
for temporal
databases
In this section we provide a transaction model for temporal databases. Section 3.1 provides the modifications of the basic concepts of transaction modelling. Based on these modifications we provide a temporal transaction model in Section 3.2. 3.1
M o d i f i c a t i o n o f basic c o n c e p t s o f t r a n s a c t i o n m o d e l l i n g
A transaction in a temporM database, just like a transaction in a conventional database, is a set of database operations that the database views as a single unit of work. However, all database operations in a temporal database are associated with a temporM element that defines their temporal effect on the database (see the definition of a participant in Section 2). We limit our discussion to database operations only, although a transaction may consist of external routines, 2 as we are mainly interested in the transaction's effect on the database. In this section we provide the required modifications to transaction modelling in temporal databases.
Atomicity a n d r e c o v e r y This section discusses atomicity and recovery. We present temporal independence as a new form of atomicity, and discuss various recovery mechanisms, including transactions aborts, cascading aborts, aggressive and conservative protocols and a redo mechanism. A transaction in conventional database is atomic, i.e. its database operations can either occur in their entirety or not occur at all, and if they occur, nothing else apparently went on during the time of their occurrence 29. There are many possible temporal extensions to the atomicity property. The two extremes result in two types of atomic behaviour of a transaction, as follows: G l o b a l a t o m i c i t y : The atomicity as perceived in conventional databases. T e m p o r a l i n d e p e n d e n c e : The temporal database is conceptually viewed as a set of independent database snapshots, each of which relates to a different chronon. Hence, a transaction in a temporal database is viewed as a collection of transactions applied to different snapshots, and therefore a transaction can commit in one chronon and abort in another. The effect of temporal independence is materialized in a preprocessing phase, during 2 For example, in DB2 a transaction is defined as '% set of interactions between an application and the database." 5
102
Avigdor Gal which a transaction submitted by the user is partitioned into a set of transactions, each relates to a single schema version, that are executed according to a set of algorithms as proposed in 11.
The main discernment for introducing temporal independence is to provide the user with an adequate mechanism to support database operations in a temporal database with schema versioning. A temporal database accommodates schema versioning if it supports modifications to the database schema, as well as database operations that should be interpreted in the context of the appropriate schema version, which is not necessarily the current one 18. The persistence of all schema versions guarantees correct interpretation of historical data, since each update operation o is considered with respect to the schema(ta) that is (are) correct in the valid time as given in the participants of o. Therefore, by using global atomicity, if o cannot be performed with respect to any of the involved schemata, the transaction aborts. Consequently, redundant aborts may occur due to the user's ignorance with respect to the metadata modifications. By using temporal independence, on the other hand, a transaction in a temporal database is treated as a syntactic substitution for representing several snapshot transactions, not bounded by global atomicity rules. Temporal independence, therefore, supports the maximal possible changes to snapshots, while maintaining the database consistency. To ensure atomicity, the DBMS should use a recovery scheme. Since each state-element is stamped with a transaction time at commit time, the most natural policy to adopt is the No-steal policy, according to which no state-elements are written to the database at least until the commitment of the modifying transaction. Therefore, we can assume that all the state-elements that are generated by a transaction persist only at commit time, after a time stamp was chosen. Hence, whenever a transaction T aborts, all the state-elements that were generated by T are not added to the database (no-undo policy). It is possible, that due to various reasons (e.g. shortage of main memory) some of the state-elements are written to the database (using the Steal policy), to be replaced at a later time by adding the transaction time. In this case, these state-elements should be erased to ensure a correct recovery process. Since an append-only database simulates the shadowing strategy, by keeping all of its previous states, erasing these state-elements restores the previous database state, and ensures the database consistency. 3 The problem of cascading rollbacks exists in temporal databases (and can be prevented by using a strict protocol), yet its scope can be narrowed by refining the conflicting operations notion. This refinement also serves to enable a better concurrency of transactions, as discussed in the sequel. In temporal databases, the occurrence of a deadlock situation is less likely than in conventional databases since temporal databases store and use more information of each property and therefore there is a reduced probability of having two concurrent transactions trying to lock the same item (see section 3.1 3 A less powerful argument, regarding media failures only, was presented in 20.
On Transaction Management in Temporal Databases
103
for the locking mechanism in temporal databases). Based on results in conventional databases, the use of an aggressive protocol is preferable to the use of a conservative one in temporal databases. T h e redo mechanism of a conventional database is not adequate for t e m p o r a l databases. In conventional databases we can scan through the log and u p d a t e each value of a committed transaction (e.g. 22). In temporal databases, however, this simple mechanism might generate two similar sets of state-elements in case a system failure occurs while updating the database with the updates of a committed transaction. Such a duplication is likely to affect the d a t a b a s e ' s retrieval results in situations where the number of state-elements is used for view purposes (e.g. averaging the values of a variable at a given chronon). To overcome this problem, we suggest to register the transaction time on the log when a transaction is committed. In the recovery process, the information of a new state-element of a variable a . P will be generated based on the information on the log only if there is no identical state-element in the database for a.P. Using this scheme, the state-elements will be recovered as a whole (including the original transaction time) rather t h a n generating a similar set of state-elements. T e m p o r a l l o c k s A common mechanism to ensure serializability of transactions is the locking mechanism. In this section we discuss a refinement of the conventional locking mechanism, to enable a more flexible transaction management, using the unique properties of temporal databases. 2. - A t e m p o r a l r e a d l o c k : A transaction T in a temporal database holds a t e m p o r a l read lock from time tl until time tu on a variable ~ . P in T (deDefinition
noted as trlock (~.P,-r)) iff no transaction can update (~.P in ~- in the time interval tl , tu ). 3. - A t e m p o r a l w r i t e lock: A transaction T in a temporal database holds a temporal write lock from time tl until time tu on a variable a . P in T (denoted as twlock (a.P, T)) iff T is the only transaction that can update a . P in T in the time interval tl,tu). Definition
It is worth noting t h a t there are two different time dimensions in the above definitions. ~- is a temporal element that relates to the valid time of a stateelement, while tz, tu) defines the time in the real world when other transactions are prohibited from reading/writing (~.P in T. A transaction T can request a trlock (ak.pl, Tq) o r a twlock (ak.pl, "rq). Both types of locks are released with an unlock (ak.p~,-rq) request. As in conventional databases, we assume t h a t each time a twlock is applied to a variable (~.P in T, a unique function associated with t h a t lock produces a new state-element for a . P in ~-. T h a t function depends on all the variables which were locked using trloek prior to the unlocking of a . P in T. Also, we assume t h a t a trlock applied to a variable c~.P in r does not modify a . P in T. We do not assume, however, t h a t a write lock of a variable implies t h a t it is read. As a final note, we draw attention to the fact t h a t while in conventional databases a write lock of an element A prevents further read locks to A before
104
Avigdor Gal
the write lock is released, there are situations where a write lock does not prevent a read lock. These situations involve the usage of previous application states using an observation time. Since previous application states cannot be modified in an append-only database, any retrieval operation t h a t involves an application state t h a t precedes the starting of the transaction can be retrieved at any time during the transaction processing, without a need for a lock even if the variable is write-locked at that time 21.
Conflicting o p e r a t i o n s The common model in conventional databases defines conflicts among read and write operations of the same item, and uses locking as a mechanism to prevent a non-serilaizable schedule as a result of such conflicts. Read locks are considered to be shared, i.e. a read lock on an item A prevents any other transaction from writing a new value to A, yet any number of transactions can hold a read lock on A. A write lock, however, is considered to be exclusive in the sense t h a t while a transaction holds a write lock on an item A, no other transaction can read from or write to A. As discussed in this section, a refinement of the notion of a conflict is required when discussing temporal databases. In t e m p o r a l databases, conflicts m a y occur among two read or write operations only if they relate to the same variable c~.P with an overlapping valid time T. As discussed in section 3.2, and following similar mechanisms in conventional databases, there can be no R R conflict in temporal databases, yet there exists a W R conflict. However, unlike conventional databases, a WVv" conflict cannot always be solved by identifying useless transactions 4 in append-only t e m p o r a l databases. For example, if a transaction uses an observation time to retrieve state-elements from the database, "useless transactions" in a conventional database become "useful transactions" as their values might serve in a future read operation. Since transactions are time stamped on commit time (x), their effect on further retrievals in the database depends on the order of the transactions' commit commands. For example, if two transactions T1 and T2 a t t e m p t to write concurrently to a variable c~.P values vall and val2 with valid times T1 and r2, respectively, such t h a t ~'1 N T2 = T r @, then b o t h values persist, yet only one value is retrieved using the last value semantics. If T1 commits before T2, val2 will be the retrieved value of ~ . P in T, and vice versa. It is also possible, under such circumstances, to generate a history t h a t would not be serializable. For example, let T1 and T2 be two transactions, and consider the following history:
F1
T2
(1) write (c~.P,TI> (2) read (a.P, TI> (3) write (a.P, TI> (4) commit (5) commit 4 A useless transaction is a transaction which effect on the database is lost due to later values written to the database 25.
On Transaction Management in Temporal Databases
105
Since T2 is committed before T1, a serialized execution should be T2 ~ T1, and therefore T2 cannot use the value of (~.P in T as written by T1. In Section 3.2 we shall show that due to such scenaria, a 2PL protocol cannot guarantee serializability in a temporal database. A t e m p o r a l transaction m o d e l
3.2
Having defined the required refinements of conventional terminology to support the temporal dimension, this section presents a temporal transaction model using schedules and a temporal serializability test. We use the convention that a serializable schedule of executed operations ensures the consistency and isolation properties, and show that while a 2PL cannot guarantee serializability in bi-temporal databases, a strict 2PL guarantees serializability. We also provide a new protocol, the a b o r t / c o m m i t / w a i t protocol to minimize the number of aborted transactions. In what follows, a transaction is either a transaction as submitted by a user (if using global atomicity) or a transaction as p r o d u c e d b y a pre-processing step (if using temporal independence, as defined in Section 3.1). A schedule S = for a set of transactions T1, ...,Tm is an ordered set of operations of T1, ..., Tm such that at = Tj: (tr/tw)lock (a.P, T> or at = T3: unlock (c~.P, T). The following definition defines equivalence of schedules, using the available sets of state-elements.
Definition 4. - E q u i v a l e n c e o f s c h e d u l e s : Two schedules $1 and $2 are equivalent if: 1. For each variable (~.P, S1 and $2 produce similar sets of state-elements. 2. Each temporal read lock of a variable (~.P in T applied by a given transaction occurs in S1 and $2 at times when (~.P has similar sets of state-elements in T.
A weaker definition of an equivalence of schedules utilizes the last value semantics as a comparison mechanism, rather than sets similarity. This weaker definition converges to the equivalence definition of schedulers in conventional databases. As explained in Section 3.1, the granularity of locks in temporal databases involves a temporal element as well as a variable. Therefore, some modifications are required to a precedence graph in order to identify whether a given set of transactions is serializable or not.
Definition 5. - A t e m p o r a l serilizability graph: Let S ~- (al, ...,an) be a schedule for a set of transactions T1, ..., Tin. A temporal serilizability graph G(V, E) is a polygraph such that: -
V={T~,...,Tm}
E is generated as follows: 1. W R conflict: an edge ((T',T">,T> is generated if." Write lock: 3ai -- T' :twlock (a.P, T' ) A
106
Avigdor Gal
R e a d lcok: 3a3 = T":trlock (a.P, T")A Write lock precedes R e a d lock: i < jA Valid t i m e overlap: T' n T" = V # OA N o intermediate conflicting lock: Vi < k < j, (ak ~ T* :twlock (~.P, T*) V ak = T* :twlock(~.P, T*) A T" N V" = O) ts * * . W W / R W conflict: an edge pair ( ( (T , T ), T), ( (T , T ), T) ) is generated if: E x i s t i n g e d g e : 2((T',T"),T') 6 E A Conflicting item: 3a.P (qai = T': twlock (a.P, T' ) A3aj = T" : trlock ( c ~ . P , T " ) A i < j A T ' n r " = r'" # O A A n o t h e r write lock: 3ak = T* : twlock (~.P, T*)A T'" n r * = r # 0 ) . According to Definition 5, an edge (or a pair of edges) of the temporal serializability graph connects two transactions only if the destination of the edge can only be performed after the source of the edge. This can occur in the following two situations: 1. A transaction T" reads a value that was written by a transaction T ' with an intersecting valid time. Therefore, in a serial schedule T' commits before
Tit.
2. A transaction T* writes a value to a variable a . P in a valid time that intersects with a valid time of ~ . P that is part of a W R conflict between two transactions T' and T". In this case, T* can commit either before T' or after
Tit.
Definition 5 takes into account the temporal effect, and therefore there should be an overlapping of the locked temporal elements to generate a dependency. It is wortlknoting that since the retrieval of past application states (using observation times) are not involved in any conflict, they do not require a read lock and therefore do not affect the transactions' priority. However, the order of writing state-elements of the same variable with an overlapping valid time generates a W W conflict. This conflict prevents an erroneous interpretation of version queries.
Definition 6. - A t e m p o r a l cycle: Let G(V, E) be a temporal serializabiliy graph and let G' be a graph that is derived from G by choosing a single edge of each pair. A temporal cycle in G' is a sequence n
(((Ti,T2), T1), ((T2,T3),T2), ..., ((Tn, Ti), Tn) such that N Ti # O. i=l
T h e o r e m 1. Let T1, T2, ..., Tm be m transactions with transaction times X l , x2, ..., xm , respectively. A schedule S for T1, T~, ..., Tm is serializable iff there is a derivative of the temporal serializability graph G' (V, E'), built using S such that:
1. For no two transactions Ti and Tj such that xi < xj, ( (Tj, Ti), T) E E'. 2. G' (V, E') has no temporal cycles.
On Transaction Management in Temporal Databases
107
S k e t c h o f proof'. 5 ~ A s s u m e that S is a serializable schedule, yet for any derivative of the temporal serializability graph G' (V, E'), built using S, there exist two transactions Ti and Tj such that xi < xj and ((Tj, T~), T) 9 E'. Let ((Tj, T~), r) be an edge of a derivative of a temporal serializability graph: , T~), T) Was generated due to a WR conflict. ~ Ti reads a value that written by Tj. Tj should commit before T~ in any serial schedule equivalent to S. xj < xi. contradiction to the assumption. (1) 2. ((Tj, T~), r) was generated due to a W W / R W conflict. ~ : 1.
was
(a)
Tj writes a value before Ti and there is some transaction T that reads the value written by T~. ==~ Tj should commit before Ti in any serial schedule equivalent to S. xj < xi. contradiction to_the assumption. (2) (b) Ti writes a value after Tj reads a value written by some transaction T. Tj should commit before Ti in any serial schedule equivalent to S. xj < xi. contradiction to the assumption. (3) (1), (2), (3) ~ no two transactions Ti and Tj exist, such that x, < xj, <(Tj,Ti),T) 9 E'. The proof of the second part is similar to the classic proof regarding cycles in a serializability graph (see 29 for an example). ~ A s s u m e conditions 1 and 2 hold, and assume (without loss of generality) that xl < x2 < ... < Xm. Let R = T1 ~ T2 --* ... -~ Tm be a serial scheduler. Using induction, we can show that T~ reads similar sets of state-elements for each variable it locks, both in the given schedule S and in the serial schedule R. The reason being that if transaction Ti reads a value of an item (c~.P, r), then in both schedules the same transactions Tjl,Tj2,...,Tjk (1 < j l , j 2 , . . . , j k < i) were the last to write ~ . P in some temporal element r* such that r* A r ~ 0, or Ti is the first to read (c~.P, r). Otherwise, a temporal cycle would be generated (contradicting condition 2). Using the induction assumption we can show that the last transaction to write a variable ~ . P in a chronon t is the same in schedules S and R, and therefore similar sets of state-elements are generated for each variable.i: A temporal variation of 2PL, termed temporal 2PL requires that in any transaction, all (read and write) temporal locks precede all temporal unlocks. A strict temporal 2PL requires all temporal locks to be released after a transaction commits. As mentioned in 21, temporal 2PL cannot guarantee serializability. We use the following example to demonstrate this claim. Let T1 and T2 be two transactions and consider the following schedule S: 5 In this paper we present partial proofs. We present the part of the proof that is unique to temporal databases, and leave out the parts whose proof is similar to the proofs of classic theorems in transaction theory.
108
Avigdor Gal
IITi
T2
(1) twlock (~.P,T) (2) unlock ((~.P,T) trlock (~.P, T) (3) (a) twlock (~.P, T) unlock (a.P, 7) (5) Obviously, S obeys the temporal 2PL. Thus, for a serial schedule S' to be equivalent to S, T1 should precede T2. However, if T1 and T2 commit on Xl and x2, respectively and Xl > x2, in order for a serial schedule S' to be equivalent to S, T2 should precede T1. Therefore, S is not necessarily serializable. It should be noted that the equivalent schedule in a conventional database (where (a.P, 7) is replaced by a.P) is serializable, whether T1 commits before T2 or vice versa. Hence, the temporal 2PL is not sufficiently strict to enforce a specific order of commit commands. However, as the following theorem shows, a strict 2PL can enforce a specific order of commit commands and therefore can guarantee serializability. T h e o r e m 2. Let T1, T2, ..., Tm be m transactions with transaction times Xl,X2,...,Xm, respectively, and let S be a schedule for T1,T2,...,Tm. If S obeys strict temporal 2PL, then S is serializable. Sketch of proof: Let S be a schedule that obeys strict temporal 2PL and
assume that S is not serializable. Using Theorem 1, for any derivative of the temporal serializability graph G' (V, E') built using S, the following two scenaria are possible: 1. G' (V, E') has a temporal cycle. A contradiction is reached in a similar fashion to classic proofs (see 29 for an example). 2. G' (V, E') has notemporal cycles, yet there exist two transactions T' and T" such that x' < x and ((T",T'), T) e E'. ====>dueto the protocol strictness, T" should release all of its locks before T' can acquire a lock for some participant (c~.P,T'), where T' N T r 0. Let t be the time T" released all of its locks. ===>due to the protocol strictness, x < t. (1) Since T' is not completed by the time T" released all of its locks (it should still acquire at least one more lock), t < X + . (2) (1), (2)====>x" < x'. contradiction. tt
:::~If S obeys strict temporal 2PL, then S is serializable.D While strict 2PL ensures serializability, it is not necessarily the best protocol as it reduces concurrent activities. Thus, we present a protocol (commit/abort/wait) in Table 1 to increase concurrency while avoiding redundant aborts. Algorithm 1 provides the relevant activities of transactions during their life cycle. In addition to retrieving and updating the database, transactions lock and unlock variables and update the temporal serializability graph. A transaction that concluded its activities might be forced to wait before committing,
On Transaction Management in Temporal Databases
109
due to other transactions t h a t precede it in the temporal serializability graph and did not commit yet. It is worth noting t h a t any transaction would either commit or abort eventually, since the temporal 2PL prevents t e m p o r a l cycles (although it cannot ensure by itself the order of the committing transactions). Also, a transaction that reaches the e n d t r a n s a c t i o n 6 bares similarity to the t e r m distributed database systems (e.g. 15). We refrain from using this term to avoid confusion, phase would eventually commit, as nothing can prevent it from doing so (all activities were successful and there are no temporal cycles).
The commit/abort/wait protocol: On s t a r t t r a n s a c t i o n do: 1 generate a new node Ti in the temporal serializability graph 2 execute operations, using temporal 2PL for locking and unlocking and update the temporal serializability graph according to its definition
1 2 3 4 5
On e n d t r a n s a c t i o n do: release remaining locks obtained by Ti if exists (T, T~) E E then: wait else: commit O n c o m m i t do: remove T~ and all edges (T, T ~) s.t. T = Ti or T = T ~ end wait commit transaction
On a b o r t do: release remaining locks obtained by T~ remove T~ and all edges (T, T ~) s.t. T = T~ or T = T' end wait abort transaction O n end wait do: if exists (T, Ti) E E then: wait else:
commit T a b l e 1. Annotated listing of Algorithm 1--commit/abort/wait protocol
6 transaction
110
4
Avigdor Gal
I m p l e m e n t i n g a temporal transaction model
Having shown the temporal transaction model, in this section we provide a scheme of a temporal transaction model, based on the relational data model. We define the notion of a shadow relation and utilize it in an algorithm for a strict conservative temporal 2PL. Various methods were suggested to map a temporal data structure into a relational model, using normalization rules. A possible implementation can use universal relations as discussed in 24. Another possible implementation uses the ENF (Extension Normal Form) 12, which is an extension of the T N F (Time Normal Form) 23, as follows. Each relation designates a set of synchronous attributes, which are attributes that have common state-element's temporal information (i.e. x and v) at any chronon. Therefore, each relation is augmented with an attribute that represents x and two attributes (vs and ve) for the boundaries of a v interval. We can assume that if R is a relation and X c R is the object identifier, then X U {x, Vs, re} serves as a key for R. Using ENF, the update of the temporal database is a tuple-based without redundancies. It is worth noting that the representation is restricted since the v can be an interval but not a temporal element. To eliminate this restriction, a separate relation for the v element should be created, identified by a unique state-element identifier and the interval values. The use of a conventional locking mechanism for a temporal database based on a relational database is impossible. For example, let R be a relation in ENF (where the set {a, x, vs, ve} serves as a key):
Ia Ib Ix IvsIv l !al bl tilt2 t4 al b 2 t 2 t 2 t3 al b3 t3!t3 t4 Let T1 be a transaction that requires the locking of the latest value(s) of a variable b of R in t2,ta). Based on a conventional locking mechanism, the first tuple is locked (being the only one with vs =t2 and Ve =t4), while the other two tuples can be accessed. However, using the temporal semantics and assuming that tl
7 In 30, this problem is handled by using temporary relations, where an interval is replaved by a set of chronons. This solution is far more expensive than the one proposed i n this paper.
On Transaction Management in Temporal Databases
111
A strict conservative temporal 2PL: 1 For each participant (a.P), v') and a lock I do: 2 case LQ(a.Pj,v') of 3 ~: 4 insert (Ri, a.Pj, v', l)
5
6 7 8 9 10 11 12 13 14 15 16 17
{read}:
If I -~ read then: insert (Ri, (~.Pj, v', read) else: Wait otherwise: Wait e n d / * Case */ e n d / * For */ Process For each participant (a.P~,v ~) and a lock l do: delete (R~, a.Pj , v', l) e n d / * For */ Table 2. Annotated listing of Algorithm 2--strict conservative temporal 2PL
Definition 7 provides the data structure for the transaction locking mechanism, and uses the notation DomR: (P~) and DomR~ (P) to represent the domain of P~ in relations R'~ and P~, respectively. A lock can be associated with either a specific valid time or the full valid time axis (identified by the all keyword). Each relation is "shadowed" to provide the information regarding locked items, where a shadow relation of a relation R consists of all the locked intervals of any instance of R. X~ U {vs, ve} consists of sufficient information to identify an object and lock identifies a lock as either a "read" or a "write." Each transaction should (read/write) lock a variable in a specified valid time before using it. To identify whether a variable mPj can be locked in v' -- {v:, v'e} the following query LQ results in the current lock that is held on <~.Pj,v'-- {vi,v:}): LQ(mPj,v'= {vi,Ve} ) =
Table 2 presents an annotated listing of Algorithm 2. The algorithm provides the locking/uulocking mechanism of a strict conservative temporal 2PL. According to the algorithm, a read lock can be assigned with ~.Pj at v' = {v:, v:} (v can be (all, all}) only if LQ(a.P), v = {v:, v:}) is either empty or results in a "read" response. A write lock can be assigned with a.Pj at v --- {v:, v:} only if
LQ(~.Pj,v = {v:, v:}) is empty.
The "Wait" statement puts the transaction on a waiting queue for a release of locks. Being a conservative protocol, it prevents deadlocks and can also prevent livelocks under certain conditions. The size of a shadow relation depends on the number of concurrent running transactions. LQ runs in O(lg n) in the worst case (where n is the number of
112
Avigdor Gal
locks that currently exist), and therefore adds little overhead to the transaction's performance.
5
Conclusion
This paper provides a transaction model for temporal databases and presents a new protocol that increases concurrency and reduces redundant abort operations. We also provide a scheme for implementing a temporal transaction protocol on top of a relational database model. The contribution of the paper lies in identifying the unique properties of transaction management in temporal databases and the use of these properties to provide a refined locking mechanism to enhance the concurrency of such databases. The implementation of the protocols, as provided in this paper, is currently underway at the University of Toronto. Further research would discuss extending the approach suggested in Section 4 by adding temporal support to conventional transaction models, rather than replacing conventional transaction models. Existing protocols should be compared and evaluated to identify the impact of a temporal extension on the performance of transaction management. This combination would enable the use of temporal oriented data along with conventional data within a single database. Such research should provide a robust mechanism to support temporal oriented data in existing databases. Another possible extension of this paper, that the temporal research area might benefit from, consists of using a multilevel transaction model 31 to model temporal transactions.
Acknowledgments I would like to thank Opher Etzion, Arie Segev and Dov Dori for their collaboration in designing the temporal data model. I would also like to thank the anonymous reviewers and the participants of the Dagstuhl seminar for their remarks and contribution.
References 1. G. Ariav. A temporally oriented data model. ACM Transactions on Database Systems, 11(4):499-527, Dec. 1986. 2. A. Bernstein and N. Goodman. Timestamped-based algorithms for concurrency control in distributed database systems. In Proceedings of the International Conference on VLDB, pages 285-300, 1980. 3. G. Bhargava and S. K. Gadia. Relational database systems with zero information loss. IEEE Transactions on Knowledge and Data Engineering, 5(1):76-87, Feb. 1993. 4. J. Blakeley. Challenges for research on temporal databases. In Proceedings of the International Workshop on an Infrastructure for Temporal Database, June 1993. 5. D. Chamberlin. Using The New DB2, IBM's Object-Relational Database System. Morgan Kaufmann Publishers, Inc., San Francisco, California, 1996.
On Transaction Management in Temporal Databases
113
6. P. Chrysanthis and K. Ramamritham. ACTA: The saga continues. In A. E1magarmid, editor, Database Transaction Models for Advanced Applications, chapter 10, pages 349-397. ACM Press. Morgan Kaufmann publisher, Inc., 1992. 7. J. Clifford and A. U. Tansel. On an algebra for historical relational databases: two views. In Proceedings of the ACM SIGMOD, pages 247-265, May 1985. 8. P. Dadam, V. Y. Lure, and H.-D. Werner. Integration of time versions into a relational database system. In Proceedings of the International Conference on VLDB, pages 509-522, Singapore, 1984. 9. S. Gadia. The role of temporal elements in temporal databases. Data Engineering Bulletin, 7:197-203, 1988. 10. A. Gal. TALE - - A Temporal Active Language and Execution Model. PhD thesis, Technion--Israel Institute of Technology, Technion City, Haifa, Israel, May 1 9 9 5 . Available through the author's WWW home page, http://www.cs.toronto.edu/,.~avigal. 11. A. Gal and O. Etzion. Parallel execution model for updating temporal databases. International Journal of Computer Systems Science and Engineering, 12(5):317327, Sept. 1997. 12. A. Gal, O. Etzion, and A. Segev. Extended update functionality in temporal databases. Technical Report ISE-TR-94-1, Technion--Israel Institute of Technology, Sept. 1994. 13. A. Gal, O. Etzion, and A. Segev. TALE - - a temporal active language and execution model. In P. Constantopoulos, J. Mylopoulos, and Y. Vassiliou, editors, Advanced Information Systems Engineering, pages 60-81. Springer, May 1996. 14. H. Garcia-Molina and K. Salem. Sagas. In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 249-259, May 1987. 15. D. Georgakopoulos, M. Rusinkiewicz, and A. Sheth. Using tickets to enforce the serializability of multidatabase transactions. IEEE Transactions on Knowledge and Data Engineering, 6(1), 1994. 16. T. Hadzilacos and C. Papadimitriou. Some algorithmic aspects of multiversion concurrency control. In Proc. Fourth ACM Sym. on Principles of Database Systems, pages 96-104, 1985. 17. C. Jensen, J. Clifford, S. Gadia, A. Segev, and R. Snodgrass. A glossary of temporal database concepts. ACM SIGMOD Record, 21(3):35-43, 1992. 18. C. Jensen et al. A consensus glossary of temporal database concepts. ACM SIGMOD Record, 23(1):52-63, 1994. 19. A. Kumar and M. Stonebraker. Performance evaluation of an operating system transaction manager. In Proceedings of the International Conference on VLDB, pages 473-481, Brighton, England, 1987. 20. D. Lomet. Grow and post index trees: Role, techniques and future potential. In Proceedings of the Second Symposium on Large Spatial Databases, Zurich, Switzerland, 1991. 21. D. Lomet and B. Salzberg. Transaction-time databases. In Temporal Databases: Theory, Design, and Implementation, pages 388-417. Benjamin/Cummings, 1993. 22. M.Stonebraker. Conncurency control and consistency of multiple copies of data in distributed ingres. IEEE Transaction on Software Engineering, 3(3):188-194, May 1979. 23. S. Navathe and R. Ahmed. A temporal relational model and a query language. Information Sciences, 49:147-175, 1989. 24. B. Nixon et al. Design of a compiler for a semantic data model. Technical Report CSRI-44, Computer Systems Research Institute, University of Toronto, May 1987.
114
Avigdor Gal
25. C. Papadimitriou, P. Bernstein, and J. R. Jr. Computational problems related to database concurrency control. In Proceedings of the Conference on Theoretical Computer Science, 1977. 26. N. Pissinou, R. Snodgrass, R. Elmasri, I. Mumick, M. Ozsu, B. Pernici, A. Segev, and B. Theodoulidis. Towards an infrastructure for temporal databases--A workshop report. ACM SIGMOD Record, 23(1):35, 1994. 27. R. Snodgrass and I. Ahn. Temporal databases. IEEE Computer, 19:35-42, Sep 1986. 28. K. Torp, C. Jensen, and M. Bohlen. Layered temporal DBMS: concepts and techniques. In Proceedings of the Fifth International Conference On Database Systems For Advanced Applications (DASFAA '97), Melbourne, Australia, Apr. 1997. 29. J. Ullman. Principles of Database and Knowledge-Base Systems, volume 1. Computer Science Press, Rockville, Maryland, 1 edition, 1988. 30. C. Vassilakis, N. Lorentzos, and P. Georgiadis. Transaction support in temporal DBMS. In J. Clifford and A. Tuzhilin, editors, Recent Advances in Temporal Databases, pages 255-271. Springer, 1995. 31. G. Weikum. Principles and realization strategies of multilevel transaction management. ACM Transactions on Database Systems (TODS), 16(1):132-180, 1991.
Implementation Options for Time-Series Data Ramez Elmasri and Jae Young Lee Department of Computer Science and Engineering The University of Texas at Arlington Arlington, Texas 76019-0015
{elmasri, j lee}@cse.uta.edu
A b s t r a c t . Time series management system is a special type of temporal
databases that has a wide variety of applications. The most distinguished feature of time series is that the change of data values is tightly associated with a specific pattern of time, called calendar. Though there have been some research on time series management, very few studies have reported on how to map time series into implementation models. In this paper, we discuss different approaches to map time series into relational and object-oriented data models. The mapping schemes are illustrated using a simple example.
1
Introduction
Objects in temporal databases can be classified into the following three different types according to their temporal characteristics: - Time-invariant objects: These objects are constrained not to change their values in the information system being modeled (example: social security number of a person, system-generated surrogate, etc.). - Time-varying objects: These are the objects that may change values with an arbitrary frequency (example: salary or rank of an employee). - Time-series objects: These objects change their values and the change is tightly associated with a particular pattern of time, called calendar (example: daily stock price, scientific data sampled regularly, etc.). Most temporal database management systems concentrated on describing temporal data based on versioning of objects, tuples, or attributes SNO87,NA89, GY88,WD92,EW90. The concept of time series, which is often needed in temporal applications SDDM95, does not fit well within these models. It is only recently that time series management systems attracted the attention of researchers. Some of the research results are reported in CS93,DDS94a,DDS94b, DDS95,SDDM95,SS87, where specialized time series management systems which mainly deal with time-series objects are discussed. In LEW97, Integrated Temporal Data Model (ITDM) that incorporates all three types of objects was O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 115-128, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
116
Ramez Elmasri and Jae Young Lee
proposed. One of the important issues regarding time-series management systems is how to map a conceptual time-series model to database implementation models, such as relational or object-oriented models. In this paper, we discuss different implementation options that map ITDM to relational and object-oriented databases. The paper is organized as follows. Section 2 briefly reviews basic concepts of time series and calendars. ITDM is also briefly reviewed in Section 3. Sections 4 and 5 discuss the mapping of ITDM to relational and object-oriented data models, respectively. Finally, Section 6 concludes the paper. 2
Time
Series
In this section, we briefly discuss the basic concepts of time series and calendars. 2.1
Basic Concept
Time series is an abstraction mechanism to manage collections of data that possess observed values at regular periods, or intervals. Collection and analysis of financial data and scientific data are examples that can be modeled with time series. Typical properties of such applications are: 1. Such applications usually involve large amounts of data. 2. The change of values of data is tightly associated with a predefined, specific time pattern, called calendar. 3. Manipulation of data involves numerically intensive, statistical analysis. 4. More emphasis is put on aggregation operations on a collection of data rather than on individual data items. Time series is usually represented as a sequence of events. An event is an ordered pair consisting of a temporal value and data value. The data value can be single-valued or multivalued. The format of a typical multivalued time series is: {(tl, < data_valuelj, data_value1,2, ... >), (t2, < data_value2,1, data_value2,2, ... >), ...}. A single-valued time series will be of the format: {(tl, data_value1), (t2, data_value2,), ...}. Here, data_valuei,j (or data_valuei) is the value of a data item from the corresponding domain of possible values. Each time series is associated with a calendar CSS94,DDS94a,DS93,KO95,SS92. A calendar provides the domain for the temporal values of the corresponding time series. In general, it specifies: - Granularity of temporal values. - The pattern and period of a sequence of temporal values. - Start and end times. Thus, the sequence of temporal values (tl, t2, ...) of a time series is determined by the corresponding calendar. A time series that models the closing values of a stock price, assuming a calendar that specifies the sequence of five working days, is shown in Figure 1.
Implementation Options for Time-Series Data
117
Temporal Value Data Value 3/6/97 120 3/7/97 125 3/10/97 127
3/11/97
134
3/12/97
139
Fig. 1. An example time series of a stock price.
2.2
C a l e n d a r s for T i m e S e r i e s
One of the important components of time series management system is a calendar. Calendars in time series management systems are different from regular calendar systems that define and provide temporal values in accordance with physical time. A calendar for time series defines a particular time pattern according to which data values are sampled and recorded. For example, the calendar for the closing stock price in Section 2.1 defines the sequence of days during which a stock market is open, in other words the sequence of only working days that do not include weekends and holidays is specified. A calendar determines the domain of temporal values of the time series associated with it. It also provides a basis for an interface for query languages that are closer to human perception. For example, we can have a query that involves a phrase like the fourth Thursday of November every year. This phrase is difficult to express using a query language which is based on a regular calendar system because weeks overlap with months and the lengths of months (and years) are not identical. With appropriate calendars defined (refer to Example 2.1 later in this section), it is possible to include the such phrase in a query assuming a properly designed parser is available. A calendar is considered as a totally ordered set (sequence) of time units with additional semantics, where the time unit refers to a time interval expressed with a certain granularity, such as Second, Minute, Month, etc. We represent a calendar with a tuple < granularity, pattern, period, start time, end time>, where
- Granularity is the default time unit used in a calendar. Pattern is a subsequence of time units expressed as a temporal element. If a calendar is periodic, the pattern with respect to one period is specified. Otherwise, the whole sequence of a calendar is specified. If there is more than one possible pattern, they are separated by "1" which means or. - Period is the length of a time interval at which pattern occurs repeatedly. It is expressed in terms of the number of time units of a granularity in that period, or by a particular granularity. Alternatively, it may be specified with the name of a periodic calendar. In this case the period of the calendar being specified is identical to the period of the other calendar. The period of an aperiodic calendar is specified by c~. Start time is the time unit from which a calendar starts. End time is the time unit at which a calendar ends. -
-
-
118
Ramez Elmasri and Jae Young Lee Then, a calendar of 8 work hours per day can be specified as follows: Calendar WorkHours
There are three types of calendars: 1. Calendars modeling physical time space: Calendars Days, Hours, M i n u t e s , Seconds, etc belong to this category. These are system defined calendars. 2. Calendars defined in accordance with a particular calendar system being used: Some example calendar systems are Gregorian, Islamic, Jewish, and Oriental Lunar calendar systems. These are also system defined. Calendars W e e k s , M o n t h s , and Y e a r s are examples of Gregorian calendars. 3. Calendars defined according to particular applications: The calendar of 8 work hours per day shown above is an example. These are user-defined calendars. The specifications of 7 system defined calendars (4 for physical time space and 3 for Gregorian) are shown in the Appendix. Another way of specifying calendars is to use calendar operations. This is more flexible and powerful than the previous approach, and most of user-defined calendars can be defined using this approach. We first define basic calendar operations and show how they can be used to derive various user-defined calendars. The informal definitions of four basic calendar operations are given below. Interested readers are referred to LEW97 for formal definitions.
- selectgr(C, i, j, re f): From each period of a calendar C, select j time units starting from the i th time unit. If r e f is begin, it counts from the first time unit in a period, and it counts from the last time unit if r e f is end. The result is represented in terms of the granularity gr. - intersect(C~, Cj): Intersection of two calendars C~ and Cj. - union(Ci, Cj): Union of two calendars C~ and Cj. - exclude(Ci, Cj): Exclude the whole sequence of a calendar Cj from that of a calendar Ci. The followings are examples of user-defined calendars derived using these operations. Note that the union operation is associative and commutative and, for notational convenience, we allow more than two arguments for the operation. E x a m p l e 2.1: - Calendar Mondays - Calendar Januarys
of all Mondays: = select(Weeks, 1, 1, begin) of all Januarys: ~- selectDay(Years, 1, 1, begin)
Implementation Options for Time-Series Data
119
Calendar of all Mondays in Januarys: Mondays-January = intersect(Mondays, Januarys ) - Thanksgiving (the fourth Thursday of November every year): Thanksgiving = select( intersect( Novembers, Thursdays), 4, 1, begin) (* assume that calendars Thursdays and Novembers are defined *) Christmas: Christmas = select( Decembers, 25, 1, begin) (* assume a calendar Decembers is defined *) - Holidays: Holidays = union( MemorialDay, July4, LaborDay, Thanksgiving, -
Christmas) (* assume calendars MemorialDay, LaborDay, and July4 are defined *) All weekends: Weekends -- selectnay(Weeks, 6, 2, begin) -= selectnay(Weeks, 1, 2, end) - 5 work-days a week: WorkDaysAWeek = Exclude(Weeks, Weekends) - 5 work-days a week excluding all holidays: BnsinessWeek = Exelude(WorkDaysAWeek, Holidays) -
2.3
Comparison
with
Other
Calendars
The calendar system in SS92 is most general among those reported in the literature. It provides basic temporal domain for a database management system. The main concept underlying this proposal is that a single interpretation of time is insufficient for all users and applications, and it is necessary to develop a general solution to support multiple interpretations of time. It allows multiple calendars (Gregorian, Lunar, Islamic, or other application sepcific calendars) to exist within the database management system. CSS94 is closer to our proposal in that it defines calendars as an abstract data type. Here, calendars are used to define lists of time intervals, to specify natural-language time-based expressions, to specify temporal conditions in database queries and rules, and to specify userdefined semantics for data manipulations. Calendars are modeled as structured nested lists of intervals and calendar operators are defined based on interval relationships. Based on these, a calendar script language is also defined to generate and manipulate new calendars. Our proposal is different in that it focuses on the periodic nature of the temporal domain of time-series data. It defines a framework where various userdefined, time-series specific calendars can be specified. Here, a generic calendar specification is defined first. Then, different calendars are specified as instances of the generic specification. Calendar operations, with which various user-defined calendars can be derived, are also defined. One distinguising feature of our calendar system is that it explicitly deals with the irregularity of real-world calendar systems. By introducing the concept of abstract calendars and granularities, it is possible to formally specify set theoretic operations on calendars that have irregular periods LEW97. In other calendars, the irregularity is hidden as an
120
Ramez Elmasri and Jae Young Lee
implicit assumption and it was not explicitly mentioned as to how the irregularity is resolved. Note that our calendars can be built on the calendric system(s) of SS92 since they are orthogonal to each other. 3
Integration Models
of Time
Series and Version-Based
Data
As discussed earlier there are three different types of objects in temporal databases: time-invariant, time-varying, and time-series objects. Traditional versionbased temporal databases mostly manage time-invariant and time-varying objects. On the other hand, specialized time series management systems CS93, DDS95 mainly deal with time-series objects. Most large, real-world applications, however, have all three types of objects. So, it is necessary to have a temporal data model that integrates all types of objects. One such model is Integrated Temporal Data Model (ITDM) LEW97, which is based on the TEER (Temporal Enhanced Entity-Relationship) model JEW90. In ITDM, a timeseries object is modeled as a time-series attribute and treated in the similar way as other attributes. Associated with each time-series object is a calendar which provides a temporal domain that determines the valid temporal values and update frequency of the corresponding time-series attribute. Like other attributes in the TEER model, temporal values of a time-series attribute TS of an entity e, TS(e), is a temporal assignment, that is a partial function TS(e) : C A L E N D A R ( T S ) --* dom(TS), where C A L E N D A R ( T S ) is the temporal domain of T S determined by the calendar associated with TS. Figure 2 shows an example database schema for a simple portfolio consisting of common stocks. The graphical notations used in the figure are the same as those in typical EER diagrams except the representation of attributes. The following convetnions are used to distinguish different types of attributes: 1. A time-invariant attribute is denoted by an oval. 2. A time-varying attribute is represented by an oval with a double-lined rectangle in it. 3. An oval with a single-lined rectangle in it represents a time-series attribute. 4. Each time-series attribute is associated with a particular calendar, which is connected to the time-series attribute by an arrow. 5. Component attributes of a composite time-series attribute normally do not have separate calendars connected to them. The assumption is that the calendar of the composite attribute applies to all of its component attributes. 6. However, the data values of a component time-series attribute may themselves be a time series (e.g., Ticks in Figure 2), and hence a separate calendar can be associated with the componet time-series attribute. The schema is simple yet shows all types of attributes. The attribute Issuer is a time-invariant attribute while the attribute Shares is a time-varying attribute. The attributes Dividend, Price, and Ticks are time-series attributes. The attribute Price is a complex attribute composed of three component attributes:
Implementation Options for Time-Series Data
121
high, low, and ticks denoting the daily high price, daily low price and a list of hourly prices per day, respectively. Note that Price has a nested time-series attribute Ticks. Associated with each time-series attribute is a calendar which is connected to the attribute by an arrow. It is assumed that the dividend is paid once every three months, stock price is sampled once every working day, and stock tick is sampled once an hour during working hours. Assuming the calen-
Fig. 2. An example database schema of a portfolio model
dars BusinessWeek, WorkHours, and Quarters are properly defined a possible set of attribute values of an entity of type Stock during the week of 3 / 6 / 9 7 - 3 / 1 2 / 9 7 is (note that March 8 and March 9 are weekends):
SURROGATE(e) = {1/1/96, now--~ surrogate_id}/* system generated */ Issuer(e) = {1/1/96, now --* I B M } Shares(e) = {1/1/96, 3/5/96 ~ 1000,
3/6/96, 9/9/96
1500,
9/10/96, 1/21/97 ~ 1200, 1/22/97, now ~ 1300} Dividend(e) -- {3/31/96 --~ 200,
6/30/96 9/30/961
275, 250,
12/31/96 -~ 280} --~< 130, 85, {9: 00 --~ 90, ..., 17: 00 3/7/97 4 < 125, 81, {9: 00 ~ 81, ..., 17: 00 3/10/97 -~< 132,125, {9: 00 --~ 130, ..., 17: 3/11/97 --*< 134,122, {9: 00 --~ 128, ..., 17: 3/12/971 --*< 139, 130, {9: 00 -~ 130, ..., 17:
Price(e) = {3/6/97
--~ 120} >, ~ 125} >, 00 --* 127} > 00 --~ 134} >} 00 --* 139} >}
For more details about ITDM (such as incorporation of calendars in the data model, update/retrieval operations, query language, etc.), interested readers are referred to LEW97,LEE98.
122 4
Ramez Elmasri and Jae Young Lee Mapping
to Relational
Databases
In the following discussions, the entity type of which a time series is an attribute is referred to as TS-entity of the time-series attribute. In general, there are two main ways of mapping ITDM to relational databases. The first way is to store the actual time series in relations. In this approach, a relation is created for each time-series attribute. The detailed mapping procedure is: 1. Create a relation R for a TS-entity E. Only time-invariant attributes of E are included as attributes of R. 2. Create a relation for each simple time-series attribute. The attributes of the new relation are: - The primary key of the TS-entity, which is a foreign key. - Temporal value of the time series. - D a t a value of the time series. 3. If a time-series attribute is composite, one relation is created for each component attribute. The attributes of new relations are the same as above. 4. Mapping of time-varying attributes is the same as that of time-series attributes. A new relation is created for each time-varying attribute, and the attributes of the relation are: - The primary key of the TS-entity, which is a foreign key. - Valid time of the attribute. - Data value of the attribute. Relation schemas for the example ITDM schema of Figure 2 is shown in Figure 3. One drawback of this approach is the repetition of a foreign key in each
DIVIDEND SHARES
STOCK
Issuer
Issuer
TI
V a l i d Time
IBM
3/31/96
200
1000
IBM
6/30/96
275
IBM
9/30/96
250
IBM
12/31/96
280
1/1/96 - 3/5/96
IBM
3/6/96 - 9/9/96
1500
IBM
9/10/96 - 1/27/97
1200
Time.
LowValue
LOW
Issue_.___./.r
Tim_._.____._eHighValue _e.
IBM
3/6/97
130
Issuer IBM
Value
, Value
IBM
HIGH
Time
TICKS
3/6/97
85
Issuer IBM
Time
Tick
3/6/97 09:00
90
IBM
3/7/97
125
IBM
3/7/97
81
IBM
3/6/97 i0:00
105
IBM
3/10/97
132
IBM
3/10/97
125
IBM
3/6/97 11:00
123
IBM
3/11/97
134
IBM
3/11/97
122
IBM
3/6/97 13:00
130
Fig. 3. Mapping to relations
time-series relation resulting in the waste of storage space. Another possible disadvantage of this approach is that, since there is one relation for each time series,
Implementation Options for Time-Series Data
123
there may be a large number of relations in the resulting relational database. However, this approach is a viable one considering the recent advance of technology in both hardware and software. T h e other approach solves the problems by storing time series in separate files instead of directly mapping them into relations. The mapping procedure is: 1. Create a relation R for a TS-entity E. 2. Each simple time-series attribute of E is converted to a pair of attributes in the new relation R. The two attributes are: The name of a file that stores the data values of the time series. - The name of the calendar which is associated with the time series. 3. For a composite time-series attribute, a pair of attributes is created for each component attribute in the same way as above. 4. Time-varying attributes are dealt with in the same way as in the previous approach. -
The resulting relation schemas along with the file organization that are derived from the same example I T D M is shown in Figure 4. As we can see there is no
Realtion Schemas STOCK Issuer Dividend IBM
IBM_Div
TI
TI_Div
DCal
High
Quarters
HCal
IBM High BusinessWeek
Quarters TI_High
BusinessWeek
LOW
LCal
Ticks
IBM_Low
~usinessWeek
IBM_Ticks
WorkHours
TCal
TI_LOw
~usinessWeek
TITicks
workHours
SHARES Issuer
Valid Tim_e
, Value
IBM
1/1/96 - 3/5/96
I000
IBM
3/6/96 - 9/9/96
1500
IBM
9/10/96 - 1/27/97
1200
File Stxnlcture
3,31,963/6,97 IBM Div
IBM_High
6/30/96~_~ 12/31/961
280
I
3/7/97 ~
IBM Ticks 3/6/97
09:00 ~
3/6/97
I0:00 ~
3/10/97
3/10/97 ~ _ _ ~
3/6/97
ii :00 ~ _ _ ~
3/11/97
3/11/971 ,122.1
3/6/97
13:00 I 130
3/7/97
9/30/96~
IBM_Low 3/6/97 ~
I
Fig. 4. Mapping to files
wasted storage space in this scheme. Note that the files do not store temporal values either, further saving storage space. This is because temporal values can be derived from the corresponding calendar specifications that are stored
124
Ramez Elmasri and Jae Young Lee
in the system catalog as part of meta-data. The trade off is normal queries can not directly access time series since files are not first class objects in relational databases. So, it is necessary to have special features in a query language along with special access structures. Note that the algorithm described in this section is orthogonal to calendar specifications and, thus, can be applied to other calendar specifications. However, implementation details may differ depending on particular calendar specification and data model employed.
5
Mapping to Object-Oriented Databases
Mapping of time series into object-oriented data databases is straightforward. The main idea is to create a class definition for each time-series attribute. Then, a TS-entity is mapped to a class definition keeping all time-series attributes, which are the references to time-series classes. The mapping scheme is summarized below: 1. A class Calendar is defined in accordance with the calendar specification discussed in Section 2.2. The attributes of the calendar class includes granularity, pattern, period, start time, and end time. 2. A class hierarchy of time series is defined. In this hierarchy, various specific time series are specified as subclasses as needed. Inside each class definition, application specific operations and update/retrieval operations are defined as methods. Some example operations are: (a) Functions computing the sum, average, minimum, or maximum data values for a given time period. (b) Operations that return the data value(s) for a given time point (interval). (c) Operations that modify the data values. (d) Functions that return the property of the associated calendar (such as granularity, period, etc.). 3. Each time-series class definition includes a calendar variable of Calendar class type (this establishes the association between a time series object and a calendar object). In addition to that, it has an attribute value which is a sequence of data values with each data value corresponding to a sampling time determined by the associated calendar variable. 4. A TS-entity E in the integrated model is mapped to a class definition. All time-series attributes in E remain in the class definition as attributes. The value of each of these attributes is the reference to the corresponding timeseries class created above. 5. If a time-series attribute is a complex attribute and one of its component attributes is also a time-series attribute, it is dealt with in the same way as described above (nested time series).
Implementation Options for Time-Series Data
(* class definition of Calendar *) class Calendar
t y p e tuple(granularity: granularity_type, pattern: pattern_type, period: period_type, start_time: time_unit, end_time: time_unit) ; method ... end
(* class definitions of time series Ticks, Price, and Dividend *) class Ticks t y p e tuple(WorkHours: Calendar, value: list(tick: integer)) ; method ... end
class Price t y p e tuple(BusinessWeek: Calendar, value: list (DallyValue: tuple(high: integer, low: integer, ticks: Ticks))) ; method ... end class Dividend
t y p e tuple(Quarters: Calendar, value: list(dividend: integer)) ; method ... end
(* class definition of the entity type Stock *) class Stock
t y p e tuple(issuer: Company, (* assume class Company exists *) shares: list (share: t u p l e ( vt: ValidTime, num_stocks: integer)), dividend: Dividend, price: Price) ; method end
Fig. 5. O 0 class definitions of the portfolio example
125
126
Ramez Elmasri and Jae Young Lee
A mapping from the example ITDM of Figure 2 to an OO data model is shown in Figure 5 that uses 02 notations LRV88. Here, the data types granularity_type, pattern_type, period_type, and time_unit must be defined separately. Note that the time-varying attribute Shares is mapped to an attribute whose value is a list of tuples each of which consists of a valid time (vt) and a data value (num_stocks). This corresponds to the traditional attribute-versioning with valid time. As can be seen here, mapping to an object-oriented database does not cause any problem. With object-oriented data model, it is possible to have complex data (class) structures through reference and collection attributes. Other features of OO data model, such as methods, inheritance, etc. make it more appropriate for time series management. 6
Conclusion
In this paper, we discussed several different approaches of mapping time series into implementation models, and mapping procedures were illustrated with a simple example. Mapping to an object-oriented data model is simple and straightforward. Mapping to relational databases incurs some problems due to the simple and fiat structure of the basic relational model. However, considering the fact that there exist a large number of operational database management systems based on relation model it is worth investigating. Furthermore, recent advancement of techonology (cheap, fast storage media with large capacity are available) is expected to ease the difficulties involved in the relational implementation of time series. References CS93
R. Chandra and A. Segev, "Managing Temporal Financial Data in an Extensible Database," Proc. 19th INt'l Conf. on VLDB, pp.302-313, 1993. CSS94 R. Chandre, A. Segev, and M. Stonebroker, "Implementing Calendars and Temporal Rules in Next Generation Databases," Proc. 3rd Int'l Conf. on Data Engineering, pp.264-273, 1994. DDS94a W. Dryer, A.K. Dittrich, and D Schmidt, "An Object-Oriented Data Model for a Time Series Management System," Proc. 7th Int'l Working Conf. on Scientific and Statistical Database Management, pp.186-195, 1994. DDS94b W. Dryer, A.K. Dittrich, and M. Stonebroker, "Research Perspectives for Time Series Management Systems," ACM SIGMOD Record, Vol.23, No.l, pp.10-15, 1994. DDS95 W. Dryer, A.K. Dittrich, and D. Schmidt, "Using the CALANDA Time Series Management System," Proc. ACM SIGMOD Int'l Conf., 1995. DS93 C.E. Dryson and R.T. Snodgrass, "Timestamp Semantics and Representation," Information Systems, 18, No.3, pp.143-166, 1993. EW90 R. Elmasri, G. Wuu, "A Temporal Model and Query Language for ER Databases," Proceedings of the 6th International Conference on Data Engineering, February 1990.
Implementation Options for Time-Series Data GY88
127
S.Gadia, C.Yeung, "A Generalized Model for a Temporal Relational Database," Proceedings of ACM SIGMOD Conference, 1988. KO95 A. Kurt and M. Ozsoyoglu, "Modeling periodic Time and Calendars," Proc. Int'l Conf. on Application of Databases, pp.221-234, 1995. LEE98 J.Y. Lee, "Database Modeling and Implementation Techniques for TimeSeries Data," Ph.D. Dissertation in preparation, Computer Science and Engineering Dept., University of Texas at Arlington, 1998. LEW97 J.Y. Lee, R. Elmasri, and J. Won, "An Integrated Temporal Data Model Incorporating Time Series Concept," Data & Knowledge Engineering, Vol. 24, No. 3, pp. 257-276, January 1998, North-Holland. LRV88 C. Lecluse, P. Richard, and F. Velez, "02, an Object Oriented Data Model," Proc. ACM SIGMOD Conference, 1988. NA89 S. Navathe, and R. Ahmed, "A Temporal Data Model and Query Language," In Information Sciences, 1989. SDDM95 D. Schmidt, A.K. Dittrich, W. Dryer, and R. Marti, "Time Series, a Neglected Issue in Temporal Database Research?," Proc. Int'l Workshop on Temporal Database, pp.214-232, 1995. SNO87 R. Snodgrass. "The Temporal Query Language TQUEL," ACM TODS, Vol. 12, No. 2, June 1987. SS87 A. Segev, and A. Shoshani, "Logical Modeling of Temporal Data," Proceedings of ACM SIGMOD Conference 1987. pp. 454 - 466. SS92 M.D. Soo and R.T. Snodgrass, "Multiple Calendar Support for Conventional Database Management," Technical Report TR 92-07, Dept. of Computer Science, University of Arizona, USA, 1992. WD92 G. Wuu, U. Dayal, "A Uniform Model for Temporal Object-Oriented Databases," Proceedings of the 8th IEEE Data Engineering Conference, February 1992.
128
Ramez Elmasri and Jae Young Lee
A p p e n d i x : C a l e n d a r Specifications for Physical T i m e Space a n d Gregorian
1. C a l e n d a r s for Physical T i m e Space Calendar Seconds Calendar Minutes end time: oo > Calendar Hours Calendar Days end time: c~ > 2. Calendars for Gregorian
Calendar Weeks Calendar Months end time: co > Calendar Years
Expressive Power of Temporal Relational Query Languages and Temporal Completeness Abdullah Uz Tansel 1,2 and Erkan T m 2 1 Baruch College, CUNY, 17 Lexington Avenue, Box E0435, New York, NY 10010, U.S.A. 2 Department of Computer Engineering and Information Science, Bilkent University, Bilkent, Ankara 06533, Turkey
A b s t r a c t . In this paper, we consider the representation of temporal
data based on tuple and attribute timestamping. We identify the requirements of temporal data and elaborate on their implications. We introduce a temporal relational data model where N1NF relations with l-level of nesting are used. This model uses attribute timestamping. For this model, nested relational tuple calculus (TRC) and equivalent temporal relational algebra (TRA) are defined. We follow a comparative approach towards completeness of temporal query languages. In this direction, we use TRC as a metric and identify common temporal operations. 1
1 Introduction A Temporal Database (TDB) is defined as a database maintaining object histories, i.e., past, present, and possibly future data. A comprehensive treatment of various approaches for handling temporal data can be found in 18. Maintaining temporal data within traditional relational databases is not straightforward. There are issues peculiar to temporal data, i.e., comparing database states at two different time points, capturing the periods for concurrent events and accessing to times beyond these periods, handling multi-valued attributes, representing and restructuring temporal data, etc. These issues are the indication of the diversity in this field which has been manifested by more than a dozen temporal relational data models and query languages proposed to date 11. There are two studies exploring the expressive power of temporal data models and their query languages 2, 61, Gadia defines a temporal relational algebra and proposes to use it as a yardstick in evaluating expressive power of temporal query languages 6. Gadia and Yeung also define a tuple calculus for this model 8 and use it to compare intervals and temporal elements (defined below) in modeling temporal data. Clifford, Croker, and Tuzhilin (CTT) define a historical 1 Material in Sections 2-4, except the proof of Proposition 3, and Section 5 appeared in the references 171 and 20, respectively. O. Etzion, S. Jaodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 129-149, 1998. (~ Springer-Verlag Berlin Heidelberg 1998
130
Abdullah Uz Tansel and Erkan Tm
language, Lh, and use it as a metric to evaluate four query languages: their own algebra 3, Gadia's calculus 7, TQuel 14, and Lorentzos and Johnson's algebra 10. It seems to us that these languages are not a representative sample of current temporal query languages. CCT provide a transformation from HRDM to Lh, excluding some operations 3, but not to the other languages. They also do not consider languages like Bhargava and Gadia's algebra 11, Gadia and Yeung's tuple calculus 8, Tansel's algebra 17, etc. Especially the latter allows non-homogeneous relations. CCT helps us to understand the issues in temporal relational completeness. However, its scope is limited since it considers only some query languages, and its conclusions are not sufficiently strong, and Lh is not powerful enough to be a metric for temporal query languages since it is not closed. In this article, we use a temporal relational data model based on NINF relations with one level of nesting, i.e., set-valued attributes since modeling capabilities of these relations is sufficient for the purpose of this study. We also use a tuple relational calculus which includes set membership formulas and set constructors. This model meets the requirements of temporal data discussed in Section 2.1. Its calculus can express all the queries expressible by the relational database languages since it is a superset of traditional tuple calculus. Therefore, we use this language as a yardstick in evaluating the expressive power of temporal query languages. Representation of temporal data and its requirements are considered in Section 2. In Section 3, temporal relational data model is introduced. Then the formal definitions of a temporal relational calculus and a temporal relational algebra are given in Section 4 and Section 5, respectively. In Section 6, the temporal operations common to the existing algebraic languages are considered and how these operations can be expressed in the temporal relational calculus and the temporal relational algebra is shown. Section 7 gives the justification to use temporal relational calculus as a metric in evaluating the expressive power of temporal relational query languages. As an example, a temporal logic based language and Bhargava and Gadia's algebra's expressive power are evaluated by TRC. Finally, we conclude in Section 8.
2
Representing Temporal Data
Atoms take their values from some fixed universe U. This universe is the set of all atomic values such as reals, integers, character strings, and the value null. Some values in U represent time and T denotes the set of these values. We assume for simplicity that time values range over the natural numbers 0, 1, . . . , now. At any particular moment, the special constant now represents the current time. In the context of time, a subset of T is called a temporal set. A temporal set which contains consecutive time points (t~, t~+l, . . . , ti+n) is represented either as a closed interval ti,ti+nl or as a half open interval ti,ti+n+l). A temporal set can also be represented by the intervals corresponding to its subsets having consecutive time points.
Expressive Power of Temporal Relational Query Languages
131
Time-varying data is commonly represented by timestamping values. The timestamps can be time points 3, intervals 10,12-15, temporal sets and temporal elements which are union of disjoint intervals 7. A temporal set is an explicit representation of a temporal element. Furthermore, these timestamps can be added to tuples or attributes which leads to two different approaches for handling temporal data in the relational data model. Since time points, intervals, temporal elements, and temporal sets are needed in temporal query languages, we have to clarify their relationships (see propositions in Section 4). Figure 1 gives an example where time points, intervals, temporal elements, and temporal sets are used as attribute timestamps. The example can also be illustrated for tuple timestamping, i.e., these timestamps can be attached to tuples. SALARY 1, 20K 10, 20K 16, 30K
SALARY 1,5), 20K 10,16), 20K 16,now, 30K
a) Time points,
b) Intervals.
SALARY {1,5) U 10,16)}, 20K {16,now}, 30K
SALARY {1,2,3,4,10,11,...,15}, 20K {16,17,...,now}, 30K
c) Temporal element.
d) Temporal set.
Fig. 1. Different timestamps. Note that using time points cannot capture the full extent of the temporal reality if the history is not continuous. This is because the starting point of a value is not sufficient to indicate the whole period over which the value is valid; end of this period is indicated by the starting time of the next value. This can be circumvented by introducing special null values 3. Also, note that temporal element and temporal set are the same constructs. There is only a notational difference between them.
2.1
Requirements for Temporal Data Models
We believe that the following are fundamental requirements for temporal databases. Let Dt denote the database state at time t. (t may also be an interval or a temporal set.) 1. The data model should be capable of modeling and querying the database at any instance of time, i.e., Dt. The data model should at least provide the modeling and querying power of 1NF relational data model. Note that when t is now, Dt corresponds to a traditional database.
132
Abdullah Uz Tansel and Erkan Tm
2. The data model should be capable of modeling and querying the database at two different time points, i.e., De and Dr, where t ~ t ~. This should be the case for the intervals and temporal sets as well. 3. The data model should allow different periods of existence in attributes within a tuple, i.e., non-homogeneous (heterogeneous) tuples. 4. The data model should allow multi-valued attributes at any time point, i.e., in Dr. 5. A temporal query language should have the capability to return the same type of objects it operates on. 6. A temporal query language should have the capability to regroup the data according to a different criterion. 7. The model should be capable of expressing set-theoretic operations, as well as set comparison tests, on the timestamps, be it time points, intervals, or temporal sets (elements). Requirements 3, 5, 6, and 7 need more elaboration whereas the rest do not require any further justification. Note that it is possible to add more requirements or relax the above requirements depending on the user requirements. Homogeneity 7 requires that the attributes of a tuple should be defined over the same period of time. This assumption simplifies the model. However, it also limits the data model and its query language. Let r be a temporal relation and T and TI be two of its tuples. Let TT and T~ be the temporal sets over which T and TI are defined, respectively. Cartesian product of these two tuples can only be defined over TT N 7"~. Parts of TT and T~ outside of their intersection is not accessible, i.e., ~T - T~, or r~, T T . One can set the semantics of a query language to allow a virtual Cartesian product of tuples with different times. Though this allows interpretation of one single expression, it is not possible to carry the intermediate results from one expression to another. Furthermore, conversion from one language to another (i.e., algebra/calculus transformations) would not be possible. By definition, temporal data models based on tuple timestamping are homogeneous. A temporal query language should return the same type of relations it operates on. The EMP relation in Figure 2 is a unique representation of the employee data where each tuple contains all the data for an employee. Relations in a unique representation are first introduced by Tansel in 4 and Gadia 6, 7 independently. There are other representations of the same data which can be obtained by taking subsets of temporal sets and creating different tuples. In other words, one tuple in the unique representation is broken into several tuples whose timestamps are the subsets of the timestamp of the original tuple. Gadia calls them as weak relations which have the same attribute values at a time point 7. Ideally, a temporal query language should retrieve unique relations. In case it retrieves weak relations, it should have the capability to transform them into equivalent unique relations. The issue of weak relations arises even if the operand relations are in unique representation. Consider the scheme of EMP relation in Figure 2. Let rl be the relation containing Tom's data in the interval 10,16) and r2 be the relation having Tom's data in the interval 16,20). rl t5 r2 -
Expressive Power of Temporal Relational Query Languages
133
contains two tuples or they can be combined into one single tuple for the time period 10,20). The former would be a weak relation whereas the latter would be a unique relation. This situation also arises in tuple timestamping aside from the fact that related data is broken into several tuples. The capability to restructure temporal relations should be provided in a temporal query language 1. Roughly, a temporal relation groups an object's data into a tuple. In Figure 2, the employee data is grouped with respect to E ~ . The employee data of Figure 2 can also be grouped with respect to department values. This will facilitate answering queries with respect to department values. A sample query might be "give the E # , name, and salary of the employees for each department" or "does the validity period of any department include lO,now) or any temporal set?" For requirement 7, any data model using temporal sets (elements) as timestamps naturally supports set-theoretic operations and set comparison tests on timestamps. However, the case of time points and intervals is not straightforward. Any data model using them should be able to simulate these operations. 3
The Temporal
Relational
Data
Model
Definition 1. A temporal atom is an ordered pair where t is a temporal set (or an interval) and v is an atomic value. asserts that v is valid for the time period t which is not empty. If a is a temporal atom, then a.T and a.v denote its temporal set and value components, respectively. A temporal atom represents a historical value of an attribute. A set-of-temporal atoms models the history of an attribute of an object. D e f i n i t i o n 2. R(A1, A2, . . . , An) is a temporal relation scheme where n is its degree and A1, . . . , An are its attributes. R is a N1NF relation with a nesting depth of at most 1. That is, an attribute can have a single value, an atom or a temporal atom, or a set, i.e., a set-of-atoms or a set-of-temporal atoms. We use the terms relation and temporal relation interchangeably. We also use the terms atom, temporal atom, set-of-atoms, and set-of-temporal atoms for the scheme and its instance. The meaning will be clear from the context. In 17, we give the formal definition of a generalized relational data model which allows arbitrary levels of nesting. In this study, we restrict nesting depth to one since it is capable of simulating all the other proposed temporal relational data models. It is also straightforward to generalize the conclusions of this study to arbitrarily nested temporal relations. D e f i n i t i o n 3. A n instance of relation scheme R, r, is the set of n-tuples. Each tuple component is an atom, a temporal atom or their sets, respectively. D e f i n i t i o n 4. A relational database schema D is { R, S, . . . } where R, S, . . . are relation schemes.
134
AbdullahUz Tansel and Erkan Tin
Tuples in a temporal relation can be heterogeneous, i.e., each attribute may have different time references. In Figure 2, we show a heterogeneous employee relation, called EMP, over the scheme E # (Employee number), ENAME (Employee name), D E P A R T M E N T and SALARY. E # and ENAME are atomic attributes and D E P A R T M E N T and SALARY are attributes with temporal atoms. As is seen, there are no department values for Tom in the periods 12,14) and 18,20). Perhaps, he was not assigned to any department during this time. We can also convert E~t and ENAME into temporal atoms by assigning the validity period of tuples as their timestamps. Note that temporal sets are represented as intervals for the notational convenience.
E#ENAME
DEPARTMENT
121
Tom
< 10, 12), Sales > < 14, 18), M k t g >
133 147
Ann John
< 25, 30), Sales > < 18, now, T o y s >
SALARY < 10, 15), 20K > < 15, 17), 25K > < 17, 20), 30K > < 25, 30), 35K > < 18, now, 4 2 K >
Fig. 2. EMP relation.
4
Temporal Relational Calculus (TRC)
In this section, we define the Temporal Relational Calculus ( T R C ) language 17 for the temporal relational data model given in the previous section. We give the symbols and the well-formed formulas of the language, followed by their interpretations. 4.1 -
Symbols P ~ d i c a t e n a m e s : There are a finite number of predicate names, P, Q, R, S,
... one for each relation instance in the database. Vamables: There are a countable number of tuple variables, s, t, u, v, ... A variable has the same scheme as the relation scheme it is associated with. Variables may be indexed. If s is a variable, then si is an indexed variable where i is between 1 and its degree, si can be an atom, a temporal atom or their sets, respectively. If si is a temporal atom, si.v and si.T are also variables denoting the value and the temporal set parts of this temporal atom, respectively. We use t as a special variable for time points and add subscript whenever more time variables are needed. This is done for the sake of clarity, otherwise there is no need for such a distinction. - Constants: There are a countable number of constant symbols; a, b, c, ... Each constant has a scheme, an atom, a temporal atom, a set-of-atoms, or a set-of-temporal atoms. -
Expressive Power of Temporal Relational Query Languages
4.2
135
Well-formed formulas
1. P(s); P is a predicate name and s is a variable. 2. si op rj; si op c; where op is one of =, ~, <, <, >, _> and si, rj, and c are atoms. The position of operands can be changed to form a new formula. 3. si.v op p~j.v; si.v op rk; or si.v op c; where op is one of - , r <, <, >, >_ and si and p~ are temporal atoms and rk and c are atoms. The position of operands can be reflected to form a new formula. 4. si=ri; rj=si; si=c; or c=si; where si, rj, and c have the same scheme, i.e., set-of-atoms, temporal atom, or set-of-temporal atoms. Here ' = ' is an identity test and hence ' r may also be used. si.T=r~.T is allowed if si and rj are temporal atoms. 5. Formulas involving membership test: - si c r~; where si is an indexed variable which is an atom, r~ is an indexed variable which is a set-of-atoms. If si is a temporal atom, then indexed variable rj is also a set-of-temporal atoms. In this formula, either of the indexed variables can be replaced by an appropriate constant. - si.v E rj; where si is a temporal atom and r~ is a set-of-atoms. Either operand can be replaced with an appropriate constant. - si E r~.T; where si is an indexed variable which is an atom and r~ .T is also an indexed variable which is a temporal atom. Either of the indexed variables can be replaced by an appropriate constant. Furthermore, si.v can also be specified if si is a temporal atom. 6. If r and ~ are formulas, so are ~b A ~, r V ~, and -~r 7. If r is a formula with the free variable s, then 3sr and Vsr are formulas and s is no longer occurs freely in r 8. ri={s I r ... )} is a formula with free variables s, u, v, ... and s does not occur freely in r Indexed variable ri has the scheme which is a set of atoms or temporal atoms. In the resulting formula, variables u, v, ... are free and s is bound. This formula is called set constructor and it may not be used in r
4.3
Interpretation of calculus objects
The domain of interpretation for a calculus object ~ is defined relative to the set U, and it is denoted by Dome(U). Atoms take their values from U. The domain of interpretation for temporal atoms is u t a = p ( T ) • where P(T) is the powerset of T and • is the Cartesian product operator. The interpretation of TRC objects is relative to U and derived set LVa. An interpretation of a constant c, denoted as Ic, is a member of Domc(U). If c is an atom or a set-of-atoms, then Dome(U) is U or P(U), respectively. If c is a temporal atom or a set-of-temporal atoms, then Domc(U) is U ta or P(Uta), respectively. An interpretation of a predicate name P, denoted as Ip, is a relation instance and Ip E Domp(U). A variable s is interpreted as a tuple instance and I~ E Doms(U) where D o m s ( U ) = L l • and n is the degree of s. For each i, Li is U, P(U), U ta, or P(U ta) if si is an atom, set-of-atoms, temporal atom,
136
Abdullah Uz Tansel and Erkan Tm
or set-of-temporal atoms, respectively. Is(i) denotes the i th component of the tuple which is the interpretation of variable s. Formulas are interpreted as true or false by assigning interpretations to their constants, predicate symbols, and free variables. The following are the rules for the interpretation of formulas in TRC. P(s) is true if Is E Ip. - si op r~ is true if Is(i) op It(j). si op c is true if Is(i) op Ic. - si.v op pW,v is true if Is(i).v op Ip(j).v. si.v op rk is true if Is(i).v op It(k). si.v op c is true if Is(i).v op Ic. si---rj is true if Is(i)=Ir(j). si=c is true if Is(i)=Ic. - si op rj is true if Is(i) op It(j). si.v E r~ is true if Is(i).v E It(j). si 6 r~.W is true if Is(i) E Ir(j).W. - r A A is true if both r and A are true. r V A is true if either r or A is true. -~r is true if r is false. 3sr is true if there is at least one assignment to s which makes r true, i.e., r is true at least for one value of I8. Vsr is true if r is true for any assignment to s. ri={s I r ... )} is satisfied (made true) by the interpretations It, Is, Iu, Iv, ... of its free variables if the following condition is met. Ir (i) equals the set of assignments Is satisfying r for the interpretations Iu, Iv, ... If there are no such tuples Is, and It(i) is empty, then this formula evaluates to false. In other words, set formatter formula does not create an empty set. -
-
-
-
An TRC expression is {ski r where s is a free variable with arity k and r is well-formed formula. An interpretation of this expression is the set of instances of s which satisfies the formula r i.e., an element of Doms(U). Safety rules for TRC are defined in 20. We now give examples by using EMP relation of Figure 2 to illustrate TRC. Answers to the example queries are shown in Figure 3. For the reader's convenience, we use attribute names instead of position indexes in the following TRC expressions. Q u e r y - 1 . W h a t are the name and salary of those employees in the sales department at time 16? {x(2)l (3r) (EMP(r) A (3u) (3z) (u E rDEPARTMENT A z E rSALARY A (3t) (t E u.T A t=16 A t E z.T A u.v=Sales A x1=rENAME A x2=z.v)))}. Q u e r y - 2 . W h a t are the histories of those employees who have only worked in the sales department?
Expressive Power of Temporal Relational Query Languages
137
ENAME SALARYI Tom 25K I Query-1 E#IENAME DEPARTMENT SALARY > 1331 Ann < 25,30),Sales > < 25,30),35K Query-2 0 Query-3
Fig. 3. The results of example queries.
{x(4) I (Sr) (EMP(r) A (3u) (u e rDEPARTMENT A u.v=Sales A -.(3z) (z C rDEPARTMENT A z.v ~t Sales) A xl=rE# A x2I--rENAME A x3=rDEPARTMENT A x4=rSALARY))}. Query-3. What are the name and salary histories of the employees whose current salary is the same as the salary of another employee when he was working for the sales department? {x(2)l (3r)(3s) (EMP(r) A EMP(s) A (3u) (u e rSALARY A now e rSALARY.T A (3z)(3y) (z C sSALARY A y E sDEPARTMENT A y.v=Sales A (~t) (t c y.T A t E z.T A u.v=z.v) A x1=sENAME A x2I=sSALARY)))}. Below we give several propositions which are essential in establishing the expressive power of temporal relational query languages. Proposition 3 is especially significant since it establishes the equivalence between interval and temporal sets as time stamps. It also shows the power of TRC in transforming temporal sets into intervals, thus allowing comparison of models based on interval and temporal set timestamps. P r o p o s i t i o n 1. TRC can form the union, intersection, and difference of temporal sets (hence temporal atoms). Proof. Straightforward from the definition of TRC.
P r o p o s i t i o n 2. TRC can simulate formulas involving set comparison, i.e., set inclusion. Proof. Straightforward from the definition of TRC.
P r o p o s i t i o n 3. TRC can convert temporal atoms with temporal sets to temporal atoms with intervals.
138
Abdullah Uz Tansel and Erkan Tm
Proof. Let R(A) be a relation scheme where attribute A is a temporal atom. We also assume existence of a single column relation over scheme T denoting all time points in the domain T={0, 1, . . . , now}. Step-1. Take the Cartesian product of the relation T with itself to obtain a relation (TI) having tuples giving all possible intervals over the temporal domain T. ATI(X) -- W(s) A W(u) A x1=s1 A x2=u1 A s1 < u1 Step-2. For each time interval in TI, get the set of time points in this interval: -
-
-
Firstly, take the Cartesian product of TI with the relation T. ATM(X) ---- ATI(S) A W(u) A x1----s1 A x2=s2 A x3----u1 Then, select those tuples in the relation TM which have, as their last attribute value, the points in the interval they represent. Call this relation TS. Avs(x) -- (3s) (ATM(S) A s1 _< s31 A s3 < s2 A x1=s1 A x2----s2 A x3=s3) Finally, for each interval, group the time points in column 3 of TS into a set (i.e., a nesting operation). Call this relation TP. ATe(x) ---- (3s) (ATS(S) A x1----s1 A x2----s2 A x3----{z l(3u) (ATS(U) A u1----s1 A u2----s2 A z1=u3)})
Step-3. In order to find out the time intervals of the tuples in the relation R, we apply the following series of operations. Firstly, form a side by side copy of R. ARI(X) ------R(s) A x1----s1 A x2----s1 Secondly, take the Cartesian product of the relations RI and T P to get a relation (RM) in which for each tuple there exists an interval. ARM(X) -- ARI(s) A ATe(u) A x1----s1 A x2----s2 A x3--ul A x4----u2 A x5=u3 -- Lastly, replace the time of second column of RM by the intersection of the temporal set of its second column and the interval in its fifth column. Ap(X) ~ (38) (ARM(S) A (3t) (t e s2.W A t e s5.W A xl----sl A x2.v----s2.v A t E x2.T A x3----s3 A x4----s4 A x5----s5 A --(3t') (t r t A t' G x2.T))) -
-
Step-~. The resulting relation P in Step-3 contains tuples whose second attributes have temporal atoms with intervals which are the components of temporal set in the first column. Some of these intervals are generated by larger intervals of column 5 which need to be eliminated. This is done by first taking a Cartesian product of the relations P and TP, then selecting only those tuples having the temporal set of their temporal atoms equal to the set of time points representing its time interval, and finally projecting on colmnns 2 and 8. APMI(X) ~---(3S)(3U) (Ap(S) A *~Tp(U) A x1----s1 A . . . A x5----s5 A x6l=u1
^ xTl=u2
^ xS=u3)
Expressive Power of Temporal Relational Query Languages
139
Apl(x) = (3s) (ApMI(S) A s2.T=s8 A xll=s1 A x2=s2 A x3=s3 A x4=s4 A x5=s5) For each tuple in R with temporal set T, we have generated its intervals T1,...J-n as well as their subsets (P1). Step-5. Retain each time interval Ti we have generated and eliminate its subsets. - Firstly, form the Cartesian product of P1 with itself. )~PSI(X) ~ (3S)(3U) ()~PI(S) A )~pl(U) A x1~---sX A ... A x5---~s5 A x6--u1 A . . . A xil0----u5) - Then, select only those tuples such that the temporal sets of the temporal atoms in the first copy of P1 are the proper subsets of the temporal sets of the temporal atoms in the second copy of P1. And project on the second attribute of the resulting relation. ) ~ Q ( x ) - - (3S) (APSI(S) A (Vt)(~(t e s2.W) V t e s7.W) A s1=s6 ^ x1=s1 ^ x2=s2) - Thirdly, take a Cartesian product of the relations Q and P1. ~p~2(x) - (3s)(~u) (~q(s) ^ ~p,(u) ^ x1=s1 ^ x2=ul A . . . A x5=u5 A x6=s1 A x7=s2) - Finally, select those tuples of PM2 such that the temporal atoms obtained from the relation Q agree with the temporal atoms obtained from P1, and project over it. The resulting relation (P2) contains the undesired tuples in P1. ~p2(x) - (~s) ( ~ p ~ ( s ) ^ 41=46 A s2=s7 ^ xll=s1 ^ . . . ^ x5=sSl)
Step-6. Eliminating the tuples with subsets of the set of time points in an interval from P1 gives the desired relation. )~(X) ~ )~Pl(X) A ~ p 2 ( X ) P r o p o s i t i o n 4. TRC can convert temporal atoms with temporal sets to temporal
atoms with time points. Proof. A conversion procedure similar to the one in the proof of Proposition 3 can be devised. P r o p o s i t i o n 5. TRC can convert temporal atoms with intervals into equivalent
temporal atoms with temporal sets. Proof. Combine the intervals of temporal atoms whose values are the same into a temporal set. Break each interval into its time points, then nest them. P r o p o s i t i o n 6. TRC can generate the power set of a relation.
Proof. The proof can be found in 9.
140
5
Abdullah Uz Tansel and Erkan Tm
Temporal Relational Algebra ( T R A )
In this section, we briefly define the operations of the Temporal Relational Algebra (TRA) for our temporal relational data model. The details of these operations are given in 16. Set operations (M, U , - ) , Projection (~r), and Cartesian Product (x) are defined exactly the same way as they are in the relational algebra. For instance, R U S is defined as {x JR(x) A S(x)}. For notational convenience, in the following, we use column indexes instead of attribute names. Nest and unnest operations provide the capability to restructure nested temporal relations. They also allow access to the values deep in the structure of a nested temporal relation. Temporal atom formation and decomposition operations are needed because we model temporal values by temporal atoms. They allow formation of new temporal atoms as well as the conversion of a nested temporal relation into an equivalent 1NF relation. Now, let E be a temporal relational algebra expression. EV(E) represents the evaluation of E. Similarly, A.T and A.v represent the names for the temporal set and value components of the attribute A.
a. Selection (c~) EV(aF(E)) = {s s e EV(E) A F is true} F is a formula in the form of i op j where op is one of the {=, 5, >, >-, <, <}. The operands i and j are either attributes of E or constants. They are atoms, or vale parts of the temporal atoms. The symbols = and ~ can be used to test the equality/inequality of atoms, temporal atoms, and sets only. The logical connectives A, V, and -~ can be expressed as a combination of the selection and set operations. Set membership (E) can be derived from the basic set of formulas 17.
b. Unnesting (~) E = #k (R). Let R be a relation scheme whose degree is n and k be a set of atoms or temporal atoms: EV(E) = {sl 3r 3y (r E EV(R) A y e rk A si =ri f o r l < i < n , i SkA sk e rk)}. For each atom (temporal atom) in attribute k of a tuple in R, a new tuple is generated in EV(E).
c. Nesting (v) E = vk (R). Let R be a temporal relation whose degree is n and k be one of its attributes, an atom or a temporal atom: EV(E) = {sl 3r (r E EV(R) A sj = r~ for 1 _< j _< n, j C k A s(k) -- {zI 3u (u E EV(R) A u p = rp for 1 < p < n, pCk A
z1 = ukl)}) }. Tuples in R are partitioned where each partition has the same values for the attributes other than k. For each partition, a tuple is generated in EV(E) by grouping the tuple components in attribute k. The nested component of the resulting tuple can never be empty. Note that nesting is not applied on set valued
Expressive Power of Temporal Relational Query Languages
141
attributes. Set valued tuple components can be combined by first unnesting and then nesting a set valued attribute.
d. Temporal Atom Decomposition (0) E --- 0k(R). Let R be a temporal relation whose degree is n and k be an attribute of temporal atoms. This operation splits k th attribute of R into its temporal set and value parts, and places them as the last two columns of the result. EV(E) is defined as: EV(E) = {sl 3r (r e EV(R) A si = ri for 1 < i < k - 1 A si = ri+l for k < i < n - 1 A sn
= rk.T
A sin+l
= rk.v)}.
e. Temporal Atom Formation (T) E -- Tk,p(R). Let R be a temporal relation whose degree is n, and r(k) and r(p) be a temporal set and atom, respectively. This operation combines the k th and pth attributes of R into a new column in E whose values are temporal atoms, thus reducing the degree of R by one. The new attribute is made into the last column of E. Evaluation of E is defined as: EV(E) = (sl 3r (r 9 EV(R) A si = rj for 1 < i < n - 2 , 1 _< j < n, j r jCp A sn-1.W = rk A sn-1.v = rp.T A sn-1.W r 0)}. These operations are the elementary operations of TRA. As in the case of traditional relational algebra, other operations (such as intersection, join, division, etc.) can be derived from these basic operations. The derivations are also similar, and we do not include them here. Slice operation (defined in the sequel) synchronizes the times of two different attributes.
f. Slice (~) E = 6o,k,p(R). Let R be a temporal relation whose degree is n, r(k) and r(p) be temporal atoms, and 0 be one of the symbols for the set-theoretic operations f3, U, and --: EV(E) = {sl 3r (r 9 EV(R) A si = ri for 1 < i < n, i r A sk.T = x A sk.v = rk.v A x = rk.T O rp.T A x # 0)}. The slice operation replaces the temporal set part of the k th attribute in R by a new temporal set which is obtained by forming the union, intersection or difference of the temporal sets of k th and pth columns.
g. Restricted Power Set Operation We define Pk(E), k > 0, as the k or fewer subsets of the nested temporal relation instance created by the temporal relational algebra expression E 9. This is called the k-restricted power set operation and is used for generating the domain of interpretation which is needed in translating an TRC expression into an equivalent TRA expression. L e m m a 1. Pk (E) is expressible in the temporal relational algebra.
Proof. Pk(E) = 7rkn+lVyaF(E k+l) U ~ where E has degree n, Y = { n k + l , ..., nk+n} and E k+l is the k + l fold Cartesian product of E whose degree is n k + n and contains k + l copies of EV(E). F is the disjunction of the formulas which
142
Abdullah Uz Tansel and Erkan Tm
equate the k + l st copy with each of the first k copies. In other words, F = F1 V ... Fk where F~ states that k + l st copy is equal to the ist copy for 1 < i < k. F~ is the conjunction of the formulas Fil A ... F~n where Fij is ((i-1)n+j) = (kn+j), 1 < i < n. That is, Fij ensures that the jth attribute in the ith copy is equal to the jth attribute in the k + l st copy. Then, the nest operation groups the k or fewer subsets to the last column, i.e., k n + l . Projection discards the first k copies and gives the result. Relational algebra cannot generate the power set of a temporal relation independent of its instance 9. Pk(E) cannot create the full power set of E because, depending on its cardinality, Pk(E) constructs a different TRA expression. Moreover, the cardinality of E is not known in advance. On the other hand, temporal relational algebra operations are defined to work for any instances of their operands. The following algorithm is based on the fact that Pk(E) calculates the same set, i.e., the power set of E for any k larger than the cardinality of E. Algorithm (the closure operator): Let E be a TRA expression k=0 loop k=k+l Compute Pk(E) Compute Pk+I(E) u n t i l Pk(E) = P k + I ( E ) L e m m a 2. Algorithm (the closure operator) calculates the power set of E.
Proof. The algorithm calculates PI(E), P2(E), ... Eventually, k reaches the cardinality of E, say m. Pro(E) and Pm+I(E) are the same because m + l or fewer subsets of E are the same as m or fewer subsets of E. Thus, the algorithm generates the power set of E and halts. T h e o r e m 1. The temporal relational algebra (with the closure operator) and the
temporal relational calculus are equivalent in expressive power. Proof. Proof can be found in 17. 6
Common
Temporal
Operations
There are various algebraic languages defined for manipulating the temporal data 11. There are some common temporal operations in these algebras. These operations appear in different forms and flavors: the temporal (keyed) set-theoretic operations, i.e., set union, set intersection, and set difference, selection and/or projection on the temporal dimension, restructuring etc. In this section, we will show how these operations can be expressed in the temporal relational calculus and the temporal relational algebra.
a. Temporal(Keyed)-Union (Px Ut P2). If P1 and P2 are two predicate names over the same relation scheme R with the same key, then P1 Ut P2 is the
Expressive Power of Temporal Relational Query Languages
143
union of P1 and P2 over R with the same key 1,3,13,14. Note that the value of the key does not change over time. Union operation is valid when none of the nonkey attributes of two tuples in P1 and P2 with the same key have different values at the same instant of time. Tuples of P1 and P2 agreeing on the key attribute(s) are collapsed into one single tuple. We consider the relation R(A,B) where A is the key. Temporal union operation can be expressed in TRC as follows: {x (3y)(3z) (PI(Y) A P2(z) A ((YAl#zA A ((xA=yA A xBl=yB) V (xA=zA A xB=zS))) V (yA=zA A xA-yA A xB--{p I (3q)(3r) (q E yB A r e zB A (3t) (t E q.T A t e r.T A q.v~r.v) A ((q.v=r.v A p.v=q.v A (Btl) (tl E q.T A tl E r.T A tl C p.T A ~(3t2) (t2~tl A t2 E p.W))) V (q.vCr.v A (p----q V p=r))))})))}. This expression can easily be generalized to any relation scheme R with more attributes. Temporal-union can also be expressed in TRA: P1 Ut P2 = VA(T A.T,A.v((#A.T(SA(PA(PI)))) U (#A.T(6A(#A(P2)))))). b. Temporal(Keyed)-Intersection (P1 F~t P2) 1, 3, 13, 14. Intersection operation on two predicates P1 and P2 defined over the same relation scheme R results in a relation containing tuples in P1 such that if there are two tuples in P1 and P2 aggreeing on the key attributes, then those instances where these two tuples agree on all attributes are placed into the result. Let R(A,B) be the relation scheme with attribute A as its key. Temporal intersection operation can be expressed in TRC as follows: {x I (By) (PI(Y) A (3z) (P2(z) A yA=zA A xAl=yA A xS={p (3q) (q 9 yB A ((3r) (r E zB A q.v=r.v A p.v----q.v A (3t) ( t E p . T A t e q . T A t Cr.TA ~(~tl) (tCtl A tl e p.T))))})}. Temporal-intersection can be expressed in the temporal relational algebra: P l Nt P2 = vA(T A.T,A.v((#A.T(SA(#A(P1)))) n(#A.T(6A(#A(P2)))))).
Co Temporal(Keyed)-Difference (P1 _t P2). If P1 and P2 are two predicates on the relation scheme R, P1 _t P2 results in a relation over scheme R with the same key 1, 3,13, 14. The resulting relation contains tuples in P1 such that if there are two tuples in P1 and P2 agreeing on the key attributes, then those instants where these two tuples agree on all attributes are removed from the domain of the tuple in P1. If a tuple of P1 does not agree with any tuple of P2 on the key attributes, this tuple is placed into the result. Consider the relation scheme R(A,B) where attribute A is its key. Temporal difference operation can be expressed in TRC as follows:
144
Abdullah Uz Tansel and Erkan Tm {x(3y)
(PI(Y) A ((-,(3z) (P2(z) A yA=zA) A xA=yA A xB=yB) V (3z) (P2(z) A yA-zA A xA=yA A xB={pl (3q) (q e yB A ((3r) (r E zB A q.v=r.v A p.v=q.v A (Bt) (t E p.T A t E q.T A -~(t C r.T) A -~(3tl) (t~tl A tl E p.T))) V -~(3r) (r c zB A q.v=r.v A p=q)))}))))}. Temporal-difference can be expressed in TRA: P1 _t P2 = VA(--A.T,A.v((IA.T(~A(#A(P1)))) -- (#A.T(e~A(#A(P2)))))) 9 d. Temporal Projection and Selection (ar162 Given a predicate P, and two TRC formulas r and r on time variable t, this operation selects those tuples of P satisfying the expression r and restricts their temporal domain to r 1. This operation appears in some form in all the temporal languages proposed. The operation is generalized in 1. In order to express this operation in TRC, we first determine the tuples satisfying r and then restrict the temporal domain of these tuples to r Each attribute of P is restricted by r individually which afterwards combined by forming a side by side copy of P to regain the original structure. In the following, for two tuple variables x and y we use x = y as a shorthand for xA1=yA1 A ... A xAn=yAn. - Al(x) = P ( x ) A (3y) ( P ( y ) A x = y A r where r is a formula on tuple variable y, - A2(x) ------(By) ()~I(Y) A xA1=yA1 A . . . A xAn=yAn A xAn+I=yAI A... A xAn+nl=YAn), - AA~(X) -- (By) (A2(y) A xA1=yA1 A... A xAn=yAn A (3z) (z E yAn+l A xAn+l.V=Z.V A (3u) (xA~+ll.T=u A u={t I r A t e xAn+l.W A t 9 z.T}))), - A~I (x) - (By) (AA, (y) A xA1=yA1 A . . . A xAn=yAn A xAn+I={zAn+II)~A1 (z) A zAI--yA1 A . . . A zAn=yAn}). The last two steps should be repeated for the remaining attributes Ai, i.e., AA, and A~A~ for 2
Expressive Power of Temporal Relational Query Languages {x(n+m) I P l ( y ) A P2(z) A x1=yA1
xn+l=zB1
A . . . A xn=yAn
145
A
A... A xn+m=zBml}.
This is a generalized join. It is possible to define a temporal (intersection) join where a resulting tuple is defined over the common period of time for the constituent tuples. The TRC expression is similar to temporal-intersection operation. f. Restructuring 1. Given a predicate P defined over the relation scheme R and an attribute A of R, keying (restructuring) operation results in a relation which is weakly equal to P and has the key A. This operation can be done in TRC and TRA. For simplicity, we illustrate the procedure for the relation R(A,B) where attribute A is the key and B as a candidate key. We restructure R into R~(A,B) where the data of R is regrouped with respect to the attribute B. In the sequel, we give a sketch of the procedure to obtain R ~ where we indicate relevant TRA operations in parenthesis. Firstly, break the set-of-temporal atoms in attribute B, that is, create a tuple for each temporal atom in B (unnest attribute B). Secondly, trim the time of attribute A by the time of attribute B (by an intersection slice operation). The result is R1.
R1
= 5n,A,B~A#B(R)
Thirdly, in column B combine the timestamps of the temporal atoms whose values are the same, i.e., R2. (This can be done by temporal atom decomposition, unnest, nest, and temporal atom formation operations.)
R2 -= 7rl,a(o'2.v,3.v(#B(R) x--1,2v1#l SBTrB(R1) ) ) Fourthly, for each B value combine the temporal sets of temporal atoms whose value is the same in attribute A. Then, for each B value, group the temporal atoms in attribute A. This is the desired result, R~(A,B) where B is the key.
R'
= Vl T 1 , 2 V l ~ I S A ( R 2 )
Similarly, TRC operations for restructuring R into R' can be defined. We do not include them here for the sake of breivity.
7
Temporal Relational Completeness
In traditional database theory, relational calculus (RC) is used as the standard in evaluating the expressive power of query languages 5. A language which has the same expressive power as RC is called relationally complete. Similarly, TRC can be used as a yardstick in evaluating the expressive power of temporal query languages. There are several reasons for this. First of all, TRC is a superset of relational calculus. Additionally, it has set membership test and set constructor formulas. If only 1NF relations are used, these formulas are not needed and thus TRC reduces to the relational calculus. Therefore, TRC subsumes the expressive power of relational calculus. Second, the data model of TRC is very powerful and it meets all of the criteria listed in Section 2.1. It can handle both heterogeneous and homogeneous tuples since the latter is a subclass of the former. We believe that this model provides full representation of reality. Thirdly, TRC can generate
146
Abdullah Uz Tansel and Erkan Tm
both relations in unique representation and weak relations, and allows conversion between these two types of relations. Fourthly, TRA, extended with closure operator, has the same expressive power as TRC. In the following section, we give two examples in which the expressive power of a temporal logic based language and Bhargava and Gadia's algebra are investigated.
7.1
Temporal Logic
A temporal logic based language can be derived from the first-order logic by including the temporal operators: necessity (I3), possibilty (O), next (o), and their past versions. In this logic, time is not explicitly referenced. Reference to time is embedded in the temporal operators and there are no temporal constants or temporal variables. Each relation scheme is considered to be a predicate and its relation instance is provided at each time point which can be generated by TRC from a temporal relation. A predicate, say P(A1, A2, . . . , An) with degree n, can then be represented by P(x) in TRC where x is a tuple variable. It is also possible to refer to the argument values of a predicate. This can be achieved in TRC by writing a formula, for each argument A~ of P, of the form u.v A ui C rA~ if rA~ is a set-of-temporal atoms such that u~ is existentially quantified in the whole formula. In the following, definitions of the three temporal operators and the necessary transformations in terms of TRC formulae are given. Given a temporal logic formula, r we let r be the equivalent formula in TRC. (i) Necessity (3). 3r is true now and is always true in the future. That is, it is true at time t if r holds at all times t~>t. Then, r ~ (Vt') (--(t'>_t) V r A t E T. (ii) Possibility (O). Or is true at time t if r holds at some time t'>t. We then have: Or ~ (3t') (t'>t A r A t e W. (iii) Next (o). or is true at time t if r holds at time t + l . An equivalent formula for this operation can be written in TRC as follows: or _-- (3t') (t'>t A r A --(3t") (t'>t" A t">t)) A t E T. Past versions of these operators, @ (necessity), * (possibility) and | (previous), can similarly be defined 19. P r o p o s i t i o n 7. TRC is at least as expressive as temporal logic.
Proof. A direct consequence of the preceding conversions. A temporal logic based language does not satisfy requirements 3,4, and 6, and hence it is less expressive than TRC. 7.2
Bhargava and Gadia's Algebra
Bhargava and Gadia define a temporal algebra 1 and classify algebraic expressions into temporal expressions, Boolean expressions, and relational expressions.
Expressive Power of Temporal Relational Query Languages
147
We will define expressions in each category and then show that they are expressible by TRC formulae. (i) Temporal expressions. Basic temporal expressions in this algebra are of the form IAI, Ir, ~AOB, and IAOc. The notation I... represents the temporal domain of the object specified. More complex expressions can be formed by using the union (U), intersection (N), and difference ( - ) operations. Note that incorporating the converted subformula into the larger TRC expression may sometimes require quantification over the free variables. Let x be a tuple variable. - Temporal elements are also temporal expressions. The TRC equivalent of a constant temporal element v, ll,Ul) U ... U ln,un), is: (t>llAt -
>lnAt
AtET,
~A~ ---- ( 3 u ) (t 9 u.W A u 9 xA),
- ~AOB~ - (3u)(3z) (t e u.W A t 9 z.W A u.v O z.v A u 9 xA -
~AOc
-
A z 9 xB),
( 3 u ) (t 9 u . T A u . v O c A u 9 x A ) ,
- ~r - (3x)(3u) (F(x) A t 9 u.W A u 9 xA) where A is a an attribute of R and P is an TRC predicate name representing r.
Let #1 and #2 be temporal expressions in Bhargava and Gadia's algebra and let Al(tl) and A2(t2) be the equivalent TRC expressions for #1 and #2, respectively. Then, - ~lU~U2 - (3t1)(3t2) (Al(tl) A Au(t2) A (t=tl V t=t2)), - ~lN~U2 -- (Stl)(3t2) (Al(tl) A A2(t2) A t----t1 A t=t2), - #1--#2 ---- (3tl)(3t2) (Al(tl) A A2(t2) A t = t l A tCt2). (ii) Boolean expressions. TRUE, FALSE, AOB, and # C v, where # and u are temporal expressions, are the basic Boolean expressions in their algebra. More complex Boolean expressions can be obtained by using the logical connective A and V, and the negation -~. TRUE and FALSE can be easily represented in TRC. The Boolean expressions of the form AOB are expressible by TRC as shown in Ocomparison above. Logical connectives of Bhargava and Gadia's algebra directly correspond to the logical connectives of TRC. As for the expression # C_u, it is expressible in TRC (Proposition 4.2). ( i i i ) Relational expressions. Restructuring operator, union, difference, projection, Cartesian product, and renaming constitute the relational expressions of
this algebra. We have shown in Section 6 how these operations can be expressed in TRC. It is straightforward to do renaming in TRC. Proposition
8. TRC is at least as expressive as Bhargava and Gadia's algebra.
Proof. Direct result of the previous conversions. Their algebra meets requirements 1, 2, 5, 6, and 7. It retrieves relations in unique representation from one relation. It may return weak relations when tuple components from several relations appear in the target specification. The result can be converted to unique representation by the restructuring operation. Thus, TRC is more expressive than Bhargava and Gadia"s algebra.
148
8
Abdullah Uz Tansel and Erkan Tm
Conclusions
In this paper, we have introduced an extension to the relational data model to handle temporal data. This extension is based on nested relations with one level of nesting and attribute time stamping. Algebra and calculus languages, having the same expressive power are also defined. These languages are capable of expressing other temporal relational query languages, and therefore they can be used in evaluating the expressive power of temporal query languages. As an example, we have evaluated the expressive power of two languages: a temporal logic based language and Bhargava and Gadia's algebra. Algebra and calculus languages can also be used in evaluating the expressive power of temporal extensions to SQL. The model we propose and its query languages satisfy the requirements listed in Section 2.1. It is obvious that our model meets the first four requirements. For requirement 5, if weak relations are formed, TRC and T R A can convert them into unique representation: In TRA, this can be done by a series of unnest/nest operations. Details can be found in 17. Section 7 shows how restructuring can be done in TRA. Finally, it is obvious that set operations and comparisons are allowed in our model. Naturally, temporal relations in our model are more complicated than their temporal 1NF counterparts. However, they are also more powerful in modeling and querying the temporal data.
Acknowledgment The research of Abdullah Uz Tansel has been supported in part by PSCCUNY Award No. 665307.
References 1. Bhargava, G., Gadia, S.K.: Relational database systems with zero information loss. IEEE Transactions on Knowledge and Data Engineering 5 (1993) 76--87 2. Clifford, J., Croker, A., Tuzhilin, A.: On completeness of historical data models. ACM Transactions on Database Systems 19 (1994) 64-116 3. Clifford, J., Croker, A.: The historical relational data model (HRDM) and algebra based on lifespans. Proceedings of the 3rd International Conference on Data Engineering, Los Angeles, California (1987) 528-537 4. Clifford, J., Tansel, A.U.: On an algebra for historical relational databases: Two views. Proceedings of ACM SIGMOD International Conference on Management of Data ~(1985) 247-265 5. Codd, E.F.: Relational completeness of relational data base sublanguages. In: Rustin, R., Data Base Systems, Prentice-Hall (1972) 6. Gadia, S.K.: Toward completeness of temporal databases. Technical Report, Electrical Engineering and Computer Science Department, Texas Technical University, Lubbock (1986) 7. Gadia, S.K.: A homogeneous relational model and query languages for temporal databases. ACM Transactions on Database Systems 13 (1988) 418-448 8. Gadia, S.K., Yeung, C-S.: Inadequacy of interval timestamps in temporal databases. Information Sciences 54 (1991) 1-22
Expressive Power of Temporal Relational Query Languages
149
9. Garnett, L., Tansel, A.U.: Equivalence of the relational algebra and calculus languages for nested relations. Mathematics and Computers with Applications 23 (1991) 3-25 10. Lorentzos, N.A., Johnson, R.G.: Extending relational algebra to manipulate temporal data. Information Systems 13 (1988) 289-296 11. McKenzie, E., Snodgrass, R.: An evaluation of relational algebras incorporating the time dimension in databases. ACM Computing Surveys 23 (1991) 501-543 12. Navathe, S.B., Ahmed, R.: TSQL-A language interface for history databases. Proceedings of the Conference on Temporal Aspects in Information Systems (1987) 113-128 13. Sarda, N.L.: Extensions to SQL for historical databases. IEEE Transactions on Knowledge and Data Engineering 2 (1990) 220--230 14. Snodgrass, R.: The temporal query language TQuel. ACM Transactions on Database Systems 12 (1987) 247-298 15. Tansel, A.U.: Adding time dimension to relational model and extending relational algebra. Information Systems 11 (1986) 343-355 16. Tansel, A.U.: A Generalized Framework for Modeling Temporal Data. In: Tansel, A.U., et al., Temporal Databases: Theory, Design and Implementation, Benjamin/Cummings (1993) 183-201 17. Tansel, A.U.: Temporal relational data model. IEEE Transactions on Knowledge and Data Engineering 9 (1997) 464-479 18. Tansel, A.U., et al.: Temporal Databases: Theory, Design and Implementation. Benjamin/Cummings (1993) 19. Tansel, A.U., Tin, E.: Expressive power of temporal relational query languages. Technical Report, Bernard M. Baruch College, Department of Computer Information Systems, City University of New York, New York (1993) 20. Tansel, A.U., Tin, E.: The expressive power of temporal relational query languages. IEEE Transactions on Knowledge and Data Engineering 9 (1997) 120-134
Transitioning Temporal Support in TSQL2 to SQL3 Richard T. Snodgrass 1, Michael H. BShlen 2, Christian S. Jensen 2, and Andreas Steiner 3 1 Department of Computer Science, University of Arizona, Thcson, AZ 85721, USA, rts@cs, arizona, edu
2 Department of Mathematics and Computer Science, Aalborg University, Fredrik Bajers Vej 7E, DK-9220 Aalborg ~, DENMARK, (boehlen,csj}| 3 Institut fur Informationssysteme, ETH Zentrum, CH-8092 Zurich, Switzerland, st einer@inf, ethz. ch
A b s t r a c t . This document summarizes the proposals before the SQL3 committees to allow the addition of tables with valid-time and transactiontime support into SQL/Temporal, and explains how to use these facilities to migrate smoothly from a conventional relational system to one encompassing temporal support. Initially, important requirements to a temporal system that may facilitate such a transition are motivated and discussed. The proposal then describes the language additions necessary to add valid-time support to SQL3 while fulfilling these requirements. The constructs of the language are divided into four levels, with each level adding increased temporal fimctionality to its predecessor. A prototype system implementing these constructs on top of a conventional DBMS is publicly available.
1
Introduction
We introduce constructs that have been submitted to the ISO SQL3 committee as change proposals to SQL/Temporal 8 to add valid-time and transaction-time support to SQL3 14, 15. These constructs are variants of those first defined in TSQL2 13. While temporal database research has a long history (cf. 17), momentum for a language designed with input from a substantial part of the community first arose at a 1993 temporal infrastructure workshop 9. The TSQL2 committee was subsequently formed in July, 1993 in response to a general invitation sent to the community. This committee consisted of Richard T. Snodgrass, Ilsoo Ahn, Gad Ariav, Don Batory, James Clifford, Curtis E. Dyreson, Christian S. Jensen, Ramez Elmasri, Fabio Grandi, Wolfgang K~ifer, Nick Kline, Krishna Kulkarni, Ting Y. Cliff Leung, Nikos Lorentzos, John F. Roddick, Arie Segev, Michael D. Soo, and Surynarayana M. Sripada. The committee produced a preliminary O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases - Research and Practice LNCS 1399, pp. 150-194, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
Transitioning Temporal Support in TSQL2 to SQL3
151
language specification the following January, which appeared in the A C M SIGMOD Record 10. Based on responses to that specification, changes were made to the language, and the final language specification and 28 commentaries were made available via anonymous FTP in early October, 1994. A book describing the language and examining in detail the design decisions 13 was released at the VLDB International Workshop on Temporal Databases in September, 1995. Richard Snodgrass started working with the ANSI and ISO SQL3 committees in late 1994. The first step was to propose a new part to SQL3, termed SQL/Temporal 12. This was formally approved in July, 1995. Jim Melton agreed to edit this new part. Discussions then commenced on adding valid-time support to SQL/Temporal. While the ANSI committee was supportive of the overall approach, there were several concerns voiced about the TSQL2 design. The major objections were as follows. 1. Rows in TSQL2 are timestamped with temporal elements 4, 7, which are sets of periods, each of which extends from a starting instant to an ending instant. Temporal elements are not bounded in size, which means that all timestamped rows will also be unbounded in size. 2. Duplicates are not supported: TSQL2 disallows value-equivalent rows, and temporal element timestamps, being sets, also do not permit duplicates. The analogy is with the relational algebra, which is also based on sets, and hence does not accommodate duplicates. 3. A table with temporal support is returned with a conventional SELECTstatement. To get a table without temporal support, the SNAPSHOTkeyword is required. The committee felt that a conventional query should return a table without temporal support. 4. There was no formal semantics for TSQL2. 5. There existed no implementation of the proposed constructs. 6. The keywords VALID and TRANSACTIONwere judged to be too generic. After many discussions with the committee and with others, the following solutions were agreed upon. This process took well over a year to complete. These modifications are reasonable, as the TSQL2 design and the change proposals had differing objectives. 1. Rows would be timestamped with periods rather than temporal elements. This enabled timestamps to be bounded in size. 2. Value-equivalent rows would be permitted, so that duplicates could be accommodated. 3. SNAPSHOTwas discarded. A conventional query returns a table with no temporal support (this was later generalized to the highly desirable property of temporal upward compatibility 1). The VALID clause was moved to before the SELECT and later generalized to support sequenced queries (which were developed as part of the ATSQL design 3). 4. A formal semantics for the language was developed 3.
152
Richard T. Snodgrass et al.
5. Michael BShlen and Andreas Steiner produced a public domain prototype implementation. Andreas has continued to evolve this prototype to be consistent with the change proposals. 6. The keywords were changed to VALIDTIME and TRANSACTIONTIME. Many other smaller changes were made to the language proposals and to the wording of the change proposals to address concerns of the committee members. The full story, including the change proposals themselves, can be found at FTP. cs. a r i z o n a , e d u / t s q l / t s q l 2 / s q l 3 . The change proposals have been unanimously approved by the ANSI SQL3 committee (ANSIX3H2) and are under consideration by the ISO SQL3 committee (ISO/IEC JTC 1/SC 21/WG 3 DBL). In this paper, we first outline a four-level approach for the integration of time. The language extensions are fairly minimal. Each level is described via a quick tour consisting of a set of examples. These examples have been tested in a prototype which is publicly available 16. We examine valid-time support first, then consider transaction-time and bitemporal support. 2
The
Problem
Most databases store time-varying information. For such databases, SQL is often the language of choice for developing applications that utilize the information in these databases. However, users also realize that SQL does not provide adequate support for temporal applications. To illustrate this, the reader is invited to attempt to formulate the following straightforward, realistic statements in SQL3. An intermediate SQL programmer can express all of them in SQL for a non-timevarying database in perhaps five minutes. However, even SQL experts find these same queries challenging to do in several hours when time-varying data is taken into account. A n employee table has five columns: name, eno, street, city, and birthdate. The related salary table has two columns: eno and amount (as a monthly salary). W e then store historical information in both tables by adding a column, When, of data type PERIOD. Column salary, eno is a foreign key for employee, eno. This means that at each point in time, the integer value in the salary, eno column also occurs in the eno column of employee at the same time. This cannot be expressed via SQL's foreign key constraint, which does not take time into account. The reader is invited to attempt to formulate this constraint instead as an assertion. - Consider the query "List those employees who have no salary." This can easily be expressed in SQL, using EXCEPT or NDT EXISTS, on the original table. Things are just a littleharder with the When column; a WHERE predicate is required to extract the current employees and current salaries.N o w formulate the query "List those employees who have no salary, and indicate when." EXCEPT and NDT EXISTS will not work, because they do not consider -
time. This simple temporal query is challenging even to SQL experts.
Transitioning Temporal Support in TSQL2 to SQL3
153
- Consider the query "Give the number of employees making over $5,000 in each city." Again, this is a simple query in SQL on the original table. We invite the reader to formulate the query "Give the history of the number of employees making over $5,000 in each city" on the table with the When column. This query is extremely difficult without temporal support in the language. One approach is to expand each row in both tables into all the days that it was valid, then count up the employees for each day. However, we would like a solution that did not force such an expansion, and also used the periods directly, as that approach is likely to be more efficient than a "point-based" expansion would be. - Now formulate the modification "Give Therese a salary of $6,000 for 1994." This modification is difficult in SQL because only a portion of many validity periods needs be changed, with the information outside of 1994 retained.
Most users know only too well that while SQL is an extremely powerful language for writing queries on the current state, the language provides much less help when writing temporal queries, modifications, and constraints.
3
Outline of t h e Solution
The problem with formulating these SQL statements is due in large part to the extreme difficulty of specifying in SQL the correct values of the timestamp column(s) of the result. The solution is to allow the DBMS to compute these values, moving the complexity from the application code into the DBMS. With the language extensions proposed here, the above queries can all be easily written by an intermediate SQL programmer in a few minutes. We provide these SQL statements here; the language constructs will be explained and exemplified in detail in the remainder of the paper. Both tables with valid-time support and temporal referential integrity can be specified using the VALIDTIMEreserved word. CREATE TABLE employee(ename VARCHAR(12), eno} INTEGER VALIDTIMEPRIMARY KEY, street VARCHAR(22), city VARCHAR(IO), birthday DATE) AS VALIDTIME PERIOD(DATE) CREATE TABLE salary(eno INTEGER VALIDTIME PRIMARY KEY VALIDTIMEREFERENCES employee, amount INTEGER) AS VALIDTIME PERIOD(DATE)
Here we indicate that the table has valid-time support through "AS VALIDTIME PERIOD(DATE)" and that the integrity constraints (primary key, referential integrity) are to hold for each point in time (day) through "VALIDTIMEPRIMARY KEY" and "VALIDTIME REFERENCES."
154
Richard T. Snodgrass et al.
For the query "List those employees who have no salary," we are interested only in the current employees. We use temporal upward compatibility to extract this information from the historical information stored in the employee table. SELECT ename FROM employee WHERE eno NOT IN (SELECT eno FROM salary)
This results in a conventional table, with one column. We use sequenced valid semantics in the query "List those employees who had no salary, and when." VALIDTIME SELECT e n a m e FROM employee WHERE eno N0T IN (SELECT eno FROM salary)
The added "VALIDTIME"reserved word specifies that the query is to be evaluated at each point in time. At some times, an employee may not have a salary, whereas at other times, the employee may have a salary. A one-column table results, but now with valid-time support (i.e., the periods of time when each employee did not have a salary are included). The query "Give the number of highly paid employees in each city" is easy, given temporal upward compatibility. SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
Again, we just get the current count for each city, i.e., the number of employees now in each city. To extract "the history of the number of highly-paid employees in each city," only a simple change is required. VALIDTIME SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
For each city, a time-varying count will be returned. Modifications work in similar ways. The modification "Give Therese a salary of $6,000 for 1994" can be expressed by following VALIDTIME with a period expression. VALIDTIME PERIOD '1994-01-01 - 1994-12-31' UPDATE salary SET amount = 6000 WHERE eno IN (SELECT eno FROM employee WHERE ename = 'Therese')
Transitioning Temporal Support in TSQL2 to SQL3
155
Here again, we exploit our knowledge of SQL to first write the update ignoring time, then change it in minor ways to take account of time. These statements are reminiscent of the kinds of SQL statements that application programmers are called to write all the time. The potential for increased productivity is dramatic. Statements that previously took hours to write, or were simply too difficult to express, can take only minutes to write with the extensions discussed here. We now return to the important question of migrating legacy databases. In the next section, we formulate several requirements of SQL/Temporal to allow graceful migration of applications from conventional to temporal databases. 4
Migration
The potential users of temporal database technology are enterprises with applications 1 that need to manage potentially large amounts of time-varying information. These include financial applications such as portfolio management, accounting, and banking; record-keeping applications, including personnel, medical records, and inventory; and travel applications such as airline, train, and hotel reservations and schedule management. It is most realistic to assume that these enterprises are already managing time-varying data and that the temporal applications are already in place and working. Indeed, the uninterrupted functioning of applications is likely to be of vital importance. For example, companies usually have applications that manage the personnel records of their employees. These applications manage large quantities of timevarying data, and they may benefit substantially from built-in temporal support in the DBMS 11. Temporal queries that are shorter and more easily formulated are among the potential benefits. This leads to improved productivity, correctness, and maintainability. This section explores the problems that may occur when migrating database applications from an existing to a new DBMS, and it formulates a number of requirements 1 to the new DBMS that must be satisfied in order to avoid different potential problems when migrating. We proceed by identifying four successively more general levels of queries and modifications. 4.1
Upward Compatibility
Perhaps the most important aspect of ensuring a smooth transition is to guarantee that all application code without modification will work with the new system exactly with the same functionality as with the existing system. To explore the relationship between nontemporal and temporal data and queries, we employ a series of figures that demonstrate increasing query and update functionality. In Fig. 1, a conventional table is denoted with a rectangle. 1 We use "database application" non-restrictively, for denoting any software system that uses a DBMS as a standard component.
156
Richard T. Snodgrass et al.
The current state of this table is the rectangle in the upper-right corner. Whenever a modification is made to this table, the previous state is discarded; hence, at any time only the current state is available. The discarded prior states are denoted with dashed rectangles; the right-pointing arrows denote the modification that took the table from one state to the next state.
Time I I I I
I I
I I I I. . . . . .
I I I I. . . . . .
I I
I I I
Q00
~d I I I I. . . . . .
Fig. 1. Level 1 evaluates an SQL3 query over a table without temporal support and returns a table also without temporal support
When a query q is applied to the current state of a table, a resulting table is computed, shown as the rectangle in the bottom right corner. While this figure only concerns queries over single tables, the extension to queries over multiple tables is clear. Upward compatibility states that (1) all instances of tables in SQL3 are instances of tables in SQL/Temporal, (2) all SQL3 modifications to tables in SQL3 result in the same tables when the modifications are evaluated according to SQL/Temporal semantics, and (3) all SQL3 queries result in the same tables when the queries are evaluated according to SQL/Temporal. By requiring that SQL/Temporal is a strict superset (i.e., only adding language constructs), it is relatively easy to ensure that SQL/Temporal is upward compatible with SQL3. Throughout, we provide examples of the various levels. In Sec. 5, we show these examples expressed in SQL/Temporal. EXAMPLE 1: A company wishes to computerize its personnel records, so it creates two tables, an employee table and a monthly salary table. Every employee must have a salary. These tables are populated. A view identifies those
Transitioning Temporal Support in TSQL2 to SQL3
157
employees with a monthly salary greater than $3500. Then employee Therese is given a 10% raise. Since the salary table has no temporal support, Therese's previous salary is lost. These schema changes and queries can be easily expressed in SQL3.
4.2
Temporal Upward
Compatibility
If an existing or new application needs support for the temporal dimension of the data in one or more tables, the table can be defined with or altered to add valid-time support (e.g., by using the CREATE TABLE ... AS VALIDor ALTER ... ADD VALIDstatements). The distinction of a table having valid-time support is orthogonal to the many other distinctions already present in SQL/Foundation, including "base table" versus "derived table," "created table" versus "declared table," "global table" versus "local table," "grouped table" versus ungrouped table, ordered table versus table with implementation-dependent order, "subtable" versus "supertable," and "temporary table" versus "permanent table." These distinctions can be combined, subject to stated rules. For example, a table can be simultaneously a temporary table, a table of degree 1, an inherently updatable table, a viewed table and a table with valid-time support. In most of the SQL3 specification, it does not matter what distinctions apply to the table in question. In those few places where it does matter, the syntax and general rules specify the distinction. It is undesirable to be forced to change the application code that accesses the table without temporal support that is replaced by a table with valid-time support. We formulate a requirement that states that the existing applications on tables without temporal support will continue to work with no changes in functionality when the tables they access are altered to add valid-time support. Specifically, temporal upward compatibility requires that each query will return the same result on an associated snapshot database as on the temporal counterpart of the database. Further, this property is not affected by modifications to those tables with valid-time support. Temporal upward compatibility is illustrated in Fig. 2. When valid-time support is added to a table, the history is preserved, and modifications over time are retained. In this figure, the state to the far left was the current state when the table was made temporal. All subsequent modifications, denoted by the arrows, result in states that are retained, and thus are solid rectangles. Temporal upward compatibility ensures that the states will have identical contents to those states resulting from modifications of the table without valid-time support. The query q is an SQL3 query. Due to temporal upward compatibility the semantics of this query must not change if it is applied to a table with valid-time support. Hence, the query only applies to the current state, and a table without temporal support results. EXAMPLE 2: We make both the employee and salary tables temporal. This means that all information currently in the tables is valid from today on. We
158
Richard T. Snodgrass et al. Time m
-I
f
-~
QOe
q
Fig. 2. Level 2 evaluates an SQL3 query over a table with valid-time support and returns a table with similar support
add an employee. This modification to the two tables, consisting of two SQL3 INSERT statements, respects temporal upward compatibility. That means it is valid from now on. Queries and views on these tables with newly-added validtime support work exactly as before. The SQL3 query to list where high-salaried employees live returns the current information. Constraints and assertions also work exactly as before, applying to the current state and checked on database modification. It is instructive to consider temporal upward compatibility in more detail. When designing information systems, two general approaches have been advocated. In the first approach, the system design is based on the function of the enterprise that the system is intended for (the "Yourdon" approach 19); in the second, the design is based on the structure of the reality that the system is about (the "Jackson" approach 5). It has been argued that the latter approach is superior because structure may remain stable when the function changes while the opposite is generally not possible. Thus, a more stable system design, needing less maintenance, is achieved when adopting the second design principle. This suggests that the data needs of an enterprise are relatively stable and only change when the actual business of the enterprise changes. Enterprises currently use non-temporal database systems for database management, but that does not mean that enterprises manage only non-temporal data. Indeed, temporal databases are currently being managed in a wide range of applications, including, e.g., academic, accounting, budgeting, financial, insurance, inventory, legal, medical, payroll, planning, reservation, and scientific
Transitioning Temporal Support in TSQL2 to SQL3
159
applications. Temporal data may be accommodated by non-temporal database systems in several ways. For example, a pair of explicit time attributes may encode a valid-time interval associated with a row. Temporal database systems offer increased user-friendliness and productivity, as well as better performance, when managing time-varying data, since they are optimized for such data. The typical situation, when replacing a non-temporal system with a temporal system, is one where the enterprise is not changing its business, but wants the extra support offered by the temporal system for managing its temporal data. Thus, it is atypical for an enterprise to suddenly desire to record temporal information where it previously recorded only snapshot information. Such a change would be motivated by a change in the business. The typical situation is rather more complicated. The non-temporal database system is likely to already manage temporal data, which is encoded using tables without temporal support, in an ad hoc manner. When adopting the new system, upward compatibility guarantees that it is not necessary to change the database schema or application programs. However, without changes, the benefits of the added valid-time support are also limited. Only when defining new tables or modifying existing applications, can the new temporal support be exploited. The enterprise then gradually benefits from the temporal support available in the system. Nevertheless, the concept of temporal upward compatibility is still relevant, for several reasons. First, it provides an appealing intuitive notion of a table with valid-time support: the semantics of queries and modification axe retained from tables without temporal support; the only difference is that intermediate states are also retained. Second, in those cases where the original table contained no historical information, temporal upward compatibility affords a natural means of migrating to temporal support. In such cases, not a single line of the application need be changed when the table is altered to be temporal. Third, conventional tables that do contain temporal information and for which temporal support has been added can still be queried and modified by conventional SQL3 statements in a consistent manner.
4.3
Sequenced Valid Extensions
The requirements covered so far have been aimed at protecting investments in legacy code and at ensuring uninterrupted operation of existing applications when achieving substantially increased temporal support. Upward compatibility guarantees that (non-historical) legacy application code will continue to work without change when migrating, and temporal upward compatibility in addition allows legacy code to coexist with new temporal applications following the migration. The requirement in this section aims at protecting the investments in programmer training and at ensuring continued efficient, cost-effective application development upon migration. This is achieved by exploiting the fact that programmers are likely to be comfortable with SQL.
160
Richard T. Snodgrass et al.
Sequenced valid semantics states that SQL/Temporal must offer, for each query in SQL3, a temporal query that "naturally" generalizes this query, in a specific technical sense. In addition, we require that the SQL/Temporal query be syntactically similar to the SQL3 query that it generalizes. With this requirement satisfied, SQL3-1ike SQL/Temporal queries on tables with temporal support have semantics that are easily ("naturally") understood in terms of the semantics of the SQL3 queries on tables without temporal support. The familiarity of the similar syntax and the corresponding, naturally extended semantics makes it possible for programmers to immediately and easily write a wide range of temporal queries, with little need for expensive training. Fig. 3 illustrates this property. We have already seen that an SQL3 query q on a table with valid-time support applies the standard SQL3 semantics on the current state of that table, resulting in a table without temporal support. This figure illustrates a new query, q~, which is an SQL/Temporal query. Query q~ is applied to the table with valid-time support (the sequence of states across the top of the figure), and results in a table also with valid-time support, which is the sequence of states across the bottom.
ooo
q'~
q
q
q
000
Fig. 3. Level 3 evaluates an SQL/Temporal query over a table with valid-time support and returns a table with similar support
We would like the semantics of q' to be easily understood by the SQL3 programmer. Satisfying sequenced semantics along with the syntactical similarity requirement makes this possible. Specifically, the meaning of q~ is precisely that of applying SQL3 query q on each state of the input table (which must have temporal support), producing a state of the output table for each such application. And when q~ also closely resembles q syntactically, temporal queries are
Transitioning Temporal Support in TSQL2 to SQL3
161
easily formulated and understood. To generate query q', one needs only prepend the reserved word VhLIDTII~ to query q. EXAMPLE 3: We ask for the history of the monthly salaries paid to employees. Asking that question for the current state (i.e., what is the salary for each employee) is easy in SQL3; let us call this query q. To ask for the history, we simply prepend the keyword VALIDTIMEto q to generate the SQL/Temporal query. Sequenced semantics allows us to do this for all SQL3 queries. So let us try a harder one: list the history of those employees for which no one makes a higher salary and lives in a different city. Again the problem reduces to expressing the SQL3 query for the current state. We then prepend VALIDTIMEto get the history. Sequenced semantics also works for views, integrity constraints and assertions. 3 These concepts also apply to sequenced modifications, illustrated in Fig. 4. A valid-time modification destructively modifies states as illustrated by the curved arrows. As with queries, the modification is applied on a state-by-state basis. Hence, the semantics of the SQL/Temporal modification is a natural extension of the SQL modification statement that it generalizes.
U'--
U
U
U
U
U
000
Fig. 4. Level 3 also evaluates an SQL/Temporal modification on a table with valid-time support
EXAMPLE 4: It turns out that a particular employee never worked for the company. That employee is deleted from the database. Note that if we use an SQL3 DELETE statement, temporal upward compatibility requires deleting the information only from the current (and future) states. By prepending the reserved word VALIDTIME to the DELETE statement, we can remove that employee from every state of the table. Many people misspell the town Tucson as "Tuscon," perhaps because the name derives from an American Indian word in a language no longer spoken. To modify the current state to correct this spelling requires a simple SQL UPDATEstatement; let's call this statement u. To correct the spelling in all states, both past and possibly future, we simply prepend the reserved word VALIDTIME to u.
162
4.4
Richard T. Snodgrass et al.
Non-Sequenced Queries and Modifications
In a sequenced query, the information in a particular state of the resulting table with valid-time support is derived solely from information in the state at that same time of the source table(s). However, there are many reasonable queries that require other states to be examined. Such queries are illustrated in Fig. 5, in which each state of the resulting table requires information from possibly all states of the source table.
QOO
OOD
Fig. 5. Level 4 evaluates a non-sequenced SQL/ Temporal query over a table with valid-time support and returns a table with similar support
In this figure, two tables with valid-time support are shown, one consisting of the states across the top of the figure, and the other, the result of the query, consisting of the states across the bottom of the figure. A single query q performs the possibly complex computation, with the information usage illustrated by the downward pointing arrows. Whenever the computation of a single state of the result table may utilize information from a state at a different time, that query is non-sequenced. Such queries are more complex than sequenced queries, and they require new constructs in the query language. EXAMPLE 5: The query "Who was given salary raises?" requires locating two consecutive times, in which the salary of the latter time was greater than the salary of the former time, for the same employee. Hence, it is a non-sequenced query. The concept of non-sequenced queries naturally generalizes to modifications.
Non-sequenced modifications destructively change states, with information retrieved from possibly all states of the original table. In Fig. 6, each state of the table with valid-time support is possibly modified, using information from possi-
Transitioning Temporal Support in TSQL2 to SQL3
163
bly all states of the table before the modification. Non-sequenced modifications include future modifications.
00Q
Fig. 6. Level 4 also evaluates a non-sequenced SQL/Temporal modification on a table with valid-time support
EXAMPLE 6: We wish to give employees a 5% raise if they have never had a raise before. This is not a temporally upward compatible modification, because the modification of the current state uses information in the past. For the same reason, it is not a sequenced update. So we must use a slightly more involved SQL/Temporal UPDATEstatement. In fact, only the predicate "if they never had a raise" need be nonsequenced; the rest of the update can be temporally upward compatible. Views and cursors can also be nonsequenced. EXAMPLE 7: We wish to define a snapshot view of the s a l a r y table in which the row's timestamp period appears as an explicit column. We can also define a valid-time view on this snapshot view that uses the explicit period column as an implicit timestamp. It is important to note that nonsequenced queries axe very different from sequenced queries. In the latter, the query language is providing a temporal semantics; in the former, the query language interprets the timestamp as simply another column. For the user, this means that in nonsequenced queries (modifications, assertions, etc.) the period timestamps must be manipulated explicitly. The operations, such as join and relational difference, are performed with respect to the periods themselves, rather than on the individual states of the tables with temporal support. Reserved words are used to syntactically differentiate temporally upward compatible queries, sequenced queries, and non-sequenced queries, each of which applies a distinct semantics.
164
4.5
Richard T. Snodgrass et al.
Summary
In this section, we have formulated three important requirements that SQL/ Temporal should satisfy to ensure a smooth transition of legacy application code. We review each in turn. Upward compatibility and temporal upward compatibility guarantee that legacy application code needs no modification when migrating and that new temporal applications may coexist with existing applications. They are thus aimed at protecting investments in legacy application code. The requirement that there be a sequenced temporal extension of all existing statements ensures that the extended query language is easy to use for programmers familiar with the existing query language. The requirement thus helps protect investment in programmer training. It also turns out that this property makes the semantics of tables with valid-time support straight-forward to specify and enables a wide range of implementation alternatives 14. These requirements induce four levels of temporal functionality, to be defined in SQL/Temporal. Level 1 This lowest level captures the minimum functionality necessary for the query language to satisfy upward compatibility with SQL3. Thus, there is support for legacy SQL3 statements, but there are no tables with valid-time support and no temporal queries. Put differently, the functionality at this level is identical to that of SQL3. Level 2 This level adds to the previous level solely by allowing for the presence of tables with valid-time support. The temporal upward compatibility requirement is applicable to this subset of SQL/Temporal. This level adds no new syntax for queries or modifications~nly queries and modifications with SQL3 syntax are possible. Level 3 The functionality of Level 2 is enhanced with the possibility of giving sequenced temporal functionality to queries, views, constraints, assertions, and modifications on tables with valid-time support. This level of functionality is expected to provide adequate support for many applications. Starting at this level, temporal queries exist, so SQL/Temporal must be a sequencedconsistent extension of SQL3. Level 4 Finally, the full temporal functionality normally associated with a temporal language is added, specifically, non-sequenced temporal queries, assertions, constraints, views, and modifications. These additions include temporal queries and modifications that have no syntactic counterpart in SQL3.
5
Tables with Valid-Time Support in SQL3
This section informally introduces the new constructs of SQL/Temporal. These constructs are an improved and extended version of those in the consensus temporal query language TSQL2 13. The improvements concern guaranteeing the properties listed in Sec. 4, to support easy migration of legacy SQL3 application code 2. The extensions concern views, assertions, and constraints (specifically
Transitioning Temporal Support in TSQL2 to SQL3
165
temporal upward compatible and sequenced and non-sequenced extensions) that were not considered in the original TSQL2 design. The presentation is divided into four levels, where each successive level adds temporal functionality. The levels correspond to those discussed informally in the previous section. Throughout, the functionality is exemplified with input to and corresponding output from a prototype system 16. The reader may find it instructive to execute the sample statements on the prototype. 5.1
Level 1: U p w a r d Compatibility
Level 1 ensures upward compatibility (see Fig. 1), i.e., it guarantees that legacy SQL3 statements evaluated over databases without temporal support return the result dictated by SQL3. SQL3 E x t e n s i o n s Obviously there are no syntactic extensions to SQL3 at this level. A Quick Tour The following statements are executed on January 1, 1995. A company creates two tables, an employee table and a monthly salary table. Every employee must have a salary. These schema changes can be easily expressed in SQL3. CREATE TABLE employee(ename VARCHAR(12), eno INTEGER PRIMARY KEY, street VARCHAR(22), city VARCHAR(IO), birthday DATE) CREATE TABLE salary(eno INTEGER PRIMARY KEY REFERENCES employee, amount INTEGER) CREATE ASSERTION emp_has_sal CHECK (NOT EXISTS ( SELECT * FROM employee AS e WHERE NOT EXISTS ( SELECT * FROM salary AS s WHERE e.eno = s.eno)))
These tables are populated. INSERT INTO employee VALUES ('Therese', 5873, 'B~hnhofstrasse 121', 'Zurich', DATE '1961-03-21') INSERT INTO employee VALUES ('Franziska', 6542, 'Rennweg 683', 'Zurich', DATE '1963-07-04') INSERT INT0 salary VALUES (6542, 3200) INSERT INT0 salary VALUES (5873, 3300)
166
Richard T. Snodgrass et al. A ~ew identifies those employees with a monthly salary greater than $3500.
CREATE VIEW high_salary AS SELECT * FROM salary WHERE amount > 3500 Employee Thereseisgiven a 10% raise.Sincethesalarytab~ has notemporal support, Therese's previous salary islost. UPDATE salary s SET amount = I.i * amount WHERE s.eno = (SELECT e.eno FROM employee e WHERE e.ename = 'Therese') COMMIT
5.2
Level 2: Temporal Upward Compatibility
Level 2 ensures temporal upward compatibility as depicted in Fig. 2. Temporal upward compatibility is straightforward for queries. They are evaluated over the current state of a database with valid-time support. SQL3 E x t e n s i o n s The create table statement is extended to define tables with valid-time support. Specifically, this statement can be followed by the clause "AS VALIDTIME", e.g., "AS VALIDTIME PERIOD(DATE)." This specifies that the table has valid-time support, with states indexed by particular days. The alter table statement is extended to permit valid-time support to be added to a table without such support or dropped from a table with valid-time support. A table with valid-time support is conceptually a sequence of states indexed with valid-time granules at the specified granularity. This is the view of a table with valid-time support adopted in temporal upward compatibility and sequenced semantics. At a more specific logical level, a table with valid-time support is also a collection of rows associated with valid-time periods. Indeed, our definition of the semantics of the addition to SQL/Temporal being proposed satisfies temporal upward compatibility and sequenced semantics. Quick Tour: P a r t 2 The fol~wing statements are executed on February 1, 1995. ALTER TABLE salary ADD VALIDTIMEPEKIOD(DATE) ALTER TABLE employee ADD VALIDTIME PEKIOD(DATE) The followingstatements are typed in the next day(February 2, 1995). INSERT INTO employee VALUES('Lilian', 3463, '46 Speedway', 'Tuscon', DATE '1970-03-09') INSERT INTO salary VALUES(3463, 3400) COMMIT
Transitioning Temporal Support in TSQL2 to SQL3
167
The employee table contains the following rows. (In these examples, we used open-closed (" ...)") for periods.) ename eno street city birthday ITherese 5873 Bahnhofstrasse 121 Zurich 1961-03-21 Franziska 6542 Rennweg 683 Zurich 1963-07-04 Lilian 3463 46 Speedway Tuscon 1970-03-09
Valid 1995-02-01 - 9999-12-31) 1995-02-01 - 9999-12-31) 1995-02-02- 9999-12-31)
Note that the valid time extends to the end of time, which in SQL3 is the largest date. The s a l a r y table contains the following rows. eno 6542 5873 3463
amount Valid 1995-02-01- 9999-12-31) 3200 1995-02-01- 9999-12-31) 3630 3400 1995-02-02- 9999712-31 )
We continue, still on February 2. Tables, views, and queries act like before, because temporal upward compatibility is satisfied. To find out where the highsalaried employees live, use the following. SELECT e n a m e , city FROM high_salary AS s, employee AS e WHERE s . e n o = e . e n o
Evaluated over the current state, this returns the employee Therese, in Zfirich. Assertions and referential integrity act like before, applying to the current state. The following transaction will abort due to (1) a violation of the PRIMARY KEY constraint, (2) a violation of the emp_has_sal assertion and (3) a referential integrity violation, respectively. INSERT INTO e m p l o y e e VALUES ('Eric', 3463, '701 Broadway', 'Tucson', DATE
'1 9 8 8 - 0 1 - 0 6 ' )
INSERT INTO employee VALUES ('Melanie', 1234, '701 Broadway', 'Tucson', DATE
'1 9 9 1 - 0 3 - 0 8 ' )
INSERT INTO salary VALUES(9999, 4900) COMMIT
5.3
Level 3: Sequenced Language C o n s t r u c t s
Level 3 adds syntactically similar, sequenced counterparts of existing queries, modifications, views, constraints, and assertions (see Fig. 3). Sequenced SQL/ Temporal queries produce tables with valid-time support. The state of a result table at each time is computed from the state of the underlying table(s) at the
168
Richard T. Snodgrass et al.
same time, via the semantics of the contained SQL3 query. In this way, users are able to express temporal queries in a natural fashion, exploiting their knowledge of SQL3. Temporal views, assertions and constrains can likewise be naturally expressed. S Q L 3 E x t e n s i o n s Temporal queries, modifications, views, assertions, and constraints are signaled by the reserved word VALIDTIME. This reserved word can appear in a number of locations. D e r i v e d t a b l e in a f r o m c l a u s e In the from clause, one can prepend VALIDTIME to a < q u e r y expression>. View definition Temporal views can be specified, with sequenced semantics. Assertion definition A sequenced assertion applies to each of the states of the underlying table(s). This is in contrast to a snapshot assertion, which is only evaluated on the current state. In both cases, the assertion is checked before a transaction is committed. Table and column constraints When specified with VALIDTIME,such constraints must apply to each state of the table with valid-time support. Cursor expression Cursors can range over tables with valid-time support. Single-row select Such a select can return a row with an associated valid time. Modification statements When specified with VALIDTIME, the modification applies to each state comprising the table with valid-time support. In all cases, the VALIDTIMEreserved word indicates that sequenced semantics is to be employed. An optional period expression after VALIDTIME specifies that the valid-time period of each row of the result is intersected with the value of the expression. This allows one to restrict the result of a select statement, cursor expression, or view definition to a specified period, and to restrict the time for which assertion definitions, table constraints and column constraints are checked. Q u i c k T o u r : P a r t 3 We evaluate the following statements on March 1, 1995. Prepending VhLIDTIME to any SELECT statement evaluates that query on all states, in a sequenced fashion. The first query provides the history of the monthly salaries paid to employees. This query is constructed by first writing the snapshot query, then prepending VALIDTIME. VALIDTIME SELECT ename, amount F R O M s a l a r y AS s, e m p l o y e e AS e W H E R E s.eno = e . e n o
This evaluates to the following. ename Franziska Therese Lilian
amount 3200 3630 3400
Valid 1995-02-01 - 9999-12-31) 1995-02-01- 9999-12-31) 1995-02-02- 9999-12-31)
Transitioning Temporal Support in TSQL2 to SQL3
169
List those for which no one makes a higher salary in a different city, over all time. VALIDTIME SELECT e n a m e FROM employee AS el, salary AS sl el.eno = sl.eno AND NOT EXISTS (SELECT e n a m e FROM employee AS e2, salary AS s2 WHERE e2.eno = s2.eno AND s2.amount > sl.amount AND el.city <> e2.city)
This gives the following result. ename Valid Therese 1995-02-01- 9999-12-31) Franziska 1995-02-01 1995-02-02) i Therese is listed because the only person in a different city, Lilian, makes a lower salary. Franziska is listed because for that one day, there was no one in a different city (Lilian did not join the company until February 2). The reserved word VALIDTIMEspecifies that the semantics of the query to which it is prepended is a sequenced semantics. Conceptually the query is evaluated independently on every state of the underlying tables (cf. Fig. 3). This ensures that the user's intuition about SQL carries over to sequenced queries and modifications. A formal semantics for sequenced queries has been developed 14, 3. While Fig. 3 provides the meaning of sequenced queries in terms of states, the formal semantics is expressed in terms of manipulations on the period timesta.mps of the underlying tables with valid-time support. We then create a temporal view, similar to the non-temporal view defined earlier. In fact, the only difference is the use of the reserved word VALIDTIME. C~EATE VIEW high_salary_history AS VALIDTIME SELECT * FROM salary WHERE s .salary > 3500
Finally, we define a temporal column constraint. ALTER TABLE s a l a r y ADD VALIDTIME CHECK (amount > 1000 AND amount < 12000) COMMIT
Rather than being checked on the current state only, this constraint is checked on each state of the s a l a r y table. This is useful to restrict retroactive changes 6, i.e., changes to past states and predictive changes, i.e., changes to future states. This constraint is satisfied for all states in the table. Sequenced modifications are similarly handled. To remove employee number 5873 for all states of the database, we use the following statement.
170
Richard T. Snodgrass et al.
VALIDTIMEDELETE VALIDTIMEDELETE
F R O M e m p l o y e e W H E R E eno = 5873 F R O M s a l a r y W H E R E eno ffi 5873
COMMIT
To correct the common misspelling of Tucson, we use the following statement. VALIDTIMEUPDATEemployee
SET c i t y = 'Tucson' c i t y = 'Tuscon'
WHERE
COMMIT
This updates all incorrect values, at all times, including the past and future. Lillian's city is thus corrected.
5.4
Level 4: Non-Sequenced Language Constructs
Level 4 accounts for non-sequenced queries (see Fig. 5) and non-sequenced modifications (see Fig. 6). Many useful queries and modifications are in this category. However, their semantics is necessarily more complicated than that of sequenced queries, because non-sequenced queries cannot exploit that useful property. Instead, they must support the formulation of special-purpose user-defined temporal relationships between implicit timestamps, datetime values expressed in the query, and stored datetime columns in the database. Nonsequenced SQL/Temporal queries can produce tables with or without valid-time support, depending on whether the valid-time period of the resulting rows is provided in the query. The state of a result table, if a table is without valid-time support, or the state of a result table at each time, if a table has validtime support, is computed from potentially all of the states of the underlying table(s), at any time. The semantics are quite simple. A nonsequenced evaluation treats a table with valid-time support as a table without temporal support, but with an additional column containing the timestamp. We again emphasize that this semantics is quite different from temporally upward compatible semantics (where the query is evaluated only on the current state) and from sequenced semantics (where the query is effectively evaluated on each state independently). SQL3 E x t e n s i o n s Nonsequenced valid queries are signaled by the new reserved word NONSEQIIENCEDpreceding the reserved word VALIDTIME.This applies analogously to nonsequenced modifications, views, assertions, and constraints. This reserved word can appear in a number of locations. D e r i v e d table in a from clause In the from clause, one can prepend NONSEQUENCED VhLIDTIMEto a . This results in a table without temporal support, and is the means of removing the valid-time support of a table.
Transitioning Temporal Support in TSQL2 to SQL3
171
View definition Nonsequenced views can be specified. A s s e r t i o n definition A nonsequenced assertion applies to the underlying table(s), considered as snapshot tables with an additional explicit timestamp column. This is in contrast to a snapshot assertion, which is only evaluated on the current state. In both cases, the assertion is checked before a transaction is committed. Table a n d c o l u m n c o n s t r a i n t s When specified with NONSEQUENCED VALIDTIME~ such constraints apply to the table with the valid timestamp treated as an explicit column. C u r s o r expression Cursors can range over the result of a nonsequenced select. Single-row select A nonsequenced single-row select will return a row without temporal support, even when evaluated over tables with valid-time support. Modification statements When specified with NONSEQUENCED VALIDTIME, the modification applies to the table considered as a snapshot table. In all cases, the NONSEQUENCEDreserved word indicates that nonsequenced semantics is to be employed. The syntax of a is extended to the following. { { NONSEQUENCED } VALIDTIME { } } An optional period expression after NONSEQUENCED VhLIDTIMEspecifies the valid-time period of each row of the result, and thus renders the resulting table to have valid-time support. This enables a table without temporal support to be converted into a table with valid-time support within a query or other statement. For modification statements, the period expression after VALIDTIMEspecifies the temporal scope of the modification: the times at which the modification is to be applied. The value expression "VALIDTIME( )" is available; it evaluates to the valid-time period of the row associated with the correlation or table name. This is required because valid-time periods of tables with valid-time support are not explicit columns (the alternative violates temporal upward compatibility). The following quick tour provides examples of these constructs. Quick Tour: P a r t 4 This quick tour starts with the database as it was when we last left it, in the previous quick tour. The employee table has the following contents. ename eno street city birthday Valid Franziska 6542 Rennweg 683 Zurich 1963-07-04 1995-02-01 - 9999-12-31) Lilian 3463146Speedway lTucson 1970-03-09 1995-02-02- 9999-12-31)
172
Richard T. Snodgrass et al.
The salary table has the following contents. eno amount Valid 6542 3200 1995-02-01 - 9999-12-31) 3463 3400 1995-02-02- 9999-12-31) A period expression after VALIDTIMEspecifies the temporal scope of the result. List those who were employed sometime during the first six months. VALIDTIME PERIOD '1995-01-01 SELECT ename FROM employee
- 1995-07-01)'
This returns the following table. ename Valid Franziska 1995-02-01- 1995-07-01) Lilian 1995-02-02- 1995-07-01) On April 1, 1995, we give Lilian a 57o raise, starting immediately. This is a temporally upward compatible modification, and so is already expressible in SQL. UPDATE salary SET amount = 1.05 * amount WHERE eno = (SELECT S.eno FROM salary AS S, employee as E WHEKE ename = 'Lilian' AND E.eno = S.eno) COMMIT
This results in the following s a l a r y table. eno amount Valid 6542 3200 1995-02-01- 9999-12-31)i 3463 3400 1995-02-02 1995-04-01) 1995-04-01 9999-12-31) 3463 3570 To determine who was given salary raises, we must simultaneously consider two consecutive states of the s a l a r y table, before and after the raise. This requires a nonsequenced query. NONSEQUENCED VALIDTIME SELECT ename FROM employee AS E, salary AS Sl, salary AS S2 WHERE E.eno = Sl.eno AND E.eno = S2.eno AND Sl.amount < S2.amount AND VALIDTIME(Sl) MEETS VALIDTIME(S2)
Transitioning Temporal Support in TSQL2 to SQL3
173
MEETS ensures that the valid-time period associated with Sl is immediately followed by the valid-time period associated with S2. Since the valid-time period of a row is not in an explicit column (as this would violate temporal upward compatibility), VALIDTIME() is used to extract the associated valid-time period. The result is a table without temporal support, because NONSEQUENCEDis not followed by a period expression.
If we instead wish to get back a table with valid-time support, i.e., "Who was given salary raises, and when did they receive the higher salary?", we place a after VALIDTIMEto specify when each resulting row is valid. Our first try is the following, in which the extracts the valid timestamp of S2. NONSEQUENCED VALIDTIME VALIDTIME(S2) SELECT ename FROM employee AS E, salary AS Sl, salary AS S2 WHERE E.eno = Sl.eno AND E.eno = S2.eno AND Sl.amount < S2.amount AND VALIDTIME(SI) MEETS VALIDTIME(S2) Because an expression is associated with NONSEQUENCED VALIDTIME, the result will be a table with valid-time support, with a valid timestamp of the value of the timestamp of S2. However, this is not quite correct, because the period expression following VALIDTIME can only mention the columns of the following select statement, and the timestamp of S2 is not available.So we put the value in the select listand use an enclosing (sequenced) selectstatement to get rid of this extra column. VALIDTIME SELECT ename FROM (NONSEQUENCED VALIDTIME S2valid SELECT ename, VALIDTIME(S2) AS S2valid FROM employee AS E, salary AS SI, salary AS S2 WHERE E.eno = Sl.eno AND E.eno = S2.eno AND Sl.amount < S2.amount AND VALIDTIME(Sl) MEETS VALIDTIME(S2) ) AS S
The inner query evaluates to two columns, ename and S2valid. The NONSEQUENCED VALIDTIME includes a , specifying that a table with valid-time support is desired. The valid timestamp of each row is the same as the value of the S2valid column. The outer query just projects out the ename column, retaining the valid timestamp. This query has the following result. ename Valid Lilian 1995-04-01 - 9999-12-31)
174
Richard T. Snodgrass et al.
If we had desired the time when the person had received the lower salary, we would simply specify VALIDTIME(S1) instead. This query is admittedly more complex to specify than the sequenced queries given in the previous section. In nonsequenced queries the user is doing all the work of manipulating the timestamps, whereas in sequenced queries, the DBMS handles the timestamps automatically, freeing the user from this concern. The reason that nonsequenced queries are included is that some (very useful) queries cannot be expressed using the sequenced semantics, the query just given being one example. Following VALIDTIME with a period expression in a modification (whether sequenced or not) specifies the temporal scope of the modification. Two applications of this are retroactive and future changes. Assume it is now May 1, 1995. Franziska, employee 6542, will be taking a leave of absence the last half of the year. VALIDTIME PERIOD ' 1995-07-01DELETE FROM salary
WHERE eno
=
1996-01-01)'
6542
VALIDTIME PERIOD '1995-07-01 DELETE FROM employee WHERE eno = 6542
- 1996-01-01)'
COMMIT
The s a l a r y table now has the~llowingcontents. eno amount 6542 3200 6542 3200 3463 3400 3463 3570
Valid 1995-02-01- 1995-07-01) 1996-01-01 - 9999-12-31) 1995-02-02- 1995-04-01) 1995-04-01- 9999-12-31)
The employee table has the following contents. ename Franziska Franziska Lilian
eno street city 6542 Rennweg 683 Zurich 6542 Rennweg 683 Zurich 13463 46 Speedway Tucson
birthday Valid 1963-07-04 1995-02-01 - 1995-07-01) 1963-07-04 1996-01-01 - 9999-12-31) 1970-03-09 1995-02-02 - 9999-12-31)
Note that these deletions split single periods into two, with a lapse between them. M a n y modifications are greatly simplified in this way. Also note that previously specified sequenced valid referential integrity and other constraints and assertions must apply to each state. Hence, ifthe firstDELETE was performed, but not the second, the COMMIT will abort because the emp.has_sal constraint is violated for certain states, such as the one on August 1, 1995. The period expression following VALIDTIME is also allowed for assertions and constraints. Assume that no employee m a y make less than 3000 during 1996.
Transitioning Temporal Support in TSQL2 to SQL3 CREATE ASSERTION s a l a r y _ c h e c k VALIDTIME PERIOD ' 1 9 9 6 - 0 1 - 0 1 -
175
1 9 9 7 - 0 1 - 0 1 ) ' CHECK
(NOT EXISTS ( SELECT * FROM salary WHERE amount < 3000 ) )
This is a sequenced assertion, and thus applies separately to each state in 1996. Nonsequenced assertions and constraints apply to all states at once. To assert that there is only one employee with a particular name, we use the following constraint within the employee table definition. CONSTRAINT unique_name ~I~IQUE (ename)
This is interpreted with temporal upward compatible semantics, and so applies only to the current state. If all we do is temporal upward compatible modifications, this will be sufficient. However, if we perform future updates, violations may be missed. To always check all states, a sequenced constraint is used. CONSTRAINT unique_name_per_time VALIDTIHE UNIQUE (ename)
This will ensure that at any time, each ename value is unique. To ensure that each ename is unique, across all states simultaneously, a nonsequenced constraint is required. CONSTRAINT unique_name_over_all_time NONSEOUENCED VALIDTIME UNIQUE (ename)
The above employee table satisfies the first two constraints, but not the third (the nonsequenced one), because there are two rows with an ename of Franziska. As with VALIDTIME, NONSEQUENCED VALIDTIMEcan appear in a from clause. To give employees a 5% raise if they never had a raise before, we first write a temporal upward compatible modification (i.e., without VALIDTIME) to give the raise. UPDATE s a l a r y AS S SET amount = i .05 * amount We can augment this statement to use a non-sequenced query in the from clause to look for raises in the past. UPDATE salary AS S SET amount = 1.05 * amount WHERE NOT EXISTS (SELECT * FROM (NONSEQUENCED VALIDTIME SELECT * FROM salary AS $1, salary AS $2 WHERE Sl.amount < S2.amount AND VALIDTIME(S1) MEETS VALIDTIME(S2) AND Sl.eno = S.eno) AS S3
) AND S.eno = S3.eno COMMIT
176
Richard T. Snodgrass et al.
The N0T EXISTS was added. Assume that the update was entered on J u n e 1, 1995. The following s a l a r y table results. eno 6542 6542 6542 3463 3463
Valid amount 1995-02-01- 1995-06-01) 3200 3360 1995-06-01 - 1995-07-01) 1996-01-01- 9999-12-31) 3360 1995-02-02 - 1995-04-01) 3400 1995-04-01- 9999-12-31) 3570
Since the u p d a t e is evaluated with temporal upward compatible semantics, it changes the salary for valid times after June 1. Finally, we wish to define a snapshot view of the s a l a r y table in which the row's t i m e s t a m p appears as an explicit column, here when. CREATE VIEW snapshot_salary (eno, amount, when) AS NONSEQUENCED VALIDTIME SELECT S.*, VALIDTIME(S) FROM salary AS S
Coming around full circle, we can define a valid-time view on s n a p s h o t _ s a l a r y t h a t uses the explicit column v a l i d t i m e as an implicit timestamp. CREATE VIEW temporal_salary (eno, amount) AS VALIDTIME SELECT eno, amount FROM (NONSEQUENCEDVALIDTIME when SELECT * FROM snapshot_salary AS S) AS S2
This conversion can also be applied within queries and cursors.
6
Transaction-Time Support
Transaction time identifies when d a t a was asserted in the database. If transaction time is supported, the states of the database at all previous points of time are retained and modifications are append-only. Unlike valid time, transaction time cannot be entirely simulated with tables with explicit t i m e s t a m p columns. The reason is t h a t tables with transaction-time support are append-only: they grow monotonically. Specifically, while the query functionality can be simulated on tables with no temporal support, in the same way t h a t valid-time query functionality can be translated into queries on tables with no temporal support, there is no way to restrict the user to modifications t h a t ensure the table is append-only. While one can revoke permission to use DELETE,it is still possible for the user to corrupt the transaction t i m e s t a m p via database updates and insertions. This means that the user can never be sure t h a t w h a t the table says was stored at some time in the past was actually in the table at t h a t time. T h e only way to ensure the consistency of the d a t a is to have the DBMS maintain the transaction timestamps automatically. Many applications need to keep track of the past states of the database, often for audit traceability requirements. Changes are not allowed on the past states;
Transitioning Temporal Support in TSQL2 to SQL3
177
that would prevent secure auditing. Instead, compensating transactions are used to correct errors. When an error is encountered, often the analyst will look at the state of the database at a previous point in time to determine where and how the error occurred. However, SQL-92 (nor the current SQL3 draft) does not support such modifications or queries well. The following example will illustrate the problems. - Assume that we wish to keep track of the changes and deletions of the employee table. If standard SQL was used, this table would have six columns: ename, eno, s t r e e t , c i t y , b i r t h d a t e , and When (a PERIOD indicating when the row was valid). To know when rows are inserted and (logically) deleted, we add two more columns, InsertTime and DeleteTime, both of the data type TIMESTAMP.Of course, adding these two columns breaks the referential integrity constraint between s a l a r y , eno and employee, eno. The reader is invited to write this referential integrity constraint to take into account the three time columns. - We ask "How many highly paid employees have been in each city?" This query is quite complex to formulate in SQL. It turns out that one of the cities shows an unreasonable number of highlypaid current employees (more than 25). When was the error introduced? Is this inconsistency in the database widespread? How long has the database been incorrect? The query "When did we think that there were many highlypaid employees in Tuscon?" provides an initial answer, but is also very difficult to express in SQL.
-
These queries are very challenging, even for SQL experts, when time is involved. Modifications are even more of a problem. A logical deletion must be implemented as an update and an insertion, because we do not want to change the previously stored information. However, there is no way of preventing an application from inadvertently corrupting past states (by incorrectly altering the values of the InsertTime or DeleteTime columns), or a white-collar criminal from intentionally "changing history" to cover up his tracks. The solution is to have the DBMS maintain transaction time automatically, so that the integrity of the previous states of the database is preserved. The query language can also help out, by making it easy to write queries and modifications. With the small syntactic additions proposed here, transaction time can be easily added. ALTER TABLE employee ADD TBANSACTIONTIME
Because the DBMS is maintaining transaction time for us, for this table, we do not have to worry about the integrity of the previous states. The DBMS simply would not let us modify past states. The previously specified sequenced valid referential integrity still applies, always on the current state of the database. No rephrasing of this integrity constraint is necessary.
178
Richard T. Snodgrass et al.
The query "How many highly paid employees have been in each city?" asks for the history in valid time of the current transaction-time state. Hence, it is particularly easy to specify, by exploiting transaction-time upward compatibility. VALIDTIME SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
To find where the error was made, we write the query "When did we think that there are many highly-paid employees in Tucson?" This uses the current time in valid time ("are"), but looks at past states of the database ("when did we think"). This requires a sequenced transaction query, with valid-time upward compatibility. T R A N S A C T I O N T I M E S E L E C T COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 AND city = 'Tucson' GROUP BY city HAVING COUNT(*) > 25
By having the DBMS maintain transaction time, applications that need to retain past states of tables for auditing purposes can have these past states maintained automatically, correctly, and securely. As well, the proposed language extensions enable queries to be written in minutes instead of hours. The concepts of temporal upward compatibility (TUC), sequenced (SEQ), and nonsequenced (NONSEQ)semantics apply orthogonally to valid time and transaction time. The semantics is dictated by three simple rules. The absence of VALIDTIME (respectively, TRANSACTIONTIME) indicates validtime (resp., transaction-time) upward compatibility. The result does not include valid-time (resp., transaction-time) support. - VALIDTIME(respectively, TRANSACTIONTIME)indicates sequenced valid (resp., transaction) semantics. An optional period expression temporally scopes the result. The result includes valid-time (resp., transaction-time) support. NONSEQUENCEDdenotes nonsequenced valid (resp., transaction) semantics. An optional period expression after NONSEQUENCEDVALIDTIME provides a valid-time timestamp, yielding valid-time support in the result. -
-
EXAMPLE 8: Starting with the simple query "Which Tucson employees are paid highly?" we can state queries that are different combinations of TUC, SEQ, and NONSEQ in valid and transaction time. In the following, we indicate valid time, then transaction time. Hence, "TUC/SEQ" means valid-time upward compatible and sequenced transaction-time semantics.
Transitioning Temporal Support in TSQL2 to SQL3
179
TUC//TUC
Which Tucson employees are current paid highly? A table with no temporal support results. SEQ//TUC Which Tucson employees are or were paid highly (as best known)? Note the the employee had to be in Tucson at the same time they were highly paid. A table with valid-time support results. TUC//SEQ Who did we think are the highly-paid Tucson employees? A table with transaction-time support results. NONSEQ/TUC Which highly-paid employees lived at some time in Tucson, as best known? A table with no temporal support results. TUC/N'ONSEQ When was it recorded that a Tucson employee is currently paid highly? A table with no temporal support results. SEQ//SEQ When did we think that some Tucson employee was paid highly, at the same time? A table with both valid-time and transaction-time support results. BEQ//NONSEQ When did we correct the information to record that some Tucson employee was paid highly? A table with valid-time support results. For each transaction time, we get a row with valid-time support, indicating when the employee is now considered to be in Tucson and be highly paid. NONSEQ//SEQ Who was recorded, perhaps erroneously, to have resided in Tucson at some time and was paid highly, perhaps at some other time? Here we get a table with transaction-time support, indicating when the perhaps erroneous data was in the table. NONSEQ//NONSEQ When did we correct the information, to record that some Tucson employee was paid highly, perhaps at some other time? Here a table with no temporal support results.
TUC in valid time translates in English to "at now;" SEQ translates to "at the same time;" and NONSEQ translates to "at any time." TUC in transaction time translates to "as best known;" SEQ translates to "when did we think ... at the same time;" and NONSEQ translates to "when was it recorded that." This example illustrates that all combinations are meaningful and useful. D While this example emphasized the orthogonally of valid and transaction time, that TUC, SEQ, and NONSEQ can be applied equally to both, there are still some differences between the two types of time. First, valid time can have a precision specified by the user at table creation time. The transaction timestamps have an implementation-dependent range and precision. Second, valid time extends into the future, whereas transaction time always ends at now. Third, unlike a following NONSEQUF_~CEDVALIDTIME, a is not permitted after NONSEQUENCED TI~NSACTIONTIME, because it is not possible to compute a transaction timestamp. Such a timestamp may only be inferred via a sequenced transaction query. Finally, during modifications the DBMS provides the transaction time of facts, in contrast with
180
Richard T. Snodgrass et al.
the valid time, which is provided by the user. This derives from the different semantics of transaction time and valid time. Specifically, when a fact is (logically) deleted from a table with transaction-time support, its transaction stop time is set automatically by the DBMS to the current time. When a fact is inserted into the table, its transaction start time is set by the DBMS, again to the current time. An update is treated, concerning the transaction timestamps, as a deletion followed by an insertion. The transaction times that a set of modification transactions give to the modified rows must be consistent with the serialization order of those transactions. The following examples will emphasize the parallel between valid-time and transaction-time support. Specifically, temporal upward compatibility guarantees that conventional, nontemporal queries, updates, etc. work as before, with the same semantics. Since the history of the database is recorded in tables with both valid-time and transaction-time support, we can find out when corrections were made, using a nonsequenced transaction query. Modifications take effect at the current transaction time. However, we can still specify the scope of the change in valid time, both before and after now (retroactive and postactive changes, respectively). Finally, arbitrarily complex queries in transaction time can be expressed with nonsequenced transaction queries. As always, the concepts also apply to views, cursors, constraints, and assertions. Quick Tour: P a r t 5 This quick tour starts with the database as it was when we last left it, at the end of the previous quick tour. The employee table has the following contents. Recall that closed-open periods are used here for the valid-time and transaction-time periods. ename eno street city birthday Valid Franziska 6542 Rennweg 683 Zurich 1963-07-04 1995-02-01 - 1995-07-01) Franziska 6542 Rennweg 683 Zurich 1963-07-04 1996-01-01 - 9999-12-31) Lilian 3463 46 Speedway Tucson 1970-03-09! 1995-02-02 - 9999-12-31) The s a l a r y table has the following contents. eno 6542 6542 6542 3463 3463
amount 3200 3360 3360 3400 3570
Valid 1995-02-01- 1995-06-01) 1995-06-01- 1995-07-01) 1996-01-01- 9999-12-31) 1995-02-02- 1995-04-01) 1995-04-01- 9999-12-31)
We can alter the employee table to be a table with both valid-time and transaction-time support, by adding transaction-time support. Assume that the current date is July 1, 1995. ALTER TABLE employee ADD TRANSACTIONTIME COMMIT
Transitioning Temporal Support in TSQL2 to SQL3
181
Since employee was a table with valid-time support, this statement converts it to the following table with both valid-time and transaction-time support. Recall that an the ending bound of the transaction-time period equal to the end of time simply indicates that the row still logically resides in the table, i.e., has not been logically deleted. ename Franziska Franziska Lilian
eno 6542 6542 3463
street Rennweg 683 Rennweg 683 46 Speedway
city Zurich Zurich Tucson
birthday 1963-07-04 I... 1963-07-04 ... 1970-03-09 ...
Valid Transaction ... 1995-02-01 - 1995-07-01)1995-07-01 - 9999-12-31) I ... 1996-01-01 - 9999-12-31) 1995-07-01 - 9999-12-31) ... 1995-02-02- 9999-12-31) 1995-07-01 - 9999-12-31) We retain the s a l a r y table as a table with valid-time support. Temporal upward compatibility guarantees that conventional, nontemporal queries, updates, integrity constraints, etc. work as before, with the same semantics. We can list those for which (currently, as best known) no one makes a higher salary in a different city. SELECT
ename
FROM employee AS el, salary AS sl WHERE e l.eno = s l.eno AND NOT EXISTS (SELECT ename FROM employee AS e2, salary AS s2 WHERE e2.eno = s2.eno AND s2.amount > sl.amount AND el.city <> e2.city)
This takes a timeslice in both valid time and transaction time at now, and returns the result: Lilian. We can also ask, for all time, when this is true, by simply prepending "VALIDTIME." VALIDTIME SELECT ename FROM employee AS el, salary AS sl WHERE el. eno = sl. eno AND NOT EXISTS (SELECT ename FROM employee AS e2, salary AS s2 WHERE e2.eno = s2.eno AND s2.amount > sl.amount AND el.city <> e2.city)
This returns a table with valid-time support, evaluated with sequenced valid semantics, after the current transaction timeslice has been taken. ename Valid Franziska i1995-02-01- 1995-02-02) Lilian i1995-02-02- 1995-04-01) Lilian 1995-04-01 - 9999-12-31)
182
Richard T. Snodgrass et al.
There are two rows for Lilian, because two rows of s a l a r y participated in computing the result. Interestingly, Franziska satisfied the where condition for exactly one day, before Lilian was hired. Temporally upward compatible modifications also work as before. Assume it is now August 1, 1995. Franziska just moved.
UPDATE employee SET street = 'Niederdorfstrasse 2' WHEKE ename = 'Fr~nziska' COMMIT This update yields the following employee table. Note that although Franziska is at the new address starting on August i, 1995, since she wo not be an employee for the next five months, her new address is recorded from January I, 1996 onward. ename eno street city birthday Franziska 6542 Rennweg 683 Zurich 1963-07-04 .. Franziska 6542 Rennweg 683 Zurich 1963-07-04 .. Franziska 6542 Niederdorfstrasse 2 Zurich 1963-07-04 .. Lilian 3463 i46 Speedway Tucson 1970-03-09 .. ... ... ... ...
Valid 1995-02-01 1996-01-01 1996-01-01 1995-02-02-
1995-07-01) 9999-12-31) 9999-12-31) 9999-12-31)
Transaction 1995-07-01 1995-07-01 1995-08-01 1995-07-01 -
9999-12-31) 1995-08-01) 9999-12-31) 9999-12-31)
Since the history of the database is recorded in tables with b o t h valid-time and transaction-time support, we can find out when corrections were made, using a nonsequenced transaction query. Assume it is now September 1, 1995. T h e query "When was the street corrected, and what were the old and new values?" combines nonsequenced transaction semantics with sequenced valid semantics.
NONSEQUENCED TRANSACTIONTIME AND VALIDTIME SELECT el.ename, el.street AS old_street, e2.street AS new_street, BEGIN(TKANSACTIONTIME(e2)) AS trans_time FROM employee AS el, employee AS e2 WHERE el.eno = e2.eno AND TKANSACTIONTIME(el) MEETS TRANSACTIONTIME(e2) This yields the following table with valid-time support. The trans_time column specifies when the change was made; the implicit timestamp indicates the validtime period of the fact that was changed. ename old_street new_street 21"1 Franziska Rennweg 683 Niederdorfstrasse .. trans_time Valid 9999_12_31)1 . . . 1995-08-011 1996-01-01 -
Transitioning Temporal Support in TSQL2 to SQL3
183
To extract all the information from the employee table, we can use a sequenced valid/sequenced transaction query. VALIDTIME AND TKANSACTIONTIME SELECT
*
FROM employee
Modifications take effect at the current transaction time. However, we can still specify the scope of the change in valid time, both before and after now (retroactive and postactive changes, respectively). Assume it is now October 1, 1995. Lilian moved last June 1. VALIDTIMEPERIOD '1995-06-01 SET street = '124 Alberca' WHERE ename = 'Lilian' COMMIT
- 9999-12-31)' UPDATE employee
This update yields the following employee table. ename eno street city birthday Franziska 6542 Rennweg 683 Zurich 1963-07-04 Franziska 16542 Rennweg 683 Zurich 1963-07-04 Franziska 16542 Niederdorfstrasse 2 Zurich 1963-07-04 Lilian 3463 46 Speedway Tucson ~1970-03-09 Lilian 3463 46 Speedway Tucson 1970-03-09 Lilian 3463 124 Alberca Tucson 1970-03-09 ... ... ... ...
Valid 1995-02-01- 1995-07-01) 1996-01-01- 9999-12-31) 1996-01-01- 9999-12-31) 1995-02-02- 9999-12-31) 1995-02-02- 1995-06-01) 1995-06-01- 9999-12-31)
.. .. .. .. .. .. Transaction 1995-07-01- 9999-12-31) 1995-07-01- 1995-08-01) 1995-08-01- 9999-12-31) 1995-0~01- 1995-10-01) 1995-10-01- 9999-12-31) 1995-10-01- 9999-12-31)
Finally, arbitrarily complex queries in transaction time can be expressed with nonsequenced transaction queries. The query, "When was an employee's address for 1995 corrected?", involves nonsequenced transaction semantics and sequenced valid semantics, with a temporal scope of 1995. Assume that it is November 1, 1995. NONSEQUENCEDTKANSACTIONTIMEAND VALIDTIME PERIOD '1995-01-01 - 1996-01-01)' SELECT el.ename, el.street AS old_street, e2.street AS new_street, BEGIN(TKANSACTIONTIME(e2)) AS trans_time FROM employee AS el, employee AS e2 WHERE el.eno = e2.eno AND TRANSACTIONTIME(el) MEETS TRANSACTIONTIME(e2) AND el.street <> e2.street
184
Richard T. Snodgrass et al.
This evaluates to the following result, which has an explicit column denoting the date the change was made, and an implicit valid time indicating the time in reality in question. ename old_street Inew-street trans-time I/Valid I Lilian 46 Speedway 124 Alberca 1995-10-01/11995-06-01 - 1996-01-01)1 Note that the period from February through May is not included in the valid time, as the street did not change for that period. As always, the concepts also apply to views, cursors, constraints, and assertions. In Sec. 5.3 we gave an example of a sequenced constraint (VALIDTIME CHECK (amount > 1000 AND amount < 12000)) on the s a l a r y table. This constraint must hold independently on every (valid-time) state of the table. In Sec. 5.4 we gave a series of valid-time constraints on the ename column of the employee table. Those alternatives apply orthogonally to the transaction time. As an example, the assertion, "An entry in the security table can never be updated. It can only be deleted, and a new entry, with another key value, inserted.", can be expressed with a nonsequenced transaction semantics, stating in effect that the key value is unique over all transaction time. CREATE TABLE s e c u r i t y ( keyvalue NUMERIC(8) NONSEQUENCED TRANSACTIONTIME UNIQUE,
)
7
Comparison with the UK Proposal
We end by comparing the above constructs, termed the US proposal, with the UK proposal 18, which has been incorporated into Part 7, SQL/Temporal 8, by applying them to the simple case study introduced in Secs. 3 and 6. This comparison will revisit and exemplify many of the salient points made earlier. These examples illustrate that SQL/Temporal could be extended in a minimal fashion along the lines discussed in this paper to provide much better support for temporal applications. 1. An employee table has five columns, ename, eno, s t r e e t , c i t y , and b i r t h date. The related salary table has two colmnns, eno and amount. Column s a l a r y , eno is a foreign key referencing the column employee, eno.
SQL without time: CREATE TABLE employee (ename VARCHAR(12), eno INTEGER PRIMARY employee, street VARCHAR(22), city VARCHAR(IO), birthday DATE) CREATE TABLE salary(eno INTEGER PRIMARY KEY REFERENCES employee, amount INTEGER)
Transitioning Temporal Support in TSQL2 to SQL3
185
US proposal with time: (discussed in this paper): CREATE TABLE employee (ename VARCHAR(12), eno INTEGER VALIDTIME PRIMARY KEY, street VARCHAR(22), city VARCHAR(10), birthday DATE) AS VALIDTIME PERIOD(DATE) CREATE TABLE salary(eno INTEGER VALIDTIME PRIMARY KEY VALIDTIME REFERENCES employee, amount INTEGER) AS VALIDTIME PERIOD(DATE) "AS VALIDTIME PERIOD(DATE)" specifiesthat an unnamed column, maintained by the D B M S , will contain the row's timestamp. "VALIDTIME" specifiesthat the integrity constraints (primary key, referential integrity) are to
apply at each instant (in this case, each day). UK proposal with time: CREATE TABLE employee(ename VARCHAR(12), eno INTEGER, street VARCHAR(22), city VARCHAR(10), birthday DATE, When PERIOD(DATE) ) CREATE TABLE salary(eno INTEGER, amount INTEGER, When PERIOD(DATE) )
The UK proposal does not have support for referential integrity for such tables, nor for primary key constraints (adding When to the primary key does not work). Additional syntax is needed. Currently the only way to do this is with complex ASSERTIONs, left as an exercise for the reader. 2. "List the history of those employees who have or had no salary."
SQL without time: SELECT ename FROM employee WHERE eno N0T IN (SELECT eno FROM salary)
US proposal: VALIDTIME SELECT ename FROM employee WHERE eno N0T IN (SELECT eno FROM salary)
To get the history of any query using the US proposal, simply prepend VALIDTIME. The change proposal and public-domain prototype demonstrate that the semantics may be implemented via a period-based algebra. The large body of performance-related research in temporal databases is applicable to implementing this semantics.
UK proposal: WITH El AS (SELECT eno, ename, EXPAND(When) AS EW FROM employee) WITH Sl AS (SELECT eno, EXPAND(When) AS EW FROM salary)
186
Richard T. Snodgrass et al. SELECT ename, PERIOD When, When AS When FROM El, TABLE(EI.EW) AS E2(When) WHERE eno NOT IN (SELECT Sl.eno FROM Sl, TABLE(SI.EW) AS S2(When) WHERE S2.When = E2.When) NORMALIZE 0N When
The semantics of EXPANDis to duplicate each row of the argument table for each granule (day) in the When period. Once this table has been expanded, perform the NOT IN individually, for each day (examining only those s a l a r y rows valid on the day in question), then NORMALIZEthe When column back to a period (collecting contiguous days into a single period). If each row is valid on average for one year, then the result of the equijoin of E1 and E2 will have 360 times the number of rows of employee, with a dramatic decrease in performance. Changing the granularity to second generates additional tuples on the order of a factor l0 s, which could seriously affect performance. The approach of using EXPANDINGdoes not work here, because the aggregate should be evaluated between the EXPANDand the NORMALIZE. The UK committee has provided a construct, EXCEPT EXPANDING,which can also be used to express this particular special case. The user can take the original SQL query, above, and map it into the relational algebra, with NOT IN being mapped to relation difference. 7rename(employee ~>~ (Treno(employee) -- 7reno(salary))) Then the user can m a p this back into SQL. SELECT ename FROM employee WHERE eno IN (SELECT eno FROM employee EXCEPT SELECT eno FROM salary)
As a third step, the user can m a p t h i s i n t o a temporal query using EXPAND and NORMALIZE. W I T H E 1 AS (SELECT ename, eno, EXPAND(When) AS EW FROM employee), E2 AS (SELECT eno, EXPAND(When) AS EW FROM (SELECT eno FROM employee EXCEPT EXPANDING(When) SELECT eno FROM salary) AS E3) SELECT ename, PERIOD E3.When, E3.When AS When FROM El, TABLE(EI.EW) AS E3(When) WHF.SU~EI.eno IN (SELECT eno FKOM E2, TABLE(E2.EW) AS E4(When) WHERE El.eno = E2.eno AND E3.When = E4.When) NORMALIZE ON When
Transitioning Temporal Support in TSQL2 to SQL3
187
This trick of using EXCEPT can also be applied with the US proposal, but omitting the complex third step and the EXPANDsand NORMALIZEsentirely. VALIDTIME SELECT e n a m e FROM employee WHERE eno IN (SELECT eno FROM employee EXCEPT SELECT eno FROM salary)
All of the UK alternatives have the problem (not shared by the US alternatives) that if the left-hand table has duplicates, then NORMALIZEwill automatically remove them, yielding an incorrect result (the original SQL query did not specify DISTINCT). It is an exercise to the reader to show how this English query can be correctly expressed using an explicit When column. It is possible to do so, but it is exceedingly difficult. There have been essentially no results published on how to optimize queries with expansion or normalize operations. Also, no general procedure has been provided for converting an arbitrary, non-temporal query into its temporal analogue using the UK constructs. Finally, while EXCEPT EXPANDING has been provided, no other expanding variants have been defined for other relational operators. In contrast, in the US proposal a sequenced variant of any query can be specified by prepending the VALIDTIMEreserved word. 3. "Give the history of the munber of highly-paid employees in each city."
SQL without time: SELECT city, COUNT(*) FROM employee, salary WHERE employee, eno -- salary, eno AND amount > 5000 GROUP BY city
US proposal: VALIDTIME SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city The VALIDTIME specifiesthat we are interested in the time-varying count. The syntax is declarative.The semantics is specifiedon a row-by-row basis; changing the granularityfrom day to second willnot impact its performance.
UK proposal: WITH E1 AS (SELECT eno, city, EXPAND(When) AS EW FROM employee), Sl AS (SELECT eno, EXPAND(When) AS EW FROM salary WHERE amount > 5000)
188
Richard T. Snodgrass et al. SELECT city, COUNT(*), PERIOD E2.When, E2.When AS When FROM El, TABLE(EI.EW) AS E2(When), SI, TABLE(SI.EW) AS S2(When) WHERE E2.When = S2.when AND El.eno = Sl.eno GROUP BY city, when NORMALIZE ON When
T h e syntax is procedural: first expand, then execute the select, then normalize. The EXPAND operator generates a SET of DAYs, which is then used to duplicate the rows of employee, one for each day each row is valid (the join in the from clause). The GROUP BY ensures t h a t the COUNTis performed separately for each day. T h e NORMALIZE converts the m a n y rows, one for each day, into periods. 4. "Give Therese a salary of $6,000 for 1994."
SQL without time: UPDATE salary SET amount = 6000 WHERE eno IN (SELECT eno FROM employee WHEKE ename = ' T h e r e s e ' )
US proposal: VALIDTIME PERIOD '1994-01-01 - 1 9 9 4 - 1 2 - 3 1 ' UPDATE salary SET amount 6000 WHERE eno IN (SELECT eno FROM employee WHERE ename -- ' T h e r e s e ' ) =
UK proposal: T h e U K proposal has no support for this operation. Instead, each row must be examined to determine the overlap with 1994, and adjusted with an UPDATE and two INSERT statements. This is left as an exercise for the reader. 5. To know when rows are inserted and (logically) deleted, we add transactiontime support.
US proposal: ALTER TABLE employee ADD TRANSACTIONTIME ALTER TABLE salary ADD TRANSACTIONTIME
Since transaction time is automatically managed by the DBMS, system integrity is ensured. Due to temporal upward compatibility, the integrity constraints work as before, as do updates, such as the one above.
UK proposal: ALTER TABLE employee ADD COLUMN InsertTime TIMESTAMP(3) DEFAULT CURRENT_TIMESTAMP
ALTER TABLE employee ADD COLUMN D e l e t e T i m e TIMESTAMP(3) DEFAULT NULL
Transitioning Temporal Support in TSQL2 to SQL3
189
ALTER TABLE salary ADD COLUMN InsertTime TIMESTAMP(3) DEFAULT CUKRENT_TIMESTAMP ALTER TABLE salary ADD COLUMN DeleteTime TIMESTAMP(3) DEFAULT NULL
There is no support for transaction time in the UK proposal. There is no way to ensure that the application correctly manages the information in these two columns. System integrity can easily be compromised. Adding these two columns also breaks the primary key and referential integrity constraints. Such constraints must be reformulated as complex assertions that take the three time columns into account. Updates are more complicated when these additional columns are present. 6. "How many highly-paid employees are in each city?"
SQL without time: SELECT city, COUNT(*) FKOM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
US proposal: SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
This still works, because the default is to take the currently valid data that has not been deleted or updated (temporally upward compatible in both valid and transaction time).
UK proposal: WITH El AS (SELECT eno, city FROM employee WHERE DeleteTime IS NULL AND CURRENTDATE OVF2J~APS When), Sl AS (SELECT eno FROM salary WHERE DeleteTime IS NULL AND CURRENT_DATE OVEBLAPS When AND amount > 5000) SELECT city, COUNT(*) FROM El, Sl WHERE El.eno = Sl.eno GROUP BY city
Since temporal upward compatibility is not satisfied by the UK proposal, the user must explicitly select the current information.
190
Richard T. Snodgrass et al. To get the history of the number of highly-paid employees in each city, some changes are required.
US proposal: VALIDTIME SELECT city, COUNT(*) FROM employee, salary WHERE employee.eno = salary.eno AND amount > 5000 GROUP BY city
We retain temporal upward compatibility in transaction time (i.e., the data that has not been deleted or updated), but specify sequenced valid semantics to get the history, via VALIDTIME.
UK proposal: WITH E1 AS (SELECT eno, city, EXPAND(When) AS EW FROM employee WHERE DeleteTime IS NULL), $1 AS (SELECT eno, EXPAND(When) AS EW FROM salary WHERE DeleteTime IS NULL AND amount > 5000) SELECT city, COUNT(*), PERIOD E2.When, E2.When AS When FROM El, TABLE(EI.EW) AS E2(When), $I, TABLE(S1.EW) AS S2(when), WHERE El.eno = Sl.eno AND E2.When = S2.When GROUP BY city, When NORMALIZE ON When
The user must explicitly select the currently stored information in transaction time ("WHERE D e l e t e T i m e IS NULL") and must EXPANDand NORMALIZE to compute the aggregate. 7. "When did we think that there were many (> 25) highly-paid employees in Tucson?"
US proposal: TRANSACTIONTIME SELECT COUNT(*) FROM employee, s a l a r y WHERE e m p l o y e e . e n o = s a l a r y . e n o AND amount > 5000 AND city = 'Tucson' GROUP BY city HAVING COUNT(*) > 25
TR~NSACTIONTIMEspecifies that we wish to look over past states of the table. VALIDTIME is not specified, as we want to know only about the information about current employees. The execution is on a row-by-row basis, and is independent of both the valid time and transaction time granularities.
Transitioning Temporal Support in TSQL2 to SQL3
191
UK proposal: WITH E1 AS (SELECT eno, EXPAND(WhenP) AS EW FKOM (SELECT eno, PF/%IOD(InsertTime, DeleteTime) AS WhenP FROM employee WHERE CURKENT_TIMESTAMP OVERLAPS When AND city = 'Tucson') AS ET), Sl AS (SELECT eno, EXPAND(WhenP) AS EW FROM (SELECT eno, PEKIOD(InsertTime, DeleteTime) AS WhenP FROM salary WHERE CURRENT_DATE OVERLAPS When AND amount > 5000) AS ET) SELECT COUNT(*), PERIOD E2.When, E2.When AS When FROM El, TABLE(EI.EW) AS E2(When), Sl, TABLE(SI.EW) AS S2(When) WHERE El.eno = Sl.eno AND E2.When = S2.When GROUP BY When HAVING COUNT(*) > 25 NORMALIZE 0N When The transaction time granularityis generally no coarser than a millisecond. Compared with the U S proposal, this query will expand into 3.10 I~ times the number of rows in the employee table.The salary table willbe similarly exploded, then a join on the two tables taken. It is not clear how to optimize this query, as the resultcould change at any millisecond:the aggregate
must be computed for each millisecond. It is doubtful that the UK query can even be computed with currently known query optimization/evaluation technology.
8
Summary
In this paper, we first outlined several desirable features of SQL/Temporal relative to SQL3: upward compatibility, temporal upward compatibility, and sequenced semantics. A series of four levels of increasing functionality was elaborated. The specific syntactic additions were outlined and examples given to illustrate these constructs. The extensions involve (a) the use of the VALIDTIME and TRANSACTIONTINEreserved words, to indicate valid-time, resp. transactiontime, support (in the case of schema specification statements) and sequenced semantics (in the case of queries, modifications, views, cursors, assertions and constraints), (b) the use of the NONSEQUENCEDreserved word for nonsequenced semantics, and (c) the use of a period expression to temporally scope sequenced and nonsequenced queries, modifications, views, cursors, constraints, and assertions. In the change proposals now before the SQL3 committees 14, 15, we provide a formal semantics, in terms of the formal semantics of SQL3, that satisfied the sequenced semantics correspondence between temporal queries and
192
Richard T. Snodgrass et al.
snapshot queries, and also provide the semantics for nonsequenced queries. In those change proposals we also list alternative implementation approaches which vary in the degree of implementation difficulty and the achievable performance. The implementation alternatives all compute the result by manipulating periods, and thus their performance is independent of the granularity of the underlying tables. We also introduced tables with transaction-time support, sequenced transaction semantics, nonsequenced transaction semantics, scoping on transaction time via an optional period expression, and modification semantics. The specific syntactic additions were outlined and examples given to illustrate these constructs. We end by listing some of the advantages of the approach espoused here. Upward compatibility is assured, permitting existing constructs to operate exactly as before. Only three new reserved words, NONSEOUENCED, VALIDTIME, and TKANSACTIONTIME, are required. Satisfactionof temporal upward compatibility ensures that existing applications do not break when tables without temporal support have such support added. - The availabilityof sequenced semantics ensures that temporal queries, modifications,views, assertions,and constraints are easy to formalize, write and implement. Nonsequenced semantics permits tables with temporal support to be converted to tables without such support, with explicit timestamp columns, and for temporal support to be added to tables, even within a query. A simple period expression permits the temporal scope to be specified. - The transaction-time extensions are compatible with, and orthogonal to, those for valid time. - A public-domain prototype 16 demonstrates the practical viability of the proposed constructs. The quick tour was validated on this prototype. -
-
-
-
-
We note that none of these benefits accrue from the UK proposal. Acknowledgments The inspiration for the constructs described here and proposed for incorporation into SQL/Temporal is the TSQL2 language. The participation of the TSQL2 Language Design Committee, which included Ilsoo Ahn, Gad Ariav, Don S. Batory, James Clifford, Curtis E. Dyreson, Ramez Elmasri, Fabio Grandi, Wolfgang Ks Nick Kline, Krishna Kulkarni, T.Y. Cliff Leung, Nikos Lorentzos, John F. Roddick, Arie Segev, Michael D. Soo and Surynarayana M. Sripada, was critical. David Toman provided helpful comments on a previous draft. We also appreciate the extensive feedback from the ANSI and ISO SQL3 committees, which helped shape the specifics of this proposal.
Transitioning Temporal Support in TSQL2 to SQL3
193
This research was supported in part by the National Science Foundation through grants ISI-9202244 and IRI-9632569, by grants from IBM, the AT&T Foundation, and DuPont, by the Danish Technical and Natural Science Research Councils through grants 9700780 and 9400911, respectively, and by the CHOROCHRONOS project, funded by the European Commission DG XII Science, Research and Development, as a Networks Activity of the Training and Mobility of Researchers Programme, contract no. FMRX-CT96-0056.
References 1. Balr, J., M. B6hlen, C.S. Jensen, and R.T. Snodgrass, "Notions of Upward Compatibility of Temporal Query Languages," Business Informatics (in Cerman, Wirtschaftsinformatik) 39(1):25-34, February 1997. 2. B6hlen, M. H., C. S. Jensen and 1%. T. Snodgrass,. "Evaluating the Completeness of TSQL2," in Proceedings of the VLDB International Workshop on Temporal Databases. Ed. J. Clifford and A. Tuzhilin. Springer Verlag, September 1995, pp. 153-172. 3. BShlen, M. H. and C. S. Jensen. Seamless Integration of Time into SQL. Technical Report R-962049, Aalborg University, Department of Computer Science, Denmark, December 1996. 4. Gadia, S. K. "A Homogeneous Relational Model and Query Languages for Temporal Databases." ACM Transactions on Database Systems 13(4):418-448, December 1988. 5. Jackson, M. A. System Development. Prentice-Hall International Series in Computer Science. Prentice-Hall International, Inc., 1983. 6. Jensen, C. S. and R. Snodgrass, "Temporal Specialization and Generalization." IEEE Transactions on Knowledge and Data Engineering 6(6):954-974, December 1994. 7. Jensen, C. S., J. Clifford, R. Elmasri, S. K. Gadia, P. Hayes and S. Jajodia (eds). "A Glossary of Temporal Database Concepts." ACM SIGMOD Record 23(1):52-64, March 1994. 8. Melton, J. (ed.) SQL/Temporal. July, 1997. (ISO/IEC JTC 1/SC 21/WG 3 DBLLGW-013.) 9. Pissinou, N., R. T. Snodgrass, R. Elmasri, I. S. Mumick, M. T. ()zsu, B. Pernici, A. Segev, and B. Theodoulidis, "Towards an Infrastructure for Temporal Databases: Report of an Invitational ARPA/NSF Workshop," SIGMOD Record 23(1):35-51, March, 1994. 10. Snodgrass, R.T., I. Ahn, G. Ariav, D.S. Batory, J. Clifford, C.E. Dyreson, R. Elmasri, F. Grandi, C.S. Jensen, W. K~ifer, N. Kline, K. Kulkarni, T.Y.C. Leung, N. Lorentzos, J.F. Roddick, A. Segev, M.D. Soo, and S.M. Sripada. "TSQL2 Language Specification," ACM SIGMOD Record 23(1):65-86, March, 1994. 11. Snodgrass, R. T. and H. Kucera. Rationale for Temporal Support in SQL3. 1994. (ISO/IEC JTC1/SC21/WG3 DBL SOU-177, SQL/MM SOU-02.) 12. Snodgrass, R. T., K. Kulkarni, H. Kucera and N. Mattos. Proposal for a new SQL Part--Temporal. 1994. (ISO/IEC JTC1/SC21/WG3 DBL RIO-75, X3H2-94-481.) 13. Snodgrass, R. T. (editor), Ilsoo Ahn, Gad Ariav, Don Batory, James Clifford, Curtis E. Dyreson, Ramez Elmasri, Fabio Grandi, Christian S. Jensen, Wolfgang K/ifer, Nick Kline, Krishna Kulkarni, T. Y. Cliff Leung, Nikos Lorentzos, John F. Roddick, Arie Segev, Michael D. Soo and Suryanarayana M. Sripada. The Temporal Query Language TSQL2. Kluwer Academic Pub., 1995.
194
Richard T. Snodgrass et al.
14. Snodgrass, R. T., M. H. BShlen, C. S. Jensen and A. Steiner. Adding Valid Time to SQL/Temporal, change proposal, ANSI X3H2-96-501r2, ISO/IEC JTC 1/SC 21/WG 3 DBL-MAD-146r2, November 1996, 77 pages. At URL: (versioncurrent November 21, 1996).
15. Snodgrass, R. T., M. H. BShlen, C. S. Jensen and A. Steiner. Adding 7~ansaction Time to SQL/Tempo~l, change proposal, ANSI X3H2-96-502r2, ISO/IEC JTC1/SC21/WG3 DBL MAD-147r2, November 1996, 47 pages. At URL: (versioncurrent November 21, 1996).
16. Steiner, A. and M. H. BShlen. The TimeDB Temporal Database Prototype, Version 1.07, November 1996. At URL: or at URL:
timecenter/TimeDB, t a r . gz> (version current March 26, 1997). 17. Tsotras, V. J. and A. Kumar. "Temporal Database Bibliography Update," ACM SIGMOD R~ord 25(1):41-51, March, 1996. 18. UK SQL Committee, Expanded Table Operations. 1 9 9 6 . (ISO/IEC JTC1/SC21/WG3 DBL MCI-67) 19. Yourdon, E. Managing the System Lie Cycle. Yourdon Press, 1982.
Valid Time and Transaction Time Proposals: Language Design Aspects Hugh Darwen IBM United Kingdom Limited PO Box 31 Warwick CV34 5JL England [email protected]
Abstract. Several proposals (such as 5, 6) have been presented to ISO for consideration as temporal extensions to SQL. These proposals are based on approaches to valid time and transaction time, wherein the regular syntax for invocation of SQL operators on tables is interpreted in a special manner, based on the existence of "hidden" timestarnps. Concerns about such an approach, with reference to these specific proposals, have been expressed (8). Those concerns have been responded to (4). We further clarify our concerns and align them with stated language design principles that we believe to be generally agreed upon. We respond to 4 in the context of these principles. Although this discussion is taking place in the specific context of proposals for international standardization, we believe its import is of wider concern than that, meriting the attention of the temporal database community in general. Furthermore, 5 and 6 appear to be the most precise specifications to date to be based on the approach in question; for that reason, we invite people who are not interested in standardization to examine the wider issues that might be emerging from discussions based on those proposals.
1 Introduction We start by proposing a list of nine principles o f good language design. Then we comment on the approach taken in 5 and 6 in the light o f these principles. Our comments take the form o f a list of deviations from these principles, observed in 5 and 6.
2 Language Design Principles The principles that follow are offered without reference to any definitive text. The author believes that they are so well established that this attempt to express them in his own words should not cause any surprise or offence. In particular, the principles listed here were written without reference to 1, except for the use o f the term parsimony, o f whose prior use in 1 the present author was aware.
O. Etzion, S. Jajodia, S. Sripada (Eds.): Temporal Databases-Research and Practice LNCS 1399, pp. 195-210, 1998. 9 Springer-Verlag Berlin Heidelberg 1998
196
H. Darwen
1. Precise Specification First and foremost, every construct should be precisely specified, so that users can
accurately predict the effect that will be obtained from any particular use of that construct.
2. Encouragement of Good Practice The distinction between what is good practice and what is bad practice is sometimes disputable. A good language, therefore, should not seek to implement restrictions that would contravene Generality (see Principle Number 3) by effectively legislating against practice that might be perceived to be bad. However, the choice of non-primitive operators to be included in language l favours those which embrace and promote l's own concepts against those which might be perceived as violating l's Conceptual Integrity (Principle Number 9). 3. Generality To avoid undue proliferation of languages, a language seeks generality, to be applicable in a wide variety of situations. Generality is often achieved by being based on well known concepts that have been shown to offer completeness in some useful sense. Non-modal two-valued logic, arithmetic and the Turing machine are among such concepts.
4. Semantic Consistency The meaning of an expression is independent of the context in which that expression appears. A consequence of adherence to semantic consistency is that if some common expression is required to be used in more than one place, it can be written just once, assigned to some name, and subsequently referenced by use of that name.
5. Syntactic Consistency A counterpart, perhaps, of semantic consistency is syntactic consistency, whereby one meaning is always expressed the same way. For example, with reference in particular to SQL's facilities for associating names with expressions, it is not consistent to require name AS expression in one context and expression AS name in another; nor is it syntactically consistent to successfully terminate an inner transaction with RELEASE SAVEPOINT while the outermost transaction must be successfully terminated by COMMIT.
6. Orthogonality Where concepts are perceived and generally agreed to fit together cleanly without "interfering" with each other, the language is designed to honour that and does not include any constructs that contravene it. For example, as noted in connection with Principle Number 7, many languages embrace simultaneously the concepts of
Valid Time and Transeation Time Proposals: Language Design Aspects
197
type, value, variable and operator. Where they do so orthogonally, the following agreeable statements will hold true. A variable of any type can be def'med. Any value in a type can be assigned to a variable of that type. An invocation of an operator that results in a value of type t is permitted anywhere where a literal of type t is permitted, including in particular as an argument to some operator invocation. Operators on variables, such as assignment, are available for all variables, regardless of their type. 7. Parsimony Carefully chosen agreed concepts should be small in number. They should also be clearly distinct from one another. For example, the four concepts of type, value, variable and operator might be thought to provide a sufficient basis in many languages. Some languages dispense with variable (functional programming languages). Some, it might be argued, dispense with type (e.g., Rexx, Smalltalk).
8. Syntactic Substitution A language definition should start with a few judiciously chosen primitive operators, embracing the few chosen concepts. Subsequent development is, where possible, by defining new operators in terms of expressions using previously defined operators. Most importantly, syntactic substitution does not refer to an imprecise principle such as might be expressed as "A is something like, possibly very like B", where A is some proposed new syntax and B is some expression using previously defined operators. If A is close in meaning to B but cannot be specified by true syntactic substitution, then we have a situation that is disagreeable and probably unacceptable, in stark contrast to true syntactic substitution, which can be very agreeable and acceptable indeed.
9. Conceptual Integrity We would like to think of Conceptual Integrity as the sum of all of the foregoing, in the sense that adherence to our first eight principles will ensure adherence to this, the ninth. A few distinct concepts having been carefully chosen and agreed upon, they must be rigidly adhered to, without any deviation whatsoever, in the entire design of the language. No compromise is acceptable. For example a database language, having chosen the concepts of the theory known as The Relational Model of Data, adheres rigidly to the concept that all information is represented by attribute values in tuples in relations. In particular, it is not acceptable to sacrifice conceptual integrity in pursuit of simple syntax. For the record, 1 uses the term yardstick rather than principle, and lists the following "desirable properties:
198
H. Darwen Orthogonality: keep unrelated features unrelated. Generality: use an operation for many purposes. Parsimony: delete unneeded operations. Completeness: can the language describe all objects of interest? Similarity: make the language as suggestive as possible. Extensibility: make sure the language can grow. Openness: let the user 'escape' to use related tools."
We have interpreted Generality somewhat differently, so that it overlaps with 1's Generality and Completeness. We have not included Similarity, but there is overlap between this and our Syntactic Consistency. Extensibility is akin, we believe, to our Syntactic Substitution.
3 Deviations Observed in the Proposed Approach We proceed to discuss various aspects of 5 in the light of the foregoing Language Design Principles. Recall that 4 is the paper in which the authors of 5 (and 6) respond to concerns about their proposals that are expressed in 8.
1. Precise Specification Section 4.1 of 4 addresses 8's concern that the result of a beginning with VALIDTIME is not precisely specified in 5. For example, if it can be calculated that employee E1 worked on project J1 from June 1st to June 5th, an implementation is permitted to return any number of identical rows expressing that very thing; it is also permitted, for example, to return a row expressing that E1 worked on J1 from June 1st to June 3rd and another row expressing that E1 worked on Jl from June 2nd to June 5th, each of those any number of times, et cetera. It is not disputed, in 4, that the result is not precisely specified, but it is claimed that it is not desirable to specify it precisely. We stand by our position that, on the contrary, precise specification is not only desirable but required. Further, we fred that the arguments given in this section of 4 are either based on incorrect assumptions or not germane, as we now argue. First, 4 appears to interpreting 8's concern as insistence that every result be normalized. By "normalized" we mean what is also often referred to as "coalesced". A set of periods SP is said to be normalized if and only if for all pairs of distinct elements P1 and P2 in SP, P1 neither overlaps nor meets P2. If two rows are identical in all column values except for some period valued column, if might be possible to "coalesce" them into a single row if the two periods happen to overlap or meet. A table in which all such pairs of rows have been eliminated by iterative coalescing is said to be normalized. Although requiring normalization would be one way of making a precise specification, 8 does not explicitly request that particular specification.
Valid Time and Transcation Time Proposals: Language Design Aspects
199
Second, 4 states "There is no canonical way to represent a sequence of states, each of which may contain duplicates, with a set of period-stamped rows." (We assume that "multiset" was really intended, here.) We claim that this is not the case. One such canonical form is presented algorithmicaUy in 2, in a section headed NORMALIZE ALL. Third, whether some canonical form is available is in any case not the point at hand. The concern expressed in 8 is only that the specification is not precise, not that it fails to use some canonical form. Fourth, 4 claims that if normalization is required it can be explicitly requested. This is not the case. For the result to be normalized on its hidden valid lime support, it will first have to be converted to a table without valid time support, so that the timestamps of the valid time support now appear in a regular column, on which normalization can be effected. The result of the normalization is a table without valid time support; if this is now converted to a table with valid time support, the normalization we have taken the trouble to obtain might well be lost. . E n c o u r a g e m e n t o f G o o d Practice
Under 5's principle of Temporal Upward Compatibility, one is encouraged to modify existing "snapshot" tables by ALTER TABLE ... ADD VALIDTIME, a result of which is that such tables will subsequently retain deleted rows as a timestamped historical record. Commonly accepted good practice, on the other hand, would suggest certain decompositions. First, attributes that reasonably belong together under Boyce-Codd Normal Form (BCNF) in the snapshot table do not always reasonably belong together in the same table with valid time support added. For example, SALARY and ADDRESS, both being functionally dependent on EMPNO in the snapshot table EMP, reasonably occur together in that table. However, those two dependent attributes are likely to vary independently of each other over time, strongly suggesting that further decomposition is now necessary. Second, normally recommended practice would suggest keeping historical data and current data in separate tables, because the predicate for the historical data is not the same as the predicate for the current data. The predicate for historical data is of the general form "P during the period tl to t2", while that for current data is "P since tl, up until now" (in each case P stands for some predicate appropriate to a snapshot table). Note that typically the "during" predicate further implies "and not at the instant immediately preceding tl, nor at the instant immediately following t2. The "since" predicate similarly implies "and not at the instant immediately preceding tl", but does not imply "and not at the instant immediately following now". If we fail to separate the historical data from the current data, the predicate for the combination is an awkward disjunction in which t2 and now have
200
H. Darwen somehow to merged into a tingle placeholder, with inevitable loss of possibly important semantics. In 3 such problems do not arise unless users choose to make them for themselves. Consider, for example, the table representing the relationship of employees working on assigned projects. Its temporal version might be Has__Worked On Since(Employee, Project, Since), where the Since column shows the date on which the employee started to work on the project. The historical record of assignments might then be Worked On During(Employee, Project, During), where During is a period-valued column indicating the start and end dates of an assignment. The view that includes both the historical data and the current data can be obtained, if needed, via the following expression: SELECT * FROM Worked On During UNION SELECT S.*, PERIOD(Since,CURRENT_DATE) AS During FROM Has_Worked On Since AS S In contrast, 5 actually makes it quite difficult for users to follow the recommendation illustrated by the foregoing "employees and projects" example. In section 3.6 of 4 it is mentioned, correctly, that 8 "glosses over the problems of ... moving rows from one table to the other ...". We conjecture that 5's proposed extensions to UPDATE and DELETE could be replaced by similar mechanisms involving the several tables of an appropriate decomposition. In any case, we need to be convinced that SQL3 triggers are insufficient for the purpose at hand here, as these can be used to make implicit changes to one table consequential on explicit changes to another. We have one fmal remark to make under the principle of encouragement of good practice. We make it gently, we hope. It is in response to the point made in section 3.8 of 4, following an illustration of how to record current state in a table that also includes history: "If the user instead wished to use a HasWorked On Since table, that is perfectly fine. Valid-time support is entirely optional ...". In its general form, this point is frequently advanced by software developers in counterargument to concerns such as those expressed in 8. This is the counterargument that says that users who don't like the offered solution are welcome not to use it if they prefer some more long-winded solution. Such an argument appeals to what is known in some circles as the Groucho Principle, an analogy with Groucho Marx's joke: "Doctor, doctor, please help me. Every time I hit myself on the head with this hammer, it hurts." "Then don't do that!" (or some such words). We do not deny that the Groucho Principle might occasionally be an appropriate crutch. Usually, though, our response, as here, is:
Valid Time and Transcation Time Proposals: Language Design Aspects
201
What about the people who have to teach it? Theycan't just ignore it. What about the people who have to make it, who have to maintain it from release to release, who have to write the documentation? They can't just ignore it. Why should those who go in for "good" practice be penalized, because of all the effort that their vendors are putting in to support the others? In other words, we think it would be better all round for the patient not to be in possession of a hammer at all. . Generality Several observers, including the authors of 8, have remarked that proposals such as those in 5 appear to offer a method of reasoning over intervals that is unnaturally constrained to intervals in time, to exactly one such temporal element per table, and to tables that contain at least one column in addition to the temporal element. The remark is made only in passing here, as it is not disputed that the problem is addressed by the inclusion of explicit operators in SQL/Temporal (3), such as NORMALIZE, and support for period types with non-temporal element types, such as PERIOD(INTEGER). . S e m a n t i c Consistency
(To be fair, we should mention that the point made in this section has been disputed by one of the authors of 5, though the reason given was not clear to us.) Consider the example referred to in 4: SELECT Name FROM Employee WHERE Name NOT IN ( SELECT Manager FROM Employee ) If Employee is a table with valid time support, under 5, we can also write VALIDTIME SELECT Name FROM Employee WHERE Name NOT IN ( SELECT Manager FROM Employee ) Now consider the case where, under the principle of semantic consistency, we replace the invocation of the built-in operator NOT IN by an invocation of the user-defined function NOT A MGR, defmed thus (in syntax def'med in SQL/PSM, ISO/IEC 9075-4:1996 and in the current draft of SQL3):
202
H. Darwen
CREATE FUNCTION NOT A MGR ( Name VARCHAR(30) ) RETURNS BOOLEAN RETURN Name NOT IN ( SELECT Manager FROM Employee ); The snapshot version of the "employees who are not managers" query can now be written thus: SELECT Name FROM Employee WHERE NOT A MGR ( Name ) However, if Employee is a table with valid time support, we cannot use VALIDTIME SELECT Name FROM Employee WHERE NOT- A MGR ( Name ) as a replacement for VALIDTIME SELECT Name FROM Employee WHERE Name NOT IN ( SELECT Manager FROM Employee ) because the invocation of the function NOT A MGR will always be evaluated against the current state of Employee, whereas the invocation of the built-in operator NOT IN is evaluated under sequenced semantics. The reason why every invocation of NOT A MGR is evaluated against the current state is that the query included in the RETURN statement that is the body of that function is an outermost query that does not begin with the word VALIDTIME. The breach of semantic consistency we have observed here would, we contend, certainly astonish and probably dismay anyone using an implementation of 5. Even the use of SQL views appears to contravene semantic consistency under 5's proposals. Continuing with the current example, one might create a view whose result is a one-column table of names of employees who are managers: CREATE VIEW Manager AS SELECT Manager FROM Employee (Please excuse the use of Manager as both a table name and a column name, arising from a desire to be consistent with two things at the same time.) This
Valid Time and Transcation Time Proposals: Language Design Aspects
203
allows the following snapshot query to replace our original snapshot query to discover the names of employees who are not managers: SELECT Name FROM Employee WHERE Name NOT IN Manager However, we cannot by the same token replace the VALIDTIME version of this with VALIDTIME SELECT Name FROM Employee WHERE Name NOT IN ( TABLE Manager ) because the view Manager still returns the names of employees who are currently managers. In fact, this example is a syntax error under 5, precisely because the view Manager is not a table with valid time support. If one wanted to use views in the same way, in the VALIDTIME query as well as in the snapshot query, one would have to create a second view, such as: CREATE VIEW ManagerVT AS VALIDTIME SELECT Manager FROM Employee and write VALIDTIME SELECT Name FROM Employee WHERE Name NOT IN ( TABLE ManagerVT ) Similarly unfortunate consequences can be observed in connection with 6's proposal to add TRANSACTIONTIME syntax, just like VALIDTIME. The present author feels that these observations might well give rise to reconsideration of the "Possible Way Forward" suggested in Section 6.0 of 8, as that suggestion would appear to suffer from the same problems. The suggestion in question was to store timestamps in a regular column, sacrificing Temporal Upward Compatibility in its strict interpretation. Then, to express valid-time queries, instead of just writing the word VALIDTIME followed by the desired query, one would write VALIDTIME(CN), where CN specifies the column name to be used for the timestamps in every table, input, intermediate and output, involved in the query. . Syntactic Consistency
The new syntax proposed in 5 and 6 does not suffer from inconsistency within itself, but we do observe an important inconsistency when it is considered in conjunction with the language (SQL) that it is proposing to extend. In SQL, the
204
H. Darwen syntax for accessing data in a table is the column reference, taking advantage of the table's column names. In 5, for example, new syntax is proposed for the same purpose, namely, VALIDTIME(CN), where CN is a correlation name. Because temporal information is not stored in a regular column there is no column name for it, so VALIDTIME(CN) has to used instead of the CN.VALIDTIME that would be the regular SQL way of accessing the data if it were stored in a column named VALIDTIME.
It might be counter-argued that the phenomenon we observe here is not really a breach of syntactic consistency, because it would obviously be inappropriate to use the syntax of column reference to access data that is not in a column. In that case we might have to agree, but we would just observe that the new syntax, for something very similar to referencing a column, is a consequence of breaches of certain other Principles (specifically, Number 7, Parsimony and Number 9, Conceptual Integrity). .
Orthogonality
Section 3 of 8 shows that the concept of rows and tables with valid time support is not orthogonal to existing SQL concepts. Some of the points made in that section are disputed in 4, but these disputations contain incorrect statements and are thus invalid. For example, it is claimed that the two assignments in the example shown in section 3.3.1 of 8 assign the same value to the variable T. The two expressions assigned are ( SELECT C, I FROM T ) and ( VALIDTIME SELECT C, I FROM VT ). The second of these is an expression that would be legal syntax if 5's proposals were adopted into the language. Because it returns a table with valid time support, it cannot possibly return the same value as the first, an expression that is legal in SQL:1992 and in several (probably all) predecessors of that language. All tables in SQL:1992 are tables without valid time support. Actually, whether the two expressions return the same value or not was not our point in 8, so we will not discuss this matter any further here. Our point was that they are patently expressions of different types. Thus, if the variable T is assigned the first expression, a subsequent VALIDTIME SELECT * FROM TABLE ( T ) should be a syntax error (because 5 proposes a syntax rule to require tables accessed by a VALIDTIME query to be tables with valid time support). This syntax error would be in keeping with SQL's existing concept of strong typing accompanied by static type checking. But it cannot be a syntax error if it is not known until m - t i m e whether the table assigned to T is one with valid time support. (There is nothing in 5 to address this problem, either at compile-time or at run-time.) Similar remarks apply to sections 3.2, 3.3.2 and 3.3.3 of 4.
Valid Time and Transcation Time Proposals: Language Design Aspects
205
A further contravention of orthogonality can be observed if one considers some hypothetical additional operator that might be proposed as an extension to SQL's syntax. For the sake of illustration, we consider the clause REPLICATE BY n AS column-name, where n is a non-negative integer. We suppose that the effect of such a clause, appended to an SQL QS, is, for each row r in the result of QS, to generate n rows constructed from r by addition of a column of the specified name, with each integer value in the range 1 to n appearing in that column in exactly one of the generated rows. The following example shows how this operator could be specified by substitution. Suppose table T contains an integer column N. Consider SELECT * FROM T REPLICATE BY N AS I. This is the same as SELECT * FROM T, ( WITH TABLE_NUM AS (VALUES(l) AS TN(I) UNION SELECT TN.I+I AS I FROM TABLE NUM AS TN WHERE TN.I < T.N ) SELECT * FROM TABLE NUM ) AS TN The second table in the outer FROM clause here is a recursive query, correlated with the fwst table, T, delivering a table of degree 1 whose column is of type INTEGER. The value T.N determines the number of rows in TABLE NUM, whose I values range from 1 to T.N. Thus, each row in T is joined in turn, loosely speaking, with the first T.N ordinal numbers. Rows in T in which T.N is zero or negative are not represented in the result of the*join. However, the given substitution would not work in the case of, for example, VALIDTIME SELECT * FROM T REPLICATE BY N AS I, where T is now a table with valid time support, because the second of the two tables in the FROM clause, in the substitution, is not a table with valid time support as required in such cases under the proposals of 5. Some additional specification would have to be made to cater for the case where T is a valid-time table, where the given specification of TABLE__NUM is replaced by an expression yielding a valid-time version of TABLE_NUM. Further additional specifications would have to be made for any other such expression modifying constructs (e.g. TRANSACTIONTIME) that might be proposed. Consider also that, regardless of how such an extension as REPLICATE BY might be specified, an existing implementation would surely have to address as a special case the possibility of REPLICATE BY being evaluated in a VALIDTIME context.
206
H. Darwen
7. Parsimony Several concepts that would be brand new to SQL are proposed in 5 and 6: Rows and tables with valid time support (5). Rows and tables with transaction time support (6). Operators on expressions (i.e., ones that make the semantics of an expression vary according to contex0. Continuing from the previous point, the attempted application of a temporal logic in place of troth-valued logic when an SQL expression is thus operated on. Occurrence of a variable at the intersection of a row and a column (to accommodate "the moving point, now"). This is not actually proposed in 5 or 6 but certain support documentation and discussions have indicated that the authors of 5 and 6 would like to add this concept at some future time. The concept of rows and tables with valid time and/or transaction time support is so close to the existing concept of rows and tables without such support as to be clearly redundant and disagreeable under the principle of parsimony. Various consequences of this new concept are catalogued in 8, Section 3. That catalogue of consequences is unaffected by 4 because 4 does not make any substantive changes to what is proposed in 5 and 6; nor does 4 show any of the consequences described in 8 to be invalid conclusions or based on incorrect assumptions.
8. Syntactic Substitution It is claimed in 4 that the specifications in 5 and 6 are in fact just "syntactic sugar" for certain operations that can be expressed using SQL operators already defined. Now, it can be seen at a glance that 4 is not here referring to syntactic substitution, even though we would claim that syntactic substitution is precisely what is usually meant by those members of the language design community (especially those who participate in ISO/IEC JTC1/SC21/WG3 Database Languages) who sometimes use the term "syntactic sugar". One reason why this can be seen at a glance is that the very firs example of "syntactic sugar" in 4 is the claim that ALTER TABLE Employee ADD VALIDTIME PERIOD(DAY) is syntactic sugar for ALTER TABLE Employee ADD COLUMN VALIDTIME PERIOD(DAY). It is clear that the first is not a substitute for the second, because the first does not yield a table such that SELECT * FROM Employee has a column called VALIDTIME, while the second
Valid Time and TranscationTime Proposals: LanguageDesign Aspects
207
does. A casual reading of 5 will quickly reveal all sorts of other differences in the effects of these two statements. The other "syntactic sugar" claims in 4 are subject to the same remarks. If "syntactic sugar" was being used in the sense of "something like", then the observations are irrelevant, to whatever extent anybody might agree with "something like" in each particular case. If it is being used in the agreeable sense of syntactic substitution, the claims are patently incorrect. For an example of syntactic substitution, consider the use of the SQL/Temporal EXPAND operator in a query of the general form SELECT FROM
R.C, XP.D R, TABLE ( EXPAND (R.P) ) AS XP(D)
In SQL3 without SQL/Temporal's EXPAND function and without SQUFoundation's TABLE syntax for "unnesting" nested tables, this could be expressed as SELECT DISTINCT R.C, XP.D FROM R, ( WITH x AS ( SELECT BEGIN(R.P) AS D, PERIOD (BEGIN(R.P), LAST(R.P) AS P FROM VALUES(R.P) AS R(P) WHERE DURATION(R.P) > 0 UNION SELECT NEXT(D) AS D, PERIOD (BEGIN(P), LAST(P) AS P FROM X WHERE DURATION(P) > 0 ) SELECT D FROM X ) AS XP 9. Conceptual Integrity The proposed new concept, rows and tables with valid time and/or transaction time support, is so close to the existing concept of rows and tables without such support that its differences from the existing concept would be perceived by many as a contravention of conceptual integrity. Requirements such as Temporal Upward Compatibility (disputed in 8), are not strong enough to warrant such a deviation. Temporal Upward Compatibility refers to the property of applications whereby they continue to work exactly as before after the addition of temporal information to the database; in particular, this includes the retention of historical records in a database that previously did not retain historical records. We dispute
208
H. Darwen the requirement, or at least the manner in which it is being pursued, by observing that SQL database administrators already have ways and means of protecting applications from addition of columns to existing tables. Further, if historical records are placed in new tables, then existing applications are obviously unaffected. It is claimed in 4 that the concept of special kinds of table is not at all new to SQL, and various categories mentioned in the current SQL3 specification are cited: "base table" versus "derived table", which is nothing more than a distinction between table variables and table values; "created table" versus "declared table", "global table" versus "local table", "subtable" versus "supertable", "temporary table" versus "permanent table", which are all distinctions between different kinds of table variable, not different kinds of table value; "grouped table" versus ungrouped table, where a grouped table is something that conceptually exists only during evaluation of a ; ordered table versus table with an implementation-dependent order, which is nothing to do with tables at all, but to do with the order in which a cursor ranges over the rows of a query result. In fact, there are no special kinds of table value in SQL. On the contrary, all and only the SQL operators on tables (SELECT, FROM, WHERE et cetera) are available on every table, and this concept is crucial. Furthermore, for all that there is a plethora of different kinds of table variable, any particular table value can equally well occupy any variable. We shall suggest, further, that 5 and 6 fail even to maintain integrity to their own declared new concepts. For example, it is claimed in Section 6.3 of 5 that a beginning with the word VALIDTIME is evaluated according to "sequenced" semantics (applying a temporal logic). As we show in point 5, below, this is not always the case. Finally, simplicity of syntax has been advanced by the proponents of 5 and 6 as the motivation for the chosen approach, apparently in justification for the various consequent contraventions of accepted language design principles. We have found no evidence in any of the background documentation (such as 7) that consideration had been given for the concerns expressed in 8, and in this present paper, when this approach was chosen.
4 Conclusion We have presented a set of proposed principles of language design. We have claimed that these principles are generally accepted as good ones in language design communities, such as, for example, the community involved in international standardization of SQL. We have observed extensive deviation from these principles in proposals, based on TSQL2, to provide temporal extensions to the ISO SQL
Valid Time and Transcation Time Proposals: Language Design Aspects
209
standard. We have stated that these observations are our justification our continuing opposition to those proposals. We invite others to consider these observations when considering their own positions on the proposals in question. Acknowledgments
The author thanks Mike Sykes and Chris Date for their critical reviews and suggestions.
References
1.
Bentley, J., Little Languages, in regular feature Programming Pearls, Communications of the ACM, Vol. 29, No. 8, August 1986.
.
Darwen, H., Planned UK Contributions to SQL/Temporal, December 17, 1996. (ISO/IEC JTC1/SC21/WG3 DBL MAD-221)
.
Melton, J. (ed), (ISO Worla'ng Draft) Temporal (SQL/Temporal), July, 1996. (ISO/IEC JTC1/SC21/WG3 DBL MCI-012)
.
Snodgrass, R.T., Response to MAD-220, January 1997. (ISO/IEC JTC 1/SC21/WG3 DBL MAD-245)
.
Snodgrass, R.T., M.H. B6hlen, C.S. Jensen and A. Steiner, Adding Valid Time to SQL/Temporal, 1996. (ISO/IEC JTC1/SC21/WG3 DBL MAD- 146r2)
.
Snodgrass, R.T., M.H. B6hlen, C.S. Jensen and A. Steiner, Adding Transaction Time to SQL/Temporal, 1996. (ISO/IEC JTC1/SC21/WG3 DBL MAD-147r2)
.
The TSQL2 Language Design Committee, The TSQL2 Language
Specification, September 1994. .
UK Response, On Proposals for Valid-Time and Transaction-Time Support, December 18, 1996. (ISO/IEC JTC1/SC21/WG3 DBL MAD-220)
References 2, 4, 5, 6 and 8 are to papers that have been presented to ISO, for discussion or to propose text for inclusion in Part 7 of SQL3, known as SQL/Temporal. They can be obtained via ftp from:
ftp://jerry.ece.umassd.edu/isowg3/dbl/MADdocs/madnnn.ps ftp://jerry.ece.umassd.edu/isowg3/dbl/BASEdocs/sql4hold/sql4-archivedtemporal.ps
210
H. Darwen
All of the ISO papers referenced here were tabled for presentation at the January 1997 meeting in Madrid, Spain, hence "MADdocs" and the "MAD-" prefix on paper numbers. As some kind of mnemonic, the reader may observe that the numerical order of these papers reflects the obvious sequence in which they were written: 146 and 147 made proposals for additional material in the SQL/Temporal base document (3); 220 responded to those proposals; 222 is not really part of the present discussion, but did outline ways in which one national body was interested in further developing the SQL/Temporal standard; 245 responded to 220.
Point-Based Temporal Extensions of SQL and Their Efficient Implementation David Toman* Department of Computer Science, University of Toronto Toronto, Ontario M5S 1A4, Canada david@cs, toronto, edu
A b s t r a c t . This chapter introduces a new approach to temporal extensions of SQL. In contrast with most of the current proposals we use single time points as references to time, while still achieving efficient query evaluation. The proposed language, SQL/TP, naturally extends the syntax of SQL/92 : it adds a single new data type that represents a linearly ordered universe of time instants. The semantics of the new language extends the standard SQL in the expected way: the new data type behaves identically to the existing types. We also eliminate or fix many of the problems connected with defining a precise semantics in interval-based languages. In addition we provide an efficient query evaluation procedure based on a compilation technique that translates SQL/TP queries to SQL/92. Therefore existing off-shelf database systems can be used as back-ends for managing temporal data.
1
Why yet another temporal extension of SQL?
After more t h a n a decade of research in the area temporal databases there is still no universal consensus on how temporal features should added to the standard relational model. Instead, there are dozens of mutually incompatible models and proposals. T h e more practical of these are often based on (often ad-hoc) extensions of existing relational languages, e.g., T Q U E L 17 and various temporal extensions of SQL: TSQL2 18, ATSQL2 5, and SQL/Wemporal 19, the current proposal of temporal extension of SQL3 to the ISO and ANSI standardization committees. T h e goal of this chapter is twofold: First, we point out severe problems comm o n to the majority of current proposals, namely representation dependent handling of temporal values and limited temporal dimensionality of the underlying temporal models. We argue t h a t these problems are inherently tied to the use of interval-valued temporal attributes and cause m a j o r problems when a precise semantics of the temporal query languages is to be defined. Second, we propose * The research was supported by a NATO/NSERC PDF fellowship, 1996-98. Extended abstract of this paper appeared in Proc. DOOD'97, Montreux, Switzerland, 1997.
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases - Research and Practice LNCS 1399, pp. 211-237, 1998. 9 Springer-Verlag Berlin Heidelberg 1998
212
David Toman
an alternative solution that avoids these problems. Our proposal is based on a clean separation of the syntax and semantics (what is stored in the database and how do we query it) from the underlying compact representation of the temporal information (how is it stored in the database). While our technique can be applied to a wide range of relational query languages, we have chosen standard SQL with its duplicate semantics and aggregation operators as the starting point of our proposal. This choice demonstrates that we are indeed interested in real-life query languages rather than in toy examples. The rest of the chapter is organized as follows: the remainder of this section exposes problems inherent to the current proposals. Section 2 formally introduces the temporal data model: abstract and concrete (interval-based) temporal databases, following the terminology introduced in 9. Section 3 defines the syntax and semantics of SQL/TP and gives examples of temporal queries. We also include a brief discussion of compatibility issues and migration of SQL queries to SQL/TP (Section 3.5). Section 4 provides foundations for the proposed compilation technique. The chapter is concluded with several open questions and directions of future research. Appendix A summarizes the BNF of the core SQL/TP language and Appendix B briefly compares SQL/TP with the SQL/Temporal proposal to the ISO/ANSI SQL3 committee. 1.1
C u r r e n t Proposals
Most of the current proposals recognize that storing temporal data as ordinary tuples leads directly to enormous space requirements: a tuple has to be repeated for every time instant at which the represented fact holds in the modeled reality. Instead, tuples are associated with compact encodings of a set of time instants (often called period of validity). The sets of time instants are commonly represented by (finite sets of) intervals 17,19, bitemporal elements 4, 14, or other fixed-dimensional products of intervals (hyper-rectangles). The syntax of the chosen encoding then provides a domain of values for temporal attributes, e.g., pairs of interval endpoints. Indeed, ATSQL 5 and SQL/Temporal 19 use the BEGIN and END keywords to extract the endpoints of intervals, PERIOD keyword to construct new interval timestamps, and Allen's interval algebra 2 operators to compare the timestamps. However, this is only possible if the user knows that the timestamps are encoded using intervals. Moreover, this approach leads to a tension between the syntax of the query languages and their intended semantics: the data model and the semantics of the languages are point-based 1 4, 9, while temporal attributes refer to the actual encoding for sets of time instants (e.g., interval endpoints). This conflict leads to several unpleasant surprises when precise semantics needs to be defined. Most importantly, it is easy to show examples of queries whose answers depend on the choice of the particular encoding rather than on the underlying meaning of the data; cf. Example 1 below. Moreover, it is extremely hard to avoid this behavior in an elegant way. In many cases 1 The truth is associated with individual time instants rather than with sets of instants (intervals).
Point-Based Temporal Extensions of SQL
213
uniqueness of answers can only be guaranteed by operational means, e.g., by prescribing a particular evaluation order. This in turn leads to a very complicated and cumbersome semantics (if one exists at all). The problems become even more apparent and critical when a temporal extension for query languages with duplicate semantics has to be defined. Consider the following situation: E x a m p l e 1 Let D be a temporal relation (or an answer to a temporal query) that represents the region in the figures below.
(1)
-lll
(2)
1
(3)
I
It is important to understand that all the three figures represent the same relation. However, it is also clear that we can distinguish (2) and (3) using a first-order query in, e.g., SQL/Temporal. We call such queries representation dependent. These queries cannot be given meaningful semantics in the pointbased temporal models. Moreover, even very simple queries, e.g., counting the number of regions along the axes, give different results depending on the particular representation. The common temporal query languages, hoping to avoid representation dependencies (among other reasons), try to mimic a point-based semantics using set operations on the encoded timestamps and a normalization procedure on temporal relations. In a single-dimensional case, the representation dependency problem can indeed be successfully evaded using coalescing 6. TSQL2's informal semantics (including many examples of queries in TSQL2) is implicitly based on this assumption 3, 18. In the rest of this section we argue that the situation in Example 1 naturally arises during temporal query evaluation and cannot be avoided in general. First, we show that a single temporal dimension is not sufficient to formulate general temporal queries. Consider the query "are there two distinct time instants when a given relation contains exactly the same tuples?" 1 and 21 have independently shown that this query cannot be formulated in first-order temporal logic. A direct corollary of this result is that this query cannot be expressed in any single-dimensional temporal relational algebra 2, Moreover, 21 shows that to express all first-order queries the number of temporal dimensions cannot be bounded by any constant. Therefore, unbounded number of temporal dimensions cannot be avoided during query evaluation even if the final result is a single-dimensional temporal relation or boolean. This fact, combined with the use of explicit interval-valued temporal attributes, leads directly to situations similar to Example 1. There are other reasons for including multiple temporal dimensions in a temporal database system, e.g., the need for representing both valid and transaction 2 A relational algebra over the universe of single-dimensional temporal relations.
214
David Toman
time 14. However, we would like to emphasize that the need for unbounded number of temporal dimensions originates from the inherent properties of firstorder queries alone, even if the temporal database and the results of the queries are single-dimensional. Second, there is no unique normal form based on coalescing in the case the number of temporal attributes (the temporal dimension) is greater than one. Now it is easy to see why the coalescing-based approaches fail to guarantee representation independence: To guarantee fixed size of tuples in temporal relations, region (1) in Example 1 has to be represented by a finite union of rectangles, e.g., using the representation (2) or (3) above. While both (2) and (3) are coalesced, they can still be distinguished by a first-order query with interval-valued temporal attributes. In addition, in many cases the user has no control over the representation of intermediate results since the coalescing is performed implicitly by the system. With two or more temporal dimensions in queries coalescing leads to serious problems: the user has no knowledge if region (1) in Example 1 is actually represented as (2) or (3), but the results of queries often depend on this information. 1.2
T h e P o i n t - b a s e d Proposal: S Q L / T P
The above problems are inherent all temporal query languages with temporal attributes ranging over intervals. Therefore our proposal follows a different path to avoid all of the above problems: we let temporal attributes in our language range over the domain of single time instants. Our approach is based on several recent results in the area of temporal and constraint query languages 1, 15,20,21. In addition we define a meaningful approach to duplicate semantics and aggregation, independent of the chosen encoding (emending 10, 11). In addition to simple and elegant syntax and semantics we propose an eft/cient query evaluation procedure for SQL/TP over compactly encoded temporal databases. While we mostly concentrate on efficient evaluation of temporal queries over an interval-based encoding of time, the design of SQL/TP allows the use of additional encodings for sets of time instants, e.g., the linear repeating points 23 for periodic events, without the need for new syntax and semantics. There are several other features of the proposal: - SQL/TP statements can be compiled to standard SQL/923 12; the translated queries can be evaluated using an off-shelf database system. This way we can build a SQL/TP front-end to an existing RDBMS and provide temporal capabilities without modifying the underlying database system itself. - SQL/TP can express all representation independent SQL/Temporal queries. Moreover, SQL/TP is first-order complete (in the sense of 8). The results in 1, 21 show that this is not the case for any of the temporal query languages 3 Other relational languages can be used as well, provided they have sufficient expressive power.
Point-Based Temporal Extensions of SQL
215
based on a fixed-dimensional temporal relational algebra, e.g., 7; this issue is not clear for TSQL2-derived languages 5,18, 19 due to the presence of explicit coercion operators that convert encoded temporal attributes to data attributes. - The proposal can easily be extended to support migration requirements 19 for upward temporal compatibility with SQL. While SQL/TP itself does not literally follow these requirements, the compatibility can be easily achieved using a very simple syntactic manipulation of the source queries and adding tags to distinguish the particular compatibility modes, cf. Section 3.5. Before we start the technical part of the chapter, we would like to reiterate (to avoid any misunderstanding) that we are interested in intervals to store sets on time instants. This is the main difference between our approach and approaches taken by various interval logics 2, where intervals represent points in a two-dimensional (half-)space. However, due to the natural multidimensional character of SQL/TP, we can represent the true intervals using pairs temporal attributes. 1.3
S u m m a r y of C o n t r i b u t i o n s
The three main technical contributions of our proposal are: (1) the definition of a representation-independent temporal extension of SQL: we decouple the syntax and semantics of the language from the underlying data representation. We support both the set- and the duplicate-based semantics of SQL (including aggregation), (2) a query compilation technique for this extension that allows SQL/TP queries to be efficiently evaluated using a standard RDBMS, and (3) the definition of nouveau normalization technique that facilitates evaluation of temporal queries over an interval-based encoding of timestamps. We would also like to note that a naive direct translation of time instants to singleton intervals 16 fails as an efficient query evaluation technique: it causes an exponential blowup in complexity.
2
The Data Model for Temporal Databases
We start with the definition of the underlying data model: the domain of time is viewed as a discrete4 countably infinite linearly ordered set without endpoints (e.g., the integers). The individual elements of the set represent the actual time instants while the linear order represents the progression of time. The actual granularity of time is not important in our proposal 5. Besides the data type for time instants we also use all the other data types defined in standard SQL: strings, integers, floats, etc. As usual, these data types do not have an a-priori assigned interpretation. We summarily refer to those data types as the uninterpreted constants. 4 A dense linearly ordered time can be used with only a minor adjustment. 5 For our purposes any fixed granularity will do.
216
David Toman
The relationships between time instants and uninterpreted constants are captured in a finite set of temporal relations stored in the database. Following the terminology of 9 we distinguish the abstract temporal databases from the concrete temporal databases:
Definition 2 (Abstract Temporal Database) The signature of a predicate symbol R is the tuple (al : t l , . . . , ak : tk) where ai are distinct attribute names, ti the corresponding attribute types, and k the arity of R. Attributes of type time are the temporal attributes, the remaining attributes are the data attributes. A database schema is a finite set of relational symbols R1, . . . , Rk paired with their signatures. A n abstract temporal database is a set of tables defined by a database schema.
In general we do not restrict the cardinality of abstract temporal tables: we allow infinite tables in general. However, in order to define meaningful operations on the tables, we require that the number of occurrences (duplicates) is finite for every distinct tuple. E x a m p l e 3 In the rest of the chapter we use an abstract temporal database with the schema {indep(Name, Year)} as a running example. The particular instance of the indep relation we use in our examples captures independence of countries in Central Europe:
indep Czech Kingdom 1198
Name Poland . . .
Poland Poland Poland Poland
Czech Republic 1995
,. 9
1794 1918
Czech Kingdom 1620 Czechoslovakia 1918 Czechoslovakia 1938 Czechoslovakia 1945 Czechoslovakia 1992
~lovakia
1940
...
.~
Slovakia Slovakia
1944 1993
.,.
. . . ..
.~
We do not impose any restrictions on the number of temporal attributes in relations (unlike, e.g., TSQL2 18). Indeed, in general we may want to record relationships between different time instants as well. While we may want to restrict the users from creating such multi-dimensional tables, we need this feature in the later sections to translate SQL/TP queries to SQL/92. The abstract temporal databases provide a natural data model for representation and querying temporal data. However, it would be impractical (and often impossible) to store the temporal databases as plain bags of their tuples: a particular tuple is often related to a large and possibly infinite set of time instants. Rather than storing all these tuples one by one, we use a compact encoding of sets of time instants. The choice of a particular encoding--in our case the interval-based encoding~efines the class of concrete temporal databases:
Point-Based Temporal Extensions of SQL
217
D e f i n i t i o n 4 ( C o n c r e t e T e m p o r a l D a t a b a s e ) Let R be a relational symbol with signature E. A concrete signature corresponding to E is defined as a tuple of the attributes that contains (1) a for every data attribute a E E and (P) train and tmax for every temporal attribute t E E. The attributes train and tmax denote endpoints of intervals. We denote the concrete signature of R by E. A concrete temporal database schema corresponding to a given abstract database schema is a set of relation symbols and their concrete signatures derived from the signatures in the abstract database schema6. A concrete temporal database is a set of finite relations defined by a concrete database schema. To capture the relationship between an abstract and a concrete temporal database we define a semantic mapping that maps a concrete temporal database to its meaning--an abstract temporal database. The meaning of a single concrete tuple x = (tmin, tmax, a l , . . . ,ak) is a set of tuples IxI = {(t, a l , . . . ,ak) : train < t < tmax}; analogously for tuples with multiple temporal attributes. The meaning IIRII of a concrete relation R is the duplicate preserving union of Ilxll for all concrete x E R. We say that R encodes IIRII. We extend the I1.11to concrete temporal databases in the standard fashion. The encodes function also defines a subset of the abstract temporal databases that can be encoded using concrete temporal databases. We call this subset the finitary temporal databases. Note that the encoding is not unique and thus two or more distinct concrete temporal databases can encode the same abstract temporal database (cf. Example 1). We call such pairs of concrete temporal databases (ll.ll-)equivalent. E x a m p l e 5 The database instance from Example 3 is infinite. However, it is finitary and can be encoded using the following concrete temporal database: indep Name Yearmin...Yearmax Czech Kingdom 1198 ... 1620 Czechoslovakia 1918 ... 1938 Czechoslovakia 1945 ... 1992 Czech Republic 1993 ... cc
Slovakia Slovakia
1940.. 1 I 1993.." 944
Poland Poland Poland
1025.. 1794 1918.. 1 1945.. 9oo38
I
All queries in the rest of the chapter are evaluated over this database while preserving answers with respect to the original relation in Example 3.
3
The Language S Q L / T P
In this section we define the syntax and semantics of SQL/TP. This includes the data definition, data query, and data manipulation parts of the language. In all three cases we show that S Q L / T P is a natural syntactic extension of SQL over 6 We use the same names for both the abstract and concrete relations. The actual meaning of the symbol is always clear from the context.
218
David Toman
the abstract temporal databases. Moreover, the proposed semantics of SQL/TP is essentially identical to the semantics of SQL (safely) extended to abstract temporal relations. 3.1
Data
Definition
Language
We start with the Data Definition Language: it is essentially identical to standard SQL/92: create table ( <signat> ) create view ( ) where is a table identifier and <signat> is a signature of the new table. For views the signature is derived from the signature of the expression (cf. Section 3.2). The only difference is that the temporal attributes are declared using a new data type time that supports modifiers that determine how the sets of time instants are stored in a concrete temporal table: using points: The time instants are stored as atomic values similarly to all other data types. This choice is suitable for representing single atomic events that happen at a specific time, e.g., when a particular tuple was inserted in the database. using bounded I unbounded intervals: Continuous sets of time instants associated with a particular data tuple are encoded using intervals. This encoding is suitable for representing durations of time. The bounded and unbounded keywords specify if the -oo and oo m a y be used as endpoints of intervals. This choice affects, what aggregate operations are allowed for that particular attribute; cf. Section 3.2.
It is important to understand that these modifiers affect only the way the table is stored, not the semantics of the queries (similarly to specifying, e.g., a sort order or a key for the table). In the future this list may grow to accommodate different was of encoding sets of time instants. The modifiers are the only place in SQL/TP where the syntax reflects the chosen encoding. The default modifier unbounded time is assumed for all temporal attributes unless explicitly stated otherwise. The following table defines how the modifiers interact with standard relational operators ("p" is a shorthand for p o i n t s , "b" for bounded i n t e r v a l s , and "u" for unbounded i n t e r v a l s ) : oppopppopbpopuboppbopbbopuuoppuopbuopul A p p p p b b p b
u
-
p
p
p
b
b
b
u
u
u
U
p
b
u
b
b
u
u
u
u
The interaction of temporal attributes in joins and selections is captured by their behavior in the N (intersection) operation, projection does not affect the remainders of the tuples.
Point-Based Temporal Extensions of SQL
219
E x a m p l e 6 The table indep in Example 3 is created as follows: create table indep (name char(20),
year time using unbounded intervals) In the rest of the chapter we discuss only the interval-based encoding; encoding time instants using points does not introduce any problems over the traditional data types. In addition we assume the time instants can be represented by integers (using a fixed granularity) and we allow integer-like operations on the new data type to reduce the amount of superfluous syntax. 3.2
The Query Language
For sake of simplicity we discuss only a syntactic subset of full SQL/TP. This choice does not affect the generality of our proposal: it is an easy exercise to show that the proposed fragment forms a (first-order) complete query language 8. Moreover, all representation independent SQL/Temporal queries, including queries with aggregation and universal subqueries, can be equivalently formulated in this fragment. S y n t a x . The syntactic subset of SQL/TP uses two basic syntactic constructs: Select block. Similarly to the standard SQL the select block is the main building block of the query language. It has the usual form select <slist> from
where
group by
where < s l i s t > is a list of attribute identifiers, constants, and (aggregate) expressions with the possibility of renaming the output column using <sexp> as . Columns defined by expressions or aggregation have to be given a name in this way, < f l i s t > is a sequence of relation identifiers or subqueries, again with the usual possibility of assigning correlation names, is a selection condition built from atomic conditions using boolean connectives. The atomic conditions depend on the data types of the involved attributes: in the case of temporal attributes we allow conditions of the form op + C where op E {<, ~, =, >, >}, and C a constant denoting a length of a time period, and < g l i s t > is a list of attribute identifiers that specifies how the result of the select block is grouped. The usual SQL rules that govern the grouping operations apply here as well. We extend the definition of signature to SQL/TP expressions: The signature of an expression is tuple of attribute names in the result paired with the corresponding data types (including modifiers for the temporal types).
220
David Toman
Set Operations. Besides nesting queries in the from clause of the select block we can combine the individual select blocks using set operations as follows: ( <exp> ) <setop> ( <exp> ) where <setop> is one of the union (set union with duplicate elimination), union a l l (additive union), except (set difference with duplicate elimination), except a l l (monus), i n t e r s e c t (set intersection with duplicate elimination), and i n t e r s e c t a l l (duplicate preserving intersection). We require the signatures of both the expressions to match 7. The resulting signature is the common signature of the expressions involved in the operation. The proposed syntax omits two common SQL constructs: subqueries nested in the where clause and the having clause. Both these constructs can be expressed in the presented fragment using nesting in the from clause of the select block and therefore can be considered to be only syntactic sugar. To achieve signature compatibility for temporal attributes we allow the use of a special constant pseudo-relation t r u e ( t : time) true for all elements of the temporal domain. This relation allows us to pad the attribute lists involved in the set operations (cf. Section 3.3) and to formulate queries that involve, e.g., the complementation over the temporal domain. Semantics. SQL/TP is essentially SQL/92 extended with an additional data type time. The main feature of this proposal is that we can use the standard SQL-like semantics over the class of the abstract temporal databases. This way we completely avoid all problems connected with representation dependencies while maintaining compatibility with SQL. Moreover, changes in the encoding (the physical representation) do not affect the syntax and semantics of queries. However, we have to be careful when extending relational operations to infinite tables: we have to ensure that we never produce tables with infinite duplicates of a single tuple. It is easy to see that all the relational operations, with the exception of duplicate preserving projection, meet this requirement. However, the duplicate-preserving projection can produce such tables, e.g.: {("Poland", 1945, oc)}
I,.l, {("Poland",n) : n > 1945} ~1 {("Poland"),..., ("Poland"),...}
The result of the projection contains infinite duplication of the tuple ("Poland"). This cannot be allowed as other relational operators, e.g., the bag difference, are not well defined over such tables. Therefore we restrict the use of duplicatepreserving projection to attributes of bounded types, i.e., bounded intervals, time points, and data types. 7 SQL only requires the types to match. However, we require both the names of the attributes and their types to match. This is not a restriction as the renaming can be conveniently done within the select clauses.
Point-Based Temporal Extensions of SQL
221
C l o s u r e o v e r I n t e r v a l - b a s e d C o n c r e t e Databases. While the above restriction guarantees a well defined semantics, it is too weak to guarantee closure of SQL/TP queries over the chosen class of concrete temporal databases. The main source of problems are the order dependencies among temporal attributes. Consider the following example: E x a m p l e 7 It is easy to find SQL/TP expressions that do not preserve closure over the class of finitary abstract temporal databases, consider the expression: Q: select
from where
rl.name as name, rl.year as tl, r2.year as t2
indep rl, indep r2 rl.name = r2.name and rl.year < r2.year
The attributes t l and t2 are correlated by an inequality t l < t2 in the result of the query: {("Poland", 1945, 1946), ("Poland", 1945, 1947),.. ,("Poland", 1945, 1950),... ("Poland", 1946, 1947),.. ,("Poland", 1946, 1950),... ("Poland", 1949, 1950),...} Obviously the triangle-like result can not be described by a product of intervals. To avoid this problem we use the notion of attribute independence. Rather than a semantic definition of attribute independence 11 we use a syntactic inference system to guarantee attribute independence in a SQL/TP expression: D e f i n i t i o n 8 ( A t t r i b u t e I n d e p e n d e n c e ) Let tl and t2 be two temporal attributes in the signature of a S Q L / T P expression exp. We say that tl and t2 are independent in exp if 1. exp is a base relation, 2. exp is a select block, tl and t2 are names of t~ and t~2 assigned in the select clause, t~ and t~2 are independent in all expressions in the from list, and an order relationship between t~ and t~2 is not implied by the where clause. 3. exp is ( e l ) setop (e2) and tl and t2 are independent in both el and e2.
In addition all the data attributes (and point temporal attributes) are mutually independent. The inference system is sufficient to infer independence of attributes. We could also analyze the compositions of the selection conditions on temporal attributes and check for tautologies. We have chosen not to pursue this direction in the current version of the proposal for sake of simplicity. However, the theory of linear order is decidable and thus such an extension is feasible; note that these tests are performed at compile time and thus do not affect the data complexity of the queries. For similar reasons we restrict the use of aggregate operations: we require the aggregated attribute to be independent of the group by attributes 10. Moreover, the aggregation has to obey the restrictions in Figure 1.
222
David Toman
We also restrict the use of duplicate-preserving projection on all temporal attributes encoded by intervals. We have already seen that duplicate-preserving projection is not possible for unbounded data types. On the other hand, for bounded data types we could implement the duplicate preserving projection by creating the appropriate number of copies of the remainder of a tuple. However, such an operation would make the query evaluation very inefficient and almost certainly unusable in practice. Consider the following example: E x a m p l e 9 Let R(x, t) -- {(a, 0, 2n-l)} be a concrete temporal relation where n is a large integer. Clearly the size of R (in bits) is lal § n. However, the size of ~'x(R) is 2n 9 lal as the result of duplicate preserving projection has to contain 2n tuples (a). Allowing such projections would cause an exponential blowup in the (space) complexity of query evaluation. Note that the duplicate preserving projection is used in SQL for two main reasons: (1) to avoid duplicate elimination or (2) to facilitate correct aggregates. The first use does not apply to S Q L / T P - - w e deal with redundant duplicate elimination in the optimization phase of our compilation procedure. The aggregates are handled using a rewriting technique that allows us to avoid the duplicate-preserving projections. We can evaluate a vast majority of representation-independent aggregate queries even under the above restriction: note that all other relational operations preserve duplicates (cf. Section 3.3). Therefore we exclude the duplicate-preserving projections of all temporal attributes encoded by intervals in order to maintain the polynomial complexity bound. We define the SQL/TP queries to be the subset of SQL/TP expressions obeying the following rules: Definition 10 Let Q be a S Q L / T P expression that obeys the following rules: 1. temporal attributes encoded by intervals cannot projected out in a s e l e c t a l l clause. 2. all attributes in the (top-level) signature of the expression are pairwise independent. 3. the attributes in the group by clause are independent of the remaining attributes, and 4. the attributes used in aggregation operators follow the rules in Figure 1 Then we say that Q is a SQL/TP query over the class of concrete interval-based temporal databases. It is easy to verify that all SQL/TP queries preserve closure over the class of finitary temporal databases: T h e o r e m 11 Let D be a finitary database and Q a S Q L / T P query. Then Q(D) is finitary. The requirement of attribute independence seems like a rather severe restriction. However, the independence is required only for the temporal attributes present in
Point-Based Temporal Extensions of SQL Type of the group by attr's data bounded int unbounded int
223
Type of the aggregated attribute data min,max, sum, count rain,max, sum, count min, max
bounded i n t e r v a l s min,max, count rain,max, count min, max
unbounded i n t e r v a l s rain,max min,max min, max
When the group by clause contains multiple attributes we take the intersection of the allowed aggregate operations. We have excluded the sum aggregate on the temporal attributes: while it is definable in our framework, it makes little sense from the semantics point of view. Fig. 1. Allowed aggregates.
the signature of the top-level query, not for all temporal attributes that appear in the query. Therefore all the representation-independent TSQL2 queries, and all first order queries with a single temporal attribute in their signature in general, can be expressed as S Q L / T P queries. T h e o r e m 12 The first-order fragment of SQL//TP is expressively equivalent to
range restricted two-sorted first order logic (temporal relational calculus). We can also express queries shown not to be expressible in T R A 7, e.g., the query "is there a pair of distinct time instants, when exactly the same countries were independent?" 1, 21. This is not possible in any temporal query language that assumes a fixed number of temporal dimensions in its data model. 3.3
Examples of Queries
In this section we provide illustrative examples of S Q L / T P queries. The examples are chosen to highlight the ease of formulating queries in S Q L / T P . In addition some of the examples, e.g., example 3, can not be easily (and correctly) be formulated in TSQL2 derivatives. 1. The first example is a simple PSJ query "List all countries that were independent while Czech Kingdom was independent". select
from where
rl .name indep rl, indep r2 r2.name = 'Czech Kingdom'
and rl.year = r2.year
Note also that the result is a standard non-temporal relation. Over the database from Example 3 we get: name
Czech Kingdom
Poland 2. Formulating more complicated queries in S Q L / T P , e.g., the query "List all years when no country was independent", is also very natural:
224
David Toman (select t as year from true) except (select year from indep)
Note the use ofthetrue pseudo-relationto achievesignature compatibilit~ The result ofthequery is yearmin
yearmax
-infinity 1795 1939
1024 1917 1939
While the o u t p u t - - a concrete table containing all the periods when no country was independent--has two columns, it is essential to understand that it is only a compact representation of an abstract table with a single column Year. 3. In addition to first-order queries, the aggregate operations in S Q L / T p also naturally interact with the rest of the language, e.g., in the query "List all countries that became independent before Slovakia": select from where
name indep, ( select min(year) as yO from indep where name = 'Slovakia' ) year < yO
The result is: name
Czech Kingdom Czechoslovakia Poland 4. S Q L / T P also supports a natural way of aggregating over the temporal attributes: "For every country (that has been independent during the 20th century) list the number of years of independence within the 20th century": select from where group by
name, c o u n t ( y e a r ) as y e a r s indep 1900 <= y e a r < 2000 name
The aggregation is made possible by the where clause: it restricts the otherwise unbounded attribute year. The result is: name years Czechoslovakia Czech Republic Poland Slovakia
67 7 75 11
Note that in query languages with interval-valued temporal attributes we would have to use a special syntactic construction to m e a s u r e the size of the intervals explicitly.
Point-Based Temporal Extensions of SQL
225
5. Moreover, SQL/TP supports grouping by temporal attributes: "For every year list the number of independent countries (if any)": select year, count(name) as numofc from indep group by year The result is: yearmin
yearmax
numofc
1025 1198 1621 1918 1940 1945 1993
1197 1620 1794 1938 1944 1992 infinity
1 2 1 2
1 2 3
This query is quite hard to ask in temporal query languages that use coalescing implicitly: the input table is coalesced, and re-coalescing after the name column is eliminated looses the duplicate information we want to compute. 6. Another query hard to express in the current proposals is "List all pairs of years, when exactly the same countries were independent": select
y l , y2 from ( ( select rl.t as yl, except ( select yl, y2 from ( ( ( select name, except ( select name, union ( ( select name, except ( select name,
r2.t as y2 from true rl, true r2 )
year as yl, t as y2 from indep, true ) t as yl, year as y2 from indep, true ) ) t as yl, year as y2 from indep, true ) year as yl, t as y2 from indep, true ) )
))) The answer to this query is (the output shows only part of the answer): ylmin
ylmax
y2min
y2max
-infinity -infinity -infinity
1024 1024 1024
-infinity 1795 1938
1024 1917 1938
1918 1918
1938 1938
1918 1945
1938 1992
.
o .
.
.
.
.
.
.
.
.
.
Again the output relation is just a representation of a binary relation with two columns y l and y2. The results in 1, 21 show that the two dimensionality is
226
David Toman
inherent to this query (this remains true even when we consider the existential closure of this query).
3.4
Database
Updates
Besides considering the query language, in a truly practical approach we also need to address updates of temporal relations. We propose two constructs: insert all delete all
into R ( ) from R ( )
The updates preserve semantics with respect to the abstract temporal databases while manipulating only the concrete representation in a similar way queries do. The d e l e t e construction is more powerful than the SQL/92 version (as it handles duplication correctly).
3.5
Compatibility with SQL/92
To allow easy migration from SQL/92 to SQL/TP we follow 19 and introduce two compatibility modes that allow standard SQL queries to be evaluated over a temporal database: T e m p o r a l U p w a r d C o m p a t i b i l i t y ( T U C ) . The first level of compatibility treats standard SQL queries as queries operating with respect to the current time instant. This goal can be easily achieved by a simple query transformation that replaces every reference to a base table R by the expression
QP where V is the list of R~'s data attributes, t~ is P~'s temporal attribute, and N is a constant representing the current time instant (e.g., a CURRENT DATE in DB2 13). This transformation removes all temporal attributes and thus the original SQL query is evaluated on the current snapshot of the database. S e q u e n c e d Queries. The second level evaluates the standard SQL queries over the temporal database with respect to every time instant returning a temporal relation as a result s. The application of a sequenced query is defined by a transformation
{QP
x {t} }.
The resulting query performs exactly the required operation. s The query evaluation algorithm defined in Section 4 executes such queries efficiently with respect to the interval encoding, rather than by separate evaluation for every time instant.
Point-Based Temporal Extensions of SQL
227
We do not specify explicit syntax for the compatibility modes: all queries that come from legacy applications use one of the compatibility modes (usually the TUC mode) by default, and thus we assume that they are preprocessed using the above definitions before they are submitted to the SQL/TP query processor. For all practical purposes, such an arrangement is sufficient to provide the upward compatibility for SQL. However, it is easy to see that we could add tags to queries that distinguish the compatibility modes on the level of syntax (similarly to 19). The tagging mechanism would guarantee a syntactic temporal upward compatibility, should it become necessary. In all the cases the SQL/TP queries generated by the compatibility modes satisfy assumptions of Theorem 20 and thus can be efficiently translated to SQL/92 by our algorithm.
4
Evaluation
of SQL/TP
Statements
Starting with this section we focus on the second and third results of the chapter: the compilation technique that translates point-based SQL/TP queries to equivalent SQL queries over interval-based concrete temporal databases. The subtle point here is that the resulting queries are e~cient. This is not completely trivial as the semantics of the original queries is defined with respect to abstract temporal databases. A naive query evaluation procedure 16 would indeed refer to all points in the active domain of the corresponding abstract temporal database--an immediate exponential blowup in the data complexity of the query evaluation. While most approaches to query evaluation in temporal databases take the path of adding specialized temporal operations to a standard relational system, we take an alternative approach: we define a translation procedure that allows us to compile SQL/TP statements to standard SQL/92 statements. This way: - we guarantee that the query evaluation uses only the active domain of the concrete temporal database (the result is an ordinary SQL/92 query), - we can implement a SQL/TP interface as a preprocessor on top of an ordinary relational system, and we isolate a single intrinsic operation needed for temporal query evaluation over the interval-based encoding.
-
While the result of the translation can be defined using a SQL/92 query, we may decide to implement parts of it, e.g., the normalization operation; cf. Definition 15, natively to improve the efficiency of query evaluation. The translation also utilizes the quantifier elimination procedure for linear order 22 to replace references to individual time instants in the queries with references to interval endpoints. In the rest of this section we give a sketch of the SQL/TP to SQL/92 translation and is based on an extension of results in 20 to duplicate semantics and uses a nouveau normalization technique.
228
4.1
David Toman
Data Definition Language
The translation of the data definition language statements is fairly simple: we merely convert abstract signatures to their concrete counterparts. The SQL/TP statement in Example 3 is translated to: create table indep (name char(20), yearmin Time, yearmax Time) where Time is a user defined type (UDT) for an integer like time. The data type Time is equivalent to INTEGER 9 extended with two special elements, -oo and co. W e define the successor and order for this new type by lifting the operations from the INTEGER type.
4.2
T h e S Q L / T P Queries
The crux of our approach lies in the translation of SQL/TP queries. The natural correctness criterion is the preservation of query semantics. This requirement is captured by the following diagram: Abstrac t TDBI
~ Illl
1QSQL/TP Abstract Relation 1, II.ll
I Concrete TDB I ~
Physical DB
1Q'SQL/92----compile(Q) eval(Q') Concrete Relation ,
Physical Relation 1
We show that the proposed translation algorithm guarantees commutativity for the left part of the diagram. The commutativity of the right part is backed up by the reliability of the underlying relational system. The rest of this section gives a proof to the following theorem: T h e o r e m 13 Let D be a concrete temporal database and Q a S Q L / T P query. Then Q(lID11) = II compile(Q)(D)ll. Note that compile(Q) is executed at query compilation time---before the actual execution over the temporal database begins. Thus it does not affect the data complexity of the query evaluation algorithm. Before we present the steps of the translation itself, we introduce three auxiliary definitions. The definition of SQL/TP queries requires the temporal gttributes in the signature of a query to be independent. However, it does not prevent us from writing queries whose subqueries do not share this property. To deal with such attribute dependencies during the translation we introduce the notion of a conditional query: Definition 14 ( C o n d i t i o n a l Q u e r y ) Let Q be a SQL/92 query and ~ a quantifier-free .formula in the language of linear order such that t is a free variable in if and only if train and tmax are temporal attributes in the signature of Q. We call Q{~} a conditional query. 9 Often we can take advantage of a built-in data type provided by the RDBMS, e.g., data type DATEin DB/2.
Point-Based Temporal Extensions of SQL
229
Attribute data
max(a) lain(a) count (a) sire(a) max(a) ,,in(a) sula(CNT~(x)) sum(a. CNT~(X)) temporal max(amax) min(alain) sula(CST~(x)) N/A Fig. 2. Translation of Aggregate operations.
While the translation algorithm uses the conditional queries to translate subqueries of a S Q L / T P query, the attribute independence of the top-level attributes guarantees that no such dependencies remain in the result ,of the translation. The second challenge lies in the definition of relational operators that preserve semantics over the interval-based encoding. For this purpose we introduce a nouveau normalization technique. The idea behind the technique is quite simple: D e f i n i t i o n 15 Let { Q 1 , . . . , Qk} be a set of SQL/92 queries with compatible signatures, X a subset o/their data attributes, and t a temporal attribute. Q 1 , . . . , Qk are t-compatible on X i / / o r all concrete temporal databases D and all 0 < i < j < k whenever two concrete tuples a C Q~(D) and b E Qj(D) such that lrx(a) -- ~rx(b) then the sets ~{t}(lall) and 7v{t}(llbt ) are identical or disjoint. Q 1 , . . . , Qk are time-compatible on X if Q1,..., Qk are t-compatible on X for all temporal attributes t in the common signature. The definition of a time-compatible set of queries essentially says that if the data portion of a tuple is related to an interval in Qi(D) and to another interval in Qj(D), then it is always the case that these two intervals coincide or are disjoint. This way we can guarantee the intervals behave like points with respect to set/bag operations (cf. Figures 3 and 4). This definition is non-trivial even for singleton sets of queries as the answers to queries are bags of tuples. It is also easy to see that we can define a normalization operation that transforms a set arbitrary queries to a t-compatible set of (ll.ll-)equivalent queries. Moreover, this operation can be defined using a first-order query1~ L e m r n a 16 Let {Q1,... ,Qk} be a set o / S Q L / 9 2 queries with compatible signatures such that X a subset of their data attributes and t 'a temporal attribute. Then there are first-order queries N~Qi; Q 1 , . . . , Qk such that
1. IIQi(D)l I = II N~Qi; Q 1 , . . . , Qk(D)II for all concrete databases D. 2. {N~Qi;Q1,... ,Qk : 0 < i <_ k} are t-compatible on X . To define a time-compatible set of queries the above lemma is used for all temporal attributes in the common signature. It is easy to see that the normalization operation can be performed in O(n log n) where n is the combined size of the results of Q~(D). The last obstacle is the translation of aggregate operations. To translate the aggregation operators correctly we need to know how many tuples are encoded 10 Similarly to coalescing; a native implementation of the normalization can often be made more efficient.
230
David Toman
by every single concrete tuple in the relation we aggregate over: we define a function CNT~ for this purpose: it tells us how many duplicates would be in the It.ll-image of the result of projecting a concrete tuple on G after applying the selection condition a~. More formally: D e f i n i t i o n 17 Let E be an abstract signature, G c E, qo a quantifier free for-
mula in the language of linear order over temporal variables in E - G, and x a concrete tuple in the signature-E. T h e n CNT~(X) ----card(o'qollTl'E_G(X)ll). Note that CNT~ maps concrete tuples to natural numbers. However, if we used a dense model of time then CNT would be a measure on the sets of time instants and could return non-integral counts, e.g., 1.5 years. For details on aggregation and measures see 10. L e m m a 18 Given fixed E, G c E, and ~, the function
CNT~ can be defined
using an integer expression over the value of x. The CNT function operates on single tuples and thus contributes only a constant to the overall data complexity of queries. Now we are ready to proceed with the translation itself: every S Q L / T P expression Q is translated to a set of conditional queries Q l{qOl } , . . . , Qn {qOn} while maintaining the invariant Q(IIDII) = avl IIQI(D)I IU...t3av. IIQn(D)l I. The translation itself is defined inductively on the structure of the S Q L / T P query.
Translation of the Select Block. Consider the S Q L / T P statement: select
all
X from E l , . . . , En where 99 group by G
where X is the set of attributes in the answer, Ei subqueries or base table references, qo the selection condition, and G the set of grouping attributes. In addition, let A be the set of all aggregate expressions in X and E be the set of all attributes in E l , . . . , En. We assume that we have already translated the subqueries to QI{~I} E c o m p i l e ( E l ) , . . . , Q n { ~ n } c compile(En) 11. We compose the partial results t o get a set of conditional queries equivalent to the original select block as follows (we proceed by translating every clause of the original select block one by one):
f r o m E 1 , . . . , E n : For every QI{~I},...,Qn{~On} the from clause gives us a SQL/92 query select E from Q 1 , . . . , Qn{r where r = ~1 A ... A ~n. where ~: To apply the selection condition ~ we need to determine the relationships between the endpoints of intervals, the corresponding point-valued attributes, and the selection formula. We use the quantifier elimination procedure for linear order to achieve this goal. Let Q ( E ) { r be the result of the 11 for the base tables we use a trivial condition true in place of q0~.
Point-Based Temporal Extensions of SQL S:
l~a'}:
I
N~}S;S
I
"b":,
i
i
I
I "a":
l,
}I "b":
,
231
,
,
,
SQL/92
7~$min,tmax
.,(S):
I
It
I
i
i
S has data attribute x and a temporal attribute t. The boxes in the figure represent the t-x graphs of the involved tables. Similar technique is used for aggregation: it is easy to see that we could easily count duplicates on the normalized relation (cf. Section 3.3, query 6). Fig. 3. Projection with Duplicate Elimination
previous step. We define ~)1 : = QE(3T(r A ~ A A (tmin < t < tmax))) tET
r
:= QE(BT(r A ~ A A (tmin < t < tmax))) tET
where T is the set of all temporal attributes in E (encoded by intervals), T is the set of all attributes except those in T and constants, and QE is the quantifier elimination procedure for linear order. Now we define select E from Q where r162
to be the result of applying the original where clause on Q{~b}. group by G: To apply grouping we first normalize the result of the previous step with respect to the attributes in G. This allows us to use the standard SQL/92 grouping construction (cf. Figure 3). Let Q(E){r be the result from the previous step. As the attributes in G and E - G are independent we can split r to r involving only attributes in G and r involving only attributes in E - G. We generate s e l e c t G, A from NoQ; Q group by 9{r Note that the aggregates in A have to be transformed using the rules in Figure 1 and Lemma 18 applied on E, G, and r s e l e c t X: The translation of the final projection depends on the use of duplicate preserving vs. duplicate-eliminating projection. Let Q(E){r be the result from the previous step. for s e l e c t queries we get s e l e c t d i s t i n c t X from NxQ;Q{r and for s e l e c t a l l queries we get s e l e c t X from Q{r Note that only queries that use aggregation or duplicate elimination use the N operation. -
For each of these steps we can easily verify that the transformation invariant is preserved. Moreover, in the actual implementation the subqueries generated by the above four steps are merged into as few nested blocks as possible, e.g., the first two steps can be always merged into a single select block, etc.
232
David Toman
"a":,
S1 : "b": , $2:
"b"'a":: ,
,
, '
,
,
' '
I NI=}SI;SI'S2
I
~a ~ I
I
I~
:
"b": ~ ~ I
~
NI=}S2;SI'S2' "a": "b":
u_J, ~,
I
,
, I exceptSQL/92 i
{~a~ I :
SlexceptS2
I
I
I Fig. 4. Set Difference using Norma/ization.
Translation of the Set Operations. The translation of the set operations follows similar path as the translation of the duplicate elimination: we need to find conditions under which the set operations on the encoding are equivalent to set operations on the abstract relations. Clearly a direct use of SQL/92 set operations does not preserve the semantics of S Q L / T P queries. L e m m a 19 Let Q1, Q2 be time-compatible SQL/92 queries with common signature. Then IIQlll opllQ211 = IIQ1 opQ211 where op is one of the u n i o n all,
except all, or intersect all. The above lemma extends to conditional queries: Let Q~{~} 9 compile(Q) for i 9 I, R j { r 9 compile(R) for j 9 J, and X the set of data attributes in the common signature. Then Q tmion R
~-* {Qi{~0~ A -~ Vjeg r
Rj{r
A -~ Vie, ~i},Q~ union n~-{~ A %bj}}
Q intersect R ~ {Q~ intersect R~{~ ^ r where Q~ = Nx Qi; Qi, Rj and R~ = Nx %; Qi, Rj. W e can omit the normalization for the union operation. The duplicate-preservingoperations are transformed analogously. The resultof the translationcan unfortunately be exponential in the depth of nesting of the originalquery. Note that this does not affectthe data complexity of the query evaluation as the translation is performed at compile time. Moreover, for large class of queries we can show: T h e o r e m 20 Let Q be a query composed of attribute independent subqueries with size at most k for a fixed constant k 9 N. Then I compile(Q)l 9 O(IQI)Thus small subqueries can violate the attribute independence requirement while still maintaining the bound on the size of the translated query. Using this result we can show that, e.g., the first-order temporal logic queries can be efficiently translated through S Q L / T P : all temporal operators are translated into small fixed-size subqueries and views with exactly one temporal attribute 3. Similar result holds for all T R A 7 based languages.
Point-Based Temporal Extensions of SQL
233
View Definitions in SQL/TP. A side effect of this observation is a natural definition of views in the SQL/TP. Again, we use the familiar SQL/92 syntax: create view ()
where is a view name and an SQL/TP query. According to Theorem 20 queries that can be expressed using views can be translated with only linear increase in size.
4.3
O p t i m i z a t i o n of t h e T r a n s l a t i o n
It is easy to see that the normalization operation N is idempotent in a similar way as projection:
Q~(D) = N~Qi; Q1,..-, Qk(D) for any queries Q 1 , . . . , Qk that are t-compatible on X. Note that in this case we are comparing the concrete representations rather than the images under I1.11. The optimization are based on the following two lemmas: L e m m a 21 Let S be a set of SQL/92 queries t-compatible on X. Then all queries in the closure of S under Try, a~, and op are also t-compatible on X where X U {t} c V, op is a set operation, and ~ is a selection condition. L e m m a 22 If Q 1 , . . . , Qk are t-compatible on X then they are t-compatible on Y for all Y D_X . Thus these rules can be used to push and eliminate superfluous normalization steps. In addition, during the optimization phase we flatten nested simple select blocks created during the translation (using trivial attribute renaming), merge their where conditions (and check for satisfiability), and perform a "general" clean-up of the resulting query.
4.4
Updates
A naive implementation of the update operations is straightforward. However, an efficient execution model of database updates over the concrete databases is a topic for further research (cf. Section 5). A major step towards efficient updates in place would be development of an in place normalization technique for temporal relations. This way we could solve the update problem by normalizing (cf. Section 4) the updated relation with respect to the update query and then process the actual update the same way SQL does. However, the normalization operation is costly; the hope is that the cost would amortize over time as the normalization is idempotent; cf. Section 4.3.
234 5
David Toman Conclusion
We have shown that a high-level point-based approach to temporal extension of SQL has many advantages over the common approaches that use interval-based attributes: simple syntax and semantics, meaningful aggregation, and possibilities of advanced query optimization. All this is achieved while maintaining efficient query evaluation over temporal databases based on interval encoding of timestamps. We have also shown that all representation independent TSQL2 queries are expressible in SQL/TP (follows from 20). Future Work. This proposal is only the first step towards the implementation of SQL/TP. There are still many open questions: - Can we use more complex temporal domains? In our proposal we used a discrete linear order with a limited way of counting. Is it possible to use richer temporal domains while maintaining the properties of the proposed language? What are the tradeoffs? - We have chosen an interval-based encoding of sets of time instants in the concrete data model. While this encoding can compactly describe periods of time, it fails, e.g., for periodic events. Is it possible to extend the encoding scheme and this way to enlarge the class of finitary temporal databases? Note that we have to be careful here: we still need to perform complementation and projection in a data-independent way during the translation of queries. Choosing too rich encoding can prevent us from achieving this goal. - What optimization techniques can be used in conjunction with our query translation procedure? We have provided only a high-level description of the compilation technique that allows SQL/TP queries to be evaluated over an interval-based temporal database. We also sketched several optimization techniques specific to this translation. However, we need to analyze how the resulting queries interact with the existing database optimizer and the underlying representation. This leads to questions of proper index selection, the use of spatial operations (spatial joins), etc. - How do we perform updates efficiently? The area of updates presents a completely new set of problems, the main problem being the in-place updates of the encoded temporal relations. This problem goes hand in hand with defining various normal forms 6, 20 of temporal relations and enforcing them over updates 12. Can the standard indices built-in relational systems aid the query evaluation based on the proposed compilation technique? What are the tradeoffs comparing to specialized indices? -
There are few answers to these questions even for the established temporal query languages like TSQL2. Our technique allows us to reuse most of the efforts aimed towards boosting performance of temporal DBMS, e.g., the development of efficient temporal and spatial joins, and sophisticated access methods. 12 In this chapter we did not assume any normal form for the temporal relations.
Point-Based Temporal Extensions of SQL
235
References 1. Abiteboul, S., Herr, L., Van den Bussche, J. Temporal versus First-Order Logic to Query Temporal Databases. Proc. ACM PODS 1996, 49-57, 1996. 2. Allen, J. F. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832-843, 1983. 3. Bohlen, M. H., Chomicki, J., Snodgrass, R. T., Toman, D. Querying TSQL2 Databases with Temporal Logic. In Proc. EDBT'96, LNCS 1057, 325-341, 1996. 4. BShlen, M. H., Jensen, C. S. Seamless Integration of Time into SQL. University of Aalborg, h t t p : / / ~ , cs. auc. dk/ boehlen/Softwaxe/T• gz, 1996. 5. BShlen, M. H., Jensen, C. S., Snodgrass, R. T. Evaluating and Enhancing the Completeness of TSQL2. Technical Report TR 95-5, Computer Science Department, University of Arizona, June 1995. 6. BShlen, M. H., Snodgrass, R. T., Soo, M. D. Coalescing in Temporal Databases. Proc. 22nd Int. Conf. on Very Large Databases, 180-191, 1996. 7. Clifford J., Croker A., Tuzhilin A. On Completeness of Historical Relational Query Languages. ACM Transactions on Database Systems, Vol. 19, No. 1, 64-116, 1994. 8. Codd, E. F. Relational completeness of database sublanguages. In Rustin, R.(ed.) Courant Computer Science Symposium 6: Data Base Systems, 65-98, PrenticeHall, 1972. 9. Chomicki J. Temporal Query Languages: a Survey. Proc. International Conference on Temporal Logic, July 1994, Germany, Springer-Verlag (LNAI 827), 506-534. 10. Chomicki, J., Kuper, G. M. Measuring Infinite Relations. Proc. ACM PODS 1995, 78-85, 1995. 11. Chomicki, J., Goldin, D. Q., Kuper, G. M. Variable Independence and Aggregation Closure. Proc. ACM PODS 1996, 40-48, 1996. 12. Date, C. J., Drawen, H. A Guide to the SQL Standard (3rd ed.), Addison-Welsley, 1993. 13. IBM Database 2, SQL Reference for common servers. IBM Corp., 1995. 14. Jensen, C. S., Snodgrass, R. T., Soo, M. J. Unification of Temporal Data Models. Proc. 9th Int. Conf. on Data Engineering, 262-271, 1993. 15. Kanellakis, P. C., Kuper, G. M., Revesz, P.Z. Constraint Query Languages. Journal of Computer and System Sciences 51(1):26-52, 1995. 16. Ladkin, P. The Logic of Time Representation. PhD Dissertation, University of California, Berkeley, 1987. 17. Snodgrass R. T. The Temporal Query Language TQuel. ACM Transactions on Database Systems, 12(2):247-298, June 1987. 18. Snodgrass R.T. (editor). The TSQL2 Temporal Query Language. Kluwer Academic Publishers, 674+xxiv pages, 1995. 19. Snodgrass, R. T., B5hlen, M. H., Jensen C. S., Steiner, A. Adding Valid Time to SQL/Temporal, ISO/IEC JTC1/SC21/WG3 DBL MAD-146r2 21/11/96, (change proposal). 20. Toman, D. Point-based vs. Interval-based Temporal Query Languages Proc. ACM PODS 1996, 58-67, 1996. 21. Toman, D., Niwinski, D. First-Order Temporal Queries Inexpressible in Temporal Logic Proc. EDBT'96, Arpes, Bouzeghoub (eds.), LNCS 1057, 307-324, 1996. 22. Williams, H. P. Fourier-Motzkin Elimination Extension to Integer Programming Problems. In Journal of Combinatorial Theory (A) 21, 118-123, 1976. 23. Kabanza, F., Stevenne, J.-M., Wolper, P. Handling Infinite Temporal Data. JCSS 51(1): 3-17, 1995.
236
David Toman
Syntax of SQL/TP Core Language
A
We summarize the syntax of SQL/TP by the following BNF grammar: <exp>
::=
I
i
::= create table ( <signat> ) I create view ( )
::= select all <s_list> from where group by I ( ) union all ( ) l ( ) except all ( ) I ( ) intersect all ( )
::= insert all delete all
<signat>
::= ,
<s_list>
: := <expr> as
::= {
: := . . . and ...
: := .
,
::= time using
<enc>
<expr>
::= .
<enc>
::= points
::= min I max I count I sum
into ( ) from ( ) <signat>
,<s_list> I ( ) } , +
I integer I real i char(N)
I ( .
I bounded intervals
i ...
) I ...
I unbounded intervals
where is a valid relation identifier, a valid attribute identifier, and a valid comparison type-compatible with the involved attributes.
B
Comparison with SQL/Temporal
Now we are ready to compare SQL/TP with the most current proposal of temporal extensions to the SQL-3 ISO/ANSI committees--SQL/Temporal 19. The
Point-Based Temporal Extensions of SQL
237
comparison is slightly complicated as SQL/Temporal comes in three separate flavors (with distinct semantics). These variants follow the migration requirements stated informally in Section 3.5 (for full description see 19). T U C . The temporal upward compatibility semantics is based on mere restriction of all temporal relations to the current state. This way the query evaluation is essentially performed on a standard relational database. From this point of view, this mode doesn't involve any manipulation of the temporal attributes (and we include it only for completeness here). S e q u e n c e d semantics. This is, in fact the only mode in which the temporal dimension is truly managed by the database system. Unfortunately, the class of queries to which this semantics applies is very restricted: only plain SQL queries applied point-wise to the whole database are allowed. Moreover, this part of the semantics is based on a valid-time relational algebra (VTA) and the results in 1,21 show that no single-dimensional temporal relational algebra can express all first-order queries (e.g., the existential closure of query 3 in Section 3.3) and thus even extending VTA with additional temporal operators, e.g., since and until, can not achieve first-order completeness (in contrary with the popular belief, e.g., 7). Nonsequenced semantics. In order to formulate general temporal queries (as none of the above semantics allows that), SQL/Temporal introduces the nonsequenced semantics. This semantics is indeed first-order complete. However, the completeness comes with a high price tag: the semantics is essentially based on a first-order logic over interval-based temporal domain 20, rather then the individual time instants and thus is plagued by all the problems discussed in Section 1. The separate semantic layers may also cause quite a bit of confusion to the users: many operations, e.g., the set difference or aggregation, behave completely differently in the sequenced semantics and in the nonsequenced semantics: the sequenced semantics is defined with respect to the individual time instants (similarly to SQL/TP), while the nonsequenced semantics is defined with respect to the encoding of sets of time instants--intervals--and the user is asked to manipulate the the representation of timestamps on her own (this can be clearly seen by formulating queries 2 and 3 from Section 3.3 in SQL/Temporal). From our point of view the second (sequenced) semantics is the most promising part of SQL/Temporal: the temporal information is handled correctly with respect to the underlying point-based semantics. We consider SQL/TP a fully expressive temporal extension of the sequenced semantics, rather then the nonsequenced semantics. Section 3.5 supports this claim by showing how the sequenced semantics can be naturally embedded into SQL/TP. However, the encoding-dependent part of SQL/Temporal is solely connected with the nonsequenced semantics. We propose to replace this part of the language with SQL/TP: this change would lead to a natural extension of the sequenced semantics to all temporal queries.
Applicability of Temporal Data Models to Query Multilevel Security Databases: A Case Study Shashi K. Gadia Computer Science Department Iowa State University Ames, IA 5001 ! [email protected]
Abstract. In a multilevel security database there are multiple beliefs about a given real world object. The ability of a database model to accommodate multiple beliefs is termed polyinstantiation in the multilevel security literature. In this paper we remark that in an abstract sense polyinstantiation is a priori present in all models for temporal and spatial databases. In particular we investigate the applicability of the parametric model for temporal data to query multilevel security data and, as a case study, compare it to a model for multilevel security given by Winslett, Smith, and Qian. Index terms. Databases, relational databases, multilevel security, belief data, polyinstantiation, temporal databases, spatial databases, dimensional databases.
1 Introduction Several models for temporal data have been proposed for which Ta+93 is an excellent reference. For the parametric model for temporal data that originated in Ga88, GY88 provided the concept of a key, and BG90 provided an SQL-like language. A summary of the parametric model given in GN93 has also appeared in Ta+93. A brief summary of the parametric model will also be given in this paper. Models for multilevel security have appeared in BL75, CS95, DLS87, DLS88, GQ95, HOT91, JS90, JS91, LDS90, SW92, WSQ94. In multilevel security there is a hierarchy of users or user levels, in which every user level has its own version of information. A user can see all information belonging to users at and below his/her level. On the other hand, the information belonging to a higher user level, or even existence of such information or such user levels, is held confidential from the lower user levels. A model for a multilevel security database must be devoid of a sort of communication, called a covert channel, which can lead to a compromise of the user confidentiality. A simplistic use of the classical first normal form database model, where every value is atomic, is venerable to covert channels. This is shown in the following example.
Example 1. Imagine a classical emp relation as in Figure l(a) with Name as its key. Postulate two user levels, upper and lower, and assume that John is known to be a fictitious person at both the user levels. Everyday, a user at the upper level can leak one bit, 0 or 1, of some secret message to a user at a lower level as follows: O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases - Research and Practice LNCS 1399, pp. 238-256, 1998. 9 Springer-Vedag Berlin Heidelberg 1998
Applicability of Temporal Data Models to Query MLS Databases
239
9 On the day that the user at the upper level wants to leak the bit "0", he/she does nothing. On the day the user at the upper level wants to leak the bit "1", he/she inserts a fictitious record for John in the morning and deletes the record in the evening. 9 Every afternoon, the user at the lower level tries to insert a (fictitious) record for John. On some days the insertion will go through, and on other days the system will reject the insertion as a violation of the key. If the insertion is confirmed by the system, the user at the lower level assumes that the bit "1" has been sent to him/her by the user at the upper level and the lower-level user deletes the record just inserted. I f the system rejects the insertion, the user at the lower level assumes that the bit "0" has been sent by the user at the upper level. 9
Name
Salary
Dept
John
50K
Toys
Tom
60K
Shoes
(a) A classical relation with Name as its key User
Salary
Dept
John
Ul
50K
Toys
John
I12
80K
Toys
Tom
u2
60K
Shoes
Name
Tom
60K Shoes u3 (b) A multilevel security relation with Name User as its key Name
User
Salary
Dept
John
t1
50K
Toys
John
t2
80K
Toys
Tom
t2
60K
Shoes
Tom
t3
60K
Shoes
(c) A temporal relation with similar mathematical content as (b) Figure 1. Polyinstantiation in multilevel security and temporal databases Typically, in multilevel security literature the covert channel is avoided by adding a "User-level" column to a relation so that the key is not just the Name attribute, but rather "Name and User-level" attributes put together. (See Figure l(b).) With this arrangement, the system does not give an error message about duplication of a record. For this solution, the term polyinstantiation has been coined: polyinstantiation means the ability of a system to accommodate multiple beliefs about a real world object in the database. It turns out that there are two levels of polyinstantiation.
240
Shashi K. Gadia
U-polyinstantiation. 1 Under u-polyinstantiation it is assumed that a real world object has the same key under all beliefs, although the nonkey values may vary. For example, Name of an employee would be the same in all beliefs, but varying beliefs about salary and department may exist. Because an object value may be different at two instants of time or at two points in space, u-polyinstantiation is in a mathematical sense a priori present in any model of temporal or spatial databases. (See Figure l(c).) 9
9 Key-polyinstantiation. Key-polyinstantiation allows key as well as nonkey attributes value to vary across beliefs. Key-polyinstantiation subsumes u-polyinstantiation. The concept of a polykey, where an object may have several key values, was introduced inBG89,GB89,BG90 2 for temporal beliefs, and a brief discussion can be found in GN93. A discussion of key-polyinstantiation is beyond the scope of this paper. The key-polyinstantiation in multilevel security has been covered in CG95, CG96. 3 U-polyinstantiation seems to be the only form of polyinstantiation in multilevel security literature, where it is typically supported by having multiple tuples for a realworld object (see Figure l(b)). As stated above, in a mathematical sense u-polyinstantiation is a priori present in any model of temporal or spatial databases. In some nonl n f models, u-polyinstantiation is captured at the tuple level. In addition, in the parametric model u-polyinstantiation is used as a keying mechanism for tuples: there is a one-to-one correspondence between objects in the real world and the u-polyinstantiated tuples. A tuple, or the corresponding real world object, is identified by its unikey,4 the unique key value. Because the parametric model captures u-polyinstantiation at the tuple level rather than at the relation level, it seems to provide a cleaner framework for multilevel security databases with u-polyinstantiation. In this paper we will consider WSQ94 as a case study, and for this purpose we term the model presented in there as the VSQ model. This paper will focus on a comparison between the parametric and the WSQ frameworks for modeling and query of multilevel security data. The rest of this paper is organized as follows. Section 2 gives a brief introduction to the parametric model for temporal data. A user hierarchy for the parametric model is presented in Section 3. Section 4 shows how to adapt the parametric model and the user hierarchy to multilevel security. Section 5 introduces the WSQ model for multilevel security. Section 6 examines some characteristics of the two models for multilevel security. Section 7 exhaustively covers all queries in WSQ94 and shows that they can be expressed more naturally in the parametric model. The conclusions are presented in Section 8. 1 The prefix "u-" in "u-polyinstantiation" may be seen as an abbreviation of"uni", the term "unipolyinstantiation" would sound odd, therefore, we have coined the term "u-polyinstantiation". 2 To the best of our knowledge, the concept of key-polyinstantiation was first introduced in GB89, where no special term was used for it. 3 Belief data is covered extensively in our works available in a series of six Technical Reports of which Ga97 serves as an index. 4 The term unikey is used in this paper for brevity and also to make a clear distinction from polykeys.
Applicability of Temporal Data Models to Query MLS Databases
241
Note that there are an unusually large number of footnotes. Footnotes are necessary to keep the main text as easy to read as possible. However, let it be emphasized that the footnotes are an important and integral part of this paper.
2 The Parametric Model for Temporal Databases The parametric model consists of a data type for time called temporal elements, attribute values, associative navigation (AOB), tuples, and relations. Our relations require a key to be designated with them. Finally, an algebra for the model will be introduced. The style of presentation is influenced by the need to make a clear comparison to the WSQ model. Let's assume that the universe of time consists of instants {t l, t2, ..., tn}. A temporal element is defined to be a finite subset ofT. Note that no order properties are assumed for the set T.I
A temporal value of an attribute A is defined to be a function from a temporal element into the domain of A. A temporal value is also called an attribute value or simply a value. An example of a temporal value of the attribute COLOR is ({tl} red, {t2} blue). IA denotes the domain of a temporal value A. Thus II({tl } red, {t2} blue)l = {tl,t2}. A,l, it denotes the restriction of A to the temporal element It. Our counterpart of the construct A0B for the relational model is IIAOB, which is defined to be {t: A and B are defined at t, and A(t)0B(t) is TRUE}, the set of instants where A is in 0 relationship to B. IAOB! is a temporal element. For example, ({tl,t3} red, {t2} blue) = ({tt,t2} blue) = {t2}. We also allow the construct A0b, where b is a constant, which is evaluated by identifying b with the value ({t 1, t 2, -.., tn}
b). A homogeneous tuple "c over a scheme R is a function from R such that for every attribute A in R, x (A) is a temporal value of A and all the temporal values in the tuple have the same domain. Informally, we say that a tuple is a concatenation of temporal values whose temporal domains are the same. The assumption that all temporal values in a tuple have the same domain make our tuples homogeneous. Suppose tuple x is given. Then the temporal domain of x is the temporal domain of any attribute and is denoted by fix. A tuple is said to be void if its domain is empty. If It is a temporal element, x$it is obtained by restricting each value in "c to the temporal element It.
2.1 Relations Every set of tuples over a scheme R is not considered to be a relation. A relation r over a scheme R, with K ~ R as the key of r, is a finite set of non-void tuples such that no key attribute value in a tuple changes with time, and no two tuples match in all their key attributes. Sometimes, the key attributes will be underlined for emphasis. Figure I In general the universe of time and temporal elements can be more complex. For a clearer comparison with the WSQ model this simple definition suffices. The main property which encapsulate temporal elements is their closure under union, intersection, and eomplementation.
242
Shashi K. Gadia
2 shows a database with a relation emp(Name Salary Dept) with Name as its key. The relation is a counterpart of the temporal relation of Figure 1 (c) in the parametric model. Now suppose r is a relation. The domain of r, denoted Ir, is defined as the union of domains of all tuples in r, i.e. Ir = ux ~ rllXl. Clearly, the domain of a relation is a temporal element. The restriction of r to temporal element ~t, denoted r$ B, is defined in a natural manner. The snapshot of r at an instant t, denoted r(t), is defined to be r,l, {t}. Iemp, the domain of the emp relation of Figure 2, is {tl,t2}. The snapshot of the emp relation at instant t 2 is shown in Figure 3. The timestamp is not shown in this figure. Because of the homogeneity assumption, the snapshot of a temporal relation is isomorphic to a classical relation without nulls. In the parametric model, a database can be viewed as a parametrization of classical relations. Note that neither WSQ94 nor this paper considers nulls.l
Name
Salary
Dept
{tl,t2} John
{tl} 50K
{tl,t2} Toys
{t2,t3} Tom
{t2,t3} 60K
{t2,t3} Shoes
{t2} 80K
Figure 2. emp relation of Figure 1 (c) in the parametric model Name
Salary
Dept
JAn
80K
Toys
Tom
60K
Shoes
Figure 3. Snapshot of the emp relation at t = t2
Now let's present an algebra for the homogeneous relations. Our algebra includes three types of expressions: domain expressions, which evaluate to temporal elements; boolean expressions, which evaluate to boolean values (TRUE or FALSE); and relational expressions, which evaluate to relations. These three types of expressions are mutually recursive.
2.2 Domain Expressions Domain expressions are the syntactic counterparts of temporal elements. They are formed using temporal elements (e.g., { 11,20} u {31,40}), lIAr, IAOB, A0b, leD, I WSQ94 states: "Note that our formal treatment does not allow null values, just as ordinary relational algebra omits consideration of nulls. Null values may be included in a formal treatment by formalizing them in one of the many standard manners ..." The same remarks apply to the parametric model where the homogeneity assumption yields the counterpart of classical relations without nulls.
Applicability of Temporal Data Models to Query MLS Databases
243
u , ca, complementation (unary -), and (binary) - , where A and B are attributes, b is a constant and e is a relational expression. If ~t is a domain expression and x is a tuple, then tl(x), resulting from the substitution of x in ~t, is a temporal element, and such substitution can be defined in a natural way. Following is an example of tuple substitution.
Example 2. Consider the domain expression Salary = 80K. For a given tuple this expression retrieves the time domain where salary is 80K. Suppose x is John's tuple in Figure 2. Then Salary = 80K (x) evaluates to {t2}. As another example, consider the domain expression Salary = 80K ca -(Dept = Toys u Dept = Shoes). For a given employee, this expression retrieves the time domain consisting of instants where salary is 80K and the department is other than Toys or Shoes. For John's tuple, it evaluates to the empty set 0 . 9 2.3 Boolean Expressions Boolean expressions are syntactic counterparts of boolean values TRUE and FALSE. They are formed using tt c_ v, where tt and v are domain expressions. More complex expressions are formed using ^, v, and 9. Note that expressions of the form Ix -- v, ~t r v, etc., can be derived using the above constructs. I f t is an instant of time, {t} ~ v can be written as t e v. 2.4 Parametric Syntactic Forms AOBI and A0B We have already introduced the syntactic form AOBI for the parametric model. In the parametric model the syntactic form AOB, without the use of ., is given a different meaning: A0B is defined to be an abbreviation for the boolean expression of the form -(AOBI c O), which simply says that there is at least one instant of time where A is in 0 relationship with B. Note that whereas AOBI is a domain expression evaluating to a temporal element, A0B is a boolean expression evaluating to TRUE or FALSE. Some important remarks about the parametric syntactic forms AOB and A0B are now in order. 9 The counterpart of the classical syntactic form A0B in the parametric model is the parametric syntactic form AOB and not the syntactic form AOB. 9 One of the uses of the parametric syntactic form A0B is to identify objects. For example, "Name = John" is TRUE only for the first tuple in Figure 3. 9 In a snapshot at an instant t, the distinction between the parametric syntactic forms AOBI and A0B essentially disappears. This is formalized in BG93. Therefore, the syntax in the parametric model is a consistent extension of that in the classical model.
2.5 Relational Expressions Before introducing relational operators, the concept of weak equality among relations must be defined. Suppose r and s are relations over the same scheme. Then r and s are said to be weaklyequal if r and s have the same snapshots, i.e., r(t) = s(t) for all instants t. It is easy to show that if two weakly equal relations have the same key, then the relations are equal. In other words, to specify a relation uniquely it is enough to specify its snapshots and its key. In the following we assume that the key o f r is K, and the natural join is denoted as O.
244
Shashi K. Gadia
Onerator
Definition o f Snanshot
Desienation of Key
Stored relation r
r(t)
Same as key of r
Union
(e I u e2) (t) = el(t) u e2(t )
Same as key o f r and s
Difference
(el - e 2) (t) = el(t) - e 2 ( t )
Same as key o f r and s
Natural join
(e I 0 e2) (t) = e 1(t) 0 e2(t )
Union of keys o f r and s
Projection
(Hx(e)) (t) = rIx(e)(t )
I f K c X then K, else X
1-3-selection
(6 (e, ,r
Same as key of r, explained below
A
(t) = O (e(t), ,r
The definitions of union, difference, natural join (0), projection and l-3-selection 1 given above are completely precise. 2 As an example consider the definition o f union. (e 1 w e2) (t) = e I (0 w e2(t), which shows how snapshots can be computed. A snapshot o f the union (e I u e2)(0 is defined as el(t ) u e2(t). The latter is well defined as it is essentially a union of two classical relations. Thus, we have completely specified the snapshots as well as the key o f e I u e 2. Therefore, e 1 u e 2 is well defined. The 1-3-selection ~ ( e , , ~) needs more explanation. A 1-3-selection is a special case o f selection of the form ~ (r, f, ~), to be discussed below. In a 1-3-selection the second argument is left blank, and it is the operator in the temporal database that is a direct counterpart o f the classical databases. In the 1-3-selection a (e,, r the parameter ~ is a domain expression. An example of the 1-3-selection is a ( e m p , , Dept= Toys u Dept = Shoes~). In temporal databases, it is a counterpart of the classical selection ~(emp, Dept = Toys v Dept = Shoes). The general form of a selection is o(r, f, r It evaluates to {'c$O(x): xer, f('0 and x$~(x) is not empty}. I f f evaluates to TRUE for a tuple, ~ allows us to select only a relevant part o f it, which is specified by ~. The key of a(r, f, r is the same as the key of r. 3
Example 3. The query give information about employees while they were in Toys or Shoes if they are currently employed can be expressed as follows: o(emp, NOW c IName,
lDept=-Toys
u lDept=-Shoes)
9
The parametric model also includes an operator that allows a user to change the key of a relation Ga88. This operator has very interesting interaction with the selection operator GN93. In this paper we will also use an SQL-like select statement for our model. It turns out that for a comparison with the WSQ model, only a simple form o f the select statement where the from list consists of a single relation will be needed. In other words, the select statement to be used in this paper is o f the form given below: The use of the term l-3-selection is confined to this paper to make its relationship with WSQ94 clearer. It is not a new operator in the parametric model. 2 Note that the snapshot semantics of the relational operators given here is a theoretical one. A more pragmatic semantics of the relational operators can be given directly without invoking the snapshots. This point is dealt with in more detail in Ga86. 3 The definition of the full form of selection cannot be given in terms of snapshots Ga88. 1
Applicability of Temporal Data Models to Query MLS Databases
245
select X from r restricted to ~b where f A precise semantics of this form of select statement can be given easily in the parametric model in terms of selection and projection operators: I-Ix o(r, f, 0). As in the definition of the selection operator, the "restricted to" clause limits the retrieval of a tuple "cto the temporal element computed by ~b(t). Several examples of the select statement will follow later in the paper.
3 Concept of User Hierarchy in the Parametric Model In the previous section, we implicitly assumed that there is only one user for the parametric model. Such a user has access to the whole history, i.e., values in the database during the entire time {tbt2,..',tn}. To facilitate a clear comparison with the WSQ model, we must introduce the concept of a user hierarchy in the parametric model. For the parametric model let's now hypothesize multiple users. Corresponding to every user u, we formally associate a temporal element in {tl,t2,...,tn}, called the domain of u, denoted as Dom (u). When a user u submits a query to a database, the system automatically restricts the database to Dora(u) before processing the query. Clearly, the set theoretic containment among users creates a partial order among users. Formally, we say that users u! is below u 2 in the user hierarchy if and only if Dom(ul) Dom(u2). A user hierarchy is shown in Figure 4. {h,t2,t3}
{tl,t2} ~ , ~
f
{t2,t3}
{t2} Figure 4. A user hierarchy for the parametric model The concept of user hierarchy was introduced in GB89. There a useful and elaborate hierarchy has been given in a bitemporal model. The following covers an interesting example of users of the temporal model presented in the previous section. Example 4. Imagine the temporal universe { 1,2,...,NOW} also denoted as the interval 0,NOW, where NOW is the current instant of time. Now imagine that we have a database used by a governmental agency, which declassifies information after 10 units of time. Consider the following community of users: system,public, analyzer, and classical with user domains 0,NOW, 0,NOW-10, NOW-4,NOW, and {NOW}, respectively. The system user can see the whole information, the public can only see
246
Shashi K. Gadia
information at least 10 years old, the analyzer has the last 5 years worth of information, and the classical user only sees the current information (as would be the case in a classical database). 9
4 Parametric Model for Multilevel Security The parametric model for temporal data discussed in the previous sections can easily be adapted to multilevel security. The terms instant and temporal element are changed to u s e r l e v e l and u s e r e l e m e n t , respectively. Corresponding to the universe of time {tl, t2, ..., tn} in the temporal case is the universe of u s e r levels {Ul, u2, ..., Un} in multilevel security. The relation of Figure 1 (a) in the parametric model for multilevel security will be as shown in Figure 5. Name
Salary
Dept
{Ul,U2} John
{ul} 50K {u2} 80K
{u1} Toys {u2} Toys
{u2,u3} Tom
{u2,u3} 60K
{u2,u3} Shoes
Figure 5. A relation corresponding to Figure
(b) for multilevel security
Note that in the parametric model for temporal data we did not impose any order properties on the instants. Clearly, the parametric model and its query language do not depend upon the order properties. In other words, if an order is imposed on the instants, it does not change the underlying model. The parametric model is generic, that is, it mainly depends upon the set theoretic primitive c on parametric elements (temporal elements and user elements).
4.1 The User Hierarchy in Multilevel Security The primitive c_ on parametric elements leads to a user hierarchy introduced in the previous section. The user hierarchy gives different users access to different portions of the database. In the parametric model a users u 1 is below u 2 in the user hierarchy if and only if Dom(ul) G Dom(u2), where Dom(ul) and Dom(u2) are the domains assigned by the system to the users ul and u 2. In multilevel security one encounters a special (less general) case of the user hierarchy. The difference is that in multilevel security, the domains are more rigidly determined by the system. A partial order < among the user levels is postulated and Dom(u) is defined as {u': u' < u}. The following property holds in the user hierarchy:
Proposition 1. I f u 1 and u 2 are user levels, then u I _
Applicability of Temporal Data Models to Query MLS Databases
247
user domains. In such a case, the user u is enrolled at an existing user level and no new user level is created. The alternative would be to choose Dom(u) as a union of some of the existing user domains. This is simply a way of saying that the new user is enrolled at a level that is immediately above the users whose domains have been unioned. One more condition should be added to complete the requirements for the case of multilevel security: Dom(u) must contain the level assigned to u, allowing a user to access his/her own data. In summary, whereas in a temporal database one has the freedom to enroll any number of users assigning them arbitrary domains, the corresponding assignment of user domains in multilevel security is more constrained. The fundamental requirement is that a user in multilevel security should be able to see his/her own updates made to the database. In this sense, multilevel security is a special case of the temporal case, and not the other way around. An interesting feature of the parametric approach is that Dom(u) can be integrated in the algebra as a primitive for domain expressions to the set of existing primitives A, IAOB, I A 0 b , and lien. More complicated domain expressions can be formed using u , c~, a n d - . This integrates the concept of a user tightly and seamlessly into the model making the concept of user an object of complex queries. From now onward the terms user and user level will be interchangeably, and no confusion should arise. A few additional primitives useful for querying the parametric model for multilevel security will be added. 9
me.
When a user u poses a query, the system interprets "me" as u.
9 Below (u'). When a user u poses a query, Below (u) is interpreted as Dom(u') - {u'}. 9 Above (u'). When a user u poses a query, and u' is visible to u, then Above (u') is interpreted as Dom(u') - {u'}. In order to present a simple but intuitive example, assume that the relational algebra contains a relational expression of the form "r", where r is a relation in the stored database.
Example 5. To adapt the running example to multilevel security, we assume the set of users {u 1, u 2, u3} such that u I < u 2 and u 2 < Uy Suppose the user u 2 wants to see the current state of the emp relation. To do this he/she executes the query "emp". The query retrieves the result shown in Figure 6. 9 Name
Salary
Dept
{u2} John
{u2} 80K
{u2} Toys
{u2} Tom
{U2} 60K
{u2} Shoes
Figure 6. The result of the query "emp" posed by user u 2
248
Shashi K. Gadia
5 The WSQ Model for Multilevel Security 1 In the WSQ model, a universe {ul, u2, -.., tin} o f users together with < is postulated. For a given database scheme, each user in the hierarchy has his/her own level's instance o f the database. In addition, each user also owns two relations: " s e l f ' and "anyone", each a single column relation over an attribute called "Label". The self relation for user u consists o f the single tuple
Name Tom
emp Salary 60K
Dept Shoes
anyone
self Label u3
Label u3 u2 Ul
(a) The database fragment at level u 3
Name John Tom
emp Salary 80K 60K
Dept Toys Shoes
self Label u2
anyone Label u2 Ul
(b) The database fragment at level u2 emp
NameI Sal I Dept I Tom
60K
Shoes
self uI
I
anyone u1
(c) The database fragment at level u l Figure 7. A database in the WSQ model corresponding to Figure 5.
5.1 Classical Relational Operators In the parametric model, when a user u poses a query, the system filters the emp relation to Dom(u), the domain visible to u. Thus by default the system is set to query information belonging to u as well as information belonging to users lower than u. On I
For ease of reading, we make the following notational changes in syntax: the projection eX will be denoted as I-lx(e), the selection e~b will be denoted as 6(e, ~b),and the level shift operator B e2e 1 (explained later in this section) will be denoted as elTe 2.
Applicability of Temporal Data Models to Query MLS Databases Name John Tom John
Salary 80K 60K 50K
249
Dept Toys Shoes Toys
(a) The relation computed by B anyoneemp Name
Salary
Dept
User
John John Tom Tom
Toys Toys Shoes Shoes
50K 80K 60K 60K
u1 u2 u2 u3
(b) The relation com rated by Banyoneemp Figure 8. Examples of level shift operator in the WSQ model the other hand in the WSQ model when a user u poses a query to the system, the system executes the query only on the instance of the database available to user u. Thus a user can perform classical operators on the relations owned by him/her. 5.2 The Level Shift O p e r a t o r In addition to the classical operators, the WSQ model introduces an operator, called the level shift operator. To ease the formalism associated with the level shift operator, let's first introduce a notation: i f e is a relational expression, and u is a user level, then el" u denotes the relation computed by the expression e with the data available only at level u. The level shift operator is of the form e'" U, where e is any relational expression and U is a single column relational expression containing tuples that are user levels. 1 e'l" U = U(u ) ~ u (eT u) First e is evaluated at every level in U, and then the relations thus obtained are unioned together.
Example 7.
Let's consider a few examples to illustrate the use of the self and anyone relations and the level shift operator B. In all these examples let's assume that the queries are being posed by the user u 3. 9 The query "emp" returns the state of the emp relation at user level u 3. 9 The query "emp'l" {(u2)}" returns the state of the emp relation at user level u 2. 9 The query "emp'" anyone" is more interesting. To understand it, note that anyone at level u 3 is the relation {(u3), (u2), (Ul) }. Therefore, the query computes the (ordinary) union (emp'l' u3) u (emp'l" u2) u (emp'l" Ul). For the given state of the database, this computation leads to the relation shown in Figure 8 (a). 1 To be exact, U is a relational expression which should evaluate to a single column relation containing tuples that are user levels.
250
Shashi K. Gadia
9 The above query disregards the source levels in the final result. If this information is desired, instead of the query "emp'l" anyone", one can pose the query "(empxself)'l" anyone", which gives rise to the relation shown in Figure 8 (b). Note that the information contained in this relation is the same as that in the relation of Figure 1 (b), which is the counterpart of a temporal relation with tuple label stamping. 9 As seen above, there are two types of operators in the WSQ model: the classical operators and the level shift operator. The classical operators obviously satisfy the classical identities. For the level shift operator, WSQ94 lists several identities; the following is a sampling: 1 9 I-Ix(e) 1" IX= rlx(e 1" It) 9 o(e,~) $ g= o(e Sg,~) 9 (e I u e2) 1" It = (e I 1" It) u (e 2 1" It) 9 (e I - e 2 ) 1" Ix ~ (e 1 1" It) - (e 2 1" It)
9 e'l'(itluit2)= e'l'itlue'l'it 2 6 Discussion
This section includes some general remarks about the WSQ model. Recall the expression U participating in the level shift operator e'l" U. Corresponding to the expression U in the WSQ model, the parametric model has domain expressions. These domain expressions are composed from the primitives Dom(u); the visibility domain of a user, IA, IIAOB~, IlA0b, lie; and operators w, n , and - . Conceptually, the domain expressions are conceptually very simple: they evaluate to user domains. Note that in particular that e, in the primitive lie, is an arbitrary relational expression. Therefore the domain expressions are recursively composed from other relational, domain, and boolean expressions. The syntax associated with the domain expressions is also very simple when compared to expressions such as U in the WSQ model. (Several examples will be given in the next section.) In the WSQ model there is a lack of uniformity between the stored and computed relations. For a given database scheme, there is an instance of that database scheme at each user level in the stored database. On the other hand, the computed relations are not placed anywhere in the user hierarchy. Even if the relation computed by e'l" U was placed at the level of the user posing the query, a corresponding instance would not exist at other levels. This tends to make the level shift operator a terminal operator: that is, once it is applied, it cannot be applied again. It is difficult to use a computed relation and a stored relation as subqueries in a larger query. In contrast, in the parametric model the relational scheme is the same as the one in classical databases, and for each relation scheme in the database scheme there is only one relation in the database. No expression in the parametric model is terminal in the sense that it can be used as a subquery of a larger query. 1 A full discussion of the cross product for the parametric model would require a considerable machinery and it is omitted from this paper. Therefore, we have not listed an identity involving the cross product in WSQ94.
Applicability of Temporal Data Models to Query MLS Databases
251
The results of a level shift operator can be unexpected. As evidence, we observe that (el-e2)'l'ix can be a proper superset o f ( e I $B) - (e2q"IX). This is shown in the following counter example. Example 8. Consider the database scheme {r(A), s(A)}. Suppose that the instances of r(A) and s(B) at different levels are as follows: u3: {a,b} and O, respectively, l u2: {a,b} and {a}, respectively. u F {a,b} and {b}, respectively. Given the above, we have ( r - s)$anyonr = ({a,b} - O) u ({a,b} - {a}) u ({a,b} - {b}) = {a,b} rl"anyone = {a,b} u {a,b} u {a,b} = {a,b} sl"anyone = O u {a} • {b} = {a,b} r T a n y o n e - sl"anyone = O Therefore, (r-s)l"anyone is a proper superset of rl'anyone - s 1' anyone. 9 In contrast, in the parametric model the relational difference operator will always behave as expected. The reason for this is that in the parametric model the user level can never be separated from a value; thus; the distinction between a value such as 55K at two different user levels is not ignored by the system. 6.1 The Restriction O p e r a t o r in the Parametric Model Now it is time to introduce an operator that comes closest to the level shift operator of WSQ. Recall the 1-3-selection, ff (e,, {~), for the parametric model for multilevel security. Here {~is a domain expressions, and as explained above ~ is very versatile: it consists of subqueries that are relational, domain, and boolean expressions. We will use the abbreviation e,l,q} for o ( e , , ~).2 The operator e,l,~ is called the restriction operator. The following identities could be proved for the restriction operator: 9 Hx(e) $ Ix = r l x ( e $ Ix) 9 o(e,~)$Ix=~(e$Ix,~) 9 (e I U e 2 ) $ I x = ( e
I $Ix) k.)(e 2 5 I x )
9 (e I - e 2) $ Ix = (e I $ Ix)- (e 2 $ Ix) 9 e $ (B1 kJ Ix2) = e $ Bl u e $ Ix2 It is appropriate to think of e $ ~ in the parametric model as the counterpart of the level shift operator e $ U. The reader should be cautioned that the two operators are duals of each other because of the way defaults work in the two models. In the para1 Strictly speaking {a,b} should be written as {(a),(b)}. 2 In Ga88 e$Cp was written as e~. Note the use of the down arrow in e.l, {~, as opposed to the up arrow in the level shift operator e $ U. The arrow direction seem appropriate: the restriction operator e$ ~ removes information from e, whereas the level shift operator e t U adds information to e.
252
Shashi K. Gadia
metric model, when a user poses a query, the query is executed for the whole database, and if the user wants to restrict the computation to a level u, every operand relation should be restricted to u by the user. On the other hand, in the WSQ model when a user u poses a query, it is evaluated for the data at level u. I f the user wants to involve the data at additional levels, it should use "1" U" explicitly in the query. This difference by itself is not a shortcoming of either o f the two models. The identities for the e,l, ~ operator in the parametric model stated above are a direct counterpart of those in the WSQ model. In particular, observe that in the parametric model the identity (e 1 - e 2 ) $ t x = (e 1 ,, Ix) - (e 2 ,, Ix) holds. Thus the difference operator in the parametric model is well behaved. We note that the e,l,O operator in the parametric model works cleanly in every conceivable context. Consider the following remarks about the level shift operator e 1" U in the WSQ model. 9 It is stated in WSQ94 that because o f the unary nature of relation U in the level shift operator e 1" U, U cannot be involved in a projection, a selection or a cartesian product. In the parametric model these possibilities do not arise because the domain expressions are not relations, they are simply time domains. In addition, the syntax they lead to is simple, powerful, and uniform. 9 It is stated in WSQ94 that U cannot involve the difference operator. In the parametric model no such problem arises: e ,1, (Ixl - Ix2) is allowed, and the natural identity e ,1, (Ixl - Ix2) = e ,l, Ixl - e $ Ix2 holds. 9 It is also stated in WSQ94 that a cascade of level shift operators does not give rise to interesting identities in general. This is not a problem in the parametric model, where the natural identity (e ,1, Ixl) ,I, Ix2 = e ,1, (lain IX2) holds.
7 Querying the Multilevel Security Data In the following we exhaustively cover all queries from WSQ94 1. Let's compare how these queries are expressed in the WSQ and the parametric models. We find that the parametric model, where the queries are usually simpler than those in the WSQ model has a distinct advantage. Recall that the constant me stands for the user who submits a query. We will now use the variable Users to denote the space o f all user levels.
Example9.
List my belief about John's department. For this query, the expressions in the algebras of the WSQ and parametric models are given below. The expressions illustrate the difference in the defaults used in the two models. In the WSQ model the query is executed only on the data at the level where the query is submitted. On the other hand in the parametric model it would be executed on the whole database, necessitating explicit restriction to me. WSQ model: l'IDept t~(emp, Name = John) Parametric model: IIDept ff (emp, Name = John, me) 1 Note that all the queries in this section have appeared in WSQ94. the emp relation, our running example.
They have been adapted to
Applicability of Temporal Data Models to Query MLS Databases
Example 10. The query list all users
253
who believe in the existence o f John is expressed
in the two algebras as follows: WSQ model: I-ILabel((~(emp, Name = John) x self) $ anyone) Parametric model: ~a(emp, Name = John) Example 11. The query list the beliefs about John's department is expressed in an SQL-like languages in WSQ and the parametric models as follows. WSQ model: select Dept, Label from emp, self believed by anyone where Name = John Parametric model: select Dept from emp where Name = John Example 12. List the names o f all employees anyone believes to exist. WSQ model: I-IName(emp) $ anyone Parametric model: IINarne (emp) Example 13. Consider the query list all names everyone believes to exist. This query is expressed in the algebra of the WSQ model as follows. WSQ model:l-lName (emp) FIName (HName (emp) x anyone - ((IIName (emp) x self) 1" anyone) The above expression is complex because it has to handle the quantifier "for all" (V) at the relation level. However, note that the English query does not involve quantification at the relational level, but only at the object level. Though relational level quantifications would be complex in the algebra for parametric model, the object level quantification would not. In the algebra of the parametric model it is expressed as follows. parametric model:
I-INamet~(emp,
= Users, )
Example 14. The query list employee names believed to exist at my level but at no level below me is expressed in the two models as follows. WSQ model: select Name from emp where Name not in ( select Name from emp believed by (select Label from anyone where Label not in ( select from self))) Parametric model: select Name from emp e l where Ie = me
254
Shashi K. Gadia
Example 15. Now we consider the query list all
names I do not believe exist but s o m e lower users do. In the two models this query is expressed as follows.
WSQ model:
(select Name from emp believed by anyone) minus (select Name from emp believed by self)
Parametric model:
select Name from emp e where me ~ ~e
In the WSQ model the following SQL-like expression is mentioned, and it is stated that the expression will not work for the given query. select Name from emp believed by anyone where Name not in ( select Name from emp believed by self) As seen above, the SQL query for the parametric model is simpler and the user may not feel a need for such a complex expression form because in the parametric model the information about a single object resides in a single tuple. Even if such information resided in different tuples, query in SQL for the parametric model would be as follows, and the problem stated in WSQ94 would not arise. select e.Name from emp e where e.Name ,1, e.owner not in ( select e'.Name restricted to e.owner from emp e' where me ~ lie'I)
8 Conclusions This paper has shown a fundamental relationship that exists between the parametric model and multilevel security databases. It has also shown how the parametric model for temporal data readily adapts to multilevel security. In this venture, the only changes in the temporal model are as follows: 9 Change of the term instant (of time) to the term user (or user level) 9 Change of the term temporal element to user element 9 Derivation of user hierarchy in multilevel security as a special case of the user hierarchy in a generic parametric model. An exhaustive comparison between the WSQ and the parametric models under upolyinstantiation was given. It has been found that the parametric model leads to a 1 Note "emp e" creates alias e of the emp relation; this is not a cross product ofemp and e.
Applicability of Temporal Data Models to Query MLS Databases
255
simpler query language than that in the WSQ model. In GY88,GN93 it has been shown that whereas the query languages in the parametric model handles the natural language constructs "or", "and", and "not" symmetrically, the languages that use tuple timestamps do not achieve this symmetry. The same arguments in GY88,GN93 would reveal a lack of symmetry in the WSQ model as well as in other models in multilevel security literature. Another advantage of the parametric model is that it leads to a seamless integration of ordinary, temporal, spatial, and belief data. This integration in the parametric model would be much tighter than the integration of temporal and multilevel security data in PM94. The identities in the parametric model and algebraic optimization has been discussed in NG92. That approach to algebraic optimization also applies to the parametric model for multilevel security. Lastly it must be remembered that the data in the real world has more complex structure than Inf. If our language seems simpler, it is because it has imitated the "real" structure rather then temper with it to fit the lnfmold. In fact, the structure of multilevel security data is far more complex than what is covered in this paper in terms ofu-polyinstantiation. Key-polyinstantiation represents the true form ofpolyinstantiation, and this form of polyinstantiation has been covered extensively in our works during the last decade. In particular, key-polyinstantiation in multilevel security has been covered in CG97a,CG97b. ACKNOWLEGMENTS. The author wishes to thank Suraj Kothari, Giora Slutzki, Akhilesh Tyagi, Marianne Winslett, anonymous referees, and Opher Etzion for their helpful comments in improving this paper.
REFERENCES
BG89
BG90
BG93
BL75 CS95 DLS87
Gadia, shashi K. and Gautam Bhargava. A 2-dimensional temporal relational database model for querying errors and updates, and for achieving zero information-loss. Technical Report 89-24. Department of Computer Science, Iowa State University, Ames, Iowa, December 1989. Gantam Bhargava and Shashi K. Gadia. The concept o f an error in a database: an application o f temporal databases. Appeared in Proceedings of INSDOC COMAD'90 International Conference on Management of Data, December 1990. Also available as Tech. Report TR97-15, Computer Science Department, Iowa State University, Ames, 1997. Bhargava, Gautam and Shashi K. Gadia. Relational database systems with zero information loss. IEEE Transactions on Knowledge and Data Engineering, Vol 5, pp76-87, 1993. Bell, D.E. and L. J. LaPadula. Secure computer systems: unified exposition and multics interpretation. Tech Report MTR-2997, MITRE, 1975 Chen, Fang and Ravi S. Sandhu. The semantics and expressive power o f MLR data model. Proceedings oflEEE Symposium on Security and Privacy, 1988. D. Denning, T. Lunt, R. Schell, et al. A multilevel relational data model. Proceedings of IEEE Symposium on Security and Privacy, pp 220-234, 1987.
256
Shashi K. Gadia
DLS88
D. Denning, T. Lunt, R. Scheli, W. R. Shockley and M. Heckman. The Seaview security model Proceedings oflEEE Symposium on Security and Privacy, pp 1988. Gadia,Shashi K. Weak temporal relations. ACM Transactions on Database Systems, 13(4), pp 418-448, December 1988. Gadia,Shashi K. A homogenous relational model and query languages for temporal databases. ACM Transactions on Database Systems, 13(4):418-448, December 1988. Gadia, Shashi K. A bibliography and index of our works on belief data: concept of error and multilevel security. Tech. Report TR97-13, Computer Science Department, Iowa State University, Ames, 1997. Gadia, Shashi K. and Gautam Bhargava. A formal treatment of errors and updates in a relational database. 1988-89. An unpublished manuscript, available as Tech. Report TR97-14, Computer Science Department, Iowa State University, Ames, 1997. Tsz Shing Cheng and Shashi K. Gadia. An algebra for beliefpersistence in multi-level security. Version 1, 1995. An unpublished manuscript, available as Tech. Report TR97-16, Computer Science Department, Iowa State University, Ames, 1997. Tsz Shing Cheng and Shashi K. Gadia. An algebra for beliefpersistence in multi-level security. A revised version of 3 incorporating some findings from 4. Version 2, 1996. An unpublished manuscript, available as Tech. Report TR97-18, Computer Science Department, Iowa State University, Ames, 1997. Gadia, Shashi K. and Sunil Nair. Temporal databases: A prelude to parametric data. In Ta+93, pp 28-66. Gong, L. and X. Qian. Enriching the expressive power of security labels. IEEE Transactions on Knowledge and Data Engineering, pp 839-841, Vol 7, 1995. Gadia, Shashi K. and Chuen-Sing Yeung. A generalized modelfor relational temporal databases. ACM SIGMOD International Conference on Management of Data, 1988, pp 251-259. Gadia, Shashi K. and Chuen-Sing Yeung. Inadequacy of Interval Timestamps in Temporal Databases. Information Sciences, Vol 54, pp 1-22, 1991. Haigh. J., R. O'Brien, and D. Thomsen. The LDV secure relational DBMS model. Database Security IV, pp 265-279, 1991. Jajodia, S. and R. Sandhu. Polyinstantiation integrity in multilevel relations. Proceedings of IEEE Symposium on Research in Security and Privacy, pp 104-115, 1990. Jajodia, S. and R. Sandhu. Toward a multilevel secure relational data model Proceedings of ACM-SIGMOD, pp 50-59, 1991. Lunt, T. F., D. E. Denning, R. R. Schell, M. Heckman, and W.R. Shockley. The Seaview security model IEEE Transactions on Software Engineering, Vol 16, pp 593607, 1990. Nair, Snnii and Shashi K. Gadia. Algebraic optimization in a relational model for temporal databases, Proc. First International Conference on Information and Knowledge Management, pp 169-176, 1992. Pissinou, Niki, Kia Makki, and E. K. Park. Towards a framework for integrating secure models and temporal databases. Proc. of Third International Conference on Information and Knowledge Management, 1994, pp 280-287. SmithK. and M. Winslett. Entity modeling in the MLS relational model Proceedings of Eighteenth VLDB, pp 199-210, 1992. Tansel, Abdullah, et al, Eds. Temporal Databases: Theory, Design, and Implementation. Benjamin/Cummings, Redwood City, California, 1993, pp 28-66. Winslett, Marianne, Kenneth Smith, and Xialei Qian. Formal query languages for secure relational databases. ACM Transactions on Database Systems, Vol 19, 1994, pp 626-662.
Ga86 Ga88 Ga97
GB89
GC95
GC96
GN93 GQ95 GY88
GY91 HOT91 JS90 JS91 LDS90
NG92
PMP94
SW92 Ta+93 WSQ94
An Architecture
and Construction Event Manager *
of a Business
A j i t K. P a t a n k a r 1.* a n d A r i e Segev 2 1 Department of Industrial Engineering and Operations Research University of California at Berkeley Berkeley, CA 94720 and Information and Computing Sciences Division Lawrence Berkeley National Laboratory aj it~math, ibl. gov 2 Walter A. Haas School of Business University of California at Berkeley and Information and Computing Sciences Division Lawrence Berkeley National Laboratory segevCmath. Ibl. gov
A b s t r a c t . A business event is an happening of interest in the business world. A business event represents a level of abstraction that is at much higher level t h a n the event paradigm used in systems such as graphical user interfaces or database triggers. This paper argues that the business event management requires services such as persistence, auditing, and sharing which are not provided in a satisfactory manner in current systems. This paper presents an architecture and construction of a novel business event manager which supports the above services. The event manager supports external, database and temporal events in a uniform fashion. A SQL-like language is proposed as an interface to the event manager. This language provides consistent interface for defining event m e t a d a t a and event detection. As event histories can naturally be modeled as time series, temporal database services a r e utilized for storing event m e t a d a t a and histories. Furthermore, we show t h a t the proposed framework is capable of supporting a rich variety of temporal events such as the context dependent and relative temporal events.
* This work was supported by the Applied Mathematical Sciences Research Program of the Office of Energy Research, U.S. Department of Energy under Contract DEAC03-76SF00098. ** Current affiliation: Center for Strategic Technology, Andersen Consulting, Palo Alto, CA.
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 257-280, 1998. ~ Springer-Verlag Berlin Heidelberg 1998
258 1
Ajit K. Patankar and Arie Segev INTRODUCTION
An event is defined as a happening of interest. It is widely recognized t h a t events provide a powerful mechanism for modeling and implementing complex systems. Nevertheless, the use of event paradigm has been limited at operating system or user interface level. For example, most of the graphical user interfaces on Windows N T / 9 5 operating systems use an event based state transition model. This research is aimed at exploiting the power and flexibility of event paradigm at the business system level. Such applications are described in MD89, CS93,SMP95. These event driven business systems require the following services: persistence, auditing, and sharing. These services are either completely lacking or can only be obtained in an indirect fashion in existing active databases. For example, in commercial databases II195 users cannot explicitly define a business event and can achieve persistence only by including additional d a t a manipulation statements in the action part of a rule. This research makes the following contributions: A new event framework is proposed t h a t identifies database, external, temporal as an exhaustive classification. Schedulability is identified as an imp o r t a n t event property t h a t significantly simplifies event detection process. T h e framework argues t h a t event histories should be modeled as t e m p o r a l objects. - T h e event framework has been implemented using an event manager which uses object-relational database. The event manager includes an algorithm based on event queues for the detection of schedulable events. - A SQL-like language is proposed as an interface to the event manager. This language provides consistent interface for defining event m e t a d a t a and event detection. This uniform language also supports composite, contextdependent, and relative events. The framework supports a wide range of temporal events and constraints such as rules in which condition m a y include future but scheduled events. -
-
The rich functionality of the event manager allows the registration and detection of events such as: An external event - Machine break-down or increase in interest rate. - A database event - Select on a table or update to an object. - A t e m p o r a l event - Any SQL time s t a m p or every 2 hours after a particular time stamp. - A context-dependent or relative event - Last working day of every m o n t h or a t e m p e r a t u r e measurement event constrained to occur between the start and end of a furnace operation. Composite events - A sequence of 3 consecutive increases in interest rates or a t t e m p t s to access a confidential database table on a weekend. -
-
T h e objective of this research is to facilitate the development of powerful event driven application systems. This research separates the following research
An Architecture and Construction of a Business Event Manager
259
issues: event definition in active database, precise semantics of rule systems (e.g., conflict or sequence resolution), and detection and storage of events. This separation is essential because active databases at present do not meet the requirements of application systems such as Computer Integrated Manufacturing or Financial Information Systems SMP95,CS93. The research proposes an event language which can be used register, detect or cancel any class of event. The language and the implementation mechanism support a wide range of features required for capturing the complete semantics of event-driven modeling. One such feature is the dynamic specification of the event quantifier operator. The conventional method for specifying an active database rule include conditions such as Once or Every, i.e., the rule is to be invoked either once or on every future occurrence of the event. The user is responsible for cancelling the rule or the event subscription. In the proposed language, a function is executed to determine if the future occurrences of the event are relevant. The Once or Every specifications are trivial sub-cases of the general event specification framework. In the proposed framework, the database is entirely responsible for detecting the database events and informing the event manager. External events can be detected in two ways - asynchronous and polling. External applications are responsible for notifying the event manager the occurrence of an asynchronous external events. The event manager is responsible for detecting all temporal events and the pollable external events. These events are referred to as the schedulable events because their occurrence can be scheduled in advance. We introduce the notion of a renewal process for schedulable events. An Event Queue is proposed as a method for implementing (or detecting) these schedulable events. Whenever a schedulable event is registered or is detected, if necessary an appropriate Future Event is inserted on the event queue. Thus, each schedulable event may renew itself by inserting a future event of its own class. This notion ensures that all future events are detected, and not just the first occurrence of an event. This scheme utilizes minimum storage and provides dynamic repeatability. The event manager architecture provides support for a wide range of temporal events and constraints. Context dependent and relative temporal events can be supported which has not been possible in existing active databases. Context dependence of temporal events refers to the calendar points over which an event may occur. Relative ordering refers to constraints on two events such as BEFORE, A F T E R , etc. This mechanism also makes it possible to specify powerful rules such as, "Two hours before the start of each working day, turn the furnace on." The storage of event metadata and histories was facilitated by the use of Illustra/Informix Universal Server which is a commercially available object-relational database. In particular, the implementation used Illustra time series datablade which provided limited but important temporal support. Rather than starting from scratch, event manager takes an evolutionary viewpoint and extends a conventional active DBMS (i.e., one in which rules are strictly defined over data
260
Ajit K. Patankar and Arie Segev
tables) into a true event-driven system. Thus, the event manager takes advantage of the DBMS services to the extent possible and adds active elements to them only if necessary. The rest of the paper is organized as follows: An event management framework is presented in Section 2. The event manager architecture is described in Section 3. The event specification language is presented in Section 4. Section 5 describes the implementation including the event metadata and history storage and the event queue modules. The Section 6 describes the support for temporal reasoning. The related work is discussed in Section 7. The paper concludes in Section 8. 2
EVENT MANAGEMENT FRAMEWORK
This section rigorously defines an event management framework. The framework is based on notions of application specific calendars, time series vectors, and a set of event operators. 2.1
Event Model
The event model uses the notion of an application specific calendar and hence it is described first. Calendar A calendar is defined as a set of points on the time line. The granularity is fundamental property of a calendar, and is defined as the distance between two consecutive points in the calendar SS87. T h e granularity need not be constant but must be a well defined function. A calendar has also associated with it a begin point and an end point, which default to the system start time and infinity respectively. It is assumed that a function ConvGregorian is implemented for each Calendar which converts a Calendar point to a Gregorian time stamp. Events An event is a happening of interest and occurs instantaneously GJS92. We model an event s as a tuple having the structure C = < s, C, (t, a) > where s, C and (t, a) respectively represent the surrogate or event identifier, the Calendar associated with the event, and the instantiation of the event history vector. The event history vector is a time series that is updated by every occurrences of an event. The model specifies atomicity (all or nothing) property for event occurrences. It is also reasonable to assume that no two events occur at the same time. Hence, we do not allow simultaneous events as proposed in MZ95. An alternate definition is given in Ter94 who considers an event as an interval of time. However, this definition is not proper for the following reasons (1)an interval oriented event can be always defined as a composite event using the beginAnterval and end_interval basic events. (2)the storage of interval based
An Architecture and Construction of a Business Event Manager
261
event histories may be infeasible as large (potentially infinite) number of the begin_interval points may have to be recorded. (3)unless efficient algorithms are developed, at every end_interval point, all the unmatched beginAntervals would have to be inspected.
Basic or Primitive Events. Basic or primitive events are a set of pre-defined events recognized by the event manager. These events cannot be infered from any other event in the system. Previous literature CKAK94 had simply assumed an availability of a mechanism for the detection of basic events. However, in this paper, we propose an algorithm for the definition and detection of all events.
Composite Events. A composite event is formed by a logical or temporal operation on a set of events. This set may include all previously defined composite events but not the event that is currently being defined. The framework does not support a recursive definition of composite events as the precise syntax of such an operation cannot be evaluated. Moreover, we have not encountered a business modeling scenario that requires recursive composite events.
Event Attributes. Each primitive event may have an optional number of attributes. Conceptually, there are no restrictions on the data type of an attribute. It can be a built-in data type or even a complex object. The attributes of a composite event are the union of the attributes of each component event. Note that, the selection of event parameters leads to different definitions of an event. Consider an event Machine_Breakdown with parameter machine id, and two occurrences of the event - (07 : 45, Furnace1), (08 : 10, Furnace2). An alternative system design is to define two separate events, one for each Furnace, i.e., Machine_Breakdown_Furnace1 and Machine_Breakdown_Furnace2. Rule System Rules in this framework are assumed to be of the form: On event do action. It is the responsibility of the event manager to notify the rule system about occurrences of interesting events in the system. Further issues in rule systems, such as an execution model or priority scheme, are orthogonal to the current research problem.
2.2
Event vector and Algebra
An event vector (EV) is a sequence of events. Formally, an EV is represented as a a E V < (A, T) > where A are the values of event attributes and T is a temporal vector. An event may have multiple attributes and it is assumed that each attribute is recorded (but may not have changed from the previous instantiation) at each event point. Thus, null as an event attribute is not allowed.
262
Ajit K. Patankar and Arie Segev
Event
Vector
Algebra
We introduce an event vector algebra t h a t facilitates the derivation of composite events. The Notation An array notation is used to identify the ith element of an event vector. For example, E V < (A, T) > i will provide the ita snapshot of the event vector. This snap shot will be a n-ary vector .4 of the event attributes. T h e first index of the event vector is 0. Event Attributes An E V m a y have n parameters and each one is accessed by a subscript notation - Ao,A1,A2,..An. It is possible t h a t a particular E V m a y have no attributes, i.e., the modeler m a y be interested in only storing the time points at which the event occurs. This E V is denoted by < (T) >. Aggregate Functions T h e event manager supports built-in aggregate functions as these are essentim for deriving new events from basic event vectors. Formally, an aggregate function (F) defined on the k th attribute of the E V is as follows: Let there be i -- 0, 1, 2, .. I events in an EV,
F(V i s.t. i <_ I, < A k , T > i) Multiple aggregate functions can be defined over the same attribute. It is possible to generalize the above definition in at least two directions: a function can be defined over multiple attributes of an E V (e.g., < Akl,Ak2, ...,T > in the above definition), and not all time series elements need to be the function p a r a m e t e r s (e.g., i can be restricted to I, I - 1, I - 2 only). We have not experienced a need for the first type of extension. But the second extension simply leads to the truncation of event history so as to save memory. 2.3
Event
Classification
Basic Events are classified in the following categories: - D a t a b a s e Events. A database event occurs when there is a database operation such as select, insert, or delete. It is also possible to generalize the definition to include operations on metadata. In this research, we are not particularly concerned a b o u t the question of what constitutes a database event. It will be assumed t h a t the DBMS is responsible for reporting the occurrence of a database event to the event manager. - External Events. External events occur outside the realm of the DBMS. These m a y include events such as machine break-down in a manufacturing application or a stock price j u m p in a financial application. Notification of an external event can occur through only two means - polling or asynchronous notification. In the polling mechanism, the event manager polls (or
An Architecture and Construction of a Business Event Manager
263
invokes) an external application to determine if an event has occurred. In the asynchronous notification scheme, the external application is responsible for notifying the event manager. - Temporal Events. A temporal event occurs at a specified point in time. For example, 8/29/30 4 : 30pm or every 2 hours are time events. The event manager is responsible self-notifying time events. This mechanism is described in a later section. - Calendar Events. Although, calendar events are a subset of temporal events, these are classified separately because of their usefulness in implementing application systems. A calendar event occurs at a specific point on the calendar. For example, Last Working Day in June or Every Wednesday are calendar events. The event manager also implements the notification scheme for Calendar events.
Schedulability P r o p e r t y All temporal and external pollable events are schedulable in the sense that their time of occurrence can be predicted a priori. This property is called schedulability and is used extensively in the implementation of the event manager. But note that, it is not necessary that an external event will be detected in each polling cycle.
3
EVENT MANAGER ARCHITECTURE
The event manager is an executable program that accepts event registration and detection commands from application programs. This program includes a parser for translating event specification language, the event queue for detecting schedulable events, and a composite event detection algorithm. Whenever an event is detected, either through the event queue implementation or by a detect command, the following actions are undertaken: (1)an entry in the event history table is inserted, (2)the rule manager is informed, and (3)the composite event detection algorithm is informed, if required, 4)a new schedulable event is inserted in the event queue, if necessary. In the rest of the section, these functions are described in details. The event manager architecture is shown in Figure 1. Its components are described next.
3.1
Event manager components
Event Registration and Detection Language The event specification language is described in section 4. This language is the interface through which users register, cancel and detect events.
264
Ajit K. Patanlmr and Arie Segev Composit~"-'---.,~
Even~/
IExternalIA< Poll vents synch Notification
~r
Clock
EVENT MANAGER
Evt/fnitcation
G Calendar iiiii IIIIIIII IIIIIIII IIIIIIII
iii
Temporal Events
( RAl:agerI Fig. 1. The Event Manager Architecture.
Event Metadata
and History Storage
T h e event m e t a d a t a is generated from the commands used in the event language. T h e m e t a d a t a is modeled by the database schema while the history is modeled as a time series. The event history and m e t a d a t a is stored in an objectrelational database that supports time series as a native d a t a type. The event history refers to the sequence of events t h a t have occured. It is i m p o r t a n t to store the event history because it is required for detecting composite events. Schedulable Event Implementation
Module
T h e event manager is responsible for detecting all schedulable events. A future event queue is maintained for detecting these events. The queue is itself stored as a time series which can be queried by users. We introduce a notion of Event Renewal for all schedulable events. Whenever a schedulable event is defined, the first occurrence of this event is inserted in the event queue. At the time of its detection, this event renews itself in the sense t h a t the next occurrence is again inserted in the event queue. This process continues until either an user cancels an event or the continuation function returns false.
An Architecture and Construction of a Business Event Manager
265
External Process Communicators The event manager communicates with external processes to detect external events as well as to implement event queue. It is assumed that this communication is through standard operating system services, and is not discussed further in this paper. Composite Event Detection Algorithm The composite event detection module is distinct from the primitive event detection system. This scheme has two advantages: modularity in the implementation and the possibility of introducing additional composite event operators without having to modify the detection of primitive events. The Sentinel active OODB system CKAK94 also proposes a distinct composite event module.
4
E V E N T SPECIFICATION LANGUAGE
The event specification language is used to register and detect all events. Note that, unlike active databases, database events are also explicitly registered and detected using the language. Since the event manager is constructed using the database services, this may appear as an unnecessary overhead. However, a strict separation between the event originating and managing systems is essential for avoiding many problems in current active database systems such as a lack of robustness and inconsistent operations. The event specification language closely follows the SQL syntax and constructs. In particular, this language is intended as a superset of the rule specification language of the emerging SQL-3 standard. 4.1
Notation and Keywords
The language uses the create and delete statements from the SQL language, and introduces an additional detect statement. The create statement is used to register an event using a set of keywords described later in the section. The delete statement only cancels the detection of an event, however, the event history is not purged. The detect statement is used by either an external application or the database to notify the occurrences of an event. Note that this statement is not required for schedulable events as the event manager itself assumes the responsibility for their detection. A formal grammar is given in appendix A. This is followed by several examples. The most important keyword is E V E N T which is similar to keywords R U L E or F U N C T I O N in SQL-3 language. This keyword, along with the standard SQL commands such as create or delete, is used to register or cancel an event. Other keywords are described below: - C A L E N D A R . Specifies the calendar that is associated with a particular event. As our event framework (section 2) requires that a calendar be associated with each event, this is a mandatory keyword in a create statement.
266 -
-
-
-
-
-
-
-
Ajit K. Patankar and Arie Segev T E M P O R A L . Specifies a temporal event. D A T A B A S E . Specifies a database event. E X T E R N . Specifies an external event. C O M P O S I T E . This is used to specify a composite event formed from the algebraic operations on other events. F U T U R E . This is used to specify a relative event based on the future event queue. The operator is further explained in section 6. D E T E C T . This is used to inform the occurrence of an event to an event manager. This keyword will be followed by the event identifier and the parameter values. R E P E A T . The parameter after this keyword is a boolean function that is executed after each occurrence of the event. The function output determines whether the event manager should watch out for the next occurrence of the event. H I S T O R Y and N U M B E R . These keywords specify the length of time for which the event history should be maintained. The keyword N U M B E R specifies this length as the number of occurrences of an event rather than a time period. P A R A M . This keyword is followed by a list of event attributes and their data types. While the event framework places no restrictions on the data types of attributes, the current implementation allows only the native database data types. P O L L . This keyword is applicable only for external events, and is used to specify an optional polling frequency. It is also followed by the name of a function which is executed to determine if the event has occurred. E V E R Y . This keyword is applicable only for temporal events and is used specify the interval between two events.
C a l e n d a r
S p e c i f i c a t i o n
The specification of a calendar in this frame work is based on SC93. A similar specification was implemented in Illnstra time series datablade II195, hence it was directly used in the implementation. A calendar is composed of a calendar pattern and a set of calendar exceptions. A pattern specifies an interval duration and the pattern of valid (on) and invalid (off) intervals. A duration is a natural time interval, e.g., day, hour, week, etc. A working week pattern will be specified as: "{5 on, 2 off},day" A complete calendar requires the specification of the following parameters: a starting timestamp, an optional ending timestamp, a pattern, a pattern-starting timestamp, and an optional set of exceptions. An exception represents either deletions or additions to the pattern defining a calendar. The exceptions may be used to specify holidays or special working days.
An Architecture and Construction of a Business Event Manager 4.2
267
Examples
Temporal Events Consider a lot release policy that releases a new lot to the shop floor every two hours starting from the beginning of a work week. The maximum number of events stored in an event history are 200. This event can be cancelled only manually by an user. It is assumed that a function Return True is defined which always returns true. Also, there is a WeekHour calendar that models the hours in a work week. This event will be specified using the following query: create EVENT LotRel CALENDAR WeekHour REPEAT ReturnTrue NUMBER 200 EVERY 2 PARAM ( LotId char (10), ProdCode char (12) );
Database Event A database event AccConfrab is used to signal attempts to access a confidential data table. The history of this event is maintained for two years. This event would be registered using the following query: create EVENT AccConfTab DATABASE REPEAT ReturnTrue HISTORY 2 year PARAM ( UserNsmle char(12), TabName char (24) ); Once this event is detected, the database would inform the event manager by assertingthe following statement: detect AccConfTab ( 'Joe', 'Payroll' ) ; It is not necessary to include a time stamp in the above statement as it is implicitly understood to be the current time.
External Event A machine in a factory is directly interfaced to an event manager. A factory management application requires that the event manager monitor machine break-down and repair completion operations. The transition of an operational machine into the failed state is indicated by the MachFailure event. The transition from a failed state into the operational state is indicated by the MachRepairCompl event. It is assumed that a machine has only these two states.
268
Ajit K. Patankar and Arie Segev The definition of MachRepairCompl would be as follows:
create EVENT MachRepairCompl EXTEKN REPEAT ReturnFalse HISTOKY i year POLL 5 min PAKAM
(
MachName char(12), RepairPerson c h a r ( 2 4 ) , );
The event manager uses polling method to detect the MachRepairCompl event. The " R E P E A T ReturnFalse" part of the definition suggests that the event is detected only once. This is possible because a rule statment can include the definition of an event. Consider a rule that is invoked whenever the MachFailure event is detected: on MachFailure do
alert repairman; update_machine_usage_statistics; create EVENT MaclhKepairComplEXTERN -- other statements of the create event
Thus, the detection of event MachRepairCompl is started only when the event MachFailure is detected. This example shows that the event manager has the flexibility to intrinsically model and implement state transitions. The performance is expected to improve because only the events that are appropriate for a state are detected. 5
T H E IMPLEMENTATION
The implementation requires the development of three modules: Event queue, Event Storage manager, and Database interface. These modules are briefly described in the following sections, the details are given in PS95. 5.1
The Event Queue
It is the responsibility of the event manager to detect and notify the rule system if a schedulable event occurs. All these schedulable events are stored in a future event queue. One of the important features of our event manager is that users can access the event queue and use it in the definition of further events. This requirement forces the storage of the event queue as a database object so that it can be queried using standard query facilities.
An Example Consider the following set of schedulable events: 1)An external event (pl) with polling frequency of 5 min, 2)An absolute temporal event (al) at M o n 08 :
An Architecture and Construction of a Business Event Manager
269
30, 3)A calendar event (cl) defined as End of the business day or at Mon 16 : 00 after conversion. Let current time (tnow) be Mon 08 : 00. These three events are scheduled to occur at 5, 30, and 400 time units from tnow. An event queue for this simple system is shown in Figure 2.
Events
tnow
pl
al
cl
5
30
400
Time line (Notto scale)
Fig. 2. The Event Queue
This implementation has three modules - Insert, Delete, and Detect. The first two modules, as the name suggests, are used to insert or delete a schedulable event from the event queue. The Detect algorithm uses a process called DBCRON which is modeled after the Unix process, CRON. The D B C R O N process performs the following functions:
Table 1. Event Queue Implementation Notation
Variable or Function now
~t next_event time_stamp_nth next_wake_up_time
Description Current Time The time interval accuracy used in detecting an event. A boolean function that determines if the current event has to be renewed. The t i m e u n t i l t h e n t h future event (from now on) ot a particular type. The time at which the event manager should be awakened.
At a scheduled time, it wakes up and detects events that were supposed to occur at that time. These events are notified to the rule manager. - It inserts an appropriate number and type of events on the future event queue. - It decides when to wake itself up for the next cycle. -
270
Ajit K. Patankar and Arie Segev
- It executes a special algorithm to maintain system integrity in case of changes in the time zone, such as daylight savings. Using the notation described in table 1, the event queue algorithm is described below. while(i){ / * forever * / now = current_clock_time() ; current_set_of_events = query_event_queue(now) ; while (not_empty(current_set_of_events)) { current_event = first_event (current_set_of_event) ; if (external_pollable(current_event)){ if (poll_extern(current_event.type) { inform rule_manager(current_event) ; store_event (current_event, now) ; insert_future_events (current_event, now) ; } /* external event occurred */ else { insert_future_events (current_event. type, poll_freq(current_event, type) ) ; } /* external event did not occur */ } /* This is an execution of polling mechanism. */ else { inform_rule_manager (current_event) ; store_event (current_event, now) ; insert_future_events(current_event, now) ; } /* event is temporal */ delete_event (current_set_of_events, current_event) ; } /* while event set is not empty */ next_wake_up_time = select_first_event_time(event_queue) ; if (next_wake_up_time = INFINITY) raise_error() ; sleep (next_wake_up_time) ; } / * do forever * /
Important function used in the above algorithm are explained next: -
-
insert_future_events. This function implements the event renewal process described in Section 3. This function requires the current time ( n o w ) a s an argument to ensure that no event is inserted in the now - now + 6t interval. Otherwise, the new event may not be detected in the next cycle. This leads to the discretization of the time line in the intervals of 6t for the sake of implementation. The value of 6t needs can be determined from performance studies, although, theoretically it is possible to have a value of z e r o . query_event_queue. This function selects all the events from the event queue which are scheduled in the range now - 6t and n o w + 6t. external_pollable. This function polls an external application and determines if the external event has occurred.
An Architecture and Construction of a Business Event Manager -
-
-
271
Once an event is detected, it has to be inserted in the event history. The function s t o r e _ e v e n t stores the event in an appropriate system table. select_first_event_time. This function selects the time of first event on the queue. first_event and d e l e t e _ e v e n t . These operators select and delete an event from the current event set. These are required as more than one event may be scheduled for wake up in a given time range. store_event.
An alternate algorithm for implementing temporal rules is given in CSS94. The differences between the two algorithms are as follows: The algorithm in CSS94 assumed that all the occurrences of a temporal events are known a priori as it did not consider quantifier operators on events (i.e., ONCE, EVERY, etc.). This is conceptually wrong because the algorithm requires infinite memory even if one event is specified with EVERY operator. Our algorithm also supports the concepts of pollable events and temporal event algebras. Furthermore, their work had assumed an integrated rule system which was also responsible for detecting events. Modification to the System Clock Small changes in the system clock, such as for synchronization with other external clocks, can be neglected in the algorithm. The above algorithm fails if the system clock is modified substantially, for instance, to accommodate summer daylight saving adjustment. This change causes either an extra hour or loss of an hour on the event queue. The semantics of such a change are unclear and need to be investigated. For example, all external pollable events can be either postponed or preponed by one hour without any loss of information. Absolute temporal events, like 7 : 30 a m 9/15/95, which do not lie in the one hour zone would also remain unaffected. However, more work is needed to determine the effect on temporal events which lie very close to the change time. 5.2
Event Metadata and History Storage
This research assumes that an object-relational database, such as described in Kim95, is available. Briefly, such a database supports the following features: columns of complex data types, inheritance, and a production rule system integrated with a database. E v e n t M e t a d a t a The event metadata storage is consistent with the m e t a d a t a storage approach followed in most relational databases, namely, metadata is stored as system tables. For example, in Illustra, even database rules, functions, and alerters are stored in system tables. The m e t a d a t a schema is shown in Figure 3. This schema is briefly described next, the implementation is described in PS95. The relations are depicted as boxes with the attributes listed inside. The primary keys are in b o l d whereas
272
Ajit K. Patankar and Arie Segev
the foreign keys have been underlined. T h e directed arcs indicate inheritance from a root table while undirected arcs show foreign key migration. T h e event classification, discussed in Section 2, is a natural hierarchy of object classes. Thus, it is directly implemented as an inheritance of tables. T h e root table E V E N T stores the attribute common to all events such as the surrogate, calendar, etc. T h e root table also manages the m e t a d a t a of events t h a t are detected only once or until manually cancelled by an user. Those events whose future detection is determined using a special function are sub-classed into a DYN_REP table. This table uses a virtual column which points to the function which needs to be executed after each occurrence of the event. T h e three basic event categories - External, Database, and Temporal all inherit from the DYN_REP table. A further sub-class of External events is the pollable events which are implemented using the P O L L table. M e t a d a t a of composite and scheduled events is managed in separate tables. T h e column E X P R E S N in table Composite stores the definition of a composite event using pre-defined operators such as SEQ, N O T , etc. The composite event detection algorithm parses these expressions as needed. T h e scheduled events table has two components - algorithm p a r a m e t e r s and the event queue. Algorithm parameters, such as tnow and St, are stored as regular columns of a table. However, the event queue is implemented using the time series d a t a type provided by an object-relational database. T h e internal representation of a large time series is a B-Tree for fast access.
Event History Recall from section 2 t h a t our event model is described by a tuple C = < s, C, (t, a) >. An event history is the (t, a) vector of the above tuple. An event history can be directly modeled as a irregular time series 1, and in fact has been implemented using Illustra irregular time series d a t a type. This d a t a type captures the precise semantics of an event history, and yet offers the benefits of using a temporal database. The storage requirements are minimized as well as support for temporal queries is possible. Although, conceptually the event attribute can be of any d a t a type, the limitations of the Illustra storage manager preclude the use of all but numeric d a t a types. Even with this limitation, the overall advantages of a temporal database make it much preferable to a relational database.
5.3
D a t a b a s e Interface
T h e database interface has two components - Event M e t a d a t a m a n a g e m e n t and Event Detection. These components are described next: 1 A regular time series has a data point associated with each point on the Calendar, an irregular time series does not have any such restriction.
An Architecture and Construction of a Business Event Manager
ComposlteEV I EID I Exprn
I'~176 delta_t
I
Event
HISTORYTABLES
EI~"~ Calendar History Number
/
~
EID I Event History
"~
Dy epeat
I
273
I
IE ' ~ _
I
'
Event History
§
'=
I Fig. 3. Event Storage Schema.
Event Metadata
Interface
Users assert a "create event" c o m m a n d to define a new system event. T h e parser modifies such a statement into a SQL-3 statement so t h a t m e t a d a t a is inserted into an appropriate system table. The parser operation is illustrated with the following example: Consider the event LotRel described in section 4.2. Briefly, this create statement will be converted to a SQL-3 statement of the following form: insert into Temporal values( 'LotRel', 'WeekHour', 200, 1, 2); -- The WeekHour calendar is predefined and not described here.
T h e subscription to an event is cancelled whenever an user asserts a "delete event" command. T h e proposed model does not support modification to the event m e t a d a t a as its semantics are not precise. Event Detection Interface This interface makes an extensive use of the active database features. The parser converts any "detect event" statement into the following form: "update table Event_History( ..... )". For schedulable events, the event manager itself asserts the table update command.
274
Ajit K. Patankar and Arie Segev
This update statement triggers a rule defined over the event history table. The following actions need to be undertaken once an event is detected: execute application specific action and determine repeatability of the event. These actions are performed using database functions (in C language) whose internal implementation is secondary to the exposition in this paper.
6
SUPPORTING TEMPORAL REASONING AND CONSTRAINTS
Problems in real world applications such as those emerging from financial, manufacturing, and scheduling world, require extensive support for temporal events. The current work in temporal databases does not allow the possibility of specifying a temporal event in terms of its relative position with respect to other temporal events. For example, consider a process control rule - "during a furnace operation measure the pressure." Here the event "measurement of pressure" is constrained to occur relative to two events - the start and end of an operation. A very important advantage of our approach is the support for the implementation of temporal reasoning and constraints. First, we describe basic notions in temporal reasoning and then introduce temporal operators.
6.1
Temporal Reasoning Concepts
The completeness of any event manager has to be judged on the basis of its ability to support these constructs, and hence they are described next: 2
Event time constraint The event time may be constrainted by a lower and upper bound. The time constraint should be expressible using a natural language calendar. Interval Time (I-Time) I-Time is the periodic time interval over which a temporal event can take place. It is usually represented by granularity of a calendar, e.g., business day. Quantifier A temporal quantifier indicates the frequency of an event with respect to the I-Time. Typical examples of a Quant operator include EACH, ALL, EVERY, and ONCE. As shown in section 5, quantifiers are sub-cases of the more general functional form of validation to determine if future occurrences of an event are valid.
Past-dependent Events Let H be the event history. If an event e~ is called past-dependent if occurrence of ei is dependent on H. 2 Terenziani Ter94 uses the terms Frame-Time, Quant, I-Time, and Qual-Rel to describe the similar notions.
An Architecture and Construction of a Business Event Manager
6.2
275
Supporting Temporal Reasoning
In this section, we briefly review the features of Event Manager that are useful for the implementation of temporal reasoning.
LATE Operator In MZ95 and in other event algebras, a negation operator is described, such as !E, where E is any primitive event. Although, such a definition is useful for defining composite events, it is wrong because this event occurs at all time points except the instances of E. Therefore, we introduce a L A T E operator that has the following syntax: create E V E N T eidl LATE (timestamp I tint} eid2. At either the timestamp or tnow+tint, the event manager checks if event eid2 has occurred, and if it has not then asserts event eidl.
Future Event Operations Let 9~ be the set of events on the event queue, and f be any particular event from this set with ft as its detection time. The event specification language supports creation of derived future event of the form: create event fd F U T U R E ft • time_dif f; If the - operator is used, then the new event fd has to obviously satisfy the requirement that tnow < ft - time_diff.
Event Constraints Event constraints are of two types - Calendric and Relative. In the event model, a calendar is associated with each event. Also, a temporal database, such as Illustra, prevents insertion of an element that is inconsistent with a calendar. Hence, calendric constraints are supported in a trivial fashion. To model relative constraints, operators such as A F T E R , STARTED BY, FINISHED BY, DURING. are introduced in temporal languages. However, we provide only two operators F U T U R E and LATE, and claim that these are adequate for supporting relative event constraints. Furthermore, our approach leads to precise semantics and implementation of tempora ! constraints. Note that, relative events cannot occur with reference to a database or asynchronous external event. Relative events are meaningful with respect to only schedulable events. A relative event may be constrained in two ways - (1)Before or after a scheduled event and (2)Between two scheduled events. The first case can be easily implemented using the F U T U R E event operator. For example, consider the following rule: Mary eats breakfast before going to work. The "going to work" event can be easily defined by a work-day calendar. The "eats-breakfast" event can be defined as follows: create event eats-breakfast as go-to-work - 30 min. Note that, Terenziani's approach does not require that a specific time operator (30 min) be used which is acceptable for modeling a temporal rule but clearly unsuitable for implementation.
276
Ajit K. Patankar and Arie Segev
The second case can also be implemented in the same fashion if both scheduled events are on the event queue at the same time. However, in certain cases this is not possible. Consider the event Measure temperature while the furnace operation is on-going. The event "temperature measurement" is constrained to occur between the operation start and end events. However, the operation end event cannot be inserted on the event queue until the operation start event occurs. Hence, such a temporal constraint has to be implemented in conjunction with the rule system. This is briefly explained with the help of the following rule (see PS95 for details): on event o p e r _ s t a r t { c r e a t e event oper_end . . . c r e a t e event temp_measure as oper_end - 10 }
7
RELATED W O R K
A database paradigm of application development was proposed in MD89. The notion of a powerful event manager, which is capable of sensing events in the database and external world, was described in SAD+94. However, they have completely ignored temporal events and an implementation of such an event manager. A conventional C + + object has been extended with an event interface in AMC93. This interface enables objects to designate some, possibly all, of their methods as primitive event generators. A classification of events similar to this paper is given in CKAK94, however, they have completely ignored calendar events and their implementation. To our knowledge, none of the active databases, in research or commercial world, maintain a history of database events. This forgetfulness of events makes it impossible to implement composite events. Alert SPAM91 claims to store a set of events as first clas tuples in an active table. However, this is not true because the effect of an event, i.e., updated, inserted or deleted tuples, have been referred to as events. Also time sequence of events is not maintained. There have been three important proposals regarding the definition and implementation of composite events GJS92, CKAK94, GD94. The composite event operators in all three proposals are quite similar, however, the detection mechanisms are different. ODE GJS92 uses finite automaton and Samos GD94 uses Petri nets. Snoop CKAK94 have used event graphs for detection and also introduced parameter contexts to alleviate the problem of monotonic increase in the storage space. A Past Temporal Logic (PTL) formalism for specifying events and conditions in active database systems is presented in PSW95. Their work assmnes that a database is designed to represent only the current information, and new values overwrite the old ones. The temporal conditions in rules determine which information is to be saved.
An Architecture and Construction of a Business Event Manager
277
The idea of an extensible calendric system was first introduced in SS92,SS93. A system of calendars that allows specification of natural-language time-based expressions was proposed in CSS94. They also proposed an algorithm for implementing Temporal Rules in extensible databases. Many researchers have developed temporal data models. Also, surveys and a book on temporal databases are available in PSE+94, Sno94, TCG+93. It appears that the only other publication that extensively deals with the specification of temporal events is reported in Ter94. There is two key differences between our specification of temporal events and Terenziani's approach are as follows: (1)Definition of an event. An event occurs over an interval of time while this paper assumes that an event occurs instantaneously. (2)Causal relationship among events. In our framework, there is no direct causal relationship between any two events. A rule triggered by an event may deterministically trigger another event, however, it is not sensible to say that one event causes another event. 8
CONCLUSIONS AND R E S E A R C H ISSUES
This paper has presented an architecture and construction of an Event Manager which is a closely coupled extension to a temporal, object-relational database. A SQL-like language was proposed for the registration, subscription, and cancellation of events. This event manager supports external, database and temporal events in an uniform fashion. As event histories can be naturally modeled as time series, temporal database services have been utilized for storing event m e t a d a t a and histories. An event queue mechanism was implemented for the detection of temporal and pollable external events. The event queue allows the definition of context dependent and relative temporal events. There are several areas for further research in this context. Retro-active and pro-active database rules were proposed in lEGS94, and we are investigating if these can be implemented using the proposed event management framework. The current event history storage scheme allows only fixed-length data types as event parameters. A closer integration of composite event algorithms also needs to be evaluated. A graphical format for the specification of events would be also useful as the recent trend is towards the use of 4GL languages. A
LANGUAGE GRAMMAR
The event language notation is similar to the SQL-3 notation. The following notational conventions will be used: - A key word will be shown in Capital Letters. Square brackets indicate optional elements. Curly braces ( } enclose lists from which the user must select one element. Vertical bars - - are used to separate choices. - An identifier is enclosed in brackets < >. -
-
-
278
Ajit K. Patankar and Arie Segev
- A (0,...) shows a list from which items m a y be repeated any number of times. Event Registration
create EVENT <Ev_Id> {TEMPORAL DATABASE I EXTERN I COMPOSITE I FUTURE} CALENDAR REPEAT {HISTORY I NUMBER} {time interval Iint} Composite Event Expr Future Event Expr EVERY {time interval Iint} POLL {time_interval} PAKAM((attr data type) .... )
Event Detection
detect EVENT Ev_Id(attr_val, attr_val,...); Event Cancellation
delete EVENT Ev_Id;
References AMC93
CKAK94
cs931
css94 ECS94
GD94
E. Anwar, L. Maugis, and S. Chakravarthy. A New Perspective on Rule Support for Object-Oriented Databases. In Proceedings of ACM SIGMOD International Conference on the Management of Data, pages 99-109, Washington, D.C, 1993. S. Chakravarthy, V. Krishnaprasad, E. Anwar, and S.-K. Kim. Composite Events for Active Databases: Semantics, Contexts, and Detection. In Proc. of the POth Very Large Database (VLDB) Conference, pages 730-739, Santiago, Chile, 1994. R. Chandra and A. Segev. Managing Temporal Financial Data in an Extensible Database. In Proceedings of the 19th Int. Conf. on Very Large Databases, Dublin, Ireland, Dublin, Ireland, August 1993. R. Chandra, A. Segev, and M. Stonebraker. Implementing Calendars and Temporal Rules in Next-Generation Databases. In Proceedings of the 10th Int. Conf. on Data Engineering, February 1994. O. Etzion, A. Gal, and A. Segev. Retroactive and proactive database processing. In Proceedings of the Fourth International Workshop on Research Issues in Data Engineering (RIDE'94), pages 126-131, Houston, TX, 1994. S. Gatziu and K.R Dittrich. Detecting Composite Events in Active Databases using Petri Nets. In Proceedings of the Fourth International Workshop on Research Issues in Data Engineering (RIDE'94), pages 2-9, Houston, TX, February 1994.
An Architecture and Construction of a Business Event Manager GJS92
II195 Kim95
MD89 MZ95
PS95
PSE+94
PSW95
SAD+94
SC9~
SMP95
Sno94 SPAM91
ss87 SS92
SS93
279
N. Gehani, H.V. Jagadish, and O. Shmueli. Composite Event Specification in Active Databases : Model and Implementation. In Proc. of the 18th Int. Conf. on Very Large Databases, 1992. Illustra Time Series Data Blade Manual. Illustra Information Technologies, Inc., Oakland, CA, 1995. Won Kim. Object-Oriented Database Systems: Promises, Reality, and Future. In Won Kim, editor, Modern Database Systems: The Object Model, Interoperability, and Beyond, pages 255-280. ACM Press, New York, NY, 1995. D.R. McCarthy and U. Dayal. The architecture of an active database management system. In Proc. of ACM SIGMOD Conf., pages 215-224, 1989. I. Motakis and C. Zaniolo. Composite Temporal Events in Active Databases: A Formal Semantics. In International Workshop on Temporal Databases, pages 332-350, Zurich, Switzerland, 1995. Ajit K Patankar and Arie Segev. An Architecture and Construction of an Event Manager. Technical Report 37913, Lawrence Berkeley Laboratory, Berkeley, CA 94720, 1995. Niki Pissinou, Richard T. Snodgrass, Ramez Elmasri, Inderpal S. Mumick, M. Tamer Ozsu, Barbara Pernici, Arie Segev, Babis Theodoulidis, and Umeshwar Dayal. Towards an Infrastructure for Temporal Database: Report of an Invitational ARPA/NSF Workshop. SIGMOD Record, 23(1):35-51, 1994. A. Prasad Sistla and O. Wolfson. Temporal conditions and integrity constraints in active database systems. In Proceedings of A C M SIGMOD International Conference on the Management of Data, pages 269-280, San Jose, CA, 1995. Michael Stonebraker, Paul Aoki, Robert Devine, Witold Litwin, and Michael Olson. Mariposa: A New Architecture for Distributed Data. In Proc. lOth Int. Conf. on Data Engineering, pages 54-65, Houston, TX, Feb. 1994. Arie Segev and Rakesh Chandra. A data model for time-series analysis. In N. Adam and B. Bhargava, editors, Advanced Database Systems. Notes in Computer Science Series, Springer Verlag,, 1993. Arie Segev, Max Mendel, and Ajit Patankar. An Implementation of a Computer Integrated Manufacturing (CIM) system using an Active, objectrelational database. In Proceeding of the second International Conference on Applications of Databases, San Jose, CA, 1995. Richard T. Snodgrass. Overview of the Special Section on Temporal Database Infrastructure. SIGMOD Record, 23(1):34, 1994. U. Schreier, H. Pirahesh, R. Agrawal, and C Mohan. Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS. In Proc. of the 17th VLDB Conference, Barcelona, Spain, 1991. A. Segev and A. Shoshani. A Logical Modeling of Temporal Databases. In Proceedings of A CM SIGMOD International Conference on the Management of Data, May 1987. M. Soo and R. Snodgrass. Mixed Calendar Query Language Support for Temporal Constants. Technical Report TempIS No.29, University of Arizona, 1992. M. Soo and R. Snodgrass. Multiple Calendar Support for Conventional Database Management Systems. In Proceedings of the Int. Workshop on an Infrastructure for Temporal Databases, June 1993.
280
Ajit K. Patankar and Arie Segev
TCG+93 Ter94
A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass. Temporal Databases. Benjamin/Cummings Publishing Company, Inc., 1993. P. Terenziani. Dealing with qualitative and quantitative temporal information concerning periodic events. In Proceedings of the 8th Int. Symposium on Methodologies for Intelligent Systems (ISMIS' 94), pages 275-284, Charlotte, NC, 1994.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic Gideon Berger and Alexander Tuzhilin * 1 Computer Science Department Courant Institute New York University gideon@cs, nyu. edu 2 Information Systems Department Stern School of Business New York University atuzhili@stern, nyu. edu
A b s t r a c t . There has been much attention given recently to the task of finding interesting patterns in temporal databases. Since there are so many different approaches to the problem of discovering temporal patterns, we first present a characterization of different discovery tasks and then focus on one task of discovering interesting patterns of events in temporal sequences. Given an (infinite) temporal database or a sequence of events one can, in general, discover an infinite number of temporal patterns in this data. Therefore, it is important to specify some measure of interestingness for discovered patterns and then select only the patterns interesting according to this measure. We present a probabilistic measure of interestingness based on unexpectedness, whereby a pattern P is deemed interesting if the ratio of the actual number of occurrences of P exceeds the expected number of occurrences of P by some user defined threshold. We then make use of a subset of the propositional, linear temporal logic and present an efficient algorithm that discovers unexpected patterns in temporal data. Finally, we apply this algorithm to synthetic data, UNIX operating system calls, and Web logfiles and present the results of these experiments.
1
Introduction
There has been much work done recently on pattern discovery in temporal and sequential databases. Some examples of this work are 14, 27, 17, 10, 25, 16, 8, 18, 9, 22. Since there are many different types of discovery problems that were addressed in these references, it is important to characterize these problems using some framework. One such characterization was proposed in 1% In this chapter * This work was supported in part by the NSF under Grant IRI-93-18773.
O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 281-309, 1998. ~ 5pringer-Verlag Berlin Heidelberg 1998
282
Gideon Berger and Alexander Tuzhilin
we review this framework and then focus on one specific problem of discovering unexpected patterns in temporal sequences. To find unexpected patterns in a sequence of events, we assume that each event in the sequence occurs with some probability and assume certain conditional distributions on the neighboring events. Based on this, we can compute an expected number of occurrences of a certain pattern in a sequence. If it turns out that the actual number of occurrences of a given pattern significantly differs for the expected number, then this pattern is certainly unexpected and, therefore, is interesting 23, 24. We present an algorithm for finding such patterns and test it on several types of temporal sequences, including Web logfiles and sequences of OS system calls.
5/1/97
7/12/97 Fig. 1. An example of the head_and_shoulder pattern.
2
Characterization of Knowledge Discovery Tasks in Temporal Databases
Characterization of knowledge discovery tasks in temporal databses, proposed in 10 is represented by the 2-by-2 matrix presented in Table 1. The first dimension in this matrix defines the two types of temporal patterns. The first type of a temporal pattern specifies how data changes over time and is defined in terms of temporal predicates. For example, the pattern
head_and_shoulder(IBM, 5/1/97, 7/12/97) indicates that the stock of IBM exhibited head_and_shoulder trading pattern 15 from 5/1/97 until 7/4/97, as is shown in Figure 1). The second type of temporal patterns is rules, such as "if a stock exhibits a head-and-shoulder pattern and investor cash levels are low, then bearish period is likely to follow." The second dimension, the validation/generation dimension, refers to the purpose of the discovery task. In validation the system focuses on a particular
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
283
pattern and determines whether it holds in the data. For example, we may want to validate if the head_and_shoulders pattern holds for the IBM stock in a given data set or that a certain rule "holds" on the data. The second purpose of discovery can be the generation of new predicates or rules that are previously unknown to the system. For example, the system may attempt to discover new types of trading rules in financial applications. Categorizing patterns in terms of the above two dimensions leads to a twoby-two classification framework of the knowledge discovery tasks, as presented in Table 1. We will describe each of the four categories in turn now.
IValidationl Generation Predicates I III ,I Rules II IV Table 1. Types of Knowledge Discovery Tasks.
C l a s s I. The discovery tasks of this type involve the validation of previously defined predicates over the underlying database. For example, assume that we have the temporal database of daily closing prices of stocks at some stock exchange, STOCK(SYMBOL,PRICE,DATE), where SYMBOL is the symbol of a security, PRICE is the closing price of that stock on the date DATE. Consider the following predicate specifying that the price of a certain stock bottomed out and is on the rise again over some time interval:
bottom_reversal(x, tl, t2)
=
(3t)(tl < t <
t2
A decrease(x, tl, t)
A increase(x, t, t2)) where increase(x, tl, t2) and decrease(x, tl, t2) are predicates specifying that the price of security x respectively "increases" and "decreases" over the time interval (tl,t2) 1 Then we may want to validate that the predicate bottom_reversal(x, tl, t2) holds on the temporal relation STOCK(SYMBOL,PRICE,DATE). This validation can take several forms. For example, we may want to find for the predicate bottom_reversal if one of the following holds:
bottom_reversal (IBM, 5/7/93, 8/25/93), bottom_reversal(IBM, tl, t2), bottom_reversal(x, 5/7/93, 8/25/93) 1 Note that we do not necessarily assume monotonic increases and decreases. Predicates increase and decrease can be defined in more complex ways, and we purposely leave it unspecified how to do this.
284
Gideon Berger and Alexander Tuzhilin
The first problem validates that the stock of IBM experienced the "bottom reversal" pattern between 5/7/93 and 8/25/93. The second problem finds all the time periods when IBM's stock had "bottom reversal," and the last problem finds all the stocks that had "bottom reversals" between 5/7/93 and 8/25/93. One of the main issues in the problems of Class I (predicate validation problem) is to find approximate matching patterns. For example, for the IBM stock to exhibit the bottom reversal pattern between 5/7/93 and 8/25/93, it is not necessary for the time series of IBM stock to match predicate bottom_reversal exactly. Another example of the approximate matching problem of Class I comes from the speech recognition applications where sounds and words are matched only approximately against the speech signal. There has been extensive work done on Class I problems in signal processing 20, speech recognition 6, 21, and data mining communities. In the data mining community these types of problems are often referred as similarity searches and have been studied in 1, 3, 4, 12, 13, 8. Class II. Discovery tasks of Class II involve validation of previously asserted rules. For example, consider the rule: "If a price correction in a stock is seen before the announcement of big news about the company, then insider trading is likely,"
Correction(stock, tl, t2) A Big_news(stock, t3) A Soon_after(t3, t2) --~ Insider_trading(stock, tl , t2) where Correction, Big_news, Insider_trading and Soon_after are user-defined predicates (views) defined on relations STOCKS and NEWS. Evaluation of this rule on the data entails finding instances of variables stock, tl, t2, t3 and the "statistical strength" of the rule (e.g. measured in terms of its confidence and support 2) that make the rule hold on the data (in statistical terms). As in the case of Class I problems, one of the main issues in rule validation is the problem of approximate matching. The need for approximate matching arises for the following reasons. First of all, rules hold on data only in statistical terms (e.g. having certain levels of confidence and support). Secondly, some of the predicates in the rule can match the data only approximately (as is the case with Class I problems from Table 1). Moreover, certain temporal operators are inherently fuzzy. For example, temporal operator Soon_after(t1, t2) is fuzzy and needs to be defined in "fuzzy" terms 2. Class III. Discovery tasks of Class III involve the discovery of new interesting predicate-based patterns that occur in the database. In order to discover such patterns, the system should know on what it should focus its search because there are potentially very many new patterns in the database. In other words, the system should know what to look for by letting the user specify what is 2 Note that it is not appropriate to define this operator in terms of the temporal operator Next because of the inherent ambiguity of the term "soon." Although this operator can be defined in many different ways, one natural approach would be through the use of fuzzy logic 28.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
285
interesting. For example, the pattern bottom_reversalmay be interesting because it provides trading opportunities for the user. Although there are many different measures of interestingnes s for the user, such as frequency, unexpectedness, volatility, and periodicity 10, the most popular measure used in the literature is frequency of occurrence of a pattern in the database 17, 16, 18. In particular, 17, 16 focus on discovering frequent episodes in sequences, whereas 18 discovers frequent patterns in temporal databases satisfying certain temporal logic expressions. In this chapter, we use a different measure of interestingness. Instead of discovering frequent patterns in the data, we attempt to discover unexpected patterns. While it is sometimes the case that the discovery of frequent patterns offers useful insight into a problem domain, there are many situations where it does not. Consider, for example, the problem of intrusion detection on a network of workstations. Assume we define our events to be operating system calls made by some process on one of these workstations. We conjecture, then, that patterns of system calls differ for ordinary users as opposed to intruders. Since intrusion is a relatively rare occurrence the patterns we would discover using frequency as our measure of interestingness would simply be usage patterns of ordinary users offering us no information about intrusions. Instead what we propose is to assign exogenous probabilities to events and then attempt to discover patterns whose number of occurrences differs by some proportion what would be expected given these probabilities. In the example of intrusion detection we would assign the probabilities of events to reflect the frequency of events in the presence of no intruders. Then if an intrusion did occur, it would presumably cause some unexpected pattern of system calls which can be an indication of this event. As will be demonstrated in Section 3, the new measure of interestingness requires discovery techniques that significantly differ from the methods used for the discovery of frequent patterns. The main reason for that is that unexpected patterns are not monotone. These notions will be made more precise in Section 3. C l a s s IV. Discovery tasks of Class IV involve discovery of new rules consisting of interesting relationships among predicates. An example of a temporal pattern of this type is the rule stating that "If a customer buys maternity clothes now, she will also buy baby clothes within the next few months." Discovery tasks of Class IV constitute challenging problems because, in the most general case, they contain problems of Class III (discovery of new predicates) as subproblems. The general problem of discovering interesting temporal rules using the concept of an abstract 11 has been studied in 7. Discovery of temporal association rules was studied in 5, 25. In this section, we reviewed a characterization of knowledge discovery tasks, as presented in 10. In t h e r e s t of this chapter, we will focus on one specific Class III problem dealing with discovery of unexpected patterns. In the next section, we will formulate the problem. In Section 4 we will present an algorithm for finding unexpected patterns, and in Section 5 we will present experiments evaluating this algorithm on several applications.
286
3
Gideon Berger and Alexander Tuzhilin
Discovering Unexpected Patterns in Sequences: The Problem Formulation
We start this section with an intuitive presentation of the problem and then provide its more formal treatment. We want to find unexpected patterns, defined in terms of temporal logic expressions, in sequences of events. We assume that each event in the sequence occurs with some probability and assume certain conditional distributions on the neighboring events. Based on this, we can compute an expected number of occurrences of a certain pattern in a sequence. If it turns out that the actual number of occurrences of a given pattern significantly differs for the expected number, then this pattern is certainly unexpected and, therefore, is interesting
23, 24. In this chapter, we first present a naive algorithm that finds all unexpected patterns (such that the ratio of the actual number of occurrences to the expected number of occurrences exceeds a certain threshold). After that, we present an improved version of the algorithm that finds most of the unexpected patterns in a more efficient manner. We also experimentally compare the naive and the more efficient algorithms in terms of their performance. More formally, let E -- {c~,/3,V,...} be a finite Mphabet of events. We use a subset of propositional linear temporal logic to discover temporal patterns over the events. The basic temporal operators of this system are C~Bk/3(c~ beforek /3) which intuitively means that c~ occurs followed by an occurrence of c~ within k subsequent events, (~N/~ (o~ next ~) ~ occurs and the next event is /3, and c~J/3 (a until/3) which means before/3 occurs a sequence of c~'s occurs. This is often called the strong until 26. While the before operator is actually redundant as c~B/3 can be expressed as -~(-~U/~) we have chosen to include it separately for simplicity and efficiency.A pattern of events is defined as a conjunction of ground events over these operators. For example, the simplest case is oLN/3.Some additional examples are (6U((aN/3)Bv)) and aN/3Nv. In the pattern discovery algorithm presented in Section 4.2 we consider the following fragment of the Propositional Temporal Logic (PLTL). The syntax of this subset is as follows. The set of formulae of our subset is the least set of formulae generated by the followingrules: (I) each atomic proposition P is a formulae; (2) if p is a formula and q is a formula containing no temporal operators then
pUq, pBKq, pNq, qUp, qBKp, qNp are formulae. 3 We assume an exogenous probability distribution over the events. While these events may be dependent or independent, depending on the problem domain of 3 We ignore disjunctions because what seems to occur in practice when disjunctions are allowed is that the disjunction of a very interesting pattern, E, with an uninteresting pattern, F, results in an interesting pattern EVF. This occurs not because EVF truly offers any insight into our problem domain but rather because the interestingness of E "drags up" tlae interestingness measure of E Y F to the point where it also becomes interesting. We choose instead to simply report E as an interesting pattern. Our decision to omit conjuctions and negation will be made clear shortly.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
287
interest we assume independence of the events unless explicitly stated otherwise. For instance, in the application we consider in Section 5.3, events are described as hits on Web pages. In this case the probability that a user goes from Web page P to Web page Q is clearly dependent on the links that exist on page P. In other cases independence may be more appropriate. In any case, given an a priori set of event probabilities, we can compute expected values for the number of occurrences of any temporal pattern in our string. For example, the expected number of occurrences of E~aBfl, assuming the events a and/~ are independent, can be computed as follows. Let Xn be the number of occurrences of the pattern c~Bj3 up to the n th element of the input string and an the number of a ' s up to the n th element of the input string. Then
EXn = P r i / ~ l X n _ , = PrE#~Xn_x
§ an-~
§ (1 - P r i / 3 D ( X n _ l
+ Prlc~(n
)
- 1) § (1 - Prlf~l)(Xn-1)
= Prc~Pr~fl~(n -- 1) -I- Xn-1
Therefore,
EiXn I - E~Xn-x -- PrEaPrf~I(n-
1)
Also, E ~X2 = Pr lal* PrEf~. From this recurrence equation, we compute E aBKf~ for the input string of length N as
EiaB~ _- P r a ~ P r i ~ N ( N -
I)
2
The expected number of occurrences of patterns of other forms can be similarly computed as
(1)
E~(~N~ = PrII~ Pr~f~( N - 1)
EEOLBK/~ = PralPrf~l(K)(N
EaU~
=
Pr~a Prif~ 1 - Pra~
- K) §
PrIa~ Pr ~I~1 ( K ) ( K
- 1)
N- 1 Z i - P r a l ~ + Pr~aIPr~~ i.=2
As was stated earlier, we will search for the unexpected temporal patterns in the data, where unexpectedness is defined as follows: D e f i n i t i o n 1 Let P denote some temporal pattern in string S. Let AP~ be the actual number of occurrences and E~P the expected number of occurrences of pattern P in S. Given some threshold T, we define a pattern P to be unexpected
288
Gideon Berger and Alexander Tuzhilin
if ^~D~A~_~p > T. The ratio ^~r)~_~ is called the
Interestingness Measure (IM) of the
pattern P and will be denoted as I M ( P ) . 4 This is a probabilistic measure of interestingness whereby a p a t t e r n is unexpected if its actual count exceeds its expected count by some proportion T. As the following theorem indicates, however, this is a difficult problem. Problem (INTERESTINGNESS): Given a string of temporal events V = v l , v 2 , . . . ,vr, does there exist an interesting p a t t e r n in V of the form X 1 B k X 2 B k . . . BkXm for an arbitrary m ? Theorem
1 The I N T E R E S T I N G N E S S problem is NP-complete.
P r o o f : See Appendix. While we are trying to find interesting patterns t h a t contain a variety of t e m p o r a l operators in an arbitrary order, this theorem states t h a t finding interesting patterns t h a t only use the B E F O R E operator is hard. Furthermore, we would like to put no restrictions on the "interesting" patterns we discover. We would simply like to find all patterns t h a t are interesting. The following theorem, however, shows t h a t it is necessary to impose some bounds on the size of the patterns t h a t we uncover, since in the case of unrestricted patterns, the most unexpected p a t t e r n will always be the entire string. 2 Consider a string of temporal events V = V l , v 2 , . . . , V N and a temporal pattern T. If the length of T (number of temporal operators in it), length(T) < N - 1, then there exists another pattern P such that length(P) = length(T + 1) and I M ( P ) > I M ( T ) , where the length of a pattern is defined as the number of events in the pattern.
Theorem
Proof" Let AIT
AIT
= ~ and E~T
= ~ and Z = {zl, z2,..., Zm} the set of all events.
ATNz~ We want to prove t h a t 3 zi E Z s.t. ETNzi
>- a
Assume this is not true for zl, z 2 , . . . , Zm-1 and show t h a t it must be true for Zm. By this assumption and because of (1) ATNzi
PrTPr~z~(N Therefore, ATNz,~
< c~
- 1)
Vzi,i = 1 , 2 , . . . , m -
1.
< olPriTlPr~z,~(N - 1).
a Another measure of interestingness is to find patterns P for which AP~/EP < T. This problem can be treated similarly. We have chosen not to search for these patterns because they are complimentary to the ones described in Definition 1. If a pattern -~P is found to be interesting in our formulation then P will be interesting in this complimentary formulation for some new threshold. Thus in the interest of simplicity we choose to solve these complimentary problems separately and ignore negation.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
289
Then, m--1
E
m--1
A~TNzd
< E
9/ , = 1
c~PrTPrlzi~(N-
1)
i=1
m--1
: (~Pr~T~(N - 1) E
Pr~zi~
i=l
= c~PrTl(N - 1)(1 - Pr~zm) m
Since, E
A ~ T N z i ~ = AIT ~ = ;3,
i=l
ATNzm
> ;3 - a P r ~ T ~ ( N - 1)(1 - Przm)
ATNzm
> ;3 - a P r ~ T ~ ( g - 1)(1 - Prizing) PrT~Przm(Y - 1) a P r ~ T ~ ( Y - 1)(1 - Przm)
ETNzm~ ;3
Pr~T~IPrizm~( N
- 1)
PrTPrzm~(g
-
1)
;3 er~T~Pr~zm~(Y
(since
;3
EIT
--
;3
PriT(N - 1)
c~(1 - Przm~) -
1)
erzm~
-- o~)
~(1 - Przmi)
Pr~zm
PrlZmll
~---Oz
Intuitively, this theorem tells us that given an interesting temporal pattern, there exists a longer pattern that is more interesting. In the limit then, the most interesting pattern will always be the entire string of events, as it is the most unlikely. In order to cope with this, we restrict the patterns that we look for to be of length less than or equal to some length limit. Of course, still the most interesting pattern we will find will be one whose length is equal to the length limit. Nevertheless, it is often the case that an interesting pattern that is not the most interesting provides valuable insight into a given domain as we will see later in discussing our experiments.
4 4.1
Algorithm Naive Algorithm
A naive approach to discovering interesting patterns in an input sequence might proceed as follows. Sequentially scan over the input string discovering new patterns as we go. When a new pattern is discovered a record containing the pattern itself as well as a count of the number of occurrences of the pattern is appended
290
Gideon Berger and Alexander Tuzhilin
to a list of all discovered patterns. This is repeated until all patterns up to a user-defined maximum length, have been found. More precisely, the algorithm proceeds as follows D e f i n i t i o n 2 B E F O R E K : A user defined constant that determines the maxi-
m u m number of events that X can precede Y by, for X B K Y to hold. Input: - Input String - Event Probabilities: the exogenously determined probabilities of each atomic event. - BEFOREK - The threshold T for interestingness. T h a t is the value that, if exceeded by the interestingness measure of a pattern, deems it interesting. - Maximum allowable pattern length (MAXL). Output: - All discovered patterns P such that I M ( P ) > T. Algorithm: Scan t h e i n p u t s t r i n g t o d e t e r m i n e t h e i n t e r e s t i n g n e s s measure of e a c h e v e n t i n i t , and i n i t i a l i z e list L with all these events
WHILE L i s n o t empty DO Amongst a l l t h e p a t t e r n s of L, c h o o s e t h e p a t t e r n C w i t h t h e l a r g e s t i n t e r e s t i n g n e s s measure as t h e n e x t c a n d i d a t e t o be expanded. Expand C as f o l l o w s . Scan t h e i n p u t s t r i n g l o o k i n g f o r o c c u r r e n c e s of C. When an i n s t a n c e of C i s d i s c o v e r e d , expand i t b o t h as a p r e f i x and as a s u f f i x . By t h i s we mean, r e c o r d a l l o c c u r r e n c e s of ( C o p X) and (X op C) where op r a n g e s o v e r t h e t e m p o r a l o p e r a t o r s , and X r a n g e s o v e r a l l e v e n t s . F i n a l l y , compute t h e i n t e r e s t i n g n e s s of a l l t h e s e newly discovered patterns C'.
IF Length(C') < MAXL THEN add C' to the list L. Remove C from L. END WHILE Output interesting patterns. Note that the algorithm we just presented is tantamount to an exhaustive search and is therefore not very efficient.W e propose a more efficientalgorithm, that, although is not guaranteed to find all interesting patterns, offers speed up with minimal loss of accuracy. The idea is to expand on the approach presented in 17 of beginning with small patterns and expanding only those that offer the potential of leading to the discovery interesting, larger patterns.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic 4.2
291
Main Algorithm
The difficulty involved in finding interesting patterns is in knowing where to look. When interestingness is measured simply by some count (i.e. the number of occurrences exceeds some threshold) as is done in 17 it is obvious that for a pattern to be frequent so must its component partial patterns be frequent. With this in mind, the technique that has been used in 17 is to expand all patterns whose count exceeds this threshold and stop when no more exist. When using our interestingness measure, however, this is not the case. T h a t is, a pattern can be unexpected while its component sub-patterns are not. This lack of monotonicity in our interestingness measure is most easily understood with an example. E x a m p l e : Let the set of events be E = {A, B, C}. Assume the probability of these events is PrIA 1 = 0.25, PrlBl = 0.25, andPrIC 1 = 0.50. Also assume that these events are independent. Let the threshold T = 2. In other words, for a pattern to be interesting the value of the actual number of occurrences of the pattern divided by the expected number of occurrences of the pattern must exceed 2.0. Consider the following string of events. ABABABABCCCCCCCCCCCC
(the length of this string N = 20) Given our probabilities, E~A = 5 and EEB = 5. Also given the expression for computing expectations for patterns of the form A N B .
EIANB
= Pr~A~Pr~B(N
- 1)
= (0.25)(0.25)(19) = 1.1875 Since A~A = 4 and A~B = 4, both of the events A and B are not interesting (in fact the actual number occurrences of these events was less than what was expected), but the pattern A N B which occurred 4 times was interesting with IM(ANB)-
4 1.1875 = 3.37
This lack of monotonicity in our interestingness measure results in a significantly more complex problem especially in terms of space complexity. In the algorithm for discovering frequent patterns significant pruning of the search space can occur with each iteration. T h a t is, when a newly discovered pattern is found to have occurred fewer times than the frequency threshold, it may be discarded as adding new events to it c a n n o t result in a frequent pattern. With our measure of interestingness, however, this is not the case. The addition of an event to an uninteresting pattern can result in the discovery of an interesting one. This inability to prune discovered patterns leads to an explosion in the amount of space
292
Gideon Berger and Alexander Tuzhilin
required to find unexpected patterns. Consequently we are limited to expanding patterns by only single literals at a time and therefore will not discover patterns like ((C~BKf~)BK(~/NS)), where two patterns of size greater than one are combined via a temporal operator (before, in this example). This is the reason that we have not used conjunctions as part of our fragment of temporal logic. Since our events occur sequentially, it is impossible for conjunctions to arise unless we expanded patterns by multiple literals at a time. This does present a limitation of our algorithm and extending our fragment further is an area we are pursuing currently. A more efficient algorithm than the naive one for finding unexpected patterns involves sequential scans over the string of events discovering new patterns with each scan. A list is maintained of those patterns discovered so far, and on each subsequent iteration of the algorithm the "best" pattern is selected from this list for expansion to be the seed for the next scan. When a pattern P is expanded, the input sequence is scanned and occurrences of P located. For each of these occurrences all patterns of the forms XopP and PopX are added to the list of discovered patterns, where op is a temporal operator, N, BK or, U and X is a variable ranging over all events. Given a pattern to expand, ~BK~, for example, during the scan we will discover all patterns, ((~BK/~)N~/), (~/BK(~BKf~)), e t c . . , for all events % T h e heart of the algorithm is how "best" patterns are chosen. We will explain it formally below (in Definition 4), but would like to give some intuition beforehand. Clearly, we would like to define "best" to mean most likely to produce an interesting pattern during expansion. By Theorem 1, we know that expanding an already interesting pattern must result in the discovery of additional interesting pattern(s). The question remains, however, amongst interesting patterns already discovered which is the best candidate for expansion, and if no interesting patterns remain unexpanded, are there any uninteresting patterns worth expanding? Initially, the algorithm begins with a scan of the input string counting the number of occurrences (and therefore, the frequencies) of individual events. Subsequent to this, we continue to expand best candidates until there are no more candidates worthy of expansion. This notion will be made clear shortly. During each scan of the input string, when a new pattern is discovered, 5 a P A T T E R N _ R E C O R D is created for it consisting of the following information: 1.
P a t t e r n P (e.g. ((aN/~)BK~)), e t c . . .
2.
Count: How many of these patterns were found
3.
Preremaining_op: One instance of this value is kept for each temporal operator. It represents the number of patterns remaining to be discovered for which P is the prefix and the operator connecting P to its suffix is op. How these values are calculated will be discussed shortly(see Definition 3).
5 In the case of the initial scan these will simply be the events.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic 4t
5. 6.
293
Postremaining_op: Identical to Preremaining_op for suffixes rather than prefixes. Expanded(boolean): Whether or not P has been expanded. INTERESTINGNESS_LIST: consists of all events in decreasing order of interestingness amongst events that can potentially complete P during expansion. One of these lists is kept for prefixes and one for s~tffixes as well as for each operator next, before, and until. T h a t is, for a pattern P = aNti, for example, if a pattern 7N5 has already been discovered then the occurrence of 5 in 7N5 cannot possibly complete the pattern ((~Nfl)NX. When determining the best candidate for expansion we will be interested in knowing what events can potentially complete all of the patterns we have already discovered and will ,therefore, make use of these lists. In fact, this sorted list represents an ordering of most interesting events that could complete the pattern they are associated with 6
D e f i n i t i o n 3 The FORM(P) of a pattern P is a logical expression with all ground terms in P replaced by variables. For example, if P -- ((c~N(flBKT))BK(f) then F O R M ( P ) = ( W N ( ( X B K Y ) B K Z ) ) . Given the length of the input string, we can determine the number of patterns of each form in the input string. For example, given a string of length M, the number of patterns of form X N Y is M - 1. The number of patterns X B K Y is (M - K ) K + ( ( g ) ( g - 1)/(2)). D e f i n i t i o n 4 Given a pattern P and an operator op, Actual__Remaining(P op X) is the number of patterns of the form PopX that have yet to be expanded. This value is maintained for each operator, op and pattern P. That is, we maintain a value for P N X , PBKX, XBKP, etc... Again, X ranges over all events. For example, if there are 20 occurrences of P = aBKfl in the input string and 5 patterns of the form ((aBKfl)NX) have been discovered so far, then Actual_Remaining_Pre_Next (((aBKfl)NX)) = 15. We use the following heuristic to determine which discovered pattern is the best one to expand. Given an arbitrary literal D, the best pattern P for expansion is the pattern for whom the the value of
ELAHP op 5/EIP op & orEAE& op P~/EI& op P~ is maximal for some ~. 6 For problem domains with a large number of events, in the interest of scalability, partial lists may be substituted where only a list of the most interesting events is maintained.
294
Gideon Berger and Alexander Tuzhilin
This heuristic simply states that the pattern P that is most likely to result in the discovery of an interesting pattern is the one for whom there exists a literal 6 such that the expected value of the interestingness measure of the pattern generated when 6 is added to P via one of the temporal operators is maximal over all discovered patterns P and literals 6. It is necessary for us to use the expected value of the interestingness measure because, although we know the actual number of occurrences of both P and 6r don't know the number of occurrences of P o p 6 or 6 op P. How this expectation is computed follows directly from our derivations of expectations in Section 3 and is illustrated in the following example. E x a m p l e : If P = a N t i and op is next, then
=
E~AEPN6~/EPN6 (#P)(FR(6))/Pra~ Prf~
Prn6 ( K - 2)
where, K = length of input string FR(6) = frequency of 6's that could complete the pattern ( ( a N ~ ) N X ) # P = number of occurrences of pattern P If op is before, EiAEPBK6/EiPBK6 = ((#P)(FR(6))(BEFOREK))/PrE(~
PrEc, Pr6 ( K - 2 ) ( B E F O R E K )
= ( ( # P ) ( F a ( 6 ) ) ) / P r l a l PrE~ Pr6 ( g - 2)
If P = aBK~ and op is next
EEAPN6/EEPN6H = ((#P) 9 (#5))/PraPr~Pr~(K
- 2)(BEFOREK)
Similar arguments are used for any combination of the operators before, next, and until T We consider the literal 6 which is most likely to result in the discovery of an interesting pattern when used to complete the pattern P during expansion. We will now argue that this measure accomplishes our goal of expanding patterns most likely to result in the discovery of interesting patterns. The choice of a best candidate for expansion proceeds in two stages. First, recall the purpose of the I N T E R E S T I N G N E S S L I S T for each discovered pattern. T For before and until these definitions are slightly erroneous due to losses of patterns at the ends of the input string. These errors are negligible, however, since the length of the input string is much larger than the length of individual patterns of interest
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
295
Each of the INTERESTINGNESS_LISTs associated with a pattern P is sorted in such a way that the event at the head of the list, when added to P is most likely to result in the discovery of an interesting pattern. An event D will be ahead of an event e on this list if, A6~/E6 > Ae~/E H . While the expected values here are computed in the usual way, in this case, the actual values are not simply equal to the counts of 6 and e, respectively, but rather equal to the number of 6's and e's that could potentially be added to P. L e m m a 1 Given two events 6 and c where 6 occurs before e on the I N T E R E S T I N G N E S S _ L I S T then:
En'(o~N/~)op61
>
El(c~N/~)opd
P r o o f : W e prove this result for the next operator. Assume, EI(~Nf~)N6~ <- EI(~Nf~)Ne~ Let N be the length of the input string s. Then
EIIA(#P)(FR(6))H Prc~Prl~Pr~6l
< EIAI(#P)(FR(e))I ~ - Prc~IPr/~Prlel
Since # P and FR(6) are constants we can remove them from the expectations and cancel them on each side of the inequality. So, (FR(6))
Pr~6l
(FR(e))
-< I~e~"~ Pr
(FR(6))(N)
(FR(c))(N)
PrI6~(N)
Prlel(N)
EII6~
EIe ~
Contradicts assumption that 6 occurs before e on the INTERESTINGNESS_LIST The proofs for the temporal operators before and until are done similarly. So, it is now clear, for each discovered pattern P, which literal when added to P is most likely to produce an interesting pattern and how interesting we expect that pattern to be. In the second stage of choosing the best candidate we select the already discovered pattern which is likely to produce the most interesting pattern. Intuitively, we are saying that the pattern P most worth expanding is the one for which there exists a literal that is likely, when added to P, to result in the discovery of the most interesting pattern. s Here FR(6) and FR(e) represents the frequencies of 6's and c's respectively that could complete the pattern ((aNj3)NX). As discussed earlier this is not equal to the frequency of all of the 6's and e's in the input string.
296
Gideon Berger and Alexander Tuzhilin
Given these preliminary motivations, we now present the algorithm: Input: - Input String - Event Probabilities - B E F O R E K : as discussed earlier we use a bounded version of the before operator. B E F O R E K is a user defined variable that is equal to the maximum distance between two events X and Y for XBKY to hold. - Threshold T for interestingness, that is the value that if exceeded by the interestingness measure of a pattern deems it interesting - Value of MIN_TO_EXPAND: the minimum threshold of interestingness that a pattern must have in order to become the next pattern for expansion. The algorithm will terminate if no such pattern remains. - Maximum allowable pattern length Output:
- List of interesting patterns, their number of occurrences and the value of their interestingness measures Algorithm: Scan the input string to determine the interestingness measure of each event in it, and initialize list L with all these events WHILE there exists a pattern in L whose interestingness measure is greater than MIN_TO_EXPAND DO Choose_Next_Candidate the pattern P such that LENGTH(P) < MAXL which maximizes E{A{PopX}/E{PopX}} for all temporal operators op and all events X or the pattern P such that LENGTH(P) < MAXL which maximizes E{A{XopP}/E{XopP}} for all temporal operators op and all events X Scan for patterns for which P is the prefix or suffix Insert_Patterns append newly found patterns to list of already found patterns Update_Smaller (discussed below) Check_Larger (discussed below) END WHILE Return interesting patterns
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
297
The algorithm continues until there are no more patterns for which (actual remaining/expected remaining) exceeds some minimum threshold MIN_TO_EXPAND, a parameter chosen at the outset. U p d a t e _ S m a l l e r : Consider the situation when a pattern ~N/~ is chosen as the next candidate. During the scan, patterns of the form c~N/3N~/, aNflBK% etc.., will be discovered. If, for example, M occurrences of ~N/~N~ are found then there are M fewer patterns for which f~N7 is a suffix remaining to be found. This decreases the chance that f~N'y will be chosen as a candidate and this change needs to be recorded. Likewise, during the scan for c~Nf~, 5N~N/~ may be found to occur L times. Therefore, the number of remaining patterns of the form 5 N ~ N X to be found has decreased by L. Again this needs to be recorded. C h e c k _ L a r g e r : Consider the situation when c~ is chosen as the next candidate. During the scan we will discover some number of aN~, say P. As was discussed earlier, this implies that there are P patterns of the form aN/3NX and patterns of the formXN~N/~(we can make similar statements for the other operators). Some of these may have already been discovered, however. For example, if 7 was already chosen as a candidate, and some f~Nq~ were found, and then/3N 7 was chosen and M ocurences of aN/3N'~ were found, then the number of remaining patterns of the form aNflNX yet to be found is not P but rather P - M. This again needs to be recorded.
5
Experiments
We conducted experiments on three different problem domains. The first was a simple sequence of independent events. This data was generated synthetically. The second domain we considered were sequences of UNIX operating system calls as part of the sendma• program. The third was that of Web logfiles. In the last case, events were dependent.
5.1
Sequential independent events
We used an input string of length 1000 over 26 different events. In this case, we assumed that each event was equally likely and that the events were independent. We searched for patterns, P, for which Length(P)_< 5. Our results are presented in Table 2. The columns of the above table are as follows: Algorithm - The algorithm used. The naive algorithm, presented in Section 4.1, represents essentially an exhaustive search over the input string and is guaranteed to find all interesting patterns. It is included as a benchmark by which we measure the effectiveness of the main algorithm. Percentage is equal to the value for the main algorithm divided by the value for the naive algorithm times 100 for each column respectively. The first number following each algorithm(2 or4) is the value of B E F O R E K used. The second number(3,4, or 6) is the interestingness threshold.
298
Gideon Berger and Alexander Tuzhilin Algorithm # o f Naive(2,3) Main(2,3) Percentage Naive(4,3) Main(4,3) Percentage Naive(2,4) Main(2,4) Percentage Naive(4,4) Main(4,4) Percentage Naive(2,6) Main(2,6) Percentage Naive(4,6) Main(4,6) Percentage
Scans # of Expanded Patterns # of Interesting Patterns 416 2489 290 161 919 268 38.7% 36.9~ 92.4% 41( 3105 259 163 1073 250 39.2% 34.6% 96.5% 416i 2489 168 161 919 164 38.7% 36.9% 97.6% 171 41( 3105 163 1073 166 97.1% 39.2% 34.6% 416 2489 133 161 919 130 38.7% 36.9% 97.7% 129 416 3105 127 163 1073 98.4% 39.2% 34.6% Table 2. Results for independent sequential data.
# of Scans - The number of scans over the input sequence necessary to discover all interesting patterns found. # of Expanded Patterns - The number of patterns discovered, interesting or otherwise. # of Interesting Patterns - The number of interesting patterns found. Based on the results presented in Table 2, the main algorithm did not find all interesting patterns, although it discovered most while doing less work than the naive algorithm. Also note that the main algorithm was more accurate as our threshold for interestingness increased. In other words, when our algorithm did miss interesting patterns they tended not to be the most interesting.
5.2
S e q u e n c e s o f O S S y s t e m Calls
The second domain we investigated was a sequence of operating system calls made by a sendma• program. The events consisted of the 31 different system calls that the program made and our string consisted of 31769 sequential calls. At the time of these experiments we had no knowledge of the actual probabilities of these events. Therefore, we made an assumption that system calls are independent from each other and estimated probabilities of individual events by simply scanning the string and counting the number of actual occurrences of each event. For each event ei we let Prlei = ( number of occurrences of e i ) / ( t h e total string length). Because of this, the interestingness of each of atomic event was by
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
299
definition exactly 1. This forced us to assign a value to MIN_TO_EXPAND t h a t exceeds 1 or else the algorithm would not even begin. This resulted in more scans of the input string t h a n were actually necessary to discover interesting patterns but nonetheless the improvement we achieved over the naive algorithm was consistent with our experiments in other domains (approximately three times). T h e following represent a selection of interesting patterns discovered. These were selected because of a combination of their interestingness as well as our confidence t h a t these actually represent significant events due to the number of occurrences of them. These results were generated on a run where we allowed strings of up to length 5.
EVENT :((sigblock NEXT setpgrp) NEXT vtrace) COUNT : 2032 ACT/EXP :43. 1628 EVENT :(((sigblock NEXT setpgrp) NEXT vtrace) NEXT vtrace) COUNT :455 ACT/EXP :83.1628 EVENT :(((sigblock NEXT setpgrp) NEXT vtrace) BEFORE sigvec) COUNT :355 ACT/EXP : 5 2 . 1 1 5 0 EVENT :(sigblock NEXT(setpgrp BEFOREK vtrace)) COUNT :2032 ACT/EXP :21.5814 EVENT : ( ( s i g b l o c k COUNT :2032
BEFOKEK s e t p g r p ) NEXT v t r a c e )
ACT/EXP :21.5814 EVENT :((sigpause NEXT vtrace) NEXT iseek) COUNT :1016 ACT/EXP :106.672 EVENT :(sigpause BEFOREK (vtrace NEXT iseek)) COUNT :1016 ACT/EXP :53.336 EVENT :(sigvec BEFOREK(sigpause NEXT(vtrace NEXT(iseek NEXT iseek)))) COUNT:29 ACT/EXP :212.349 EVENT : ( s i g p a u s e BEFOREK (vtrace BEFOREK iseek)) COUNT : 2032 ACT/EXP : 5 3 . 3 3 6
300
Gideon Berger and Alexander Tuzhilin
EVENT :((vtrace NEXT iseek) NEXT iseek) COUNT :1017 ACT/EXP :35.5112 In these results C O U N T represents the number occurrences of the p a t t e r n E V E N T and A C T / E X P represents the interestingness of this pattern. Notice a couple of things. First, most of the interesting patterns t h a t occurred a reasonable number of times (the ones shown above) were mostly of length 3. There were, of course, more interesting patterns of longer length but the number of occurrences of these patterns was significantly fewer. Also notice t h a t no interesting U N T I L patterns were discovered. This is because we never saw AAAAAAB, i.e. all the occurrences of until were of the form AB or AAB which were captured by N E X T or B E F O R E and since fewer instances of N E X T and B E F O R E were expected these proved more interesting. These system calls are from the UNIX operating system. In the future what we propose is to assign probabilities of atomic events based on their frequencies in a period when we are confident no intrusions to the network occurred and then see if we can discover interesting patterns t h a t correspond to intrusions.
5.3
W e b logfiles
Each time a user accesses a Web site, the server on the Web site automatically adds entries to files called logfiles. These therefore summarize the activity on the Web site and contain useful information about every Web page accessed at the site. While the exact nature of the information captured depends on the Web server t h a t the site uses, the only information we m a d e use of was the user identity and the sequence of requests for pages made by each user. T h e Web site we considered was that of one of the schools at a m a j o r university. The users we considered were the two most frequent individual users. It is i m p o r t a n t to recognize t h a t the Web logfiles simply tell us the hostname from which a request originated. Typically, there are a large number of users who m a y access a Web site from the same host, and the hostname, therefore, cannot be used to definitively identify individual users. We a t t e m p t e d to identify, with some confidence, frequent hostnames t h a t did indeed represent individual users. We used two Web logfiles for our experiments. First, we considered a synthetic Web log. This included a Web site with 26 different pages and 236 total links. We used an input string of length 1000 representing 1000 hits on pages of the site. In this case events were hits on Web pages. Probabilities were, of course, not independent. T h e probability of a user reaching a given Web page is dependent on the page he is currently at. In order to compute a priori probabilities of each page we declared several pages to be equally likely "entrance points", to the Web site. If there were N "entrance points" then each has a ~ chance of occurring. If P is one of these "entrance points", P has K links on it and one of these links is to page G then the probability of G occurring is ( ~ ) ( ~ ) . By conducting an exhaustive breadth-first search we were able to calculate the probabilities of each
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
301
event occurring (i.e. each page being "hit"). When calculating expectations for various patterns, we used conditional probabilities. So, for example, the E ~ A N B is no longer P r ~ A P r ~ B ~ ( K - 1), where K is the length of the input string. It is now P r ~ A ~ P r ~ B I A ~ ( K - 1) = Pr~A~(1/#of links in page A ) ( K - 1) if there is a link from A to B and 0 otherwise. Our results for this data are presented
Naive2 Main2 Percentage Naive4 Main4 Percentage
634 1356 239 528 37.7% 38.9% 654 1564 245 568 37.5% 36.3% Table 3. Results for synthetic Web logfile data.
464 437 94.2% 462 437 94.6%
in Table 3. The interestingness threshold for these experiments was 3.0. Once again our algorithm was able to find most interesting patterns while examining much less of the search space than the naive algorithm did. Finally, we considered data from an actual Website from one of the schools of a major university. There were 4459 different pages on this site with 37954 different links between pages. We used Web log data collected over a period of nine months and selected out the two most frequent individual users of the site, both of whom accounted for more that 1400 hits and used these sequences of hits as our input string. Our experiments using this data were less enlightening than when we used synthetic data. The main algorithm found only a handful of interesting patterns of length greater than two. In fact, when we applied the naive algorithm we found that there were few more interesting patterns to be found at all. More specifically, the main algorithm found 2 and 3 interesting patterns of length greater than two in our two input strings, respectively. The naive algorithm found 3 and 3. The primary reason for the lack of interesting patterns of greater length was that the size of the Web site dominated the size of the input string. The fact that there were 4459 pages and our input strings were only of length 1400 made the expected number of occurrences of each event very small. So small, in fact, that even a single occurrence of many events proved interesting. Additional factors that compounded the problem are: 1.
C h a n g i n g W e b S t r u c t u r e . The Web architecture from which we built the graph that the algorithm was run on was from a single moment in time( we captured the structure of the Web site, including the links, on a single day, and extrapolated it to 9 months of Web log data). Over this period there were some changes to the Web site. This creates some difficulties in that the Web logfiles showed that users linked from pages to other
302
Gideon Berger and Alexander Tuzhilin pages where links did not exist in the Web we were considering. In fact, there were visits to pages in the Web log d a t a t h a t did not exist in the site we were using. This had the effect of forcing the expected number of occurrences of any patterns t h a t included these pages or links to be zero and thus never considered interesting either as patterns or candidates. 2.
M u l t i p l e S e s s i o n s . While each input string we used had a length greater t h a n 1400 events, these Web hits spanned m a n y sessions. In fact, the average session length was approximately 10 hits. The last hit from one session immediately preceded the first hit of the N E X T session in our input string. Normally, however, a link did not exist from the last page of the first session to the first page of the N E X T session. Therefore, once again this had the effect of forcing the expected number of occurrences of any patterns t h a t included this sequence of pages to be zero and thus never considered interesting either as patterns or candidates.
3.
C a c h i n g . Consider what sequence of hits appears in Web log d a t a if a user goes to pages A, B, C, D in the following order A ~ B ~ C --* B ~ D. Normally, what occurs is t h a t a request is m a d e ( a n d therefore logged) for page A then page B then page C then, however, when the user goes back to page B no request is made of the server because this page has been cached on the users' local machine. Finally, a request for page D will be m a d e and logged. Therefore, this sequence of hits will appear in the Web log d a t a as follows: A --~ B --* C --* D. If no link exists from page C to page D then once again the expected number of occurrences of any p a t t e r n including this sequence of events will be zero. Given the wide use by Web users of the B A C K button, the effect of caching is substantial.
4.
L o c a l Files. Finally, m a n y pages t h a t appeared in the Web log d a t a did not appear in the Web site we were using because they were files kept on individuals local machines in their own directories, rather t h a n on the Web server. These pages had the same effect as the changes made in the Web over the nine m o n t h period.
Lessons L e a r n e d . T h e primary cause of our lack of success in finding interesting patterns in the use of our university Web site was the fact t h a t the size of the site was very large in comparison to the size of the input strings we considered. We are planning to obtain Web logfiles spanning a longer period of time, for a smaller and more stable Web site. We are also considering various models to deal with the loss of patterns t h a t we experienced due to the multitude of user sessions.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
6
303
Conclusions and Future Work
In this chapter, we reviewed the characterization of different knowledge discovery tasks in temporal databases (as summarized in Table 1) and focused on a Class III problem of generating unexpected predicates. In particular, we presented an algorithm for finding unexpected patterns, expressed in temporal logic, in sequential databases. We used multiple scans through the database and the stepby-step expansion of the most "promising" patterns in the discovery process. To evaluate the performance of the algorithm, we compare it with the "naive" algorithm that exhaustively discovers all the patterns and show by how much our algorithm outperforms it. We also use our algorithm for discovering interesting patterns in sequences of operating system calls and in Web logfiles. In its current implementation, our algorithm discovers temporal patterns only of a certain type (described in Section 3). As a future work, we plan to extend our algorithm to include more complex temporal logic expressions. We also plan to extend our methods to discovering unexpected patterns in temporal databases, where the patterns will be expressed in first-order temporal logic. Finally, we plan to apply our algorithm to the problem of intrusion detection, as well as to a more suitable Web site having fewer HTML files and more traffic. We expect to find more interesting patterns for such a site. We are also interested in pursuing some of the complexity issues that arose in the NP-hardness proof. Specifically, The problem CLIQUE that we reduced from is actually SNP-complete 19 which is a class of languages that has some interesting properties we are investigating. In addition, we are considering application of Ramsey's Theorem to our problem domain.
7
Acknowledgments
The authors wish to thank Bud Mishra from the Computer Science Department at NYU for his general input to the contents of this chapter as well as specifically for his help with the proof of Theorem 1.
A
A p p e n d i x : P r o o f of T h e o r e m 1
Problem: Given a string of temporal events V -- v l , v 2 , . . . , vr, does there exist an interesting pattern in V of the form X1BkX2Bk... BkXm for an arbitrary m? The following proof shows that this problem is NP-complete. P r o o f of T h e o r e m 1: We show that our problem is NP-hard by proving that CLIQUE ~p INTERESTINGNESS. The reduction algorithm begins with an instance of CLIQUE. Let G ~- (V, E) be an arbitrary graph with IVI vertices and IEI edges. We shall construct a string of events S such that an interesting pattern of the form elBke2... Bkem exists if and only if G has a clique of size m. The string is constructed as follows. Each vertex vl, v2,..., Vly I, in the graph G will become an event in our string S, i.e. our events are el, e2, 999 ewi. Additionally we will make use of (IVI § IEI)m "dummy" events called dl, d 2 , . . . , d(IYl+lEI)m,
304
Gideon Berger and Alexander Tuzhilin 1
Fig. 2. The graph G ( V , E ) with vertices vl,v2,..., v6 and a clique C of size 4. C -{v2, v3, v4, vs}
where m is the value from the CLIQUE problem. Based on each vertex vi E G a substring will be created. The associated event e~ will be called the "generator" of this substring and the substring will be "generated" by the event. The concatenation of these substrings will be the string S. Initially, the vertices in G are arbitrarily ordered 1, 2, ...IVi. Then for each associated event ei, in order, we create the substring based on ei by listing, again in sorted order, the list of vertices(actually their associated events) ej, for which there exists an edge (vi, vj) c E plus the event eilVI times. For example, the substring generated by e2 for the graph in Figure 1 would be el
e2e2
9 9 9 e2 e3e4e5
since there are edges in G from v2 to each of e3,e4, and eh. Following each substring generated in this fashion we concatenate a substring of all the d u m m y events in sorted order. As will be seen shortly these dummy events are used to separate the substrings of events ei and therefore no dummies are needed following the substring generated by e l y I. Thus, for the above graph the string S -- e l e l ...ele2ehe6dl...d(iyl+lEi)mel e 2 . . . e 2 e3e4ehdl . . . Y
Iyl
Iyr
d(jvl+lEI)me2 e 3 . . 9e3 e4ehesdl . . . d(ivi+lEi)me2e3 e 4 . . . e4 ehdl . . .
IYl
Jyl
d(iyl+lEt)mele2e3e4 e h . . . e5 dl . . . d ( w l + l E i ) m e l e 3 e 6 . . . e6 IYl
IYJ
The total length of S will be 2 1 E I + i V i 2 + ( I V B - 1 ) ( ( I V I + I E B ) m ) . This can be seen as follows. The substring generated by ei will have IV occurrences of ei plus one
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
305
occurrence of each event ej such that (v~, vj) E E (deg(vi)). Summing over all vertices i the total length of these substrings will equal 2E I + IVI 2. In addition there will be a total of IV - 1 occurrences of the substring d i d 2 . . , d(El+lVI)m with a total length of ( I V - 1)((IV I + IE)m). The string S can clearly be constructed in polynomial time as it is polynomial in the size of the graph. Given that our problem allows for an exogenous assignment of probabilities we will assume that all of the events are equiprobable. T h a t is PrX
1 = iV I + (iVl + 1El) m
for X -- e~ or dj, i E 1 . . . I Y h J c 1 . . . ( l e I + tVI)m. Since each d u m m y event occurs exactly IVI - 1 times and each event ei occurs IVI times in the substring it generates plus an additional deg(IVI) times elsewhere, these exogenous probabilities are not consistent with the actual probabilities of the events in S as the events corresponding to vertices occur more frequently than the d u m m y events. It is possible to define the probabilities so that the assigned probabilities of the d u m m y events is consistent with their actual frequencies but this requires a somewhat more complicated construction and proof and offers little insight into the problem so we have chosen to proceed as described above. Let B E F O R E K -- IVI + IEIThe expected number of occurrences of a pattern
XlBkX2Bk... B k X L
~- (n
-
K(L
-
1))KL-1prX1PX2...
PrEXL
K(~-1)-1( / )PrXd. PrXL, ffK>l +
E
i=L- 1
= (n - K(L
(L-l)
""
- 1))KL-1prX1PX2...
PrXL,
else
where K -- B E F O R E K and n = ISI. This can be derived in a manner analogous to how expectations were derived in section 3. It can be seen that in the special case of L -- 2 this formula reduces to the one derived previously for s For the case where g = IVI + IEI,n = 21E I + IYl 2 + (IVI - 1)((IV I + IEI)m), and L = m we will call the value of this expectation e. Let the interestingness threshold
T - 21Vim - 1 2e The relevance of this value is that if a pattern of the form X 1 B k X 2 . . . BkXm is instantiated only with events e~(no dummies) and it occurs at least IVIm times it will be deemed interesting. If it occurs IVIm - 1 times it will not. This will be discussed in further detail shortly. We must now show that this transformation from CLIQUE to I N T E R E S T INGNESS is a reduction. First, suppose a CLIQUE Vl,V2,..., Vm exists in G and therefore corresponding events el, e 2 , . . . , em exist in S. Note that here the indexes of the vertices and events are not intended to suggest that the clique
306
Gideon Berger and Alexander ~Sazhilin
must consist of the first m vertices in the original ordering but rather are used for ease of exposition. Of course these vl,. 9 9 Vm(and el, 9 9 era) could represent any collection of m vertices(events) although we will continue to assume t h a t they are in sorted order. By construction, the substring generated by el will include elel:..ele2es...em
IvI For an a r b i t r a r y i the substring generated by ei will include o ele2.
999
e~. ei_F1 , , , e m
Each substring will contain IVI occurrences of the p a t t e r n el Bke2BkeaBk... Bkem and there are m such substrings so the total number of occurrences of this pattern is IVIm. Thus
AffelBk... Bkem~ EelBke2... Bkem
-
IVIm
e
> T
Conversely, suppose t h a t an interesting p a t t e r n of the form X 1 S k X 2 . . . B k X m exists. We must show t h a t a corresponding C L I Q U E of at least size m exists in G. T h e following l e m m a is the basis for our showing this. 1. If an interesting pattern exists then it consists only of events ei, containing no dummy events.
Lemma
P r o o f : We have already seen t h a t if a C L I Q U E of size m exists in G then an interesting p a t t e r n exists in S. Thus interesting patterns are possible. W h a t is left to show is t h a t if - a p a t t e r n consists only of d u m m y events then it c a n ' t be interesting, and - if a p a t t e r n consists of b o t h d u m m y events and events ei it c a n ' t be interesting Assume we instantiate the p a t t e r n P = X 1 B k . . . B k X m with j d u m m y events and m - j events ei where j -- 1 . . . m. Note t h a t given our definition of BEF O R E K for any p a t t e r n of this form its total length, i.e. the distance in the string S from X1 to Xm can be at most (IEI -4- IVI)m. Therefore, if a p a t t e r n contains any d u m m y events these occurrences must occur only at the beginning or end of the p a t t e r n since any d u m m y event is part of a substring of ( IE I+ IV I)m d u m m y events. T h a t is there cannot exist a d u m m y event dj in the p a t t e r n such t h a t an event e~ occurs before dj in the p a t t e r n and an event ek occurs after it. We will assume, without loss of generality, t h a t the j dun~ny events all occur at the end of the pattern. We will next count the m a x i m u m number of occurrences of p a t t e r n s of this form. 9 There may, of course be vertices that are not part of the clique that are connected via some edge to e~. These vertices would also be included.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
307
Each of the m - j events ei generates a substring in S. In t h a t substring the event ei occurs IVI times and all other events occur once. In addition, in the substring of d u m m y events immediately following this substring each event occurs once. Thus, there can be at most tVI occurrences of the p a t t e r n P t h a t include events from the substring generated for each e~. There are a total of m - k events ei in the p a t t e r n and therefore a m a x i m u m of (m -j)IVI occurrences of P t h a t include these substrings. In addition, there exist IV I - ( m - j ) substrings generated by events not in P. In each of these substrings P can occur at most once since each event in P occurs at most once in a substring t h a t it did not generate. This can result in a m a x i m u m of IVI - (m - j ) additional instances of P for a total of (m - j + 1)IV I - (m - j ) occurrences of P. This expression is maximized if j = 1 in which case the m a x i m u m number of occurrences of P = mV - m + 1. Since
reV-m+l
where e is again the expected number of occurrences of this pattern,this p a t t e r n cannot be interesting. We now know t h a t any interesting p a t t e r n can consist only of events ei. We also know t h a t each occurrence of an interesting p a t t e r n can include only events generated by a single e~ (since B E F O R E K < (E + V)m, the length of the d u m m y substrings separating event substrings generated by each event). Furthermore, we can use an argument identical to the one used in the proof of the above l e m m a to show t h a t for at least mlV occurrences of a p a t t e r n to exist at least mV of t h e m must include the generating event from which all the events in this instance came. In other words, if an interesting p a t t e r n elBk... Bkem exists then there must be at least mlV I instances which include the ei t h a t generated the substring from which all the other events came. To see this note t h a t each time an instance of a p a t t e r n t h a t includes a generating event occurs, IV I instances will actually occur, one for each copy of the generating event in the substring it generated. Let us assume t h a t only (m - 1)IV l instances of a p a t t e r n exist t h a t include the generating event from which all other events in this instance came. 1~ In all the other substrings generated by events not in the p a t t e r n there can be at most one instance of the p a t t e r n since each event occurs at most once in a substring it did not generate. There are V - (m - 1) such events so the total number of instances would only be mtV - m + 1. Therefore, for a p a t t e r n to occur at least mV times and thus to be interesting there must be m V instances t h a t include the generator of the other events in t h a t instance. Since each generator results in IV instances there are m generators t h a t are p a r t of instances. The m vertices t h a t correspond to these m events form a clique in G. This is clearly true since for any of the ei amongst these m generators there is an edge from itself to each of the other generators. Finally, note t h a t this problem is also in NP and therefore NP-complete since given a certificate(i.e, an instantiation of our p a t t e r n in this case) we can 10 There cannot be any more than this unless there are tiples of V.
mV since they come in mul-
308
Gideon Berger and Alexander Tuzhilin
check if it is interesting by simply scanning over S. This clearly can be done in polynomial time. Notice t h a t we have phrased our NP-hardness problem as "Does any interesting p a t t e r n exist?" We could have just as easily posed the question "Do p interesting patterns exist"? Our proof can be trivially extended to accomplish this by enforcing t h a t the d u m m y events always contain p - 1 interesting patterns and t h a t the p t h interesting p a t t e r n only occur if a clique of size m exists in G. Our decision to enforce t h a t the d u m m y events contain no interesting patterns and to thus pose our question as we did was rather arbitrary.
References 1. R. Agrawal, C. Faloutsos, and A. Swami. Efficient similarity search in sequence databases. In In Proc. of the conference on foundations of data organizations and algorithms (FODO), October 1993. 2. R. Agrawal, T. Imielinsky, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of ACM SIGMOD Conference, pages 207-216, 1993. 3. R. Agrawal, K-I Lin, H.S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In In Proc. of the 21st VLDB conference., 1995. 4. R. Agrawal, G. Psaila, E. Wimmers, and M. Zait. Querying shapes of histories. In In Proc. of the 21st VLDB conference., 1995. 5. R. Agrawal and R. Srikant. Mining sequential patterns. In Proc. of the International Conference on Data Engineering., March 1995. 6. W.A. Ainsworth. Speech recognition by machine. Peter Peregrinus Ltd., London, 1998. 7. D. Berndt. AX: Searching for database regularities using concept networks. In Proceedings of the WITS Conference., 1995. 8. D. J. Berndt and J. Clifford. Finding patterns in time series: A dynamic programming approach. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining. AAAI Press/ The MIT Press, 1996. 9. C. Bettini, X.S. Wang, and S. Jajodia. Testing complex temporal relationships involving multiple granularities and its application to data mining. In Proceedings of PODS Symposium, 1996. 10. J. Clifford, V. Dhar, and A. Tuzhilin. Knowledge discovery from databases: The NYU project. Technical Report IS-95-12, Stern School of Business, New York University, December 1995. 11. V. Dhar and A. Tuzhilin. Abstract-driven pattern discovery in databases. IEEE Transactions on Knowledge and Data Engineering, 5(6), 1993. 12. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. In In Proceedings of the SIGMOD conference., 1994. 13. D.Q. Goldin and P.C. Kanellakis. On similarity queries for time-series data: constralnt specification and implementation. In In Proe. of the 1st Int'l Conference on the Principles and Practice of Constraint Programming. LNCS 976, September 1995. 14. P. Laird. Identifying and using patterns in sequential data. In Algorithmic Learning Theory, 4th International Workshop, Berlin, 1993.
Discovering Unexpected Patterns in Temporal Data Using Temporal Logic
309
15. J.B. Little and L. Rhodes. Understanding Wall Street. Liberty Publishing Company, Cockeysville, Maryland, 1978. 16. H. Mannila and H. Toivonen. Discovering generalized episodes using minimal occurrences. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, August 1996. 17. H. Mannila, H. Toivonen, and A. Verkamo. Discovering frequent episodes in sequences. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal,Canada, August 1995. 18. B. Padmanabhan and A. Tuzhilin. Pattern discovery in temporal databases: A temporal logic approach. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, August 1996. 19. C. H. Papadimitriou. Computational Complexity. Addison Wesley, 1994. 20. H.V. Poor. An Introduction to signal detection and estimation. Springer-Verlag, New York, 1988. 21. L.R. Rabiner and S.E. Levinson. Isolated and connected word recognition: Theory and selected applications. In Readings in speech recognition. Morgan Kaufmann Publishers, San Mateo, CA., 1990. 22. P. Seshadri, M. Livny, and R. Ramakrishnan. Design and implementation of sequence database system. In Proceedings of ACM SIGMOD Conference, 1996. 23. A. Silberschatz and A. Tuzhilin. On subjective measures of interestingness in knowledge discovery. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, Montreal, Canada, August 1995. 24. A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Transactions on Knowledge and Data Engineering, 8(6), December 1996. 25. R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the International Conference on Extending Database Technology, 1996. 26. J. van Leeuwen. Handbook of Theoretical Computer Science: Volume B Formal Models and Semantics. The MIT Press/Elsevier, 1990. 27. J. T.-L. Wang, G.-W. Chirn, T. G. Marr, B. Shapiro, D. Shasha, and K. Zhang. Combinatorial pattern discovery for scientific data: Some preliminary results. In Proceedings of ACM SIGMOD Conference on Management of Data, 1994. 28. L. Zadeh. The role of fuzzy logic in the management of uncertainty in expert systems. In Fuzzy Sets and Systems, vol. 11, pages 199-227. 1983.
Querying the Uncertain Position of Moving
Objects A. Prasad Sistla 1, Ouri Wolfson2, Sam Chamberlain 3, and Son Dao 4 1 Department of Electrical Engineering and Computer Science, University of Illinois, Chicago, IL 60607, USA 2 Department of Electrical Engineering and Computer Science, University of Illinois, Chicago, IL 60607 and CESDIS, NASA Goddard Space Flight Center, Code 930.5, Greenbelt, MD 20771, USA 3 Army Research Laboratory, Aberdeen Proving Ground, MD, USA a Hughes Research Laboratories, Information Sciences Laboratory, Malibu, CA, USA
Abstract. In this paper we propose a data model for representing moving objects with uncertain positions in database systems. It is called the Moving Objects Spatio-Temporal (MOST) data model. We also propose Future Temporal Logic (FTL) as the query language for the MOST model, and devise an algorithm for processing FTL queries in MOST.
1
Introduction
Existing database management systems (DBMS's) are not well equipped to handle continuously changing data, such as the position of moving objects. The reason for this is that in databases, data is assumed to be constant unless it is explicitly modified. For example, if the salary field is 30K, then this salary is assumed to hold (i.e. 30K is returned in response to queries) until explicitly updated. Thus, in order to represent moving objects (e.g. cars) in a database, and answer queries about their position (e.g., How far is the car with license plate RWW860 from the nearest hospital?) the car's position has to be continuously updated. This is unsatisfactory since either the position is updated very frequently (which would impose a serious performance and wireless-bandwidth overhead), or, the answer to queries is outdated. Furthermore, it is possible that due to disconnection, an object cannot continuously update its position. In this paper we propose to solve this problem by representing the position as a function of time; it changes as time passes, even without an explicit update. So, for example, the position of a car is given as a function of its motion vector (e.g., north, at 60 miles/hour). In other words, we consider a higher level of data abstraction, where an object's motion vector (rather than its position) is represented as an attribute of the object. Obviously the motion vector of an O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases - Research and Practice LNCS 1399, pp. 310-337, 1998. (~) Springer-Verlag Berlin Heidelberg 1998
Querying the Uncertain Position of Moving Objects
311
object can change (thus it can be updated), but in most cases it does so less frequently than the position of the object. We propose a data model called Moving Objects Spatio-Temporal (or MOST for short) for databases with dynamic attributes, i.e. attributes that change continuously as a function of time, without being explicitly updated. In other words, the answer to a query depends not only on the database contents, but also on the time at which the query is entered. Our model also allows for uncertainty in the values of dynamic attributes. Furthermore, we explain how to incorporate dynamic attributes in existing data models and what capabilities need to be added to existing query processing systems to deal with dynamic attributes. Clearly, our proposed model enables queries that refer to future values of dynamic attributes, namely future queries. For example, consider an air-traffic control application, and suppose that each object in the database represents an aircraft and its position. Then the query Q = "retrieve all the airplanes that will come within 30 miles of the airport in the next 10 minutes" can be answered in our model. In 15 we introduced a temporal query language called Future Temporal Logic (FTL). The language is more natural and intuitive to use in formulating future queries such as Q. Unfortunately, due to the difference in data models, the algorithm developed in 15 for processing FTL queries does not work for MOST databases. Therefore, in this paper we develop an algorithm for processing an important subclass of FTL queries for MOST databases. The answer to future queries is usually tentative in the following sense. Suppose that the answer to the above query Q contains airplane a. It is possible that after the answer is presented to the user, the motion vector of a changes in a way that steers a away from the airport, and the database is updated to reflect this change. Thus a does not come within 30 miles of the airport in the next 10 minutes. Therefore, in this sense the answer to future queries is tentative, i.e. it should be regarded as correct according what is currently known about the real world, but this knowledge (e.g. the motion vector) can change. Continuous queries is another topic that requires new consideration in our model. For example, suppose that there is a relation MOTELS (that resides, for example, in a satellite) giving for each motel its geographic-coordinates~ roomprice, and availability. Consider a moving car issuing a query such as "Display motels (with availability and cost) within a radius of 5 miles", and suppose that the query is continuous, i.e., the car requests the answer to the query to be continuously updated. Observe that the answer changes with the car movement. When and how often should the query be reevaluated? Our query processing algorithm facilitates a single evaluation of the query; reevaluation has to occur only if the motion vector of the car changes. We provide two different kinds of semantics-- called may and must semantics respectively. Because of possible uncertainty in the values of dynamic attributes, these two different semantics may produce different results. For example, consider the query "Retrieve all air-planes with in 5 miles of airplane A". Under the "may" semantics, the answer is the set of all airplanes that are possibly with in 5 miles of A. Under the "must" semantics, this will be the set of all airplanes
312
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
which are definitely with in 5 miles of A. These two values coincide if there is no uncertainty in the position of A. Assume that at any point in time the current position of the vehicle is supplied by a Geographic Positioning System (GPS) on board the vehicle. Assume also that there is a natural, user-friendly way of entering into the database the current position and motion vector of objects. For example, a point on a screen may represent the vehicle's current position, and the driver may draw around it, on the touch-sensitive screen, a circle with a radius of 5 miles; then s/he may name the circle C and indicate that C moves as a rigid body having the motion vector of the vehicle. This way the driver specifies a circle and its motion vector, and the vehicle's computer can create a data representation of its position. The computer can automatically update the motion vector of C when it senses a change in speed or direction. In other applications, such as air-traffic-control, there may be other means of entering objects and their motion vector. Generally, a query in our data model involves spatial objects (e.g. points, lines, regions, polygons) and their motion vector. Some examples of queries are: "Retrieve the objects that will intersect the polygon P within 3 minutes", or, "Retrieve the objects that will intersect P within 3 minutes, and have the attribute P R I C E < 100", or, "Retrieve the objects that will intersect P within 3 minutes, and stay in P for 1 minute", or "Retrieve the objects that will intersect P within 3 minutes, stay in the polygon for 1 minute, and 5 minutes later enter another polygon Q". For performance considerations, in answering queries of this type, we would like to avoid examining each moving object in the database. In other words, we would like to index dynamic attributes. The problem with a straight-forward use of spatial indexing is that since objects are continuously moving, the spatial index has to be continuously updated, an unacceptable solution. Therefore, we introduce one possible method of indexing dynamic attributes, which guarantees logarithmic (in the number of objects) access time. In summary, in this paper we introduce the MOST data model whose main contributions are as follows. - A new type of attributes called dynamic attributes. The principles for incorporating dynamic attributes on top of existing DBMS's are outlined. - Adaptation of F T L as a query language in MOST. Two different semantics for queries, called "may" and "must" semantics, are provided. An efficient algorithm is devised for processing queries specified in an important subclass of F T L when all dynamic variables are deterministic, i.e. when there is no uncertainty in their values at any instant of time. We show how the algorithm for the deterministic case can be used to answer "may" queries when objects are moving on well defined routes and when there is uncertainty in their speeds. Finally, we show how the proposed algorithm can be implemented on top of an existing DBMS. T h e rest of this paper is organized as follows. In section 2 we introduce the M O S T data model and discuss the types of queries it supports in terms of
Querying the Uncertain Position of Moving Objects
313
database histories. In section 3 we define the F T L query language, i.e. its syntax and semantics in the context of MOST; we also demonstrate the language using examples, and we introduce an algorithm for processing F T L queries. In section 4 we discuss a method of indexing dynamic attributes. In section 5 we discuss several issues related to implementation of the MOST data model, including: M O S T on top of existing DBMS's, queries issued by moving objects, and distributed query processing. In section 6 we compare our work to relevant literature, and in section 7 we discuss future work. 2
The
MOST
data
model
The traditional database model is as follows. A database is a set of object-classes. A special database object called time gives the current time at every instant; its domain is the set of natural numbers, and its value increases by one in each clock tick. An object-class is a set of attributes. For example, M O T E L S is an object class with attributes Name, Location, Number-of-rooms, Price-per-room, etc. Some object-classes are designated as spatial. A spatial object class has attribute P O S I T I O N denoting the position of the object. Depending on the coordinate system one might be using, the P O S I T I O N attribute may have subattributes POSITION.X, P O S I T I O N . Y and POSITION.Z denoting the x-,y- and z-coordinates of the object. The spatial object classes have a set of spatial methods associated with them. Each such method takes spatial objects as arguments. Intuitively, these methods represent spatial relationships among the objects at a certain point in time, and they return true or false, indicating whether or not the relationship is satisfied at the time. For example, INSIDE(o,P) and OUTSIDE(o,P) are spatial relations. Each one of them takes as arguments a pointobject o and a polygon-object P in a database state; and it indicates whether or not o is inside (outside) the polygon P in that state. Another example of a spatial relation is W I T H I N - A - S P H E R E ( r , Ol , ... ,ok). Its first argument is a real number r, and its remaining arguments are point-objects in the database. W I T H I N - A - S P H E R E indicates whether or not the point-objects can be enclosed within a sphere of radius r. There may also be methods that return an integer value. For example, the method DIST(ol, 02) takes as arguments two point-objects and returns the distance between the point-objects. To model moving objects, in subsection 2.1 we introduce the notion of a dynamic attribute, and in subsection 2.2 we relate it to the concept of a database history. In subsection 2.3 we discuss three different types of queries that arise in this model.
2.1
D y n a m i c attributes
Each attribute of an object-class is either static or dynamic. Intuitively, a static attribute of an object is an attribute in the traditional sense, i.e. it changes only
314
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
when ann explicit update of the database occurs; in contrast, a dynamic attribute changes over time according to some given function, even if it is not explicitly updated. For example, consider an object starting from the origin and moving along the x axis in two dimensional space at 50 to 60 miles per unit time. Then its position at any point in time is given by its x and y coordinates, where the value of its y-coordinate is a static attribute and has valuezero, and the value of its x-coordinate is a dynamic attribute that changes with time. Its x-coordinate has value 1 some time in the time interval between ~ and ~ units of time, has value 2 some time in the interval between 2 and ~6, 2 and so on (equivalently, after t units of time, the x-coordinate value of the object's position is somewhere between 50t and 60t). We represent the values of a dynamic attribute at various times by a sequence of value-time pairs of the form (Ul, tl), ..-, (ui, ti), (Ui+l, ti+i),...(Un, tn) where tl < t2 < ... ( ti ( ti+l .... Such a sequence indicates that, for each i such that 0 < i < n, the attribute has value ui during the right open interval ti, ti+l), and has value un at time tn. Usually, tl, ..., tn denote the time instances when the attribute value changes implicitly. Note that, here we are assuming that the attribute takes discrete values. Formally, a dynamic attribute A is represented by three sub-attributes, A.initialvalue, A.updatetime, and A.function(denoted as A.f). Here A . f is a multi-valued function which takes as argument a sequence of value-time pairs and returns a set of possible value-time pairs of the dynamic attribute. Intuitively, the returned value-time pairs denote the different ways in which the input sequence can be extended to denote values of the variable in future. At time A.updatetime the value of A is given by A.initialvalue and its value until the next update is defined inductively as follows. Let v :- (ul, tl), ..., (un, tn) be a sequence of valuetime pairs denoting the values of A up to time tn, where ul -~ A.initialvalue and tl -- A.updatetime, and let (Un+l, tn+l) be any element in the set C returned by A . f on input v. Then, the sequence (Ul, tl) .... , (Un, tn), (Un+l, tn+l) denotes a possible sequence of values of A up to time tn+l. Note that by extending v with different elements from C we get different extensions; these different extensions may give different values to A at the same time denoting the uncertainty in its value. For any t > A.updatetime, we define the set of possible values of A at time t as follows. Let (Ul, tl), ..., (un, tn), (un+l, tn+l) be a sequence of value-time pairs generated inductively as defined above such that tn _< t < tn+l. Then un is a possible value of A at time t. The set of all possible values of A at time t is defined by considering all possible such sequences. Thus, we see that the possible values at a future time is defined inductively as function of its values at previous times. The need for such a definition will be be clear when we define database histories in the next subsection. Also, it is to be noted that the uncertainty in the possible value of a dynamic attribute is indicated by the fact that A . f returns a set of values. As an example, let o . P O S I T I O N . X be the dynamic variable denoting the xcoordinate of an object o moving in the direction of the x-axis at a speed ranging between the values 1 and h. Let u be a sequence of value-time pairs ending with
Querying the Uncertain Position of Moving Objects
315
(un, Ca). Then the function o.POSITION.X.f, on input u, will output the set of values ((un q- 1,tn q- e): ~ < e < 7) denoting that the x-coordinate increases by 1 at any time between ~ and ~ time units after tn. A dynamic variable A is called deterministic if the set of values returned by A.f is a singleton set. An explicit update of a dynamic attribute may change any of its sub-attributes, except for A.updatetime. There are two possible interpretations of A.updatetime, corresponding to valid-time and transaction-time (see 8). In the first interpretation, it is the time at which the update occurred in the real world system being modeled, e.g. the time at which the vehicle changed its motion vector. In this case, along with the update, the sensor has to send to the database A.updatetime. In the second interpretation, A.updatetime, is simply the time-stamp when the update was committed by the DBMS. In this paper we assume that the database is updated instantaneously, i.e. the valid-time and transaction-time are equal. When a dynamic attribute is queried, the answer returned by the DBMS gives the range of possible values of the attribute at the time the query is entered. In this sense, our model is different than existing database systems, since, unless an attribute has been explicitly updated, a DBMS returns the same value for the attribute, independently of the time at which the query is posed. So, for example, in our model the answer to the query: "retrieve the possible current x-positions of object o" depends on the value of the dynamic attribute o.POSITION.X at the time at which the query is posed. In other words, the answer may be different for time-points tl and t2, even though the database has not been explicitly updated between these two time-points. In this paper we are concerned with dynamic attributes that represent spatial coordinates, but the model can be used for other hybrid systems, in which dynamic attributes represent, for example, temperature, or fuel consumption. 2.2
D a t a b a s e histories
In existing database systems, queries refer to the current database state, i.e. the state existing at the time the query is entered. For example, the query can request the current price of a stock, or the current position of an object, but not future ones. Consequently, existing query languages are nontemporal, i.e. limited to accessing a single (i.e. the current) database state. In our model, the database implicitly represents future states of the system being modeled (e.g. future positions of moving objects), therefore we can envision queries pertaining to the future, rather than the current state of the system being modeled. For example, a moving car may request all the motels that it may reach (i.e. come within 500 yards of) in the next 20 minutes. To interpret this type of queries, i.e. queries referring to dynamic attributes, we need the notion of a database history. We assume that there is a special variable called time_stamp. A database state is a mapping that associates a set of objects of the appropriate type to each object class and a time value to the time_sCamp variable. The value of the time_stamp variable in a database state gives the time when that database state
316
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
was created, i.e. the update t h a t created the database state. In any database state, the value of a dynamic attribute A is given by the values of its three sub-attributes A.initialvalue, A.updatetime and A . f . For any object o, we let o.A denote the attribute A of o; if A has a sub-attribute B then we let o.A.B denote the value of the sub-attribute. We denote the value of the attribute A of an object o in a state s by s(o.A). Let s be a database state, and t be any time value greater t h a n or equal to
s(time_stamp). A possible database state s ~ corresponding to s at time t is a mapping t h a t associates a set of objects of the appropriate type to each object class and the value t with the variable time_stamp satisfying the following properties: for each object class C the set of objects assigned by s ~ to C is same as the set of objects assigned by s; for each object o present in s the value of an attribute A of o in s ~ is defined as follows; if A is a static attribute then s~(o.A) = s(o.A); if A is a dynamic attribute then s ~ treats A as atomic and assigns it a value s~(o.A) which belongs to the set of possible values of A as defined in the previous subsection. For a database state s, and any time t > s(time_stamp), there can be more t h a n one possible database state corresponding to s at time t. However, if all the dynamic attributes are deterministic, i.e. no uncertainty, then there can only be one possible database state corresponding to s at time t. A trace is a finite sequence so, .., si, ..., sn of database states such t h a t for every i > 0, si(time_stamp) > si-1 (time_stamp), i.e. the values of the time_stamp are strictly increasing. For any i > 0, we say t h a t the attribute A of object o is updated in the state si if o is present in both si and si-1 and s i - l ( o . A ) ~ si (o.A). We say t h a t object o is created in state si if o is present in si but not in si-1. If o is created in si then for every dynamic attribute A of o, si(o.A.updatetime) = si (time_stamp). Similarly, if a dynamic attribute A of an object o is u p d a t e d in state si, then si(o.A.updatetime) -~ si(time_stamp). Let a = (so, ..., si, ...Sn) be a trace. A possible database history h (briefly, a database history) corresponding to a is an infinite sequence v0, Vl, ..., vj, ... of possible database states such t h a t the following properties are satisfied: (i) for all j > 0, vj(time_stamp) > Vj_l(time_stamp), and vj(time_stamp) increases unboundedly with j; (ii) for every i such that 0 < i < n, there exists an k such t h a t si(time_stamp) = vk(time_stamp); (iii) for each j > 0, vj is a possible database state corresponding to s~ where i is the m a x i m u m integer such t h a t 0 < i < n and s~ (time_stamp) < vj (time_stamp). Note that, in a history, the value of time_stamp is monotonically increasing; usually, we assume t h a t these time vahms denote the instances when the state changes either due to an explicit update, or due to implicit change in the value of a dynamic attribute. There can be m a n y possible database histories corresponding to a trace. However, if all the dynamic variables are deterministic then there is only one history corresponding to a trace. Consider the example of an object o moving in the x-direction with a speed ranging between 50 and 60 miles per unit time starting with the x-coordinate equals 0 at time 0. Assume t h a t its x-coordinate is represented by the dynamic
Querying the Uncertain Position of Moving Objects
317
attribute o . P O S I T I O N . X . We assume that the x and y coordinates are integer values. The database trace corresponding to this example has only one element consisting of the initial state. Now consider any sequence of possible database states vo,vl,...,vi,.... Then v o ( o . P O S I T I O N . X ) = 0, and for any i > O, v i ( o . P O S I T I O N . X ) = i and V~_l(time_stamp) + e where e is any value between ~ and ~0" Here the lowest value and highest values of e correspond to the cases when the object moves at maximum and minimum speeds respectively. Consider a database trace denoting the various updates on a given database up to the current time t. Let h be a database history corresponding to this trace. The finite sequence consisting of possible database states in h with a lower time-stamp than t is called the past database-history, and the inifinite sequence consisting of all possible states in h with a time-stamp higher than the current time t is called the future database-history. Each state in the future history is identical to the state at time t, except possibly for the values of the dynamic attributes. We would like to emphasize at this point that the database history is an abstract concept, introduced solely for providing formal semantics to our temporal query language, FTL. We do not maintain store the database history any where.
2.3
Object Positions
As indicated earlier, we assume that we have a database denoting information about spatial and other objects. The class of spatial objects has a subclass of moving objects. There may be other subclasses of spatial objects such as polygons etc. All the moving objects are assumed to be point objects, and they have a dynamic attribute called POSITION. If a moving object is moving in 2dimensional space then we assume that it has sub-attributes P O S I T I O N . X and P O S I T I O N . Y each of which itself is a dynamic attribute (similarly, objects moving in 3-dimensional space will have three sub-attributes). On the other hand, objects may be moving on well defined routes (such as high ways etc.) and in that case, the position of the object is given by its distance when measured from a fixed point on the route in a particular direction; this distance will be considered as a dynamic attribute. Although we have used a general multi-valued function to represent the values of a dynamic attribute, one can use one of the following two schemes for representing positions of moving objects. In the first scheme, for an object moving on a well defined route, we specify the motion of the object by two numbers denoting the upper and lower bounds on the speed; for an object moving freely in the two dimensional space, its motion is specified by giving the speed bounds in the X and Y directions. In the second scheme, for an object moving on a route, we specify its distance, from the initial position, by two functions of time that give upper and lower bounds on the distance at any future time; for an object moving freely in two dimensional space, we use pairs of functions for each of the X,Y directions.
318
2.4
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
Three types of MOST queries
A query is a function which takes as input a database trace and a time value,
and outputs a set of values. In our query language, the user can use temporal operators and can refer to the current as well as future possible database states. We define the semantics of a query by referring to the possible histories of the database. We define two different kinds of semantics of a query, called may and must semantics. In our model, we distinguish between three types of queries; instantaneous, continuous and persistent. The same query may be entered as instantaneous, continuous and persistent, producing different results in each case. These types differ depending on the histories on which the query is evaluated, and on the time when they are evaluated (in contrast, in traditional databases the situation is simpler). For each of these types of queries, we may use either of the two semantics. Which of the semantics to be used can be explicitly specified by the user or the query processor may retrieve answers under the both the semantics and output both the answers. An instantaneous query is a function of the set of current possible database states, and a continuous query is an instantaneous query evaluated continuously at each instance in the future. Formally, the value of an instantaneous query at time t is defined using the set of possible histories starting at t, i.e. the time when the query is entered. As indicated earlier, the value depends upon the kind of semantics used, may or must semantics, t is usually the time when the query is entered. For example, the query Q -- "Display the motels within 5 miles of all the current possible positions of vehicle x", when considered as an instantaneous query returns a set of motels, presented to the user immediately after the query is evaluated. Since there may be an uncertainty in the current position, the set of motels returned depends upon the kind of semantics used. Under the "may" semantics the result is the set of motels with in 5 miles of any possible current position. Under the "must" semantics the result is the set of motels which are with in 5 miles of every possible current position. Observe that an instantaneous query may refer to all possible future histories. For example, "Display the motels that I reach within 3 minutes" refers to all the histories, and within each history it refers to states with a time-stamp between now and three minutes later. Under "may" semantics it will output the set of of motels reached in three minutes in any of the possible future histories; under "must" semantics it will output the set of motels that will be reached in three minutes in every possible future history. Obviously, since an instantaneous query is evaluated on an infinite history, its answer may be infinite. For example, the query: "Display the tuples (motel, reaching-time) representing the motels that I will reach, and the time when I will do so" may have an infinite answer. To cope with this situation we will assume in this paper that an instantaneous query pertains to a predefined (but very large) fixed amount of time. There are other ways of dealing with this problem (they involve a finite representation of infinite sets), but these are beyond the scope of this paper.
Querying the Uncertain Position of Moving Objects
319
To motivate the second type of query, assume that a satisfactory motel is not found as a result of the instantaneous query Q, since, for example, the price is too high for the value. However, the answer to Q changes as the car moves, even if the database is not updated. Thus, the traveler may wish to make the query continuous, i.e. request the system to regard it as an instantaneous query being continuously reissued at each clock tick (while the car is moving), until cancelled (e.g. until a satisfactory motel is found). Formally, a continuous query at time t is a sequence of instantaneous queries, one for each point in time t ~ > t (i.e. the query is considered on the infinite history starting at time tt). If the answer to a continuous query is presented to the user on a screen, the display may change over time, even if the database is not updated. Clearly, continuously evaluating a query would be very inefficient. Rather, when a continuous query is entered our processing algorithm evaluates the query once, and returns a set of tuples. Each tuple consists of an instantiation p of the predicate's variables (i.e. an answer to the query when considered in the noncontinuous sense) and a time interval begin to end. The tuple (p, begin, end) indicates that p is in the answer of the instantaneous queries from time begin until the time end. The set of tuples produced in response to a continuous query CQ is called Answer(CQ). Obviously, an explicit update of the database may change a tuple in Answer For example, it is possible that the query evaluation algorithm produces the tuple (o, 5, 7), indicating that o satisfies the query between times 5 and 7. If the speed of the object o is updated before time 5, the tuple may need to be replaced by, say (o, 6, 7), or it may need to be deleted. Therefore, a continuous query CQ has to be reevaluated when an update occurs that may change the set of tuples Answer( CQ ). In this sense Answer(CQ) is a materialized view. However, a continuous query in our model is different than a materialized view, since the answer to a continuous query may change over time even if the database is not updated.
(CQ).
Finally, the third type of query is a persistent query. Formally, a persistent query at time t is defined as a sequence of instantaneous queries at each future time t ~ > t, where the instantaneous query at t ~ has two argmnents (i) the database trace as of t ~ and (ii) the time value t; note that the semantics of this instantaneous query is defined using the possible histories with respect to the database trace at t ~. Observe that, in contrast to a continuous query, the different instantaneous queries comprising a persistent query have the same starting point in the possible histories. These histories may differ for the different instantaneous queries due to database updates executed after time t. To realize the need for persistence, consider the query R -- "retrieve the objects whose speed in the direction of the X-axis doubles within 10 minutes". Suppose that the query is entered as persistent at time 0. Assume that for some object o, at time 0 the value of the dynamic attribute P O S I T I O N . X changes according to the function 5t (recall, t is time, i.e. the speed is 5). At time 0 no objects will be retrieved, since for each object, the speed is identical in all future database states; only the location changes from state to state. Suppose further
320
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
that after one minute the function is explicitly updated to 7t, and after another minute it is explicitly updated to 10t. Then, the speed in the X direction has changed from 5 at time 0 to 10 at time 2, and hence, at time 2 object o should be retrieved as an answer to R. But if we consider the query R as instantaneous or continuous o will never be retrieved, since starting at any point in time, the speed of o is identical in all states of the future database history. When entered as persistent, the query R is considered as a sequence of instantaneous queries, all operating on the history that starts at time 0. At time 2 this history reflects a change of the speed from 2 to 4 within two minutes, thus o will be retrieved at that time. In summary, the three types of queries are illustrated in the following figure.
database history t i
I
|
I
I y
I
I
, *
H
Fig. 1. database history (a) An instantaneous query at time t is defined with respect to the set of possible future histories Ht (i.e. the future history beginning at t). (b) A continuous query at time t is a sequence of instantaneous queries at each time t' >_t. (e) A persistent query at time t is a sequence of instantaneous queries, all at time t. The queries are evaluated at each time t p >_ t when the database is updated.
In contrast to continuous queries, the evaluation of persistent queries requires saving of information about the way the database is updated over time, and we postpone the subject of persistent query evaluation to future research. Observe that persistent queries are relevant even in the absence of dynamic variables. In 15 we developed an algorithm for processing F T L persistent queries. Unfortunately, that algorithm does not work when the queries involve dynamic variables. Observe that continuous and persistent queries can be used to define temporal triggers. Such a trigger is simply one of these two types of queries, coupled with an action and possibly an event. 3
The
FTL
language
In this section we first motivate the need for our language (subsection 3.1), then we present the syntax (3.2) and semantics (3.3) of FTL. In subsection 3.4 we demonstrate the language through some example, and in subsection 3.6 we present our query processing algorithm.
Querying the Uncertain Position of Moving Objects
3.1
321
Motivation
A regular query language such as SQL or OQL can be used for expressing temporal queries on moving objects, however, this would be cumbersome. The reason is that these languages do not have temporal operators, i.e. keywords that are natural and intuitive in the temporal domain. Consider for example the query Q: "Retrieve the pairs of objects o and n such that the distance between o and n stays within 5 miles until they both enter polygon P". Assume that for each predicate G there are functions begin_time(G) and end_time(G) that give the beginning and ending times of the first time-interval during which G is satisfied; also assume that "now" denotes the current time. Then the query Q would be expressed as follows. RETRIEVE o,n FROM Moving-Objects WHERE begin_time(DIST(o, n) < 5) < now and end_time(DIST(o, n) < 5) >
begin_time(INSIDE(o, P)) A INSIDE(n, P)). At the end section 3.2 we show how the query Q is expressed in our proposed language, FTL. Clearly, the query in FTL is simpler and more intuitive. The SQL and OQL queries may be even more complex when considering the fact that the spatial predicates may be satisfied for more than one time interval. Thus, we may need the functions begin_time1 and end_time1 to denote the beginning and ending times of the first time interval, begin_time2 and end_time2 to denote the beginning and ending of the second time interval, etc.
3.2
Syntax
The FTL query language enables queries pertaining to the f u t u r e states of the system being modeled. Since the language and system are designed to be installed on top of an existing DBMS, the FTL language assumes an underlying nontemporal query language provided by the DBMS. However, the FTL language is not dependent on a specific underlying query language, or, in other words, can be installed on top of any DBMS. This installation is discussed in section 4.1. The formulas (i.e. queries) of FTL use two basic future temporal operators U n t i l and N e x t t i m e . Other temporal operators, such as E v e n t u a l l y , can be expressed in terms of the basic operators. The symbols of the logic include various type names, such as relations, integers, etc. These denote the different types of object classes and constants in the database. We assume that, for each n _> 0, we have a set of n-ary function symbols and a set of n-ary relation symbols. Each n-ary function symbol denotes a function that takes n-arguments of particular types, and returns a value. For example, § and * are function symbols denoting addition and multiplication on the integer type. Similarly, _<, _> are binary relation symbols denoting arithmetic comparison operators. The functions symbols are also used to denote atomic queries, i.e. queries in the underlying
322
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
nontemporal query language (e.g. OQL). We assume that all atomic queries retrieve single values. For example, the function " R E T R I E V E (o.height) W H E R E o.id -- 100" denotes the query that retrieves the height of an object whose id is 100. Atomic queries can have variables appearing in them. For example, "RET R I E V E (o.height) W H E R E o.id -- y" has the variable y appearing free in it; for a given value to the variable y, it retrieves the height of the object whose id is given by y. Functions of arity zero denote constants and relations of arity zero denote propositions. The formulas of the logic are formed using the function and relation symbols, the object classes and variables, the logical symbols -1, A, the assignment quantifier ~--, square brackets , and the temporal modal operators U n t i l and N e x t t i m e . In our logic, the assignment is the only quantifier. It binds a variable to the result of a query in one of the database states of the history. One of the advantages of using this quantifier rather than the First Order Logic (FOL) quantifiers is that the problems of safety are avoided. This problem is more severe when database histories (rather than database states) are involved. Also, the full power of FOL is unnecessary for the sequence of database states in the history. The assignment quantifier allows us to capture the database atomic query values at some point in time and relate them to atomic query values at later points in time.
A term is a variable or the application of a function to other terms. For example, time+ 10 is a term; if x, y are variables and f is a binary function, then f(x, y) is a term; the query " R E T R I E V E o.height W H E R E o.id -- y" specified above is also a term. Well formed formulas of the logic are defined as follows. If tl, ..., tn are terms of appropriate type, and R is an n-ary relational symbol, then R(tl, ..., tn) is a well formed formula. If f and g are well formed formulas, then -~f, f A g, f U n t i l g, N e x t t i m e f and (x ~- tf) are also well formed formulas, where x is a variable and t is a term of the same type as x and may contain free variables; such a term t may represent a query on the database. A variable x appearing in a formula is free if it is not in the scope of an assignment quantifier of the form x r t. In our system, a query is specified by the following syntax: R E T R I E V E W H E R E <semantic-spec> . Here is an F T L formula in which all the free variables are object variables. The specification is a list of attributes of all object variables appearing free in the condition part. The clause <semantic-spec> can be one of the two key words m a y or m u s t , and it specifies the semantics to be used in processing the query. We call a query to be a "may" query if its semantic clause is the key word "may", otherwise the query is called a "must" query. For example, the following query retrieves the pairs of objects o and n such that, on all future histories, the distance between o and n stays within 5 miles until they both enter polygon P (the F T L formula is the argument of the W H E R E clause) in all possible future histories:
Querying the Uncertain Position of Moving Objects R E T R I E V E o,n W H E R E m u s t DIST(o, n) < 5 Until (INSIDE(o, P)) A INSIDE(n, 3.3
323
P)
Semantics
Intuitively, the semantics are specified in the following context. Let so be the state of the database when a query f is entered. The formula f is evaluated on the history starting with so. We define the formal semantics of our logic as follows. We assume t h a t each type used in the logic is associated with a domain, and all the objects of t h a t type take values from t h a t domain. We assume a standard interpretation for all the function and relation symbols used in the logic. For example, < denotes the standard less-than-or-equal-to relation, and + denotes the standard addition on integers. We will define the satisfaction of a formula at a state on a history with respect to an evaluation, where an evaluation is a m a p p i n g t h a t associates a value with each variable. For example, consider the formula Ix ~- RETRIEVE(o)NexttimeRETRIEVE(o) ~ x, t h a t is satisfied when the value of some attribute of o differs in two consecutive database states. The satisfaction of the subformula RETRIEVE(o) ~ x depends on the result of the atomic query t h a t retrieves o from the current database, as well as on the value of the variable x. The value associated with x by an evaluation is the value of o in the previous database state. T h e definition of the semantics proceeds inductively on the structure of the formula. If the formula contains no temporal operators and no assignment (to the variables) quantifiers, then its satisfaction at a state of the history depends exclusively on the values of the database variables in t h a t state and on the evaluation. A formula of the form f U n t i l g is satisfied at a state with respect to an evaluation p, if and only if one of the following two cases holds: either g is satisfied at t h a t state, or there exists a future state in the history where g is satisfied and until then f continues to be satisfied. A formula of the form N e x t t i m e f is satisfied at a state with respect to an evaluation, if and only if the formula f is satisfied at the next state of the history with respect to the same evaluation. A formula of the form x *-- tf is satisfied at a state with respect to an evaluation, if and only if the formula f is satisfied at the same state with respect to a new evaluation t h a t assigns the value of the t e r m t to x and keeps the values of the other variables unchanged. A formula of the form f A g is satisfied if and only if b o t h f and g are satisfied at the same state; a formula of the form -~f is satisfied at a state if and only if f is not satisfied at t h a t state. In our formulas we use the additional propositional connectives V (disjunction), ~ (logical implication) all of which can be defined using ~ and A. We will also use the additional temporal operators E v e n t u a l l y and A l w a y s which are defined as follows. The temporal operator E v e n t u a l l y f asserts t h a t f is satisfied at some future state, and it can be defined as t r u e U n t i l f . Actually, in our context a more intuitive notation is often l a t e r f , but we will use the
324
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
traditional E v e n t u a l l y f . The temporal operator Always f asserts that f is satisfied at all future states, including the present state, and it can be defined as ~ E v e n t u a l l y ~ f . We would like to emphasize that, although the above context implies that f is evaluated at each database state, our processing algorithm avoids this overhead. Let Q be an instantaneous query specified at time t using the syntax given at the end of the last subsection. Let the FTL formula f denote the condition part of Q, and let T denote the target list of Q. We define the semantics based on the isemantic-specs clause in Q. Let a be the database trace denoting the sequence of updates up to t. Let H be the set of all possible future database histories corresponding to ~ as of now, i.e. as of time t. For any h E H, let F h be the set of all evaluations p to the free variables in f such that f is satisfied at the beginning of h with respect to the evaluation p. Let Rh denote the set of all tuples t obtained by applying some evaluation in Fh to the target list T, i.e. Rh : (p(T) : p C Fh}. Let May_Answer(Q) = UhEFh Rh and Must_Answer(Q) = NheFu Rh. If Q is a "may" query, then we define the semantics of Q, i.e. the answer to Q, to be May_Answer(Q), and if Q is a "must" query its semantics is defined to be Must_Answer(Q). Thus, it is easy to see that the answer computed for the "may" query indicates possibility with respect to at least one of the future possible histories, while the answer computed with for a "must" query denotes definiteness of the result. Both these answers coincide when all the dynamic attributes are deterministic, i.e. H contains a single history.
3.4
Examples
In this subsection, we show how to express some queries in FTL. For expressive convenience, we also introduce the following real-time (i.e. bounded) temporal operators. These operators can be expressed using the previously defined temporal operators and the time object. (see 15). E v e n t u a l l y _ w i t h i n _ c (g) asserts that the formula g will be satisfied within c time units from the current position. E v e n t u a l l y _ a f t e r _ c (g) asserts that g holds after at least c units of time. AIways_for_c (g) asserts that the formula holds continuously for the next c units of time. The formula (g until_within_c h) asserts that there exists a future instance within c units of time where h holds, and until then g continues to be satisfied. The following query retrieves all the objects o of type "civilian" that may enter a restricted area P within three units of time from the current instance. (I)
RETRIEVE o WHERE m a y (o.type --"civilian" A P.type = "restricted"A E v e n t u a l l y _ w i t h i n _ c I N S I D E ( o , P))
The following query retrieves all the civilian objects o that definitely (i.e. must) enter a restricted area P within three units of time, and stay in P for another 2 units of time.
Querying the Uncertain Position of Moving Objects (II)
325
RETRIEVE o W H E R E m u s t (o.type ="civilian" A P.type =" restricted" A E v e n t u a l l y _ w i t h i n _ 3 (INSIDE(o, P)A Always_for_2 INSIDE(o, P)))
The following query retrieves all the objects o that may enter the polygon P within three units of time, stay in P for two units of time, and after at least five units of time enter another polygon Q. (III)
RETRIEVE o WHERE may (Eventually_within_3
(INSIDE(o, P)A Always_for_2
(INSIDE(o, P))A E v e n t u a l l y _ a f t e r _ 5 INSIDE(o, Q))
3.5
Algorithm for evaluation of MOST queries
Earlier in subsection 2.3, we have indicated two different ways for representing the positions of moving objects. In the reminder of this paper, we use the first of these schemes. For an object o moving on a route, we assume that o.ubs and o.lbs, respectively, denote the upper and lower bounds on the speed of the object and that these bounds are positive ; we also assume that the attribute o.route gives the identity of the route on which the object is traveling. We say that an object o is moving freely in 2-dimensional space if its velocities in the x and y directions are independent. For such an object o, we let o.X.ubs and o.X.lbs denote the upper and lower bound speeds in the direction of the x-axis, and o.Y.ubs and o.Y.lbs represent the corresponding speeds in the direction of the y-axis; each of these speeds can be positive or negative. (Note that for an object that moves on a route, the direction of its motion is determined by the route and its speed will give its state of motion at that point; on other hand for an object moving freely in 2-dimensional space we need to know its speeds in both the x and y directions). For a moving object, any of the above sub-attributes can be explicitly updated. In this subsection, we consider the problem of evaluating queries in the MOST model. An F T L formula f is said to be a restricted conjunctive formula, if it has no negations appearing in it, the only temporal operators appearing in it are u n t i l , u n t i l _ w i t h i n _ c and E v e n t u a l l y _ w i t h i n _ c , and the time_stamp or the time variable does not appear in it; the last condition implies that for every query q that appears on the right hand side of an assignment in f (i.e. as in x ~ q) the value returned by q at any time is independent of the time when it is evaluated and is only a function of the values to the free variables in q and the current positions of the objects. This condition also ensures that satisfaction of a non-temporal predicate when an object is at a particular position depends only on the position of the object but not the time when it reached the position. Also, note that f does not contain the n e x t t i m e operator.
326
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
The following theorem shows that the problem of evaluating a "may" query whose condition part is a conjunctive F T L formula is PSPACE-hard when the objects are moving freely in 2-dimensional space. This theorem is proved by exhibiting a straightforward reduction from the model-checking problem for conjunctive formulas which is a known PSPACE-hard problem 10. T H E O R E M 1: Given a MOST database D modeling objects moving freely in 2-dimensional space, and given a "may" query whose condition part is given by a conjunctive F T L formula containing one free moving object variable, the problem of evaluating the query is a PSPACE-hard problem. Now, we consider the problem of evaluating "may" queries where the objects are moving on routes. Consider a query Q whose condition part is given by a conjunctive formula f with one free moving object variable o. Now consider an object, say ol, whose speed is in the range 1, u. There are many possible histories corresponding to the varying speeds of ol. Let h be the possible history corresponding to the case where the object moves with the highest speed u at all times. Intuitively, it seems to be the case that if there is a possible history h ~ such that h ~ satisfies f at the first state with respect to the evaluation where the variable o is assigned object ol, then f is also satisfied at the beginning of h with respect to the same evaluation. This is due to the following properties: (a) in both the histories object ol goes through the same positions (possibly at different times), (b) all the time bounds in the formula f are only upper bounds, and if these bounds are met when the object is moving at a lower speed then they will definitely be met when the object is moving at a higher speed, and (c) time does not appear any where else in the formula; this ensures that satisfaction of a non-temporal predicate at a particular time only depends on the position of the object but not the time when it reached the position. Now, we have the following theorem. T H E O R E M 2: Let f be a conjunctive F T L formula with one free object variable o ranging over moving objects, ol be an object moving on a route with speed in the range l, u, p be an evaluation in which o is mapped to the object ol, and h be a history in which ol is moving with the maximum speed u. Then, f is satisfied at the beginning of some possible history with respect to the evaluation p iff it is satisfied at the beginning of h with respect to p. P r o o f : Let h ~ be any possible history that satisfies f at the beginning with respect to the evaluation p. For each i > 0, let s~ and t~ denote the i th states in h and h t respectively. Since, in a history a new state is added whenever the position of any object changes, it is the case that the distance of any object in successive states of a history either remains unchanged or changes by 1. Hence, we can divide a history in to a sequence of sub-sequences B o , B 1 , ..., B i , ... of successive states such that ,for each i > 0, the distance of object ol in any two states of B~ is same, and its distance in a state in B~ differs from a state in B~+I by 1. Let B0, B1, ...B~, ... be the sequence of sub-sequences corresponding to h; similarly, let Co, C1, ..., Ci, ... be such a sequence corresponding to h ~. Since, in both the histories ol starts from the same initial position, it is the case that for each i > 0, the distance of ol in any state in Bi equals its distance in any state
Querying the Uncertain Position of Moving Objects
327
in Ci. For each i >_ 0, we say that every state in B~ corresponds to every state in Ci and vice versa. Let g be a subformula of f . Now, by a simple induction on the length of g, we show that (*) If g is satisfied at t~ in h t and sj is any state in h that corresponds to ti then g is also satisfied at sj in h ~. The proof is as follows. If g is an atomic formula then (*) holds because the satisfaction of g, with respect to an evaluation, only depends on the position of object ol, and it is independent of the time. The non-trivial case in the induction is when g is of the form g l u n t i l _ w i t h i n _ c g2 where c is a positive constant. Assume that g is satisfied at ti in h t. This implies that there exists some i t > i such that g2 is satisfied at ti,, and for all k, i < k < i t, gl is satisfied at tk; further more, the difference in the value of the time_stamp variable in the states t~, and t~ is bounded by c. Clearly, there is a state s j, in h that appears after si and that corresponds to ti,; furthermore, every state appearing between sj and s j, corresponds to some state appearing between t~ and t~,. By induction, we see that g2 is satisfied at sj,, and gl is satisfied at sj and at all states appearing after sj but before sy. Also, the distance traversed by ol from state sj to sj, is same as that between ti and t~,. Since, in history h, ol is traveling at a higher speed, it is the case that difference in the values of time_stamp in state sj, and sj is smaller than between ti, and t~. From all this, we see that the formula g l u n t i l _ w i t h i n _ c g2 is also satisfied at state sj in h. The other cases in the proof are straightforward. Theorem 2 shows that, in order to answer the "may" queries whose condition part is a restricted conjunctive formula with a single free variable that ranges over moving objects, it is enough if we consider the single history where the objects are moving at the maximum speed. This corresponds to the deterministic case. In the reminder of this section we present an algorithm for evaluating F T L queries for the case when the objects are moving at constant speeds on different routes. Our algorithm works for class of queries given by conjunctive formulas, and for the case when all the dynamic variables are deterministic. A conjunctive formula is an F T L formula without negation and without the n e x t t i m e operator and without any reference to the time_stamp variable. Even though conjunctive formulas can not explicitly refer to the time_stamp variable, one can express real-time properties using the real time temporal operators. Note that the class of conjunctive formulas is superset of the class of restricted conjunctive formulas. In practice, most queries are indeed expressed by conjunctive queries. For instance, all the example queries we use in this paper are such. One of the main reasons for the restriction to conjunctive formulas is safety (i.e. finiteness of the result); negation may introduce infinite answers. The handling of negation can be incorporated in the algorithm, but this is beyond the scope of this paper. An additional restriction of the algorithm is that it works only for continuous and instantaneous queries (i.e. not for persistent queries). For a query CQ specified by the formula f with free variables (Xl, ..., xk) the algorithm returns a relation called Answer(CQ) (this relation was originally discussed in subsection 2.4), having k § 2 attributes. The first k attributes give
328
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
an instantiation p to the variables, and the last two attributes give a time interval during which the instantiation p satisfies the formula. The system uses this relation to answer continuous and instantaneous queries as follows. For a continuous query CQ, the system presents to the user at each clock-tick t, the instantiations of the tuples having an interval that contains t. So, for example, if Answer(CQ) consists of the tuples (2, 10,15), and (5, 12,14), then the system displays the object with id = 2 between clock ticks 10 and 15, and between clock-ticks 12 and 14 it also displays the object with id = 5. For an instantaneous query, the system presents to the user the instantiations of the tuples having an interval that contains the current clock-tick.
The FTL query processing a l g o r i t h m Let f ( x l , x2, ..., xk) be a conjunctive F T L formula with free variables Xl, x2, ..., Xk such that the variable time_stamp is also not referenced in it. We assume that the system has a set of objects O. Some of these objects are stationary and the others are mobile. The positions (i.e. the X , Y and Z coordinates) of the stationary objects are assumed to be fixed, while the positions of the mobile objects are assumed to be dynamic variables. Without loss of generality we assume that the time when we are evaluating the query is zero. The current database state reflects the positions of objects as of this time, and furthermore, we assume that for each dynamic variable we have functions denoting how these variables change over time. As a consequence, the values of static variables at any time is the same as their value at time zero, and the values of dynamic variables at any time in the future are given by the functions which are stored in the database. Thus, the future history of the database is implicitly defined. For each subformula g of f (including f itself), our algorithm computes a relation Rg. Let g(xl,..., Xk) be a subformula containing free variables Xl, ..., xk. The relation Rg will have (k + 2) attributes; the first k attributes correspond to the k variables; the last two attributes in each tuple specify the beginning and ending of a time interval; we call this as the interval of the tuple. Each tuple in Rg denotes an instantiation p of values to the free variables in g and an interval I (specified by the last two columns) during which the formula g is satisfied with respect to p. The algorithm computes Rg, inductively, for each subformula g in increasing lengths of the subformula. To do this it executes a sequence of one or more SQL queries whose result will be the desired relation Rg. We only describe how to generate these SQL queries. After the termination of the algorithm, we will have the relation R f corresponding to the original formula f . The base case in our algorithm is when g is an atomic predicate R(Xl, ..., xk) such as a spatial relation etc. In this case, we assume that there is a routine, which for each possible relevant instantiation of values to the free variables in g, gives us the intervals during which the relation R is satisfied. Clearly, this algorithm has to use the initial positions and functions according to which the dynamic variables change. For example, if R is the predicate D I S T ( x l , x2) _< 5, then the algorithm gives, for each relevant object pair ol, o2, the time intervals
Querying the Uncertain Position of Moving Objects
329
during which the distance between them is < 5 (for this example, if we assume that all objects are point objects, and that xl ranges over moving objects, and x2 ranges over stationary objects, and that we have a relational database containing information about the the routes and speeds of moving objects and about the positions of statinary objects on the routes, then we can write an SQL query that computes a relation denoting the the ids of objects and the time intervals during which the predicate R is satisfied). We assume that the relation given by the atomic predicates are all finite. For cases where these relations are infinite in size, we need to use some finite representations for them and work with these representations; this is beyond the scope of this paper and will be discussed in a later paper. For the case when g is not an atomic predicate, we compute the relation Rg inductively based on the outer most connective of g as given below. - Let g -- gl A g2. In this case, let R1, R2 be the relations computed for gl and g2 respectively, i.e P~ = Rg, for i = 1, 2. For a given instantiation p, if gl is satisfied during interval/1 and g2 is satisfied d u r i n g / 2 then g is satisfied during the interval 11 n / 2 . The relation R for g is computed by joining the relationships R1 and R2 as follows: the join condition is that common variable attributes should be equal and the interval attributes should intersect; the retrieved tuple copies all the variable values, and the interval in the tuple will be the intersection of the of the intervals of the joining tuples. It is faily easy to see how we can write a single SQL query that computes Rg from Rg 1 and Rg 2. - Let g = gl U n t i l g2, and let R1 and R2 be the relations corresponding to gl and g2 respectively. Let p § 2, q § 2 be the number of columns in R1 and R2 respectively. First, we compute another relation S from R1 as follows. We define a chain in R1 to be a set T of tuples in R1 that give same values to the first p columns and such that the following property is satisfied: if l denotes the lowest value of the left end points of all intervals of tuples in T and u denotes the highest value of the right end points of these tuples ,then every time point in the interval If, u is covered by an interval of some tuple in T (i.e., the interval /, u is the union of all the intervals in T); we define T to be a maximal chain if no proper super set of it is a chain. The relation S is obtained by having one tuple corresponding to each maximal chain T in R1 whose first p columns have the same values as those in T and whose interval is the interval /, u as defined above. For example, if a maximal chain has three tuples with intervals 10, 20, 15, 30 11,40 then these will be represented by a single tuple whose interval is 10, 40. The resulting relation S satisfies the following property. For any two tuples t, t r E S, if t, t ~ match on the first p columns (i.e. columns corresponding to the variables), then their intervals will be disjoint and furthermore these intervals will not even be consecutive; the non-consecutiveness of the interwls means that there is a non-zero gap separating intervals in tuples that give identical values to corresponding variables;
330
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao The following SQL query computes S from R1. For any tuple t, we let t.1 and t.u denote the left and right end points of the interval of t. SELECT(< list >, tl.1, t2.u) FROM R1 tl, R1 t2 WHERE COND-B AND NOT EXISTS ( SELECT t3 FROM R1 t3, R1 t4 WHERE COND-C AND NOT EXISTS ( SELECT t5 FROM R1 t5 WHERE COND-D
-
))
In the above query, the < list > in the target list is the list of the first p attributes of tl. COND-B specifies that tl and t2 give identical values to the first p columns and that tl.l < t2.u, and there is no other tuple whose interval contains t2.u+ 1 or tl.1-1; the later condition guarantees maximality of the chain. The WHERE clause of the outermost query states that tl.l and t2.1 denote the left and right ends of a chain. This is indicated by stating that there are no tuples t3 and t4 whose intervals intersect with the interval tl.l, t2.u, and such that t3.u < t4.l and such that there is a gap between t3.u and t4.1; COND-C specifies the first of the two conditions; the existence of a gap between t3.u and t4.1 is indicated by the inner most subquery starting with the clause "NOT EXISTS"; this subquery states that there is no tuple t5 whose interval intersects with the interval t3.u, t4.l; COND-C states the later condition. COND-B,COND-C and COND-D also specify that the first p columns of tl thru t5 match. Observe that if tl, t2 are any two tuples belonging to S and R~, respectively, such that their intervals intersect, and tl.l <_ t2.1, and their values on common columns match, then g is satisfied throughout the interval tl.l, t2.u. Now, the relation Rg is computed from S and R~ as follows. Let A be the union Df all column names in S and R2 that correspond to variables. The relation Rg will contain IAI + 2 columns. For each tl C S and t2 E R2 that satisfy the above properties, the relation Rg will contain a tuple t such that t.l = tl.l, t.u = t2.u, and the first IA columns of t contain the corresponding values from tl or t2. It is fairly straightforward to write a SQL query that computes R 9 from S and R2. Let g = gl until_within_c g2 and R1, R2 be the relations corresponding to gl and g2 respectively. Let S be the relation computed from R1 as given in the previous case (i.e. the case for "until"). Let tl E S and t2 E R2 be tuples that match on common columns and such that their intervals intersect and such that tl.l <_ t2.l. Let d = max{t1./, t2.1 - c } . It should be easy to see that g is satisfied throughout the interval d, t2.u with respect to the evaluation given by cohmms corresponding to variables in tl and t2. For every such
Querying the Uncertain Position of Moving Objects
331
tuples tl and t2, there will be a tuple t in Rg with t.l = d, t.u = t2.u and such that the variable columns in t have the same values as in tl or t2. It should be easy to write a SQL query that computes Rg from S and R2. Let g = gl until_after_c 92. Recall that g is satisfied at some point if 92 is satisfied at some point which is at least c time units later and until then gl is satisfied. Let R 1 , R 2 , S, tl and t2 be as in the previous case. Let e = min~tl.u, t2.u). Also assume that tl.1 < e - c. Now, it is easy to see that g is satisfied through out the interval tl.l, e - c. Corresponding to each tl, t2 satisfying the above conditions, the relation Rg will have a tuple t such that t.l = tl.1, t.u = e - c and the variable columns in t have the same values as the corresponding columns in tl or t2. We can easily write an SQL query that computes Rg from S and R2. - Let g = y ~- qgl, and let R1 be the relation corresponding to 91. The atomic query q may have some free variables. For example, q may be h e i g h t ( o ) denoting the height attribute of the object given by the variable o. We assume that the value of q is given by a relation Q with p + 3 columns where the first p columns correspond to the free variables in q, the (p + 1)st column is the value of q and the last two columns specify a time interval. Each tuple t in Q denotes the value of the atomic query q during the interval specified by the last two columns, and for the the instantiation of free variables specified by the first p columns; the value of the query is given by the p + 1st column. In above example, Q will have four columns; the first column gives the object id, the third and fourth colu~nns give an interval and the second column gives the height of the object during this interval. Now the relation R for g is obtained by joining Q and R1 where the join condition requires that columns corresponding to common variables should be equal, the column corresponding to the y variable in R1 should be equal to the (p+ 1)st column of Q, and the time intervals should intersect. For two joining tuples tl in R1 and t2 in Q, in the output tuple we copy all variable columns from tl and t2 excepting the one corresponding to variable y, and the time interval in the output tuple will be the intersection of the time intervals in tl and t2. -
4
Discussion
In this section we first discuss the implementation of our proposed data model on top of existing DBMS's (subsection 5.1), then we discuss architectural issues, particularly the implications of disconnection and memory limitations of computers on moving objects (5.2), and various query processing strategies in a mobile distributed system (5.3). 4.1
I m p l e m e n t i n g M O S T on t o p of a D B M S
Our system proposed in this paper (including an FT L language interpreter) can be implemented by a software system, called MOST, built on top of an existing DBMS. Such a system can add the capabilities discussed in this paper to the
332
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
DBMS as follows. We store each dynamic attribute A as its sub-attributes; two of the sub-attributes are A.initialvalue and A.updatetime; the other subattributes specify the future values together with uncertainty. In case of when A is the position of a moving object, the other subattributes may be the upper and lower bounds on the speed, or upper and lower bound functions of time that denote the actual position and its uncertainty. Any query posed to the DBMS is first examined (and possibly modified) by the MOST system, and so is the answer of the DBMS before it is returned to the user. In the rest of this subsection we sketch the modifications to queries and answers of the underlying DBMS. For simplicity our exposition will assume the relational model and SQL for the underlying DBMS. However, the same ideas can be extended to object-oriented model. In section 4, we considered the problem of evaluating "may" queries in a MOST database system modeling the motion of objects. We had shown that this problem is a computationally hard problem when objects are moving in two dimensional space. We then considered the case when objects are moving on well defined routes and when there is uncertainty in their speeds, specified by an upper and a lower bound. We had shown that for "may" queries whose condition part is given by a restricted conjunctive FTL formula the query processing can be reduced to the deterministic case, more specifically to the case when objects are traveling at the maximum speed. After this, we presented a method for processing "may" queries for the the deterministic case when the condition part is given by an arbitrary conjunctive FTL formula f. This method, inductively, computes a relation Rg corresponding to each subformula g of f. For the case when g has no temporal operators, we assumed that the relation Rg are computed by some application dependent method. The computation of Rg, for the case when g contains temporal operators, is achieved by translation in to SQL queries that refer to previously computed relations corresponding to smaller subformulas. Thus we can implement the above method on top of an existing DBMS that supports SQL provided we have a method for computing the relations Rg for the case when g has no temporal operators. In this subsection, we address the problem of evaluating "may" queries whose condition part has no temporal operators and when there is uncertainty in the values of dynamic attributes. Our method applies to any type of uncertainty (i.e. it is not limited to the case of moving objects whose speeds are specified to lie between two bounds). Our method can be employed, as specified in the previous paragraph, to process non-temporal subformulas in the algorithm of section 4. Now consider any "may" query whose condition part is non-temporal. If the query does not contain a reference to a dynamic the query is simply passed to the DBMS and the answer returned to the user. Now assume that the query contains references to dynamic attributes, but not temporal operators. We will distinguish between references in the SELECT and WHERE clauses. If the query contains a reference to a dynamic attribute A only in the SELECT clause (i.e. in the target list), then the MOST system modifies the query as follows. Instead of A, the query retrieves the sub-attributes of A
Querying the Uncertain Position of Moving Objects
333
from the DBMS; and the MOST system computes the current range of possible values of A for each retrieved object, before returning it to the user. Assume now that the WHERE clause is F, which is a boolean combination of atoms (for example, an atom may be A > 5). Consider first the case where there is only a single atom p that refers to dynamic attributes in F. Before passing the original query Q to the DBMS the MOST system replaces Q by two queries, Q1 and Q2. The transformation is based on the following equivalence. F = (F' A p) V (F" A -T), where F ' is F with p replaced by t r u e and F " is F with p replaced by false. Q1 and Q2 are defined as follows. The target list of Q1 and Q2 consists of the target list of Q, plus the subattributes of the dynamic attributes in p. The FROM clause of Q1 and Q2 is identical to that of Q. The WHERE clause of Q1 is F ' and that of Q2 is F t'. Q1 and Q2 are submitted to the underlying DBMS, and the results are processed as follows before returning them to the user. The atom p is evaluated on each tuple in the result of Q1, and the atom -~p is is evaluated on each tuple in the result of Q2. (To do these evaluations the MOST system computes the current values of the dynamic attributes appearing in p using the retrieved sub-attributes.) The tuples that do not satisfy the respective atoms are eliminated, and the projection of the union of the resulting tuples on the original target list is returned to the user. If the WHERE clause has multiple atoms referencing dynamic attributes then we can do as follows. Let pl, ...,Pk be all such atoms. We first write F as (F ~Apl) V (F ~' A -~Pl)- We can repeat the above procedure for other atoms also to rewrite F into an equivalent condition of the form F1 AG1 F2 A G2 .../k Fr/k Gr where the clauses F1, F2, ..., Fr do not contain any dynamic attributes, and each clause Gi is a condition involving the atoms Pl .... ,Pk. In the worst case, r may be as much as 2k. However, by identifying terms with common subexpressions, in practice we can get r to be much smaller. As explained earlier, corresponding to each Fi, we create a query Qi whose WHERE clause is Fi; the condition Gi is evaluated on each tuple in the result of Qi by computing the current values of the dynamic attributes mentioned in Qi. All these results are combined to obtain the answer to the main query. 4.2
C o n t i n u o u s queries from
moving objects
Consider a centralized DBMS equipped with the MOST capability. Suppose that a continuous query CQ is issued from a moving object M. M may or may not be one of the objects represented in the database. After the centralized DBMS computes the set Answer(CQ), there are two approaches of transmitting it to M, immediate and delayed. In the immediate approach, the whole set is transmitted immediately after being computed. For each tuple (S, begin, end), the computer in M is presenting S between times begin and end. However, remember that explicit updates of the database may result in changes to Answer(CQ). If so, the relevant changes are transmitted to M.
334
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
The immediate approach may have to be adjusted, depending on the memory limitations at M. For example, M's memory may fit only B tuples, and the set Answer(CQ) may be larger. In this case, the set Answer(CQ) needs to be sorted by the begin attribute, and transmitted in blocks of B tuples. The delayed approach of transmitting the set Answer(CQ) to M is the following. Each tuple (S, begin, end) in the set is transmitted to M at time begin. The computer at M immediately displays S, and keeps it on display until time
end. Of course, intermediate approaches, in which subsets of Answer(CQ) are transmitted to M periodically, are possible. The choice between the immediate and delayed approaches depends on several factors. First, it depends on the probability that an update to Answer(CQ) can be propagated to M (i.e. that M is not disconnected) before the effects of the update need to be displayed. Second, it depends on the frequency of updates to Answer(CQ), and the cost of propagating these updates to M.
4.3
Distributed query processing
Assume now that each object represented in the database is equipped with a computer, and the database is distributed among the moving objects. In particular, assume that the distribution is such that each object resides in the computer on the moving vehicle it represents, but nowhere else. This is a reasonable architecture in case there are very frequent updates to the attributes of the moving object. For example, if the motion vector of the object changes frequently, then these changes may only be recorded at the moving object itself, rather than transmitting each change to other moving objects or to a centralized database. Assume that each query is issued at some moving object. We distinguish between three types of MOST queries. The first, called self-referencing query, is a predicate whose truth value can be determined by examining only the attributes of the object issuing the query. For example, "Will I reach the point (a,b) in 3 minutes" or, "When will I reach the point (a,b)" are self-referencing queries. Clearly, self-referencing queries can be answered without any inter-computer communication. The second type of queries, called object queries, is a predicate whose truth value can be determined for an object independently of other objects. For example, "Retrieve the objects that will reach the point (a,b) in 3 minutes" is an object query; for each object we can determine whether or not it satisfies the predicate, independently of other objects. To answer an object query, a mobile computer needs to be able to communicate with the other mobile computers. Assuming this capability, there are two ways to processing such a query issued from mobile object M. First is to request that the object of each mobile computer be sent to M; then M processes the query. Second is to send the query to all the other mobile computers; each computer C for which the predicate is satisfied sends the object C to M. The second approach is more efficient since it processes the query in parallel, at all the mobile computers. The second approach is also more efficient for continuous queries. In this case, the remote computer
Querying the Uncertain Position of Moving Objects
335
C evaluates the predicate each time the object C changes, and transmits C to M when the predicate is satisfied. Using the first approach C would have to transmit C to M every time the object C changes. The third type of query, called relationship query, is a predicate whose truth value can only be determined given two or more objects. For example, the query "Retrieve the objects that will stay within 2 miles of each other for at lease the next 3 minutes" is a relationship query. The most efficient way to answer a relationship query is to send all the objects to a central location. The most natural location is the computer issuing the query. When a relationship query is presented at mobile computer M, it requests the objects from all other mobile computers. Then M processes the query.
5
C o m p a r i s o n to relevant work
One area of research that is relevant to the model and language presented in this paper is temporal databases 13,11, 12. The main difference between our approach and the temporal database works is that, by and large, those works assume that the database varies at discrete points in time, and between updates the values of database attributes are constant (13 uses interpolation functions to some extent). In contrast, here we assume that dynamic attributes change continuously, and consequently the temporal data model is different than the data model presented in this paper. Thus, it is also not clear if and how temporal extensions to deal with imcomplete information (see 18, 19 are applicable to our context. Additionally, temporal languages other than FTL can be used to query MOST databases, but any other processing algorithm will have to be modified to handle dynamic attributes. Another relevant area is constraint databases (see 6 for a survey). In this sense, our dynamic attributes can be viewed as a constraint, or a generalized tuple, such that the tuples satisfying the constraint are considered in the database. Constraint databases have been separately applied to the temporal (see 3, 4, 1) domain, and to the spatial domain (see 7). However, the integrated application for the purpose of modeling moving objects has not been considered. Furthermore, this integrated application has not been considered since the model is different than ours, thus perhaps inappropriate for modeling moving objects. The main difference is that in constraint databases all the tuples (or objects) that satisfy the constraint (in our case the values of the function at all timepoints) are considered to be in the database simultaneously. In contrast, in our model these values are not in the database at the same time; at any point in time a different value is in the database. Methods in object oriented systems are also relevant to our model. In an object-oriented system, the value of a dynamic attribute may be computed by a method (i.e. a program stored with the data) using the sub-attributes of a dynamic attribute. However, in this case, as far as the DBMS is concerned the method is a black-box, and the only way to answer a query such as "retrieve the objects that will intersect a polygon P at some time between now and 5pm" is
336
A. Prasad Sistla, Ouri Wolfson, Sam Chamberlain, and Son Dao
to evaluate the query at every point in time between now and 5pm. In contrast, in our model we "open" the black box, i.e. expose to the DBMS the way the dynamic attribute changes. Thus the DBMS can currently compute which objects will intersect the polygon in the future. Another body of relevant work is location-dependent software systems (e.g. 14,16, 3). There are three differences between that work and the our work presented in this paper. First, although independent of a particular database management system our work pertains to incorporation of mobility in database systems. Second, our work pertains to situations where the mobile clients are aware not only of their current location, but also of their movement, i.e. their future location. Indeed for airplanes and cars moving on the highway, this is often the case. Third, in our model the answer to a query depends not only on the location of the client posing the query, but also on the time at which the query is posed. In our earlier work (15) we introduced FTL for specifying trigger conditions in active databases. The algorithm presented there does not work in the MOST model, since it can only deal with static attributes. In 17 we considered the same issues as here, but we did not deal with imprecision; namely, a dynamic attribute of an object has a unique value at a particular time, rather than a set of possible values.
6
C o n c l u s i o n and future work
In this paper we introduced the the MOST data model for representing moving objects. It has two main aspects. First is the novel notion of dynamic attributes, i.e. attributes that change continuously as time passes without being explicitly updated. There can be uncertainty in the value of the dynamic variables. Such variables are represented by sub-attributes that specify their values over time. For moving objects, these sub-attributes specify upper and lower bounds on the speeds of the objects; or they give a pair of functions of time, and at any time the variable may have any value in the range whose lower and upper bounds are specified by the two functions. A user can query future states of database values. This motivates the second aspect of our data model, namely the query language, FTL. It enables the specification of future queries, i.e. queries that refer to future states of the database. In support of the new data model, in this paper we developed algorithms for processing queries specified in FTL, we discussed a method of indexing dynamic attributes, and we discussed methods for building the capabilities of MOST on top of existing database management systems. We also identified several types of queries arising in the new data model, namely instantaneous, continuous and persistent queries. We also discussed issues of query processing in a mobile and distributed environment. In the future, we intend implement the MOST data model on top of an existing DBMS, e.g. Sybase. We intend to further explore various processing methods for the three types of queries, particularly in mobile and distributed
Querying the Uncertain Position of Moving Objects
337
environments. We intend to experimentally compare various mechanisms for indexing dynamic attributes.
References 1. M. Abadi , Z. Manna: Temporal logic programming. Journal of Symbolic Computation. (1989) 2. S. Acharya, B. Badrinath, T. Imielinski,J. Navas: A WWW-Based Location Dependent Information Service for Mobile Clients. Rutgers Univ. TR. (1995) 3. M. Baudinet , M. Niezette, P. Wolper: On the representation of infinite data and queries. ACM Symposium on Principles of Database Systems. (1991) 4. J. Chomicki, T. Imielinski: Temporal deductive databases and infinite objects. ACM Symposium on Principles of Database Systems. (1988) 5. G. Hjaltason, H. Samet: An indexing scheme for time dependent data in CLINSYS. unpublished manuscript. 6. P. Kanellakis: Constraint programming and database languages. ACM Symposium on Principles of Database Systems. (1995) 7. J. Paradaens , J. van den Bussche , D. Van Gucht: Towards a theory of spatial database queries. ACM Symposium on Principles of Database Systems. (1994) 8. R. Snodgrass, I. Ahn: The temporal databases. IEEE Computer. (1986) 9. H. Samet: The design and analysis of spatial data structures. Addison Wesley. (1990). 10. A. P. Sistla, E. M. Clarke: Complexity of Propositional Linear Temporal Logics. Journal of the Association for Computing Machinery. 3 (1985) 32 11. R. Snodgrass: The Temporal Query Language TQuel. ACM Trans. on Database Systems. 2 (1987) 12 12. R. Snodgrass, ed.: Special Issue on Temporal Databases. Data Engineering. (1988) 13. A. Segev , A. Shoshani: Logical Modeling of Temporal Data. Proc. of the ACMSigmod International Conf. on Management of Data. (1987) 14. B. Schilit , M. Theimer , B. Welch: Customizing mobile applications. USENIX Symposium on Location Independent Computing. (1993) 15. P. Sistla, O. Wolfson: Temporal Triggers in Active Databases. IEEE Transactions on Knowledge and Data Engineering (TKDE). 3 (1995) 7 16. G. Voelker , B. Bershad: Mobisaic: An information system for a Mobile Wireless Computing Environment. Workshop on Mobile Computing Systems and Applications. (1994) 17. A. P. Sistla, O. Wolfson, S. Chamberlain, S. Dao: Modeling and Querying Moving Objects. Thirteenth International Conference on Data Engineering. (1997) 18. C. Dyreson, R. Snodgrass: Valid-time Indeterminacy. International Conf. on Data Eng. (1993) 19. S. Gadia, S. Nair, Y-C. Poon: Incomplete Information in Relational Temporal Databases. Eighteenth VLDB. (1992)
Temporal Database Bibliography Update* Yu Wu, Sushil Jajodia, and X. Sean Wang Center for Secure Information Systems Department of Information & Software Systems Engineering George Mason University, Fairfax, VA 22030-4444 http: / /www.isse.gmu.edu/ ~csis / tdb / This is the seventh bibliography concerning temporal databases. There have been six previous bibliographies, which are: (1) Vassilis J. Tsotras and Anil Kumar. Temporal database bibliography update. A C M SIGMOD Record, 25(1):41-51, March 1996. (2) Nick Kline. An update of the temporal database bibliography. A C M SIGMOD Record, 22(4):66-80, December 1993. (3) Michael D. Soo. Bibliography on temporal databases. A C M SIGMOD Record, 20(1):14-23, March 1991. (4) Robert B. Stam and Richard T. Snodgrass. A bibliography on temporal databases. Data Engineering Bulletin, 11(4):53-61, December 1988. (5) Edwin McKenzie. Bibliography: Temporal databases. A C M SIGMOD Record, 15(4):40-52, December 1986. (6) A. Bolour, T.L. Anderson, L.J. Dekeyser, and Harry K.T. Wong. The role of time in information processing: A survey. A C M SIGMOD Record, 12(3):27-50, April 1982. In this bibliography, we collect 331 new temporal database papers. Most of these papers were published in 1996-1997, some in 1995 and some will appear in 1997 or 1998. Compared with the earlier bibliographies, this one adopts a different classification method. We divide papers into 12 categories, namely: (1) Models; (2) Database Designs; (3) Query Languages; (4) Contraints; (5) Time Granularities; (6) Implementations; (7) Access Methods; (8) Real-Time Databases; (9) Sequence Databases; (10) Data Mining; (11) Concurrency; and (12) Other Papers. If a paper addresses several topics, it appears in multiple categories. We would like to thank many authors who provide us with their comments and suggestions. Any errors in the entries herein however are entirely our own. We apologize in advance for any errors, misclassifications, or omissions and welcome any corrections.
1
Models
Eric Allen, Geoffrey Edwards, and Yvan Bedard. Qualitative causal modeling in temporal GIS. In Proc. of the International Conference COSIT'95 on Spatial * The work is partly supported by the NSF research grant IRI-9633541. The work of Wang was also partially supported by the NSF research grant IRI-9409769. O. Etzion, S. Jajodia, and S. Sripada (Eds.): Temporal Databases- Research and Practice LNCS 1399, pp. 338-366, 1998. (~ Springer-Verlag Berlin Heidelberg 1998
Temporal Database Bibliography Update
339
Information Theory: A Theoretical Basis for GIS. Lecture Notes in Computer Science, volume 988, pages 397-412, September 1995. B. Benatallah and M.-C. Fauvet. Le point sur e'~volution du sch6ma d'une base d'objets. L'OBJET : logiciels, bases de donndes, rdseaux, 1997. B. Benatallah and M.-C. Fauvet. Evolution de schema et adaptation des instances. In acres du 13~me Congr~s INFORSID, May 1995. B. Benatallah. Un compromis : modification et versionnement de schema. In Acres des 12e Journdes Bases de Donndes Avancdes, pages 373-393, Ocotber 1996. B. Benatallah. Towards an hybrid approach for object-oriented schema evolution management. In 4th Magrebian Conference on Artificial Intelligence and Software Engineering, pages 97-113, April 1996. J.van Benthem. Remodeling temporal geometry. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. E. Bertino, E. Ferrari, and G. Guerrini. T_Chimera: A temporal object-oriented data model. International Journal on Theory and Practice of Object Systems (TAROS), 3(2):103-125, 1997. C. Bettini, X. Sean Wang, and S. Jajodia. Temporal semantic assumptions and their use in databases. IEEE Transactions on Knowledge and Data Engineering. To appear. (Revised version of ISSE-TR-95-105, ISSE Technical Report, George Mason University.) Sarah Boyd. Summarizing time-varying data. In Proc. of the 14th National Conference on Artificial Intelligence, 1997. Michael H. BShlen, Richard T. Snodgrass, and Michael D. Soo. Coalesing in temporal database. In Proc. of 22th International Conference on Very Large Data Bases, pages 180-191, September 1996. H. Rebecca Callison. A time-sensitive object model for real-time systems. ACM Transactions on Software Engineering and Methodology (TOSEM), 4(3):287317, July 1995. J.-F. Canavaggio, P.-C. Scholl, and M.-C. Fauvet. Un module multigranulaire du temps pour un SGBD temporel. Technical Report RR 962-I, Laboratoire de Logiciels, Syst~mes et R~seaux, IMAG, October 1996. J.-F. Canavaggio. Types temporels pour un SGBD h objets. In actes de Journdes Jeunes Chercheurs MATIS, Archamps, France, December 1996. C.De Castro. On the concept of transaction atomicity in distributed temporal relational databases. In Proc. of the 9th ISCA International Conference on Parallel and Distributed Computing Systems (PDCS'96), 1996. L. Chittaro and C. Combi. Temporal indeterminacy in deductive databases: an approach based on the event calculus. In The 2nd International Workshop on
340
Yu Wu, Sushil Jajodia, and X. Sean Wang
Active, Real-Time and Temporal Database Systems (ARTDB-97), September 1997. Jui-Shang Chiu and Arbee L.P. Chen. A note on "incomplete relational database models based on intervals". IEEE Transactions on Knowledge and Data Engineering, 8(1):189-191, February 1996. J. Chomicki and P.Z. Revesz. Constraint-based interoperability of spatialtemporal databases. In International Symposium on Large Spatial Databases, 1997. C. Claramunt and M. Theriault. Toward semantics for modeling spatio-temporal processes with GIS. In Advances in GIS Research II. 7th International Symposium on Spatial Data Handling, volume 1, 1996. C. Claramunt, M. Theriault, and C. Parent. A qualitative representation of evolving spatial entities in two dimensional topological spaces. Innovations in GIS V, 1997. James Clifford, Curtis Dyreson, Tomas Isakowitz, Christian S. Jensen, and Richard T. Snodgrass. On the semantics of "now" in databases. ACM Transactions on Database Systems, 22(2):171-214, June 1997. C. Combi, F. Pinciroli, and G. Pozzi. Managing time granularity of narrative clinical information: The temporal data model TIME-NESIS. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. C. Combi, G. Cucchi, and F Pinciroli. Applying object-oriented technologies in modeling and querying temporally-oriented clinical databases dealing with temporal granularity and indeterminacy. IEEE Transactions on Information Technology in Biomedicine, 1997. S. Conrad and G. Snake. Evolving temporal behaviour in information systems. In Participant's Proc. of Higher-order Algebra, Logic, and Term Rewriting (2nd Int. Workshop), pages 7:1-16, 1995. S. Conrad and G. Snake. Spefying evolving temporal behaviour. In Proc. of the 2nd Workshop of ModelAge Project (ModelAge'96), pages 51-65, January 1996. S. Conrad and G. Saake. Extending temporal logic for capturing evolving behaviour. In Proc. of the lOth Int. Symposium on Methodologies for Intelligent Systems (ISMIS'97), October 1997. S. Dao. Modeling and querying moving objects. In Proc. of the 13th International Conference on Data Engineering (ICDE13), pages 422-432, April 1997. C.E. Dyreson and R.T. Snodgrass. Supporting valid-time indeterminacy. ACM Transactions on Database Systems, 23(1), March 1998. N. Edelweiss, J.P.M. Oliveira, and G. Kunde. Evoluq~o de esquemas conceituais: o conceito de papel. In Anais XXIII Conferencia Latinoamericana de Informatica, November 1997.
Temporal Database Bibliography Update
341
N. Edelweiss and J.P.M Oliveira. Roles representing the evolution of objects. In Proc. of Argentine Symposium on Object Orientation of the 26th Jornadas Argentinas de Informdtica e Investigacidn Operativa, pages 57-65, August 1997. I. Eini, V. Goebel, and B. Skjellaug. A temporal data model for multimedia database systems. Technical report, Research report 252, Department of Informatics, University of Oslo, June 1997. M.-C. Fauvet, J.-F. Canavaggio, and P.-C. Scholl. TEMPOS : un module d'historiques pour un SGBD temporel ~ objets. In acres des 13e journdes de Bases de Donndes Avancdes, Grenoble (France), September 1997. M.-C. Fauvet, J.-F. Canavaggio, and P.-C. Scholl. Modeling histories in object DBMS. In Proc. of the 8th International Conference on Database and Expert Systems Applications (DEXA '97), Toulouse (France), September 1997. A. Gal and D. Dori. Combining simultaneous values and temporal data dependencies. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. A. Gal, O. Etzion, and A. Segev. TALE---a temporal active language and execution model. Advanced Information Systems Engineering, pages 60-81, May 1996. A. Gal, O. Etzion, and A. Segev. A language for the support of constraints in temporal active databases. In Proc. Workshop on Constraints, Databases and Logic Programming, pages 42-58, Portland, Oregon, December 1995. I.A. Goralwalla, Y. Leontiev, M.T. Ozsu, and Szafron. Modeling temporal primitives: Back to basics. In The sixth International Conference on Information and Knowledge Management (CIKM'97), November 1997. I.A. Goralwalla, M.T. Ozsu, and D. Szafron. A framework for temporal data models: Exploiting object-oriented technology. In Proc. of the 13th International Conference and Exhibition on Technology of Object-Oriented Languages and Systems (TOOLS'97) USA, July 1997. Fabio Grandi. Temporal interoperability in multi+temporal databases. Journal of Database Management, Winter 1998. Fabio Grandi and Maria Rita Scalas. Extending temporal database concepts to the world wide web. Technical report, Tech. Rep. CSITE 004-97, University of Bologna, Italy, 1997. H. Gregersen and C.S. Jensen. Temporal entity-relationship models--a survey. Technical report (TR-3), TimeCenter, January 1997. Christian S. Jensen and Richard T. Snodgrass. Semantics of time-varying information. Information Systems, 21(4):311-352, 1996.
342
Yu Wu, Sushil Jajodia, and X. Sean Wang
E.T. Keravnou. Engineering time in medical knowledge-based systems through time-axes and time-objects. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. Lefteris M. Kirousis, Paul G. Spirakis, and Philippas Tsigas. Simple atomic snapshots: A linear complexity solution with unbounded time-stamps. Information Processing Letters, 58(1):47-53, April 1996. G. Knolmayer and T. Myrach. Zur Abbildung zeitbezogener Daten in betrieblichen Informationssystemen. Wirtschaftsinformatik 38, 1:63-74, 1996. Henry F. Korth. Database system architectures for time-constrained applications. In Workshop on Databases: Active and Real-Time, 1996. Ajay D. Kshemkalyani. Temporal interactions of intervals in distributed systems. Journal of Computer and System Science, 52(2):287-298, April 1996. Jae Young Lee, Ramez Elmasri, and Jongho Won. Specification of calendars and time series for temporal databases. In Proc. of the 15th International Conference on Conceptual Modeling, pages 341-356, October 1996. J.Z. Li, I.A. Goralwalla, M.T. ()zsu, and D. Szafron. Modeling video temporal relationships in an object database management system. In Proc. of Multimedia Computing and Networking (MMCN97), San Jose, California, February 1997. Jane W.-S. Liu. Validation of timing properties. ACM Computing Surveys, 28(4es):184, December 1996. Robert Morris and Lina Khatib. Entities and relations for historical databases. In Fourth International Workshop on Temporal Representation and Reasoning, 1997. P. Mueller and S.M. Sripada. The ChronoBase temporal deductive database system. In Proc. of Workshop on Temporal Reasoning in Deductive and ObjectOriented Databases (DOOD'95), December 1995. M.A. Orgun. Incorporating an implicit time dimension into the relational model and algebra. RAIRO Informatique Thdorique et Applications/Theoretical Informatics and Applications, 30(3):231-260, 1996. P.G. OI)onoghue and M.E.C. Hull. Using timed CSP during object oriented design of real-time systems. Information and Software Technology, 38(2):89-102, February 1996. N.L. Sarda. Modeling valid time: an efficient representation. In Proc. of the International Conference on Database Systems for Advanced Applications (DASFAA '97), April 1997. N.L. Sarda and P.V.S.P. Reddy. Handling of alternatives and events in temporal databases. Technical report, Technical Report, IIT Bombay, 1997.
Temporal Database Bibliography Update
343
B. Skjellaug and A.-J. Berre. Multi-dimensional time support for spatial data models. Technical report, Research report 253, Department of Informatics, University of Oslo, May 1997. B. Skjellaug and A.-J: Berre. A uniform temporal geographic data model framework. Technical report, SINTEF Report STF40 A97046, SINTEF Telecom and Informatics, Oslo, 1997. S.M. Sripada. On snapshot, rollback, historical and temporal deductive databases. In Proc. of Conference on Management of Data (COMAD'92), December 1992. S.M. Sripada and P. Mueller. The generalized chronobase temporal data model. Meta-Logics and Logic Programming, 1995. A. Steiner and M.C. Norrie. A temporal extension to a generic object data model. Technical report (TR-15), TimeCenter, 1997. A. Steiner and M.C. Norrie. Temporal object role modelling. In Proc. of 9th International Conference on Advanced Information Systems Engineering (CAiSE'97), June 1997. Abdullah Uz Tansel. Temporal relational data model. IEEE Transactions on Knowledge and Data Engineering, 9(3):464-479, May 1997. P. Terenziani. Towards an ontology dealing with periodic events. In Proc. of 12th European Conference on Artificial Intelligence (ECAI'96), pages 43-47, 1996. Susan V. Vrbsky. A data model for approximate query processing of real-time databases. Data ~4 Knowledge Engineering, 21(1):79-102, December 1996. Jongho Won, Ramez Elmasri, and Jason Dongwon Kwak. Comparing the expressiveness of temporal data models based on time dimension representation. In The 3rd International Workshop on Next Generation Information Technologies and Systems (NGITS'97), 1997. May Yuan. Temporal GIS and spatio-temporal modeling. In Third International Conference/Workshop on Integrating GIS and Environmental Modeling CD-ROM, January 1996. E. Zims C. Parent, S. Spaccapietra, and A. Pirotte. TERC+: A temporal conceptual model. In Proc. of the International Symposium on Digital Media Information Base (D MIB '97) , November 1997.
2
Database
Designs
C De Castro, F Grandi, and MR Scalas. Schema versioning for multitemporal relational databases. Information Systems, 22(5):249-290, 1997. C. Claramunt, C. Parent, and M. Theriault. An entity-relationship model for spatio-temporal processes. In Proc. of Database Semantics DS- 7, IFIP, Chapman 84 Hall, 1997.
344
Yu Wu, Sushil Jajodia, and X. Sean Wang
A. Gal and O. Etzion. Parallel execution model for updating temporal databases. International Journal of Computer Systems Science and Engineering, 12(5):317327, September 1997. A. Gal and O. Etzion. Parallel execution model for updating temporal databases. International Journal of Computer Systems Science and Engineering, 12(5):317327, September 1997. H. Gregersen, C.S. Jensen, and L. Mark. Evaluating temporally extended ERmodels. In CAiSE'97//IFIP 8.1 International Workshop on Evaluation of Modeling Methods in Systems Analysis and Design, June 1997. C.S. Jensen and R.T. Snodgrass. Semantics of time-varying attributes and their use for temporal database design. Technical Report (TR-1), TimeCenter, 1997. C.S. Jensen and R.T. Snodgrass. Temporally enhanced database design. ObjectOriented Data Modeling Themes. To appear. T.G. Kirner and A.M. Davis. Requirements specification of real-time systems: temporal parameters and timing-constraints. Information and Software Technology, 38(12):735-741, December 1996. P.G. OI)onoghue and M.E.C. Hull. Using timed CSP during object oriented design of real-time systems. Information and Software Technology, 38(2):89-102, February 1996. Maria Ligia B. Perkusich, Angelo Perkusich, and Ulrich Schiel. Object oriented real-time database design and hierarchical control systems. In Proc. of the 1st International Workshop on Active and Real-Time Databse Systems. Workshops in Computing, pages 104-121, June 1995. Anne Pons and Rudolf K. Keller. Schema evolution in object databases by catalogs. In International Database Engineering and Applications Symposium (IDEAS), 1997. Trout, Tai, and Lee. Real-time system design using complementary processing design methodology. In Workshop on Databases: Active and Real-Time, 1996. X. Sean Wang, Claudio Bettini, Alexander Brodsky, and Sushil Jajodia. Logical design for temporal databases with multiple granularities. ACM Transactions on Database Systems, 22(2):115-170, June 1997.
3
Query Languages
M. Aritsugi, T. Tagashira, T. Amagasa, and Y. Kanamori. An approach to spatio-temporal queries-interval-based contents representation of images. In The 8th International Conference on Database and Expert Systems Applications (DEXA 97), Lecture Notes in Computer Science 1308, pages 202-213, September 1997.
Temporal Database Bibliography Update
345
Leopoldo Bertossi and Bernardo Siu. Answering historical queries in databases. In X V I International Conference of the Chilean Computer Science Society, 1996. J. Bair, M. BShlen, C.S. Jensen, and R.T. Snodgrass. Notions of upward compatibility of temporal query languages. Wirtschaftinformatik, 39(1):25-34, February 1997. M. BShlen and C.S. Jensen. Seamless integration of time into SQL. Technical report, Technical Report R-96-2049, Aalborg University, Department of Computer Science, December 1996. M. B6hlen, C.S. Jensen, and B. Skjelluag. Spatio-temporal database support for legacy applications. Technical report (TR-20), TimeCenter, July 1997. J.-F. Canavaggio and M. Dumas. Manipulation de valeurs temporelles dans un SGBD ~ objets. In actes du 15e congr~s INFORSID, Toulouse, juin 1997. T.P.C. Carvalho and N. Edelweiss. A visual query system implementing a temporal object-oriented model with roles on a relational database. In Proc. of the 17th International Conference of the Chilean Computer Science Society, November 1997. T.P. Carvalho and N. Edelweiss. Um sistema visual de consulta para um modelo de dados temporal orientado a objetos. In Anais 24 Semindrio Integrado da Sociedade Brasileira de Computa65o, pages 337-348, August 1997. L. Chittaro, C. Combi, E. Cervesato, R. Cervesato, F. Antonini-Canterin, G.L. Nicolosi, and D. Zanuttini. Specifying and representing temporal abstractions of clinical data by a query language based on the event calculus. In 1997 Annual Conference of Computers in Cardiology. IEEE Computer Press, 1997. J. Chomicki. Temporal query languages: A survery. In International Conference on Temporal Logic, pages 506-534, 1994. C. Combi, G. Cucchi, and F Pinciroli. Applying object-oriented technologies in modeling and querying temporally-oriented clinical databases dealing with temporal granularity and indeterminacy. IEEE Transactions on Information Technology in Biomedicine, 1997. S. Dao. Modeling and querying moving objects. In Proc. of the 13th International Conference on Data Engineering (ICDE13), pages 422-432, April 1997. H. Darwen. Normalisiug temporal tables. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL LHR-33r2, December 1995. H. Darwen. Fixes for NORMALIZE ON. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL MCI-162, April 1996. H. Darwen. The outstanding fix for NORMALIZE ON.Technical report,ISO/IEC JTC 1/SC 21/WG 3 DBL MAD-222, December 1996. Debabrata Dey, Terence M. Barron, and Veda C. Storey. A complete temporal relational algebra. The VLDB Journal, 5(3):167-180, July 1996.
346
Yu Wu, Sushil Jajodia, and X. Sean Wang
John D.N. Dionisio and Alfonso F. Cardenas. Mquery: A visual query language for multimedia, timeline and simulation data. Journal of Visual Languages and Computing, 7(4):377-401, December 1996. C.E. Dyreson and R.T. Snodgrass. Supporting valid-time indeterminacy. ACM Transactions on Database Systems, 23(1), March 1998. Experts' Contribution. Expanded tables operations. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL MCI-67, April 1996. M.-C. Fauvet, J.-F. Canavaggio, and P.-C. Scholl. Expressions de requStes temporelles dans un SGBD ~ objets. In Actes des 12e Journdes Bases de Donndes Avancdes, pages 225-250, Cassis (France), October 1996. Sonia Fernandes, Ulrich Schiel, and Tiziana Catarci. Visual query operators for temp databases. In Fourth International Workshop on Temporal Representation and Reasoning, 1997. Sonia Fernandes, Ulrich Schiel, and Tiziana Catarci. Using formal operators for visual queries in historical databases. In XII Simposio Brasileiro de Banco de Dados, pages 375-387, October 1997. S. Ginsburg and X. Sean Wang. Regular sequence operations and their use in database queries. In The Journal of Computer and System Sciences. To appear. Fabio Grandi. TSQL2: A standard query language for temporal databases (in Italian). Rivista di Informatica. To appear. H.V. Jagadish, Alberto O.Mendelzon, and Tova Milo. Similarity-based queries. In Proc. of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 36-45, 1995. N. Jukic and Susan V. Vrbsky. Feasibility of aggregates in temporal constrained queries. Information Systems, 21(7):595-614, 1996. Flip Korn, H.V. Jagadish, and Christos Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proc. of the 1997 ACM SIGMOD International Conference on Management of Data, 1997. M. Koubarakis. The complexity of query evaluation in indefinite temporal constraint databases. Theoretical Computer Science, Special Issue on Uncertainty in Databases and Deductive Systems, 171:25-60, January 1997. N.A. Lorentzos and Y.G. Mitsopoulos. SQL extension for interval data. IEEE Transactions on Knowledge and Data Engineering, 9(3):480-499, 1997. N.A. Lorentzos and H. Darwen. Extension to SQL2 binary operations for temporal data. In Proc. of 3rd HERMIS Conference, pages 462-469, September 1996. Invited paper. N.A. Lorentzos and Y.G. Mitsopoulos. SQL extension for valid and transaction time support. Technical report, Informatics Laboratory, Agricultural University of Athens, 1996.
Temporal Database Bibliography Update
347
T. Myrach, G.F. Knolmayer, and R. Barnert. On ensuring keys and referential integrity in the temporal database language TSQL2. In Proc. of the 2nd International Baltic Workshop on Databases and Informations Systems, volume 1, pages 171-181, 1996. T. Myrach. TSQL2: Der Konsens fiber eine temporale Datenbanksprache. Informatik-Spekt~'um 20, 3:143-150, 1997. Wilfred Ng and Mark Levene. OSQL: An extension to SQL to manipulate ordered relational databases. In International Database Engineering and Applications Symposium (IDEAS), 1997. Hans Nilsson, TorbjSrn T6rnkvist, and Claes Wikstrom. Amnesia-A distributed real-time primary memory DBMS with a deductive query language. In Proc. of the 12th International Conference on Logic Programming, June 1995. Mehmet Orgun and Anthoy A. Faustini. The chronolog family of languages. Journal of Symbolic Computation, 22(5/6):722-724, November 1996. T. Padron-McCarthy and T. Risch. Optimizing performance-polymorphic declarative database queries. In 2nd International Workshop on Real- Time Databases (RTDB'97), September 1997. T. Padron-McCarthy and T. Risch. Performance~polymorphic execution of realtime queries. In 1st Workshop on Real-Time Databases: Issues and Applications (RTDB-96), March 1996. Bruce Rex and John Risch. Animation query language for the visualization of temporal data. In Third International Conference~Workshop on Integrating GIS and Environmental Modeling CD-ROM, January 1996. R.T. Snodgrass, M. B6hlen, C.S. Jensen, and A. Steiner. Transitioning temporal support in TSQL2 to SQL3. Technical report (TR-8), TimeCenter, March 1997. Alexander D. Stoyenko and Thomas J. Marlowe. A case for better language and compiler support for real-time database systems. In Proc. of the 1st International Workshop on Active and Real-Time Database Systems (ARTDB-95). Workshops in Computing, pages 46-49, June 1995. Hyun Su. Semantics-based time-alignment operations in temporal query processing and optimization. Information Sciences, 103/01-4:37-70, September 1997. UK Experts. IXSQL: An alternative approach to temporal data. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL YOW-031, April 1995. Abdullah Uz Tansel and Erkan Tin. The expressive power of temporal relational query languages. IEEE Transactions on Knowledge and Data Engineering, 9(1):120-134, January 1997. David Toman. Point-based temporal extension of SQL. In The 1997 International Conference on on Deductive and Object-Oriented Databases, 1997.
348
Yu Wu, Sushil Jajodia, and X. Sean Wang
David Toman and Damian Niwinski. First-order queries over temporal databases inexpressible in temporal logic. In Proc. of the International Conference on Extending database Technology (EDBT'96), 1996. Susan V. Vrbsky. A data model for approximate query processing of real-time databases. Data 8J Knowledge Engineering, 21(1):79-102, December 1996.
4
Constraints
Suad Alagic. A temporal constraint system for object-oriented databases. In The 2nd International Workshop on Constraint Database Systems (CDB'97). Lecture Notes in Computer Science, volume 1191, pages 208-218, January 1997. Elisabeth Andr@and Thomas Rist. Coping with temporal constraints in multimedia presentation planning. In Proc. of the 13th National Conference on Artificial Intelligence, August 1996. C. Bettini, X. Sean Wang, and S. Jajodia. Satisfiability of quantitative temporal constraints with multiple granularities. In Proc. of 3rd International Conference on Principle and Practice of Constraint Programming, 1997. V. Brusoni, L. Console, B. Pernici, and P. Terenziani. Qualitative and quantitative temporal constraints and relational databases: theory, architecture, and applications. IEEE Transactions on Knowledge and Data Engineering. Submitted. Jan Chomicki and Damian Niwinski. On the feasibility of checking temporal integrity constraints. Journal of Computer and System Science, 51(1):523-535, August 1995. Carlo Combi and Giorgio Cucchi. GCH-OSQL: A temporally-oriented objectoriented query language based on a three-valued logic. In Fourth International Workshop on Temporal Representation and Reasoning (TIME-97), 1997. A. Datta and I. Viguier. Providing real-time response, state recency and temporal consistency in databases for rapidly changing environments. In Information System.To appear. Frank Dignum, Hans Weigand, and Egon Verharen. Meeting the deadline: On the formal specification of temporal denotic constraints. In Zbigniew Ras and Maciej Michalewicz, editors, Proc. of the 9th International Symposium on Foundations of Intelligent Systems (ISMIS'96), pages 243-252, June 1996. Anne Doucet, Marie-Christine Fauvet, Stephane Gancarski, Genevieve Jomier, and Sophie Monties. Using database versions to implement temporal integrity constraints. In The 2nd International Workshop on Constraint Database Systems (CDB'97). Lecture Notes in Computer Science, volume 1191, pages 219-233, January 1997.
Temporal Database Bibliography Update
349
Huang and Gruenwald. Impact of timing constraints on real-time database recovery. In Workshop on Databases: Active and Real-Time, 1996. M. Jixin and B. Knight. Building temporal constraints into knowledge bases for process control-an examination. Engineering Applications of Artificial Intelligence, 9(1):95-96, February 1996. T.G. Kirner and A.M. Davis. Requirements specification of real-time systems: temporal parameters and timing-constraints. Information and Software Technology, 38(12):735-741, December 1996. M. Koubarakis. From local to global consistency in temporal constraint networks. In The Special Issue of Theoretical Computer Science Dedicated to the 1st International Conference on Principles and Pratice of Constraint Programming (CP95), volume 173, pages 89-112, February 1997. M. Koubarakis. Representation and querying in temporal databases: the power of temporal constraints. In Proc. of the 9th International Conference on Data Engineering, pages 327-334, April 1993. C. Martin and J. Sistac. An integrity constraint checking method for temporal deductive databases. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. C. Martin and J. Sistac. Applying transition rules to bitemporal deductive databases for integrity constraint checking. In Proc. of the International Workshop on Logic in Databases (LID'96), pages 117-134, July 1996. S. Parthasarathy. Building temporal constraints into knowledge bases for process control-an examination. Engineering Applications of Artificial Intelligence, 8(4):473, August 1995. E. Schwalb and R. Dechter. Processing disjunctions of temporal constraints. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. P. Terenziani. Integrating calendar-dates and qualitative temporal constraints in the treatment of periodic events. IEEE Transactions on Knowledge and Data Engineering, 1997. To appear. P. Terenziani. Qualitative and quantitative temporal constraints about numerically quantified periodic events. In Proc. of 4th International Workshop on Temporal Representation and Reasoning (TIME'97), pages 94-101, 1997. Richard West, Karsten Schwan, Ivan Tacic, and Mustaque Ahamad. Exploiting temporal and spatial constraints on distributed shared objects. In The 17th International Conference on Distributed Computer Systems, May 1997. Ming Xiong, John A. Stankovic, Krithi Ramamritham, Donald F. Towsley, and Rajendran Sivasankaran. Maintaining temporal constraints: Issues and algorithms. In Proc. of the 1st International Workshop on Real-Time Databases: Issues and Applications, March 1996.
350 5
Yu Wu, Sushil Jajodia, and X. Sean Wang Time Granularities
C. Bettini, X. Sean Wang, and S. Jajodia. A general framework and reasoning models for time granularity. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. C. Bettini, X. Sean Wang, and S. Jajodia. A general framework for time granularity and its application to temporal reasoning. Annals of Mathematics and Artificial Intelligence. To appear. (Revised version of ISSE-TR-96-10.) C. Bettini, X. Sean Wang, S. Jajodia, and J.-L. Lin. Discovering temporal relationship with multiple granularities in time sequences. IEEE Transaction on Knowledge and Data Engineering. To appear. (Revised version of ISSE-TR-9606, ISSE Technical Report, George Mason University.) C. Bettini, X. Sean Wang, and S. Jajodia. Satisfiability of quantitative temporal constraints with multiple granularities. In Proc. of 3rd International Conference on Principle and Practice of Constraint Programming, 1997. C. Combi, F. Pinciroli, and G. Pozzi. Managing time granularity of narrative clinical information: The temporal data model TIME-NESIS. In Proc. of the Third International Workshop on Temporal Representation and Reasoning (TIME'96), 1996. C. Combi, G. Cucchi, and F Pinciroli. Applying object-oriented technologies in modeling and querying temporally-oriented clinical databases dealing with temporal granularity and indeterminacy. IEEE Transactions on Information Technology in Biomedicine, 1997. C. Combi, F. Pinciroli, and G Pozzi. Managing different time granularities of clinical information by an interval-based temporal data model. Methods of Information in Medicine, 34(5):458-474, 1995. H. Darwen. Temporal SQL: Support for data type period. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL YOW-23, 1995. Hong Lin. Efficient convertion between temporal granularities. Technical report (TR-19), TimeCenter, June 1997. Edjard Mota, David Robertson, and Alan Smaill. Nature time: Temporal granularity in simulation of ecosystems. Journal of Symbolic Computation, 22(5/6):665698, November 1996. J.M. Sykes. More elements of type period. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL MCI-044, March 1996. J.M. Sykes. Periods of integers. Technical report, ISO/IEC JTC 1/SC 21/WG 3 DBL MAD-151, November 1996. X. Sean Wang, Claudio Bettini, Alexander Brodsky, and Sushil Jajodia. Logical design for temporal databases with multiple granularities. ACM Transactions on Database Systems, 22(2):115-170, June 1997.
Temporal Database Bibliography Update 6
351
Implementations
Suad Alagic. A temporal constraint system for object-oriented databases. In The 2nd International Workshcp on Constraint Database Systems (CDB'97). Lecture Notes in Computer Science, volume 1191, pages 208-218, January 1997. Rohan F.M. Aranha, Venkatesh Ganti, Srinivasa Narayanan, C.R. Muthukrishnan, S.T.S. Prasad, and Krithi Ramamritham. Implementation of a real-time database system. Information Systems, 21(1):55-74, March 1996. V. Brusoni, L. Console, B. Pernici, and P. Terenziani. LaTeR: an efficient, general purpose manager of temporal information. IEEE Expert, 21(4), 1997. T.P.C. Carvalho and N. Edelweiss. A visual query system implementing a temporal object-oriented model with roles on a relational database. In Proc. of the 17th International Conference of the Chilean Computer Science Society, November 1997. T.P. Carvalho and N. Edelweiss. Um sistema visual de consulta para um modelo de dados temporal orientado a objetos. In Anais 24 Semindrio Integrado da Sociedade Brasileira de Computa~5o, pages 337-348, August 1997. Jean-Raymond Gagne and John Plaice. A non-standard temporal deductive database system. Journal of Symbolic Computation, 22(5/6):649-664, November 1996. I.A. Goralwalla, D. Szafron, M.T. Ozsu, and R.J. Peters. Managing schema evolution using a temporal object model. In The 16th International Conference on Conceptual Modeling (ER'97), November 1997. Iakovos Motakis and Carlo Zaniolo. Temporal aggregation in active database rule. In Proc. of the 1997 ACM SIGMOD International Conference on Management of Data, 1997. J. Mylopoulos, A. Borgida, M. Jarke, and M. Koubarakis. Representing knowledge about information systems in Telos. Database Application Engineering with DAIDA, pages 31-64, 1993. T. Myrach. Die Schliisselproblematik bei der Umsetzung temporaler Konzepte in das relationale Dateumodell. Rundbrief des GI-Fachausschusses 5.2, Informationssystem-Architekturen"2, 2:13-15, 1995. T. Myrach. Realisierung zeitbezogener Datenbanken: Ein Vergleich des herkSmmlichen relationalen Datenmodells mit einer temporalen Erweiterung. Wirtschaftsinformatik 39, 1:35-44, 1997. Krithi Ramamritham, Rajendran M. Sivasankaran, John A. Stankovic, Donaid F. Towsley, and Ming Xiong. Integrating temporal, real-time, and active databases. SIGMOD Record, 25(1):8-12, March 1996. Earl Rennison and Lisa Strausfeld. The millennium project: constructing a dynamic 3§ virtual environment for exploring geographically, temporally and
352
Yu Wu, Sushil Jajodia, and X. Sean Wang
categorically organized historical information. In Proc. of the International Conference COSIT'95 on Spatial Information Theory: A Theoretical Basis for GIS. Lecture Notes in Computer Science, volume 988, pages 69-91, September 1995. Suryanarayana M. Sripada. Efficient implementation of the event calculus for temporal database applications. In Proc. of the 12th international Conference on Logic Programming (ICLP 1995), pages 99-113, June 1995. A. Steiner and M.C. Norrie. Implementing temporal database in object-oriented systems. In Proc. of 5th International Conference on Database Systems for Advanced Applications (DASFAA '97), April 1997. K. Torp, C.S. Jensen, and M. BShlen. Layered temporal DBMSs--concepts and techniques. In Proc. of 5th International Conference on Database Systems for Advanced Applications (DASFAA '97), April 1997. K. Torp, L. Mark, and C.S. Jensen. Efficient differential timeslice computation. IEEE Transactions on Knowledge and Data Engineering. To Appear. K. Torp, C.S. Jensen, and R.T. Snodgrass. Stratum approaches to temporal DBMS implementation. Technical report (TR-5), TimeCenter, March 1997. K. Torp, R.T. Snodgrass, and C.S. Jensen. Correct and efficient timestamping of temporal data. Technical report (TR-4), TimeCenter, March 1997. C. Vassilakis, N.A. Lorentzos, P. Georgiadis, and Y.G. Mitsopoulos. ORES: Design and implementation of a temporal DBMS. Technical report, Technical Report, Informatics Laboratory, Agricultural University of Athens, 1996. T. Zurek. Parallel temporal joins. In "Datenbanksysteme in Biiro, Technik und Wissenschaft" (BTW), German Database Conference, pages 269-278, March 1997. in English. T. Zurek. Optimal interval partitioning for temporal databases. In Proc. of the 3rd Basque International Workshop on Information Technology (BIWIT'97) , pages 140-147, July 1997. T. Zurek. Optimisation of partitioned temporal joins. In Proc. of the 15th British National Conference on Databases (BNCOD'97), pages 101-115. Springer, July 1997. T. Zurek. Parallel processing of temporal joins. Informatica, 21, 1997.
7
Access Methods
Laura Amadesi and Fabio Grandi. An adaptive split policy for the time-split B-Tree. Data ~ Knowledge Engineering. To appear. L. Arge and J.S. Vitter. Optimal dynamic interval management in external memory. In Proc. of the 37th IEEE Symposium on Foundations of Computer Science, October 1996.
Temporal Database Bibliography Update
353
J.Van den Bercken, B. Seeger, and P. Widmayer. A generic approach to bulk loading multidimensional index structures. In Proc. of the 23th International Conference on Very Large Data Base (VLDB'96), pages 406-415, 1997. E. Bertino, E. Ferrari, and G. Guerrini. Navigational accesses in a temporal object model. IEEE Transactions on Knowledge and Data Engineering. To appear. A. Cappelli, C.De Castro, and M.R. Scalas. A modular history-oriented access structure for bitemporal data. In Proc. of the 22nd Seminar on Current Trends in Theory and Practice of Informatics (SOFSEM'95), pages 369-374, 1995. Cheng Hian Goh, Hongjun Lu, Beng Chin Ooi, and Kian-Lee Tan. Indexing temporal data using existing B+-trees. Data ~ Knowledge Engineering, 18(2):147165, March 1995. Brajesh Goyal, Jayant R. Haritsa, S. Seshadri, and V. Srinivasan. Index concurrency control in firm real-time database systems. In Proc. of the 21th International Conference on V