Types in Compilation: Second International Workshop, TIC'98, Kyoto, Japan, March 25-27, 1998 Proceedings

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen 1473 Xavier Leroy Atsushi Ohori ...

Author: Xavier Leroy | Atsushi Ohori

41 downloads 641 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen

1473

Xavier Leroy Atsushi Ohori (Eds.)

Types in Compilation Second International Workshop, TIC '98 Kyoto, Japan, March 25-27, 1998 Proceedings

~ Springer

Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Xavier Leroy INRIA Rocquencourt Domaine de Voluceau, B.P. 105, F-78153 Le Chesnay, France E-mail: Xavier.Leroy @inria. fr Atsushi Ohori Research Institute for Mathematical Sciences, Kyoto University Kitashirakawa-Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan E-mail: [email protected] Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufnahme

Types in compilation : second international workshop ; proceedings / TIC '98, Kyoto, Japan, March 25 - 27, 1998. Xavier Leroy ; Atsuslai Ohori (ed.). - Berlin ; Heidelberg ; New York ; Barcelona ; Budapest ; Hong Kong ; London ; Milan ; Paris ; Singapore ; Tokyo : Springer, 1998 (Lecture notes in computer science ; Vol. 1473) ISBN 3-540-64925-5

CR Subject Classification (1991): F.3, D.2, D.3, D.4 ISSN 0302-9743 ISBN 3-540-64925-5 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. 9 Springer-Verlag Berlin Heidelberg 1998 Printed in Germany Typesetting: Camera-ready by author SPIN 10638677 06/3142 - 5 4 3 2 1 0

Printed on acid-free paper

Preface

This volume constitutes the proceedings of the second International Workshop on Types in Compilation (TIC'98), held at the Research Institute for Mathematical Sciences, Kyoto University, Japan, March 25-27, 1998. Types (in the broadest sense of the word) play a central role in many of the advanced compilation techniques developed for modern programming languages. Standard or nonstandard type systems and type analyses have been found to be useful for optimizing dynamic method dispatch in object-oriented languages, for reducing run-time tests in dynamically typed languages, for guiding data representations and code generation, for program analysis and transformation, for compiler verification and debugging, and for establishing safety properties of distributed or mobile code. The Types in Compilation workshops bring together researchers to share new ideas and results in this area. For TIC'98, the program committee received seventeen submissions in response to the call for papers, and selected thirteen papers among those. Each submission received at least four reviews, done by the program committee members or their subreferees (their names appear below). The program committee also invited five additional speakers to complement the presentations of the regular papers. The 1998 Types in Compilation workshop was sponsored by the Research Institute for Mathematical Sciences, Kyoto University, and organized in cooperation with the Association of Computing Machinery Special Interest Group in Programming Languages (ACM SIGPLAN) and the Japan Society for Software Science and Technology Special Interest Group in Programming (JSSST SIG Programming). Their support is gratefully acknowledged.

June 1998

Xavier Leroy Program Chair TIC'98

Yl

Organization

C o n f e r e n c e chair Atsushi Ohori (Kyoto University) Organizing committee Craig Chambers (University of Washington) Robert Harper (Carnegie-Mellon University) Xavier Leroy (INRIA Rocquencourt) Robert Muller (Boston College) Atsushi Ohori (Kyoto University) Simon Peyton-Jones (Glasgow University) P r o g r a m chair Xavier Leroy (INRJA Rocquencourt) Program committee Craig Chambers (University of Washington) Urs HSlzle (University of California, Santa Barbara) Satoshi Matsuoka (Tokyo Institute of Technology) Yasuhiko Minamide (Kyoto University) Simon Peyton-Jones (Glasgow University) Zhong Shao (Yale University) Andrew Wright (InterTrust STAR Lab)

Local a r r a n g e m e n t s Atsushi Ohori (Kyoto University) Yoshikazu Sato (Oki Electric) Additional referees Kenichi Asai Haruo Hosoya Atsushi Igarashi Didier R@my

Toshihiro Shimizu Valery Trifonov Steve Weeks

T a b l e of C o n t e n t s

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Xavier Leroy

Typed intermediate languages Compiling J a v a to a T y p e d Lambda-Calculus: A Preliminary R e p o r t . . . . .

Andrew Wright, Suresh Jagannathan, Cristian Ungureanu, Aaron Hertzmann Stack-Based T y p e d Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

Greg Morrisett, Karl Crary, Neal Glew, David Walker How Generic is a Generic Back End? Using MLRISC as a Back End for the T I L Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

Andrew Bernard, Robert Harper, Peter Lee

Program analyses A Toolkit for Constructing Type- and Constraint-Based P r o g r a m Analyses (invited talk) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

Alexander Aiken, Manuel F~ihndrich, Jeffrey S. Foster, Zhendong Su Optimizing ML Using a Hierarchy of Monadic T y p e s . . . . . . . . . . . . . . . . . . .

97

Andrew Tolmach Type-Directed Continuation Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

116

Zhong Shao, Valery Trifonov

Program transformations and code generation Polymorphic Equality - No Tags Required . . . . . . . . . . . . . . . . . . . . . . . . . . . .

136

Martin Elsman Optimal T y p e Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

156

Bratin Saha, Zhong Shao Formalizing Resource Allocation in a Compiler . . . . . . . . . . . . . . . . . . . . . . . .

178

Peter Thiemann

Memory management An Approach to Improve Locality Using Sandwich T y p e s . . . . . . . . . . . . . . .

Daniela Genius, Martin Trapp, Wolf Zimmermann

194

VIII

Garbage Collection via Dynamic Type Inference - - A Formal Treatment .. 215

Haruo Hosoya, Akinori Yonezawa Partial

evaluation

and run-time

code generation

Strong Normalization by Type-Directed Partial Evaluation and Run-Time Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

240

Vincent Balat, Olivier Danvy Determination of Dynamic Method Dispatches Using Run-Time Code Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

253

Nobuhisa Fujinami Distributed

computing

Type-Based Analysis of Concurrent Programs (abstract of invited talk) . . . 272

Naoki Kobayashi A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

273

Dominic Duggan Author

Index

.................................................

299

Introduction Xavier Leroy INRIA Rocquencourt Domaine de Voluceau, 78153 Le Chesnay, France

1

Types in Programming Languages

Most programming languages are equipped with a type system that detects type errors in the program, such as using a variable or result of a given type in a context that expects data of a diﬀerent, incompatible type. Such type checking can take place either statically (at compile-time) or dynamically (at run-time). Type checking has proved to be very eﬀective in catching a wide class of programming errors, from the trivial (misspelled identiﬁers) to the fairly deep (violations of data structure invariants). It makes program considerably safer, ensuring integrity of data structures and type-correct interconnection of program components. Safety is not the only motivation for equipping programming languages with type systems, however. Another motivation, which came ﬁrst historically, is to facilitate the eﬃcient compilation of programs. Static typing restricts the set of programs to be compiled, possibly eliminating programs containing constructs that are diﬃcult to compile eﬃciently or even to compile correctly at all. Also, static typing guarantees certain properties and invariants on the data manipulated by the program; the compiler can take advantage of these semantic guarantees to generate better code. The “Types in Compilation” workshops are dedicated to the study of these interactions between type systems and the compilation process.

2

Exploiting Type Information for Code Generation and Optimization

An early example of a type system directed towards eﬃcient compilation is that of Fortran. The Fortran type system introduces a strict separation between integers numbers and ﬂoating-point numbers at compile-time. The main motivation for this separation, according to Fortran’s designers, was to avoid the diﬃculties of handling mixed arithmetic at run-time [2, chapter 6]. Thanks to the type system, the compiler “knows” when to generate integer arithmetic operations, ﬂoating-point arithmetic operations, and conversions between integers and ﬂoats. Since then, this separation has permeated hardware design: most processor architectures provide separate register sets and arithmetic units for integers and for ﬂoats. In turn, this architectural bias makes it nearly impossible to generate X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 1–8, 1998. c Springer-Verlag Berlin Heidelberg 1998

2

Xavier Leroy

eﬃcient numerical code for a language whose type system does not statically distinguish ﬂoating-point numbers from integers. Another area where compilers rely heavily on static typing is the handling of variable-sized data. Diﬀerent data types have diﬀerent natural memory sizes: for instance, double-precision ﬂoats usually occupy more space than integers; the size and memory layout of aggregate data structures such as records and arrays vary with the sizes and number of their elements. Precise knowledge of size information is required to generate correct code that allocates and operates over data structures. This knowledge is usually derived from the static typing information: the type of a data determines its memory size and layout. Languages without static typing cannot be compiled as eﬃciently: all data representations must ﬁt a default size, if necessary by boxing (heap-allocating and handling through a pointer) data larger than the default size — an expensive operation. Staticallytyped languages whose type system is too ﬂexible to allow this determination of size information in all cases (e.g. because of polymorphism, type abstraction, or subtyping) make it more diﬃcult, but not impossible, to exploit unboxed data representations: see [31,21,34,16,39,22,33,28] for various approaches. Guarantees provided by the type system can also enable powerful program optimizations. For instance, in a strongly-typed language (whose type system does not allow “casts” between incompatible types), two pointers that have incompatible types cannot alias, i.e. cannot point to the same memory block. This guarantees that load and store operations through those two pointers cannot interfere, thus allowing more aggressive code motion and instruction scheduling [13]. One can also envision diﬀerent heap allocation strategies for objects of diﬀerent types, as exempliﬁed by the paper by Genius et al. in this proceeding. Another area where type information is useful is the optimization of method dispatch in object-oriented languages. General method dispatch is an expensive operation, involving a run-time lookup of the code associated to the method in the object’s method suite, followed by a costly indirect jump to that code. In a class-based language, if the actual class to which the object belongs is known at compile-time, a more eﬃcient direct invocation of the method code can be generated instead. If the code of the method is small enough, it can even be expanded in-line at the point of call. Simple examination of the static type of the object and of the class hierarchy of the program uncovers many opportunities for this optimization. For instance, if the static type of the object is a class C that has no subclasses, the compiler knows that the actual class of the object is C and can generate direct invocations for all methods of the object [10,15,5].

3

Program Analyses and Optimizations Based on Non-standard Type Systems

There are many points of convergence between, on the one hand, algorithms for type checking and type inference, and on the other hand, static analyses of programs intended to support code optimization. This should not come as a surprise: both static analyses and type inference algorithms attempt to reconstruct

Introduction

3

semantic information that is implicit in the program source, and propagate that information through the program, recording it at each program point. A more formal evidence is that both static analyses and type inference problems can be recast in the common framework of abstract interpretation [9]. What is more remarkable is that essentially identical algorithms are used for type inference and for certain program analyses. For instance, uniﬁcation between ﬁrst-order terms, as used for type inference in the Hindley-Milner type system of ML and Haskell, is also at the basis of several fast program analyses such as Steensgaard’s aliasing analysis [37], and Henglein’s tagging analysis [18]. Baker [6] reﬂects informally on this connection between Hindley-Milner type inference and several program analyses. Another technique that has attracted considerable interest recently both from a type inference standpoint and a program analysis standpoint consists in setting up systems of set inclusion constraints (set inequations) and solving them iteratively. This technique has been used to perform type inference for type systems with subtyping [25,3,14]. The same technique is also at the basis of several ﬂow analyses for functional and object-oriented languages [35,36,17,1,32,19,11]. These analyses approximate the ﬂow of control and data in the presence of ﬁrst-class functions and objects, and are very eﬀective to optimize function applications and method invocations, and also to eliminate dynamic type tests in dynamically-typed languages. Palsberg and O’Keefe [29] draw a formal connection between those two areas, by proving the equivalence between a ﬂow analysis (0-CFA) and a type inference algorithm (for the Amadio-Cardelli type system with subtyping and recursive types). The paper by Aiken et al. in these proceedings surveys the use of set inclusion constraints and equality (uniﬁcation) constraints for program analyses. Several non-standard type systems have been developed to capture more precisely the behavior of programs and support program transformations. The eﬀect systems introduced by Lucassen and Giﬀord [23,20] enrich function types with eﬀects approximating the dynamic behavior of the functions, such as inputoutput or operations on the store. This information is useful for code motion and automatic parallelization. Jouvelot, Talpin and Tofte [38,40] use region annotations on the types of data structures and functions to determine aliasing and lifetime information on data structures. The ML compiler developed by Tofte et al. [8] relies on these lifetime information for managing memory as a stack of regions with compiler-controlled explicit deallocation of regions instead of a conventional garbage collector. Tolmach’s paper in these proceedings presents a reformulation of simple eﬀect systems as monadic type systems. Shao and Trifonov’s paper develop a type system to keep track of the use of ﬁrst-class continuations in a program, thus allowing interoperability between languages that support callcc and languages that do not. Finally, non-standard type systems can also be used to record and exploit the results of earlier program analyses. For instance, Dimock et al. [12] and Banerjee [7] develop rich type systems that capture and exploit the ﬂow information produced by ﬂow analyses. Another example is Thiemann’s paper in these pro-

4

Xavier Leroy

ceedings, which develops a type system that captures resource constraints that appear in compilers during register allocation.

4

Types at Run-Time

Many programming languages require compiled programs to manipulate some amount of type information at run-time. Interesting compilation issues arise when trying to make these run-time manipulations of types as eﬃcient as possible. A prime example is the compilation of run-time type tests in dynamicallytyped languages such as Scheme and Lisp: many clever tagging schemes have been developed to support fast run-time type tests. Another example is objectoriented languages such as Java, Modula-3, or C++ with run-time type inspection, where programs can dynamically test the actual class of an object. Again, clever encodings of the type hierarchy have been developed to perform those tests eﬃciently. Even if the source language is fully statically typed, compilers and run-time systems may need to propagate type information to run-time in order to support certain operations. A typical example is the handling of non-parametric polymorphic operations such as polymorphic equality in ML and type classes in Haskell [41]. Another example is the handling of polymorphic records presented in [27]. There are several ways to precompile the required type information into an efﬁcient form: one is to attach simple tags to data structures; another is to pass extra arguments (type representations or dictionaries of functions) to polymorphic functions. Elsman’s paper in these proceedings compares the performances of these two approaches in the case of ML’s polymorphic equality. Passing run-time representations of type expressions as extra arguments to polymorphic function allows many type-directed compilation techniques to be applied to languages with polymorphic typing. The TIL compiler [39] and the Flint compiler [33] rely on run-time passing of type expressions (taken from extensions of the Fω type system) to handle unboxed data structures in polymorphic functions and modules with abstract types. Constructing and passing these type expressions at run-time entail some execution overhead. The paper by Shao and Saha in these proceedings shows how to minimize this overhead by lifting those type-related computations out of loops and functions so that they all take place once at the beginning of program execution. Non-conservative garbage collectors also require some amount of type information at run-time in order to distinguish pointers from non-pointers in memory roots and heap blocks. The traditional approach is to use tags on run-time values. Alternatively, Appel [4] suggested to attach source types to blocks of function code, and reconstruct type information for all reachable objects at run-time, using a variant of ML type reconstruction. The paper by Hosoya and Yonezawa in these proceedings is the ﬁrst complete formalization of this approach. Communicating static type information to the run-time system can be challenging, as it requires close cooperation from the compiler back-end. For instance, a type-directed garbage collector needs type information to be associated with

Introduction

5

registers and stack locations at garbage collection points; cooperation from the register allocator is needed to map the types of program variables onto the registers and stack slots. The paper by Bernard et al. in these proceedings discuss their experience with coercing a generic back-end into propagating type information. Another operation that relies heavily on run-time type information is marshaling and un-marshaling between arbitrary data structures and streams of bytes – a crucial mechanism for persistence and distributed programming. In these proceedings, Duggan develops rich type systems to support marshaling in the presence of user-deﬁned marshaling operations for some data types.

5

Typed Intermediate Languages

In traditional compiler technology, types are checked on the source language, but the intermediate representations used in the compilation process are essentially untyped. The intermediate representations may sometimes carry type annotations introduced by the front-end, but no provision is made for type-checking again these intermediate representations. Recently, several compilers have been developed that take the opposite approach: their intermediate representations are equipped with typing rules and type-checking algorithms, and their various passes are presented as type-preserving transformations that, given a well-typed input, must produce a well-typed term of the target intermediate language. The need for typed intermediate representations is obvious in compilers that require precise type information to be available till run-time, such as TIL and Flint [39,33], or at least till late in the compilation process. Without requiring that each compiler pass be type-preserving and its output typable, it is nearly impossible to ensure the propagation of correct type information throughout the whole compiler. Even in compilers that do not rely as crucially on types, typed intermediate languages can be extremely useful to facilitate the debugging of the compiler itself. During compiler development and testing, the type-checkers for the intermediate representations can be run on the outcome of every program transformation performed by the compiler. This catches a large majority of programming errors in the implementation of the transformations. In contrast with traditional compiler testing, which shows that the generated code is incorrect but does not indicate which pass is erroneous, type-checking the intermediate representations pinpoints precisely the culprit pass. The Glasgow Haskell compiler was one of the ﬁrst to exploit systematically this technique [30]. So far, typed intermediate representations as described above have been applied almost exclusively to compiling functional languages. The paper by Wright et al. in these proceedings develop a typed intermediate language for compiling Java, and discuss the diﬃcult issue of making explicit the “self” parameter to methods in a type-preserving way. Typed intermediate languages usually do not go all the way down to code generation. For instance, Glasgow Haskell preserves types through its high-level

6

Xavier Leroy

program transformation, but the actual code generation is mostly untyped. The TIL compiler goes several steps further, in particular by performing the conversion of functions into closures in a type-preserving manner [24]. The paper by Morrisett et al. in these proceedings shows how to go all the way to assembly code: it proposes a type system for assembly code that can type-check reasonably optimized assembly code, including most uses of a stack.

6

Other Applications of Types

While the discussion above has concentrated on core compiler technology for functional and object-oriented languages, types have also found many exciting and sometimes unexpected applications in other areas of programming language implementation. For instance, type-directed partial evaluation is an interesting alternative to traditional partial evaluation based on source-level reductions. The paper by Balat and Danvy in these proceedings presents a type-directed partial evaluator that also uses run-time code generation. The paper by Fujinami presents a partial evaluator and run-time code generator for C++. Languages for distributed programming based on process calculi are another area where the exploitation of type information is crucial to obtain good performances. Kobayashi’s abstract in these proceedings surveys this topic. Types have interesting applications in the area of language-based security for mobile code. Java applets have popularized the idea that foreign compiled code can be locally veriﬁed for type-correctness before execution. This local type-checking of compiled code then enables language-based security techniques that rely on typing invariants, such as the Java “sandbox”. Advances in typed intermediate languages have an important impact in this area. For instance, while Java code veriﬁcation is performed on unoptimized bytecode for an abstract machine, the paper by Morrisett et al. in these proceedings show that similar veriﬁcations can be carried on optimized machine code. Lee and Necula’s work on proof-carrying code [26] show how to generalize this approach to the veriﬁcation of arbitrary speciﬁcations. In conclusion, there has been considerable cross-fertilization between type systems and compilers, and we hope to see more exciting new applications of types in the area of programming language implementations in the near future.

References 1. Ole Agesen, Jens Palsberg, and Michael Schwartzback. Type inference of Self: analysis of objects with dynamic and multiple inheritance. In Proc. European Conference on Object-Oriented Programming – ECOOP’93, 1993. 2. Alfred V. Aho, Ravi Sethi, and Jeﬀrey D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley, 1986. 3. Alexander S. Aiken and Edward L. Wimmers. Type inclusion constraints and type inference. In Functional Programming Languages and Computer Architecture 1993, pages 31–41. ACM Press, 1993.

Introduction

7

4. Andrew W. Appel. Run-time tags aren’t necessary. Lisp and Symbolic Computation, 2(2), 1989. 5. David Bacon and Peter Sweeney. Fast static analysis of C++ virtual function calls. In Object-Oriented Programming Systems, Languages and Applications ’96, pages 324–341. ACM Press, 1996. 6. Henry G. Baker. Unify and conquer (garbage, updating, aliasing, . . . ) in functional languages. In Lisp and Functional Programming 1990. ACM Press, 1990. 7. Anindya Banerjee. A modular, polyvariant, and type-based closure analysis. In International Conference on Functional Programming 1997, pages 1–10. ACM Press, 1997. 8. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In 23rd symposium Principles of Programming Languages, pages 171–183. ACM Press, 1996. 9. Patrick Cousot. Types as abstract interpretations. In 24th symposium Principles of Programming Languages, pages 316–331. ACM Press, 1997. 10. Jeﬀrey Dean, David Grove, and Craig Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proc. European Conference on Object-Oriented Programming – ECOOP’95, pages 77–101. Springer-Verlag, 1995. 11. Greg DeFouw, David Grove, and Craig Chambers. Fast interprocedural class analysis. In 25th symposium Principles of Programming Languages, pages 222–236. ACM Press, 1998. 12. Allyn Dimock, Robert Muller, Franklyn Turbak, and J. B. Wells. Strongly typed ﬂow-directed representation transformations. In International Conference on Functional Programming 1997, pages 11–24. ACM Press, 1997. 13. Amer Diwan, Kathryn S. McKinley, and J. Eliot B. Moss. Type-based alias analysis. In Programming Language Design and Implementation 1998, pages 106–117. ACM Press, 1998. 14. Jonathan Eifrig, Scott Smith, and Valery Trifonov. Type inference for recursively constrained types and its application to OOP. In Mathematical Foundations of Programming Semantics, volume 1 of Electronic Notes in Theoretical Computer Science. Elsevier, 1995. 15. Mary F. Fern´ andez. Simple and eﬀective link-time optimization of Modula-3 programs. In Programming Language Design and Implementation 1995, pages 103– 115. ACM Press, 1995. 16. Robert Harper and Greg Morriset. Compiling polymorphism using intensional type analysis. In 22nd symposium Principles of Programming Languages ACM Press, 1995. 17. Nevin Heintze. Set-based analysis of ML programs. In Lisp and Functional Programming ’94, pages 306–317. ACM Press, 1994. 18. Fritz Henglein. Global tagging optimization by type inference. In Lisp and Functional Programming 1992. ACM Press, 1992. 19. Suresh Jagannathan and Andrew Wright. Polymorphic splitting: An eﬀective polyvariant ﬂow analysis. ACM Transactions on Programming Languages and Systems, 20(1):166–207, 1998. 20. Pierre Jouvelot and David K. Giﬀord. Algebraic reconstruction of types and eﬀects. In 18th symposium Principles of Programming Languages, pages 303–310. ACM Press, 1991. 21. Xavier Leroy. Unboxed objects and polymorphic typing. In 19th symposium Principles of Programming Languages, pages 177–188. ACM Press, 1992.

8

Xavier Leroy

22. Xavier Leroy. The eﬀectiveness of type-based unboxing. In Workshop Types in Compilation ’97. Technical report BCCS-97-03, Boston College, Computer Science Department, June 1997. 23. John M. Lucassen and David K. Giﬀord. Polymorphic eﬀect systems. In 15th symposium Principles of Programming Languages, pages 47–57. ACM Press, 1988. 24. Yasuhiko Minamide, Greg Morrisett, and Robert Harper. Typed closure conversion. In 23rd symposium Principles of Programming Languages, pages 271–283. ACM Press, 1996. 25. John C. Mitchell. Coercion and type inference. In 11th symposium Principles of Programming Languages, pages 175–185. ACM Press, 1984. 26. George C. Necula. Proof-carrying code. In 24th symposium Principles of Programming Languages, pages 106–119. ACM Press, 1997. 27. Atsushi Ohori. A polymorphic record calculus. ACM Transactions on Programming Languages and Systems, 17(6):844–895, 1995. 28. Atsushi Ohori and Tomonobu Takamizawa. An unboxed operational semantics for ML polymorphism. Lisp and Symbolic Computation, 10(1):61–91, 1997. 29. Jens Palsberg and Patrick O’Keefe. A type system equivalent to ﬂow analysis. In 22nd symposium Principles of Programming Languages, pages 367–378. ACM Press, 1995. 30. Simon L. Peyton-Jones. Compiling Haskell by program transformation: a report from the trenches. In European Symposium on Programming 1996, volume 1058 of Lecture Notes in Computer Science. Springer-Verlag, 1996. 31. Simon L. Peyton-Jones and John Launchbury. Unboxed values as ﬁrst-class citizens in a non-strict functional language. In Functional Programming Languages and Computer Architecture 1991, volume 523 of Lecture Notes in Computer Science, pages 636–666, 1991. 32. John Plevyak and Andrew Chien. Precise concrete type inference for objectoriented languages. In Object-Oriented Programming Systems, Languages and Applications ’94, pages 324–340. ACM Press, 1994. 33. Zhong Shao. Flexible representation analysis. In International Conference on Functional Programming 1997, pages 85–98. ACM Press, 1997. 34. Zhong Shao and Andrew Appel. A type-based compiler for Standard ML. In Programming Language Design and Implementation 1995, pages 116–129. ACM Press, 1995. 35. Olin Shivers. Control-ﬂow analysis in Scheme. In Programming Language Design and Implementation 1988, pages 164–174. ACM Press, 1988. 36. Olin Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, May 1991. 37. Bjarne Steensgaard. Points-to analysis in almost linear time. In 23rd symposium Principles of Programming Languages, pages 32–41. ACM Press, 1996. 38. Jean-Pierre Talpin and Pierre Jouvelot. The type and eﬀect discipline. Information and Computation, 111(2):245–296, 1994. 39. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: a type-directed optimizing compiler for ML. In Programming Language Design and Implementation 1996, pages 181–192. ACM Press, 1996. 40. Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Information and Computation, 132(2):109–176, 1997. 41. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less adhoc. In 16th symposium Principles of Programming Languages, pages 60–76. ACM Press, 1989.

Introduction Xavier Leroy INRIA Rocquencourt Domaine de Voluceau, 78153 Le Chesnay, France

1

Types

in

programming languages

Most programming languages are equipped with a type system that detects type errors in the program, such as using a variable or result of a given type in a context that expects data of a different, incompatible type. Such type checking can take place either statically (at compile-time) or dynamically (at run-time). Type checking has proved to be very effective in catching a wide class of programming errors, from the trivial (misspelled identifiers) to the fairly deep (violations of data structure invariants). It makes program considerably safer, ensuring integrity of data structures and type-correct interconnection of program components. Safety is not the only motivation for equipping programming languages with type systems, however. Another motivation, which came first historically, is to facilitate the efficient compilation of programs. Static typing restricts the set of programs to be compiled, possibly eliminating programs containing constructs that are difficult to compile efficiently or even to compile correctly at all. Also, static typing guarantees certain properties and invariants on the data manipulated by the program; the compiler can take advantage of these semantic guarantees to generate better code. The "Types in Compilation" workshops are dedicated to the study of these interactions between type systems and the compilation process.

2

Exploiting

type

information for code generation and

optimization An early example of a type system directed towards efficient compilation is that of Fortran. The Fortran type system introduces a strict separation between integers numbers and floating-point numbers at compile-time. The main motivation for this separation, according to Fortran's designers, was to avoid the difficulties of handling mixed arithmetic at run-time 2, chapter 6. Thanks to the type system, the compiler "knows" when to generate integer arithmetic operations, floating-point arithmetic operations, and conversions between integers and floats. Since then, this separation has permeated hardware design: most processor architectures provide separate register sets and arithmetic units for integers and for floats. In turn, this architectural bias makes it nearly impossible to generate

efficient numerical code for a language whose type system does not statically distinguish floating-point numbers from integers. Another area where compilers rely heavily on static typing is the handling of variable-sized data. Different data types have different natural memory sizes: for instance, double-precision floats usually occupy more space than integers; the size and memory layout of aggregate data structures such as records and arrays vary with the sizes and number of their elements. Precise knowledge of size information is required to generate correct code that allocates and operates over data structures. This knowledge is usually derived from the static typing information: the type of a data determines its memory size and layout. Languages without static typing cannot be compiled as efficiently: all data representations must fit a default size, if necessary by boxing (heap-allocating and handling through a pointer) data larger than the default size - - an expensive operation. Staticallytyped languages whose type system is too flexible to allow this determination of size information in all cases (e.g. because of polymorphism, type abstraction, or subtyping) make it more difficult, but not impossible, to exploit unboxed data representations: see 31, 21, 34, 16, 39, 22, 33, 28 for various approaches. Guarantees provided by the type system can also enable powerful program optimizations. For instance, in a strongly-typed language (whose type system does not allow "casts" between incompatible types), two pointers that have incompatible types cannot alias, i.e. cannot point to the same memory block. This guarantees that load and store operations through those two pointers cannot interfere, thus allowing more aggressive code motion and instruction scheduling 13. One can also envision different heap allocation strategies for objects of different types, as exemplified by the paper by Genius et al. in this proceeding. Another area where type information is useful is the optimization of method dispatch in object-oriented languages. General method dispatch is an expensive operation, involving a run-time lookup of the code associated to the method in the object's method suite, followed by a costly indirect jump to that code. In a class-based language, if the actual class to which the object belongs is known at compile-time, a more efficient direct invocation of the method code can be generated instead. If the code of the method is small enough, it can even be expanded in-line at the point of call. Simple examination of the static type of the object and of the class hierarchy of the program uncovers many opportunities for this optimization. For instance, if the static type of the object is a class C that has no subclasses, the compiler knows that the actual class of the object is C and can generate direct invocations for all methods of the object 10, 15, 5.

3

Program analyses and optimizations based on non-standard type systems

There are many points of convergence between, on the one hand, algorithms for type checking and type inference, and on the other hand, static analyses of programs intended to support code optimization. This should not come as a surprise: both static analyses and type inference algorithms attempt to reconstruct

semantic information that is implicit in the program source, and propagate that information through the program, recording it at each program point. A more formal evidence is that both static analyses and type inference problems can be recast in the common framework of abstract interpretation 9. What is more remarkable is that essentially identical algorithms are used for type inference and for certain program analyses. For instance, unification between first-order terms, as used for type inference in the Hindley-Milner type system of ML and Haskell, is also at the basis of several fast program analyses such as Steensgaard's aliasing analysis 37, and Henglein's tagging analysis 18. Baker 6 reflects informally on this connection between Hindley-Milner type inference and several program analyses. Another technique that has attracted considerable interest recently both from a type inference standpoint and a program analysis standpoint consists in setting up systems of set inclusion constraints (set inequations) and solving them iteratively. This technique has been used to perform type inference for type systems with subtyping 25, 3, 14. The same technique is also at the basis of several flow analyses for functional and object-oriented languages 35,36, 17, 1,32, 19, 11. These analyses approximate the flow of control and data in the presence of first-class functions and objects, and are very effective to optimize function applications and method invocations, and also to eliminate dynamic type tests in dynamically-typed languages. Palsberg and O'Keefe 29 draw a formal connection between those two areas, by proving the equivalence between a flow analysis (0-CFA) and a type inference algorithm (for the Amadio-Cardelli type system with subtyping and recursive types). The paper by Aiken et al. in these proceedings surveys the use of set inclusion constraints and equality (unification) constraints for program analyses. Several non-standard type systems have been developed to capture more precisely the behavior of programs and support program transformations. The effect systems introduced by Lucassen and Gifford 23, 20 enrich function types with effects approximating the dynamic behavior of the functions, such as inputoutput or operations on the store. This information is useful for code motion and automatic parallelization. Jouvelot, Talpin and Tofte 38, 40 use region annotations on the types of data structures and functions to determine aliasing and lifetime information on data structures. The ML compiler developed by Tofte et al. 8 relies on these lifetime information for managing memory as a stack of regions with compiler-controlled explicit deallocation of regions instead of a conventional garbage collector. Tolmach's paper in these proceedings presents a reformulation of simple effect systems as monadic type systems. Shao and Trifonov's paper develop a type system to keep track of the use of first-class continuations in a program, thus allowing interoperability between languages that support c a l l c c and languages that do not. Finally, non-standard type systems can also be used to record and exploit the results of earlier program analyses. For instance, Dimock et al. 12 and Banerjee 7 develop rich type systems that capture and exploit the flow information produced by flow analyses. Another example is Thiemann's paper in these pro-

ceedings, which develops a type system that captures resource constraints that appear in compilers during register allocation.

4

T y p e s at r u n - t i m e

Many programming languages require compiled programs to manipulate some amount of type information at run-time. Interesting compilation issues arise when trying to make these run-time manipulations of types as efficient as possible. A prime example is the compilation of run-time type tests in dynamicallytyped languages such as Scheme and Lisp: many clever tagging schemes have been developed to support fast run-time type tests. Another example is objectoriented languages such as Java, Modula-3, or C + + with run-time type inspection, where programs can dynamically test the actual class of an object. Again, clever encodings of the type hierarchy have been developed to perform those tests efficiently. Even if the source language is fully statically typed, compilers and run-time systems may need to propagate type information to run-time in order to support certain operations. A typical example is the handling of non-parametric polymorphic operations such as polymorphic equality in ML and type classes in Haskell 41. Another example is the handling of polymorphic records presented in 27. There are several ways to precompile the required type information into an efficient form: one is to attach simple tags to data structures; another is to pass extra arguments (type representations or dictionaries of functions) to polymorphic functions. Elsman's paper in these proceedings compares the performances of these two approaches in the case of ML's polymorphic equality. Passing run-time representations of type expressions as extra arguments to polymorphic function allows many type-directed compilation techniques to be applied to languages with polymorphic typing. The TIL compiler 39 and the Flint compiler 33 rely on run-time passing of type expressions (taken from extensions of the F~ type system) to handle unboxed data structures in polymorphic functions and modules with abstract types. Constructing and passing these type expressions at run-time entail some execution overhead. The paper by Shao and Saha in these proceedings shows how to minimize this overhead by lifting those type-related computations out of loops and functions so that they all take place once at the beginning of program execution. Non-conservative garbage collectors also require some amount of type information at run-time in order to distinguish pointers from non-pointers in memory roots and heap blocks. The traditional approach is to use tags on run-time values. Alternatively, Appel 4 suggested to attach source types to blocks of function code, and reconstruct type information for all reachable objects at run-time, using a variant of ML type reconstruction. The paper by Hosoya and Yonezawa in these proceedings is the first complete formalization of this approach. Communicating static type information to the run-time system can be challenging, as it requires close cooperation from the compiler back-end. For instance, a type-directed garbage collector needs type information to be associated with

registers and stack locations at garbage collection points; cooperation from the register allocator is needed to map the types of program variables onto the registers and stack slots. The paper by Bernard et al. in these proceedings discuss their experience with coercing a generic back-end into propagating type information. Another operation that relies heavily on run-time type information is marshaling and un-marshaling between arbitrary data structures and streams of bytes - a crucial mechanism for persistence and distributed programming. In these proceedings, Duggan develops rich type systems to support marshaling in the presence of user-defined marshaling operations for some data types. 5

Typed

intermediate

languages

In traditional compiler technology, types are checked on the source language, but the intermediate representations used in the compilation process are essentially untyped. The intermediate representations maysometimes carry type annotations introduced by the front-end, but no provision is made for type-checking again these intermediate representations. Recently, several compilers have been developed that take the opposite approach: theirintermediate representations are equipped with typing rules and type-checking algorithms, and their various passes are presented as type-preserving transformations that, given a well-typed input, must produce a well-typed term of the target intermediate language. The need for typed intermediate representations is obvious in compilers that require precise type information to be available till run-time, such as TIL and Flint 39, 33, or at least till late in the compilation process. Without requiring that each compiler pass be type-preserving and its output typable, it is nearly impossible to ensure the propagation of correct type information throughout the whole compiler. Even in compilers that do not rely as crucially on types, typed intermediate languages can be extremely useful to facilitate the debugging of the compiler itself. During compiler development and testing, the type-checkers for the intermediate representations can be run on the outcome of every program transformation performed by the compiler. This catches a large majority of programming errors in the implementation of the transformations. In contrast with traditional compiler testing, which shows that the generated code is incorrect but does not indicate which pass is erroneous, type-checking the intermediate representations pinpoints precisely the culprit pass. The Glasgow Haskell compiler was one of the first to exploit systematically this technique 30. So far, typed intermediate representations as described above have been applied almost exclusively to compiling functional languages. The paper by Wright et al. in these proceedings develop a typed intermediate language for compiling Java, and discuss the difficult issue of making explicit the "self" parameter to methods in a type-preserving way. Typed intermediate languages usually do not go all the way down to code generation. For instance, Glasgow Haskell preserves types through its high-level

program transformation, but the actual code generation is mostly untyped. The TIL compiler goes several steps further, in particular by performing the conversion of functions into closures in a type-preserving manner 24. The paper by Morrisett et al. in these proceedings shows how to go all the way to assembly code: it proposes a type system for assembly code that can type-check reasonably optimized assembly code, including most uses of a stack.

6

Other applications of types

While the discussion above has concentrated on core compiler technology for functional and object-oriented languages, types have also found many exciting and sometimes unexpected applications in other areas of programming language implementation. For instance, type-directed partial evaluation is an interesting alternative to traditional partial evaluation based on source-level reductions. The paper by Balat and Danvy in these proceedings presents a type-directed partial evaluator that also uses run-time code generation. The paper by Fujinami presents a partial evaluator and run-time code generator for C + + . Languages for distributed programming based on process calculi are another area where the exploitation of type information is crucial to obtain good performances. Kobayashi's abstract in these proceedings surveys this topic. Types have interesting applications in the area of language-based security for mobile code. Java applets have popularized the idea that foreign compiled code can be locally verified for type-correctness before execution. This local type-checking of compiled code then enables language-based security techniques that rely on typing invariants, such as the Java "sandbox". Advances in typed intermediate languages have an important impact in this area. For instance, while Java code verification is performed on unoptimized bytecode for an abstract machine, the paper by Morrisett et al. in these proceedings show that similar verifications can be carried on optimized machine code. Lee and Necula's work on proof-carrying code 26 show how to generalize this approach to the verification of arbitrary specifications. In conclusion, there has been considerable cross-fertilization between type systems and compilers, and we hope to see more exciting new applications of types in the area of programming language implementations in the near future.

References 1. Ole Agesen, Jens Palsberg, and Michael Schwartzback. Type inference of Self: analysis of objects with dynamic and multiple inheritance. In Proc. European Conerence on Object-Oriented Programming -ECOOP'93, 1993. 2. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: principles, techniques, and tools. Addison-Wesley, 1986. 3. Alexander S. Aiken and Edward L. Wimmers. Type inclusion constraints and type inference. In Functional Programming Languages and Computer Architecture 1993, pages 31-41. ACM Press, 1993.

4. Andrew W. Appel. Run-time tags aren't necessary. Lisp and Symbolic Computation, 2(2), 1989. 5. David Bacon and Peter Sweeney. Fast static analysis of C + + virtual function calls. In Object-Oriented Programming Systems, Languages and Applications '96, pages 324-341. ACM Press, 1996. 6. Henry G. Baker. Unify and conquer (garbage, updating, aliasing, ... ) in functional languages. In Lisp and Functional Programming 1990. ACM Press, 1990. 7. Anindya Banerjee. A modular, polyvariant, and type-based closure analysis. In International Conerence on Functional Programming 1997, pages 1-10. ACM Press, 1997. 8. Lars Birkedal, Mads Torte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In 23rd symposium Principles of Programming Languages, pages 171-183. ACM Press, 1996. 9. Patrick Cousot. Types as abstract interpretations. In 24th symposium Principles o Programming Languages, pages 316-331. ACM Press, 1997. 10. Jeffrey Dean, David Grove, and Craig Chambers. Optimization of object-oriented programs using static class hierarchy analysis. In Proc. European Conference on Object-Oriented Programming - ECOOP'95, pages 77-101. Springer-Verlag, 1995. 11. Greg DeFouw, David Grove, and Craig Chambers. Fast interprocedural class analysis. In 25th symposium Principles o Programming Languages, pages 222-236. ACM Press, 1998. 12. Allyn Dimock, Robert Muller, Franklyn Turbak, and J. B. Wells. Strongly typed flow-directed representation transformations. In International Conference on Functional Programming 1997, pages 11-24. ACM Press, 1997. 13. Amer Diwan, Kathryn S. McKinley, and J. Eliot B. Moss. Type-based alias analysis. In Programming Language Design and Implementation 1998, pages 106-117. ACM Press, 1998. 14. Jonathan Eifrig, Scott Smith, and Valery Trifonov. Type inference for recursively constrained types and its application to OOP. In Mathematical Foundations of Programming Semantics, volume 1 of Electronic Notes in Theoretical Computer Science. Elsevier, 1995. 15. Mary F. Fernandez. Simple and effective link-time optimization of Modula-3 programs. In Programming Language Design and Implementation 1995, pages 103115. ACM Press, 1995. 16. Robert Harper and Greg Morriset. Compiling polymorphism using intensional type analysis. In 22nd symposium Principles of Programming Languages ACM Press, 1995. 17. Nevin Heintze. Set-based analysis of ML programs. In Lisp and Functional Programming '9~, pages 306-317. ACM Press, 1994. 18. Fritz Henglein. Global tagging optimization by type inference. In Lisp and Functional Programming 1992. ACM Press, 1992. 19. Suresh Jagannathan and Andrew Wright. Polymorphic splitting: An effective polyvariant flow analysis. ACM Transactions on Programming Languages and Systems, 20(1):166-207, 1998. 20. Pierre Jouvelot and David K. Gifford. Algebraic reconstruction of types and effects. In 18th symposium Principles o Programming Languages, pages 303-310. ACM Press, 1991. 21. Xavier Leroy. Unboxed objects and polymorphic typing. In 19th symposium Principles of Programming Languages, pages 177-188. ACM Press, 1992.

22. Xavier Leroy. The effectiveness of type-based unboxing. In Workshop Types in Compilation '97. Technical report BCCS-97-03, Boston College, Computer Science Department, June 1997. 23. John M. Lucassen and David K. Gifford. Polymorphic effect systems. In 15th symposium Principles of Programming Languages, pages 47-57. ACM Press, 1988. 24. Yasuhiko Minamide, Greg Morrisett, and Robert Harper. Typed closure conversion. In 23rd symposium Principles of Programming Languages, pages 271-283. ACM Press, 1996. 25. John C. Mitchell. Coercion and type inference. In 11th symposium Principles of Programming Languages, pages 175-185. ACM Press, 1984. 26. George C. Necula. Proof-carrying code. In 2,~th symposium Principles of Programming Languages, pages 106-119. ACM Press, 1997. 27. Atsushi Ohori. A polymorphic record calculus. ACM Transactions on Programming Languages and Systems, 17(6):844-895, 1995. 28. Atsushi Ohori and Tomonobu Takamizawa. An unboxed operational semantics for ML polymorphism. Lisp and Symbolic Computation, 10(1):61-91, 1997. 29. Jens Palsberg and Patrick O'Keefe. A type system equivalent to flow analysis. In 22nd symposium Principles of Programming Languages, pages 367-378. ACM Press, 1995. 30. Simon L. Peyton-Jones. Compiling Haskell by program transformation: a report from the trenches. In European Symposium on Programming 1996, volume 1058 of Lecture Notes in Computer Science. Springer-Verlag, 1996. 31. Simon L. Peyton-Jones and John Launchbury. Unboxed values as first-class citizens in a non-strict functional language. In Functional Programming Languages and Computer Architecture 1991, volume 523 of Lecture Notes in Computer Science, pages 636-666, 1991. 32. John Plevyak and Andrew Chien. Precise concrete type inference for objectoriented languages. In Object-Oriented Programming Systems, Languages and Applications '9~, pages 324-340. ACM Press, 1994. 33. Zhong Shao. Flexible representation analysis. In International Conference on Functional Programming 1997, pages 85-98. ACM Press, 1997. 34. Zhong Sha~ and Andrew Appel. A type-based compiler for Standard ML. In Programming Language Design and Implementation 1995, pages 116-129. ACM Press, 1995. 35. Olin Shivers. Control-flow analysis in Scheme. In Programming Language Design and Implementation 1988, pages 164-174. ACM Press, 1988. 36. Olin Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, May 1991. 37. Bjarne Steensgaard. Points-to analysis in almost linear time. In 23rd symposium Principles of Programming Languages, pages 32-41. ACM Press, 1996. 38. Jean-Pierre Talpin and Pierre Jouvelot. The type and effect discipline. Information and Computation, 111(2):245-296, 1994. 39. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, mad P. Lee. TIL: a type-directed optimizing compiler for ML. In Programming Language Design and Implementation 1996, pages 181-192. ACM Press, 1996. 40. Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Information and Computation, 132(2):109-176, 1997. 41. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less adhoc. In 16th symposium Principles of Programming Languages, pages 60-76. ACM Press, 1989.

Compiling Java to a T y p e d Lambda-Calculus: A Preliminary R e p o r t Andrew Wright 1, Suresh Jagannathan 2, Cristian Ungureanu 2, and Aaron Hertzmann 3 1 STAR Laboratory, InterTrust Technologies Corp., 460 Oakmead Parkway, Sunnyvale, CA 94086, wright~intertrust.com 2 NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, {suresh,cristian)(~research.nj.nec.com 3 Media Research Laboratory, New York University, 715 Broadway, NewYork, NY 10003, hertzman~mrl.nyu.edu

1

Introduction

A typical compiler for Java translates source code into machine-independent byte code. The byte code may be either interpreted by a Java Virtual Machine, or further compiled to native code by a just-in-time compiler. The byte code architecture provides platform independence at the cost of execution speed. When Java is used as a tool for writing applets--small ultra-portable programs that migrate across the web on demand--this tradeoff is justified. However, as Java gains acceptance as a mainstream programming language, performance rather than platform independence becomes a prominent issue. To obtain highperformance code for less mobile applications, we are developing an optimizing compiler for Java that bypasses byte code, and, just like optimizing compilers for C or Fortran, translates Java directly to native code. Our approach to building an optimizing compiler for Java has two novel aspects: we use an intermediate language based on lambda-calculus, and this intermediate language is typed. Intermediate representations based on lambdacalculi have been instrumental in developing high-quality implementations of functional languages such Scheme 13, 19 and Standard ML 3. By using an intermediate language based on lambda-calculus to compile Java, we hope to gain the same organizational benefits in our compiler. The past few years have also seen the development in the functional programming community of a new approach to designing compilers for languages like ML and Haskell based on typed intermediate languages 15, 20. By emphasizing formal definition of a compiler's intermediate languages with associated type systems, this approach yields several benefits. First, properties such as type safety of the intermediate languages can be studied mathematically outside the sometimes messy environment of compiler source code. Second, type checkers can be implemented for the intermediate languages, and by running these type checkers on the intermediate programs after various transformations, we can detect a large class of errors in transformations. Indeed, by running a type checker

10 after each transformation, we may be able to localize a bug causing incorrect code to a specific transformation, without even running the generated code. Finally, a formal definition of a typed intermediate language serves as complete and precise documentation of the interface between two compiler passes. In short, using typed intermediate languages leads to higher levels of confidence in the correctness of compilers. Our compiler first performs ordinary Java type checking on the source program, and then translates the Java program into an intermediate language (IL) of records and first-order procedures. The translation (1) converts an object into a record containing mutable fields for instance variables and immutable procedures for methods; (2) replaces a method call with a combination of record field selections and a first-order procedure call; (3) makes the implicit self parameter of a method explicit by adding an additional parameter to the procedure representing that method and passing the record representing the object as an additional argument at calls; and (4) replaces Java's complex name resolution mechanisms with ordinary static scoping. The resulting IL program typechecks since the source program did, but its typing derivation uses record subtyping where the derivation for the Java program used inheritance subtyping. In contrast to our approach, traditional compilers for object-oriented languages typically perform analyses and optimizations on a graphical representation of a program. Nodes represent arithmetic operations, assignments, conditional branches, control merges, and message sends 8. In later stages of optimization, message send nodes may be replaced with combinations of more primitive operations to permit method dispatch optimization. In earlier stages of optimization, program graphs satisfy an informal type system which is essentially that of the source language. In later stages, program graphs are best viewed as untyped, like the representations manipulated by conventional compilers for procedural languages. By compiling Java using a typed lambda-calculus, we hope to gain increased confidence in the correctness of the generated code. Indeed, for languages like Java that are used to write web-based applications, whether mobile or not, correctness is vital. Incorrect code generated by the compiler could lead to a security breach with serious consequences. Additionally, by translating Java into an intermediate language of records and procedures, we hope to leverage not only optimizations developed for object-oriented languages 8, but also optimizations developed for functional languages 3, 15, 20 such as Standard ML and Haskell, as well as classical optimizations for static-single-assignment representations of imperative languages 7. In particular, representing objects as records exposes their representations to optimization. The representations of objects can be changed by transformations on IL programs, and the type system ensures that the resulting representations are consistent. Even for optimizations like inlining and copy propagation that do not explicitly change object representations, the type system provides valuable assurance that representations remain consistent. Unfortunately, the problem of designing a sound type system that incorporates object-oriented features into a record-based language appears to have no

11 simple solution. With a straightforward translation of objects into records and a natural type system, contravariance in the subtyping rule for function types foils the necessary subtyping relation between the types of records that represent Java objects. The problem is that making the implicit recursion through an object's self parameter explicit as an additional argument to each method leads to function types that are recursive in both covariant and contravariant positions, and hence permit no subtyping. More sophisticated type systems that can express the necessary subtyping exist 2, 5, 16, but these type systems require more complex encodings of objects and classes. Object calculi that keep self-recursion implicit 1,5 are more complex than record calculi and do not expose representations in a manner suitable for an intermediate language. Rather than devise an unwieldy IL and translation, we take a more pragmatic approach. We assume that a Java program is first type-checked by the Java type-checker before it is translated into the IL. Now, optimizations and transformations performed on the IL must ensure that (1) IL typing is preserved, and (2) safety invariants provided by the Java type-checker are not violated. To satisfy the first requirement, self parameters in the IL are assigned type T (top), the type that is the supertype of any record type. To satisfy the second requirement, typecase operations are inserted within method bodies to recover the appropriate type of self parameters as dictated by the Java type system. The resulting IL program is typable and performs runtime checks at typecase expressions to ensure it is safe with respect to Java typing. However, since the source program has passed the Java type-checker, these checks should never fall. Failure indicates a compiler bug. During compiler development, these checks remain in the generated object code. For production code, the code generator simply omits the checks. In either case, we lose the ability to statically detect errors in transformations that misuse self parameters. On the other hand, we can still detect a large class of type errors involving misuse of other parameters and variables, and we gain the benefit of a simple, typed intermediate language that is easy to work with. The remainder of the paper is organized as follows. The next section presents a core IL of records and procedures. Following that, Section 3 illustrates the translation from Java to our IL with several examples. Section 4 concludes with a summary of related work. 2

Language

The following grammar defines the types of our explicitly-typed intermediate language for Java:

t ::=pt I rt I t*-~ t J tag rt::=#a.{tag:tag x:#*} #a.gtag:tagx:fl*~ ~ f-t ::= pt array rt array I vt vt ::= t vat t pt ::= boolean byte J short J intJlong I char ~oat l double I void

12 where x E Vat is a set of variables and a E Ty Vat is a set of type variables used for recursive type definitions. There are four kinds of types t: primitive types pt, function types tl . . . t~ --+ t, ordered record types {Xl : ftl " " x,~ : fin}, and unordered record types {{X 1 : f t 1 " ' " X n : f t n . TWO additional kinds, mutable variable types t var and mutable array types pt array and rt array, are not fullfledged types in their own right, but m a y be used as types of fields in records and as types of variables. Several restrictions, which are motivated below, apply to the formation of types. The field names Xl 999 xn of a record type must be distinct. T h e first field of an unordered record type must be n a m e d tag and of type tag. Tags encode the static type of an object, and are used to inspect the type of a record at runtime. An ordered record type need not include a field n a m e d tag of type tag, but if it does, this'field must appear first. Unordered record types are considered equal under different orderings of their second through last fields; t h a t is, {{tag: tag, x2:.ft 2 -.. x n : ~ n }

= {{tag: tag, p e r m u t e ( x 2 : f t 2 , . . . , X n : f t n )

~

where p e r m u t e yields an arbitrary p e r m u t a t i o n of its arguments. T h e fields of ordered record types m a y not be rearranged. Both kinds of record types m a y be recursive if prefixed by the binding operator/~, hence t ---- #O~.{X 1 : f t I " ' " X n : f t n } =

{X 1 : f t l O /

~

t

"''

Xn : ftnO/

~

t}

and

where t'a ~ t denotes the substitution of t for free occurrences of a in t'. Figure 1 defines the subtyping relation on types. The relation allows a longer ordered record type to be a subtype of a shorter record type, provided the sequence of field names of the shorter type is a prefix of the sequence of field names of the longer type, and provided t h a t the types of like-named fields are subtypes. Since the fields of unordered record types can be reordered arbitrarily (except for the first), a longer unordered record type is a subtype of any shorter unordered record type with a subset of the longer t y p e ' s fields. An ordered record type is also a subtype of an unordered record type with the same fields. The subtyping relation includes the usual contravariant rule for function types, as well as a covariant rule for array types. Our translation uses ordered record types to represent J a v a classes. In the intermediate language, subtyping on ordered record types expresses J a v a ' s single inheritance class hierarchy. Because field offsets for ordered record types can be computed statically, the translation can implement access to a m e m b e r of a J a v a object with efficient record-field selection operations. For example, our translation could represent objects of the following J a v a classes:

class A {

)

int i; Af(Ax)

class B extends A {

int get_i0 { return i; } {i=O;returnx;}

}

13

pt <: pt

tl < : t2 t2 < : t3 tl <: t3

t var <: t var

t <: t ~ t array < : t ~ array

t~ < : tl . . . t " < : t~ t < : t' tl . . . t~ ~ t < : t~ . . . t" - ~ t'

(~1 : ~

f t l <: ft'l . . . f t . < : ft'. : f t ~ . . . ~ + ~ : f t ~ + ~ } < : {~1 :ft'l . . . ~

... ~

:ft'}

!

{~1: ftl ... ~:~

{xl:/tl

.

.

.

... ~+~:

ft~+~

<: ~ 1 :

x~:#n}

<: ~ 1 : # 1

ft'l . . . x ~ : ~ ' J

"'" ~ :

c~<:a'=~t<:t' a~tt p tt~.t <: Ita~.t ~

#~~

cr'~tt

Fig. 1. Subtyping relation.

with the following IL types: tA = #(~. { tag : tag, i: i n t v a r , f : ~tag : tag}

tB = { tag : tag, i : int var, f : {tag : tag}

• (~ -~ (~

}

x tA ~ tA, --~ i n t

get_i: {tag : tag}

} (In fact the translated types are not quite this simple; see Section 3.) The type {tag : tag} plays the role of T discussed in the introduction since any record type containing a tag field is a subtype of this type. The J a v a typing rules permit an object of class B to be passed to methods like f t h a t expect an A. Since tB <: tA, values of type tB can be passed to b o t h IL functions f. A reference to any field of a record of type tA or t s is implemented as a fixed-offset access into the record. Since J a v a interfaces permit multiple inheritance, ordered record types cannot support the necessary subtyping for interface types. Hence our translation uses unordered record types to represent interfaces. Accessing a particular field of a record of unordered type is more expensive as record values with different field orders can belong to the same unordered record type. T h e field access operation for unordered record types determines the actual order of a value's fields from the initial tag field required of the unordered type. For example, consider the following J a v a interface and its corresponding IL type: Interface J { int get_i(); A f( A x );

}

t j = {tag : tag,

}

get_i : {tag: tag} --+ i n t , f : {tag: t a g } x tA --+ tA

14

e

::~

V

let d *

in e

X x::e e.x e . x :~--- e

e~x e~x

:~- e

e; e

e(e*) r(e*) if e then e else e typecase e of g as x =~ e* try e e raise e

ee el:= ee :={} e V

::=

C

I I f

::~

= e

I e' d ::= x : v t = e

else e l

syntactic value binding record construction variable reference variable update ordered record field selection ordered record field update unordered record field selection unordered record field update sequencing procedure invocation primitive invocation conditional type conditional exception handler exception raise array element selection array update for primitive types array update for record types simple constazat tag first-order procedure record of values initial value array construction value declaration set of recursive value declarations tag declaration

Fig. 2. Expression syntax.

If we amend class B to implement interface J, type tB does not change, and we have tB <: tj. (Again, the translated types are not quite this simple; see Section 3.) Figure 2 specifies the expressions e, values v, and declarations d of our intermediate language, where g E Tag are tags, c E Const are basic constants, and r E P r i m are primitive operations. Constants, tags, and procedures are values, as well as records where all initializers are values. Primitive operations can only

15 a p p e a r in call position. Procedures are called by value, bind their arguments as usual, and must be first-order: the only free variables a procedure is allowed are global variables bound by top-level let-expressions. A declaration di _= (x : vt = e) appearing in an expression let dl .- 9d,~ in e' binds x of type vt in di+l through dn and e'. A recursive declaration di -- rec xa : ta = Va "'" Xz : tz = Vz binds x ~ . . . xz of types t a . . . tz in all of v ~ . . . v~ and e'. A tag declaration di = g ~ t -~: gl " " g n introduces tag g and associates it with type t and tags gl""gn. Tags g l ' " g n are called supertags of g. Conversely, g is a subtag of gl 999gn. The translation places a tag in a record field n a m e d tag when the t y p e a record was constructed with m a y need to be recovered by a language operation like typecase. In a record construction, a field type t var indicates t h a t the field is mutable, but its initializer must be a value of type t. Similarly, declarations of type t var introduce mutable variables must have initializers of type t. Mutable fields and variables are automatically "dereferenced" when accessed. (There are no values of type t v a t . ) The expressions e.x and e@x access fields of records of ordered and unordered type, respectively. The expressions e . x := e and e Q x := e u p d a t e such fields. The unordered record operations e @ x and e @ x := e use the initial tag field of a record to determine the a p p r o p r i a t e offset into the record. Ordinary if-expressions provide boolean conditionals, and typecase tests the tag of a record-valued expression. A typecase expression evaluates the first clause g as x ~ e for which g is a supertag of the record's tag. In the clause b o d y e, x is bound to the record, but with a more precise type. T h e expression try el e2 evaluates el with e2 as an exception handler. If el raises no exception, its value is returned as the value of the try-expression. If el raises exception v, the expression e2 (v) is evaluated and its value becomes the value of the try-expression. The expression raise e evaluates e to a record v and raises an exception. Since arrays can only appear within records in our IL, the three expressions for accessing and updating arrays actually operate on records. These operations retrieve or modify array elements associated with a record field n a m e d array. Another field n a m e d length stores an array's size. The assignment operation ele2 :={} e3 for arrays whose elements are records sets the element of el.array at index e2 to the value of e3. Due to the covariant rule for array subtyping, this operation must also perform a runtime check to ensure t h a t the value of e3 is a subtype of the runtime array component type. Hence a third field n a m e d elerntag holds a tag representing the component type of the array. Since J a v a arrays are implicitly subtypes of the J a v a class Object, our translation places additional fields such as clone and getClass in records t h a t represent arrays. We explain our rationale for this t r e a t m e n t of arrays below. IL expressions must obey a collection of t y p e checking rules. To simplify the presentation, we describe these rules in two groups. Figure 3 defines the first group of rules which concern simple expressions and procedures. The function ~9 strips var off a type:

:D(ft) =

l tft if f t = t var ; otherwise.

16

TypeOf(c) = p t

A x l ~-~ h i ' " AF- AXl :h...xn

AP c: pt A ( x ) = vt A t- x : 2P(vt)

xn ~-~ t~ t- e : t : t ~ . e : t l . . . t n --~ t

A ( x ) = t var A t- e : t A F- x :---e : void

A t- eo : tl . . . tn --~ t At-el:t1 . . . A t- e,~ : t~ A ~- e o ( e l . . . e n ) : t

TypeOf(r)----pt 1. . . p t n -+ t

A t- el : p t 1 . . . A F- e,~ : p t u A F- r ( e l . . . e , ~ ) : t

A F- el :boolean

A ~- e2 : t

A F- e3 : t

A ~ el : tl At-el;

A t- if el then e2 else ea : t A~-dl

~ A1

9 ..

A+A~+'"+A,~-~t-dn~An

A F- e2 : t2 e2 :t2

A+Al+...+Ant-e:t

A t- letdl .--d,~ ine : t A F- e : I ) ( v t ) A F- x : vt = e ~ x ~

t <: T(gl) ... t <: T ( g , 0 At- g~ Axl

~

tl...xn

~

vt

T ( g ) =- t t -~: g l ' " g , ~ ~

t~ ~- vl : tl . . . A x l

~

G(g) = {gl,...,g,~)

tl...x~

~

t~ ~- v~ : t~

A ~- rec Xl : h = Vl ... x~ : t~ = v~ ==v xl ~-+ t l . . . x ~ ~ t~ A~-e:t t<:t A~-e:t'

~

Fig. 3. Typing rules for simple expressions.

A is a type assignment t h a t maps variables to types. The rules also refer to two global m a p s T and G. Map T : T a g fin) T y p e associates types with tags, and m a p G : T a g ~n> 7) ( T a g ) associates sets of tags with tags. An IL expression e is typable if there exist m a p s T and G and a typing derivation concluding ~- e : t. Most of the typing rules for simple expressions are standard; we discuss only the exceptions. The last three rules produce environments for declarations. T h e rule for a tag declaration g ~ t -~: gl 9"" gn requires the global m a p T to associate g with type t, and the m a p G to associate g with the set { g l , . . - , gn}. T allows the type associated with g to be recovered by language operations such as typecase. G abstracts the J a v a type hierarchy and allows language operations such as typecase to test relations in this hierarchy. For soundness, the typing rule requires t h a t the types associated with tags related in G be similarly related under subtyping; t h a t is, if g is declared to be a subtag of g ' , then T ( g ) <: T ( g ~ ) . Figure 4 defines the typing rules for records and related expressions. We explain only the non-standard rules here. A tag has type t a g , provided t h a t T

17

g e Dom(T)

g e Dom(G)

At-g:

tag

A ~- el : i l l " " A ~- e,~ : f t n A ~- {x~ : ft~ = e~ . . . X n : f'~n = e n } : { X l : f'~ 1 " ' "

Xn : ftn}

'

if xl ----tag a n d ft 1 = tag t h e n el = g a n d T ( g ) = { X l : f t 1 . . . x ~ :ft~} if x~ = array a n d ft~ = t a r r a y t h e n ei = e~ . . . e ~ a n d xj -- length a n d ej -- m a n d i < j if xi = array a n d ft i -- rt a r r a y t h e n xk = elemtag a n d ek -- g~ a n d T ( g ~) ----rt a n d k < i < j At-e:T~(vt) A ~- e : v t

At-el :t ... At-en:t A ~- e l - . . e~ : t a r r a y

A~-e:{... x:ft} A F- e . x : l ) ( f t )

A~-e:{... x:tvar} A~-e2:t A ~- e l . x := e2 : v o i d

A ~- e : ~ t a g : tag, x : t v a r } A ~" e2 : t A t- e l ~ x := e2 : v o i d

A t- e : ( t a g : tag, x : f t } A ~- e @ x : 1)(ft)

A t- el : {--" array : t a r r a y . . . length : i n t ) A t- ele2 : t

A ~- e2 : i n t

A t- ez : { - " array : p t a r r a y . . . length : i n t } A F- e2 : i n t A ~- ele2 : = e z : v o i d

A ~- e3 : p t

A i- el : { ' - - elemtag : tag . . . array : t a r r a y . . . length : i n t } A ~- e2 : i n t

A ~- ea : t

A ~- ele2

t <: ~ t a g : t a g }

:=(} e3 : v o i d

A F- eo : ~ t a g : t a g }

Axl

~ T(gl)

F- ez : t . . . A x ,

typecase eo of gl as x : =~ el

A P el : t

~ T(g,~) t- e , : t A t- e , + l : t ... 9~ as x~ =~ e~ else e~+l : t

A F- e2 : ( t a g : t a g } --~ t A ~- try el e2 : t

A F e : ~tag: tag} A t- raise e : v o i d

F i g . 4. T y p i n g rules for records a n d related expressions.

a n d G a s s o c i a t e it w i t h a p p r o p r i a t e t y p e s a n d s u p e r t a g s . R e c o r d e x p r e s s i o n s receive o r d e r e d r e c o r d t y p e s w i t h s e v e r a l r e s t r i c t i o n s . F i r s t , if t h e first field is n a m e d t a g a n d h a s t y p e t a g , t h e n its i n i t i a l i z e r m u s t b e a t a g g w h o s e t y p e i n T is t h e t y p e of t h e e n t i r e r e c o r d . T h i s e n s u r e s t h a t a r e c o r d ' s t y p e c a n b e r e c o v e r e d f r o m its tag. S e c o n d , a field m a y h a v e a n a r r a y i n i t i a l i z e r o f l e n g t h m if a n d o n l y if t h e field's n a m e is array a n d t h e r e is a field n a m e d length w h o s e

18 initializer is the constant rn. This restriction ensures that the length field can be used for bounds checking accesses to the array. Third, if an array field is present of type rt array where rt is a record type, then the record must include a field named elerntag whose initializer is a tag corresponding to rt. Array update uses the elemtag field to perform its runtime type check. The third and fourth typing rules handle initializers for fields. The rules for array access and update require el to be a record containing array and length fields. The rule for update where the component type is a record type additionally requires an elemtag field. The rule for typecase requires that the expression being tested have a record type including a tag field. For each clause gi as xi =~ ei, variable xi is bound in ei to T(g~), since the typing rule for record construction ensures that any record containing tag gi will have type T(gi). Finally, the typing rules for exception constructs require the exception be a tagged record as the translation uses typecase within handlers to distinguish different exceptions. Provided t h a t array access and update operations perform bounds checks, this type system is sound. But to achieve high performance code, we need to lift array bounds checks out of loops or eliminate them entirely. Our IL is designed so that a safe array access operation can be replaced with a combination of an explicit test and a corresponding unsafe operation. For instance, we replace el e2 with let a

=

el

i~e2

in if i > 0 & i < a.length then unsafeai else raise IndexOutOfBoundsException The explicit tests so introduced can then be optimized as usual. 3

Translation

The translation from Java to our intermediate language of records and procedures: 9 replaces method dispatch with simple record accesses and a first-order procedure call; 9 passes object state explicitly through this parameters that are treated no differently from any other function parameter; 9 supports efficient implementation of member access by representing objects as ordered records; 9 replaces Java's complex mechanisms for name resolution (visibility keywords, overloading, super, inner classes, and packages) with ordinary static scoping; 9 flattens the class inheritance hierarchy by explicitly including record fields defined by superclasses; 9 expresses method sharing among objects of the same class by placing procedures that implement the methods in a shared record; 9 accommodates subtyping between Java classes by assigning type T to this and using typecase to recover the appropriate type;

19 class Point ( int x;

}

int y -- 3; Point() { x = 2;}; public void my( int dx, int dy ) { x + = dx; y + = dy; }; public boolean eq( Point other ) { return (x ~ = other.x 2~&. y =__- other.y); } Point like() { Point p = new Point(); p.x = x; p.y = y; return p; }

class ColorPoint extends Point { int c;

public boolean eq( Point other ) { if ( other instanceof ColorPoint ) return super.eq( other ) & & c - - = ((ColorPoint) other).c; else

}

return false;

ColorPoint sc( int c ) { this.c ---- c; return this; } ColorPoint 0 { super(); } ColorPoint( int c ) { super(); this.c ---- c; }

Fig. 5. Example Java classes.

9 uses type tags on records to support runtime type tests and casts; 9 accommodates interface subtyping by using unordered record types; 9 lifts static methods and constructor and initialization procedures out of classes and represents them as top-level procedures; 9 expresses class initialization as explicit tests and calls that can be optimized; 9 replaces implicit conversions on primitive types with explicit operations, eliminates widening conversions in favor of implicit subtyping, and expresses narrowing conversions with typecase; 9 expresses local control constructs (for, while, break, etc.) with uses of tailrecursive procedures; 9 places lock and unlock instructions where control enters or leaves synchronized blocks. In this section, we illustrate some aspects of this translation with examples. All Java objects implicitly extend class Object and hence have members such as clone and getClass, but we omit such members in these examples to simplify the presentation. Figure 5 presents Java code defining two classes Point and ColorPoint. In the example, a Point object contains x and y coordinate fields, and methods to move a point (mv), test whether two points are the same (eq), and clone a new point from the current one (like). Class ColorPoint inherits from Point and adds a color field c. ColorPoint overrides the eq method of Point and also provides a

20 t p =/~a. { tag: tag,

methods: { mv: ~tag : t a g x int x int --~ void, eq: ~tag : tag x c~ --~ boolean, like: ~tag : t a g --~ a

}, }

x: int var, y: int var

tc ----#f~. { tag: tag, methods: { mv: ~tag : tag x int x int -+ void, eq: ~tag : t a g x tp -+ boolean, like: ~tag : tag -+ tp, sc: ~tag : t a g x int --~

}, x: int vat, y: int var, c: int vat Fig. 6. Types for Point and ColorPoint objects in the IL.

new m e t h o d sc to set its color. T h e ColorPoint class declares two constructors. The first initializes a new ColorPoint object with a default color; the second sets the color field explicitly to the color supplied as an argument. Figure 6 presents the record types corresponding to Point and ColorPoint. In general, records corresponding to objects include a tag field, a methods field, and fields for the instance variables, b o t h explicit and inherited. The methods field contains a record of functions corresponding to the instance methods of the class, b o t h explicit and inherited. Initially, this record is shared by all objects of the class, although optimizations m a y replace it with a record of specialized functions in certain objects. The functions take an additional first argument which is the object record itself. The IL types do not include fields for constructors or static methods as these procedures are called directly without selecting t h e m from an object. The types of my and eq in t c and t p are the s a m e . This is because J a v a requires t h a t an overriding method be of the same type as the overridden one. Since t c has at least the same fields as t p , and since the m e m b e r s in the shared prefix have the same type, we have t c <: t p . Hence a record denoting a ColorPoint can be passed to a function t h a t expects a record denoting a Point. A p r o g r a m in our intermediate language consists of a set of mutually recursive values corresponding to methods, constructors, and m e t h o d tables. Other t h a n references to other top-level definitions, these procedures have no free variables. Notably, this is supplied as an explicit argument, unlike its t r e a t m e n t in

21 let tagP ~ tp rec newP: -4 t p -,~. { tag:tag = tagP, methods: . . . .

Pmethods,

x : i n t v a t = O, y : i n t v a t = 0 } initP: t p - 4 void =

Athis:tp . this.y := 3; this.x := 2 Pmethods: . . . . { mv: . . . . mvP, eq: . . . . eqP, like: . . . . likeP } mvP: ~tag : t a g } x int x i n t -4 void = )~this:~tag : t a g } , dx:int, dy:int . typecase this of tagP as this =~ this.x := this.x -t- dx; this.y := this.y -I- dy else raise CompilerError eqP: ~tag: t a g } t p --~ boolean x

=

Athis:~tag : t a g } , other:tp . typecase this of tagP as this =~ if not(this.x = = other.x) then false else this.y = = other.y else raise CompilerError likeP: ~tag : tag} -4 tp = ~this:{tag : tag~ .

typecase this of tagP as this =~ let it = newP 0 in initP(it); it else raise CompilerError

in... Fig. 7. Translation of Point class.

J a v a and other object-based languages. This property facilitates code-movement optimizations on our IL such as inlining. Figure 7 shows the translation of the Point class. We elide some types t h a t are obvious from context. The translation generates a procedure newP for constructing new Point objects, a procedure initP for initializing them, a record of functions corresponding to the methods of the class, and the functions t h e m selves. Each method function dispatches on the type of its first argument. A tag encodes the static type of an object; this type is examined at runtime using typecase. Thus, if mv is invoked by an object t h a t is not a Point, the argument t a g supplied in the call will not be a subtag of tagP, and a runtime exception will be raised. Such an error will not be caught at compile-time because the type expected by mv for this argument is T = {{tag : t a g ~ . Indeed, T is the s e l f type expected by all translated methods. Figure 8 shows the translation of the ColorPoint class. An interesting aspect of ColorPoint's definition is its use of super. Calls to super in ColorPoint constructors are translated to calls to initP. The call super.eq( other ) becomes a direct call to eqP since Java's semantics dictate t h a t such uses of super bypass the usual

22 let ... code for Point ... in let tagC ~ tp rec newC: -~ to = )t. { tag:tag = tagC, methods:... --- Cmethods, x : i n t var = O, y : i n t v a t = O, c:int var = 0 }

initCl: t c --~ void = Athis:tc. initP(this) initC2: tc x i n t -+ void = Athis:tc, c:int, initP(this); this.c := c Cmethods: ... = { my: . . . . mvP, eq: . . . . eqC, like: . . . . eqC: tag : t a g } x t p -+ boolean = Xthis:tag : t a g } , other: tp . typecase this of

likeP, sc : . . . . scC }

tagC as this =~ if (typecase other of tagC as other =~ true else false) then (if not(eqP(this, other)) then false else this.c = = (typecase other of tagC as other =~ other else raise CastException).c)

else false else raise CompilerError scC: ~tag : t a g ~ x int --+ t c = )~this:~tag : t a g ~ , c:int . typecase this of tagC as this =;~ this.c :-- c; this else raise CompilerError in

...

Fig. 8. Translation of ColorPoint class

dynamic method dispatch. Uses of typecase capture the runtime behavior of instanceof and narrowing conversions. In particular, (typecase other . . . ) takes the tagC branch if other.tag is tagC or any subtag of tagC. All records containing such a tag are guaranteed to represent ColorPoints, or belong to subclasses of ColorPoint.

Figure 9 illustrates a Java interface Widget and its corresponding type t w in our IL. Since the classes that implement Widget may have methods in different orders, the methods field of t w has an unordered record type. If we amend Point to implement Widget, the translated types t~, and t~ for Point and ColorPoint, also shown in Figure 9, include a tag field in their methods record to achieve the subtyping t~ <: t~ <: t w .

23 interface Widget { boolean eq( Point other ); void mv( int dx, int dy );

} t w = { tag: tag,

methods: ~ tag: tag, eq: ~tag : t a g } x t~p -~ boolean, my:

}

~tag : t a g } x

i n t x int -+ void

} t~ = pc~. { tag:

tag,

methods: { tag: tag, my: {tag : t a g } x int x int --~ void, eq: ~tag : t a g } x a --~ boolean, like: ~tag : t a g } -~ a

}, x: int var, y:

int var

} t~ = #f~. { tag:

tag,

methods: { tag: tag, my: ~tag : t a g } x int x int -~ void, eq: ~tag : t a g } x t~ -~ boolean, like: {tag : t a g } --~ tip, sc: ~tag : t a g } x int -~

},

x: i n t vat,

y: int c: i n t

vat, vat

Fig. 9. Interface Widget and types for Widget, Point, and ColorPoint.

4

Related Work

Optimizations for object-oriented languages, type systems for object-oriented languages, and typed intermediate languages are three topics that have been investigated independently by other researchers and relate to the work presented here. O p t i m i z a t i o n s for O b j e c t s An important issue addressed by optimizing compilers for object-oriented languages is reducing the overhead introduced by encoding polymorphism. Statically-typed object-oriented languages such as Java support polymorphism

24 through subclassing. Subclasses share implementations with their parents. Because methods can be overridden to provide alternative implementations, the exact method invoked at a call site may not be easily determined at compile time. Indeed, without aggressive analyses, compilers are unlikely to determine the control flow of a program that makes any significant use of inheritance. On the other hand, relying only on intraprocedural optimization may not be effective because methods are usually short and make frequent calls to other methods. There are two main ways of eliminating the dispatch at a call x.(...). Either (i) the value of the receiver x can be of only one type T, in which case we can call T's method f directly, or (ii) x can be of any of the types in a set S, but all types in S share the same implementation of f , in which case we can call f directly. Concrete type inference and class hierarchy analysis are two well-known analyses that have been devised to address the issue of dispatch elimination. Concrete Type Inference 14, 17, 9, 10 is a form of flow analysis that identifies, for each expression, the set of possible types its values may belong to. When a receiver is found to have only one possible type, the method dispatch can be replaced by a direct function call to that type's method. Class Hierarchy Analysis 9, 4, 10 is a program analysis that, based solely on the program's class structure, identifies a set of types S that share the same implementation of method . An example of such a set is the set containing class C and all subclasses of C that do not override f . Such sets can be computed either from programmer's annotations ("final" in Java) or from inspection of the complete class hierarchy. The analysis can be adapted to work, although less beneficially, in the presence of separate compilation, where implementations are separated from interfaces. In such cases it is still possible to eliminate method dispatch at link time 11. Even if the above analyses are unable to identify a call site as calling a unique function, it may still be possible to optimize the program by using a type-case statement with execution branching on the exact type of the value to code specific to each possible type 6. Message splitting is a variation of this technique which consists of duplicating not only the method call on each branch of the type case, but subsequent statements as well, whenever this enables further optimizations. Dynamically typed languages, and to a lesser extent statically typed languages, could benefit from type feedback--information about the set of concrete types that a receiver is observed to have during program's execution. Comparison of type feedback with either class hierarchy analysis 9 or concrete type inference 12 shows it to be a valuable technique. In contrast to our typed intermediate language, the intermediate language on which these optimizations have typically been performed is an untyped controlflow graph. Low-level nodes in the graph are used to represent arithmetic operations, assignments, conditional branches, etc. High-level nodes are used to represent the semantics of method calls 6. High-level nodes help the compiler postpone code-generation decisions for method dispatch until after optimizations aimed at replacing method calls with direct function calls are performed, Re-

25 maining method dispatches are then translated into more primitive operations, and the code is then subject to further intra-procedural optimizations. This approach is well-suited for implementing dynamically typed languages, where a method dispatch can be a rather heavy-weight construct. On the other hand, in a statically typed language with single inheritance such as Java, method dispatch consists of fetching a function pointer from a record from a known offset, and calling that function. We believe that in such a setting, an intermediate language based on first order functions and records is a viable alternative. All the complicated constructs of the source language, including method dispatch, are translated into simpler operations. Flow analysis techniques used to drive interprocedural optimizations for functional languages can be directly applied to our intermediate language and need not be modified to understand the nuances of method dispatch. By having available the function tables constructed for each type, analyses can still compute a reasonably precise conservative approximation to the set of methods called at a call site, facilitating optimizations like inlining.

Type Systems for Objects In designing our typed IL for Java, we considered and rejected several alternatives. A naive attempt to translate Java into a record-based IL uses the same language and type system as ours, but gives self parameters the object's record type rather than T. That is, my, eq, and like in class Point all expect a value of type tp for their first argument. This solution fails because, translating ColorPoint the same way, we no longer have tc <: tp due to contravariant subtyping of functions. Hence many Java-typable programs are not typable under such a translation. Several object calculi have most of the language features found in Java and support the necessary subtyping 1, 5. However, in these calculi, self parameters are implicitly bound, and method dispatch is not broken down into separate function selection and procedure call mechanisms. Consequently it would be difficult to adapt existing techniques for optimizing procedural languages to such calculi. Moreover, the complexity of these calculi make them inappropriate as the foundation for an IL. Finally languages that employ a split-self semantics represent an object as a pair of a record containing the object's state and a record containing the object's code 16. They use existential types to achieve subtyping, and include pack and unpack operations to manipulate values of existential type. The encoding of objects in this style is complex and unwieldy for use in a compiler.

Typed Intermediate Languages Several advanced functional language implementations have embraced the use of a typed intermediate language to express optimizations and transformations 18, 20. The motivation for using a typed intermediate language holds equally well in the context of a Java implementation. Like most functional languages, Java has a rich type system and requires aggressive compiler optimization to achieve

26 acceptable performance. However, while the intermediate language type systems developed for functional language implementations have been based on a polymorphic A-calculus, the type system in our IL more closely reflects features found in Java. Thus, it provides record subtyping to express single inheritance, unordered record types to express interfaces, and a tag type to express runtime type inspection. To summarize, our typed intermediate language for Java serves three major roles: (1) it gives us increased confidence in the correctness of optimizations; (2) it exposes salient properties of an object's representation that may be then optimized; and (3) it facilitates type-specific decisions throughout the compiler and runtime system. We are confident that a typed intermediate language of this kind will be instrumental in realizing a high-performance Java implementation. References 1. ABADI, M., AND CARDELLI,L. A Theory of Objects. Springer-Verlag, 1996. 2. ABADI, M., CARDELLI,L., AND VISWANATHAN,R. An Interpretation of Objects and Object Types. In Proceedings of the Conference on Principles of Programming Languages (1996), pp. 392-406. 3. APPEL, A. W. Compiling with Continuations. Cambridge University Press, 1991. 4. BACON, D., AND SWEENEY, P. Fast static analysis of C++ virtual function calls. OOPSLA '96 Conference on Object-Oriented Programming Systems, Languages, and Applications (1996). 5. BRUCE, K. B., CARDELLI,L., AND PIERCE, B. C. Comparing Object Encodings. In Theoretical Aspects of Computer Software (TACS), Sendal, Japan (Sept. 1997). 6. CHAMBERS, C. The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages. PhD thesis, Stanford University, March 1992. 7. CYTRON,R., FERRANTE,J., ROSEN, B. K., WEGMAN,M. N., AND ZADECK,F. K. Efficiently Computing Static Single Assignment Form and the Control Dependence Graph. TOPLAS 13, 4 (October 1991), 451-490. 8. DEAN, J., DEFOUW, G., GROVE, D., LITVINOV,V., AND CHAMBERS,C. Vortex: An optimizing compiler for object-oriented languages. OOPSLA '96 Conference on Object-Oriented Programming Systems, Languages, and Applications (1996), 83-100. 9. DEAN, J., GROVE, D., AND CHAMBERS,C. Optimization of object-oriented programs using static class hierarchy analysis. ECOOP (1995). 10. DIWAN, A., MOSS, E., AND McKINLEY, K. Simple and effective analysis of statically-typed object-oriented programs. OOPSLA '96 Conference on ObjectOriented Programming Systems, Languages, and Applications (1996). 11. FERNANDEZ,M. F. Simple and effective link-time optimization of modula-3 programs. Proceedings of the Conference on Programming Language Design and Implementation (1995), 103-115. 12. HOLZLE,U., AND AGESEN, 0. Dynamic versus static optimization techniques for object-oriented languages. OOPSLA '95 Conference on Object-Oriented Programming Systems, Languages, and Applications (1995). 13. KRANZ, D., KELSEY, R., REES, J. A., HUDAK, P., PHILBIN, J., AND ADAMS, N . I . Orbit: An optimizing compiler for scheme. ACM SIGPLAN Conference Proceedings (1986).

27 14. PALSBERG, J., AND SCHWARTZBACH,M. I. Object-oriented type inference. OOP-

15.

16.

17.

18. 19. 20.

SLA '91 Conference on Object-Oriented Programming Systems, Languages, and Applications (1991), 146-161. PEYTON-JONES, S., LAUNCHBURY, J., SHIELDS, M., AND TOLMACH, A. Briding the gulf: A common intermediate language for ML and Haskell. In Proceedings of the Conference on Principles of Programming Languages (1998), ACM Press, pp. 49-61. PIERCE, B. C., AND TURNER, D. N. Simple type-theoretic foundations for objectoriented programming. Journal of Functional Programming 4, 2 (Apr. 1994), 207247. A preliminary version appeared in Principles of Programming Languages, 1993, and as University of Edinburgh technical report ECS-LFCS-92-225, under the title "Object-Oriented Programming Without Recursive Types". PLEVYAK,J., AND CHIEN, A. A. Precise concrete type inference for object-oriented languages. OOPSLA '94 Object-Oriented Programming Systems, Language, and Applications (1994), 324-340. SHAO, Z. Flexible Representation Analysis. In Proceedings of the International Conference on Functional Programming (1997), ACM Press, pp. 85-98. STEELE JR., G. L. Rabbit: a compiler for scheme. Master's thesis, Massachusetts Institute of Technology, May 1977. TARDITI, D., MORRISETT, G., CHENG, P., STONE, C., HARPER, R., AND LEE, P. TIL: A Type-Directed Optimizing Compiler for ML. In Proceedings of the Conference on Programming Language Design and Implementation (1996), ACM Press, pp. 181-192.

Stack-Based T y p e d A s s e m b l y Language * Greg Morrisett, Karl Crary, Neal Glew, and David Walker Cornell University

A b s t r a c t . In previous work, we presented a

Typed Assembly Language

(TAL). TAL is sufficiently expressive to serve as a target language for compilers of high-level languages such as ML. This work assumed such a compiler would perform a continuation-passing style transform and eliminate the control stack by heap-allocating activation records. However, most compilers are based on stack allocation. This paper presents STAL, an extension of TAL with stack constructs and stack types to support the stack allocation style. We show that STAL is sufficiently expressive to support languages such as Java, Pascal, and ML; constructs such as exceptions and displays; and optimizations such as tail call elimination and callee-saves registers. This paper also formalizes the typing connection between CPS-based compilation and stack-based compilation and illustrates how STAL can formally model calling conventions by specifying them as formal translations of source function types to STAL types.

1

Introduction

and

Motivation

Statically typed source languages have efficiency and software engineering advantages over their dynamically typed counterparts. Modern type-directed compilers 19, 25, 7, 32, 20, 29, 12 exploit the properties of typed languages more extensively than their predecessors by preserving type information computed in the front end through a series of typed intermediate languages. These compilers use types to direct sophisticated transformations such as closure conversion 18, 31, 17,1,21, region inference 8, subsumption elimination 9, 11, and unboxing 19, 22, 28. Without types these transformations are, in many cases, less effective or impossible. Furthermore, the type translation partially specifies the corresponding term translation and often captures the critical concerns in an elegant and succinct fashion. Strong type systems not only describe but also enforce many important invariants. Consequently, developers of type-based compilers may invoke a type-checker after each code transformation, and if the output fails to type-check, the developer knows that the compiler contains an internal error. Although type-checkers for decidable type systems will not catch all compiler errors, they have proven themselves valuable debugging tools in practice 24. * This material is based on work supported in part by the AFOSR grant F49620-971-0013, ARPA/RADC grant F30602-96-1-0317, ARPA/AF grant F30602-95-1-0047, and AASERT grant N00014-95-1-0985. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not reflect the views of these agencies.

29 Despite the numerous advantages of compiling with types, until recently, no compiler propagated type information through the final stages of code generation. The TIL/ML compiler, for instance, preserves types through approximately 80% of compilation but leaves the remaining 20% untyped. Many of the complex tasks of code generation including register allocation and instruction scheduling are left unchecked; types are not used to specify or explain these low-level code transformations. These observations motivated our exploration of very low-level type systems and corresponding compiler technology. In Morrisett et al. 23, we presented a typed assembly language (TAL) and proved that its type system was sound with respect to an operational semantics. We demonstrated the expressiveness of this type system by sketching a type-preserving compiler from an ML-like language to TAL. The compiler ensured that well-typed source programs were always mapped to well-typed assembly language programs and that they preserved source level abstractions such as user-defined abstract data types and closures. Furthermore, we claimed that the type system of TAL did not interfere with many traditional compiler optimizations including inlining, loop-unrolling, register allocation, instruction selection, and instruction scheduling. However, the compiler we presented was critically based on a continuationpassing style (CPS) transform, which eliminated the need for a control stack. In particular, activation records were represented by heap-allocated closures as in the SML of New Jersey compiler (SML/NJ) 2, 3. For example, Figure 1 shows the TAL code our heap-based compiler would produce for the recursive factorial computation. Each function takes an additional argument which represents the control stack as a continuation closure. Instead of "returning" to the caller, a function invokes its continuation closure by jumping directly to the code of the closure, passing the environment of the closure and the result in registers. Allocating continuation closures on the heap has many advantages over a conventional stack-based implementation. First, it is straightforward to implement control primitives such as exceptions, first-class continuations, or user-level lightweight coroutine threads 3, 31,34. Second, Appel and Shao 5 have shown that heap allocation of closures can have better space properties, primarily because it is easier to share environments. Third, there is a unified memory management mechanism (namely the garbage collector) for allocating and collecting all kinds of objects, including activation frames. Finally, Appel and Shao 5 have argued that, at least for SML/NJ, the locality lost by heap-allocating activation frames is negligible. Nevertheless, there are also compelling reasons for providing support for stacks. First, Appel and Shao's work did not consider imperative languages, such as Java, where the ability to share environments is greatly reduced, nor did it consider languages that do not require garbage collection. Second, Tarditi and Diwan 14, 13 have shown that with some cache architectures, heap allocation of continuations (as in SML/NJ) can have substantial overhead due to a loss of locality. Third, stack-based activation records can have a smaller memory footprint than heap-based activation records. Finally, many machine architectures

30 have hardware mechanisms that expect programs to behave in a stack-like fashion. For example, the Pentium Pro processor has an internal stack that it uses to predict return addresses for procedures so that instruction pre-fetching will not be stalled 16. The internal stack is guided by the use of call/return primitives which use the standard control stack. Clearly, compiler writers must weigh a complex set of factors before choosing stack allocation, heap allocation, or both. The target language must not constrain these design decisions. In this paper, we explore the addition of a stack to our typed assembly language in order to give compiler writers the flexibility they need. Our stack typing discipline is remarkably simple, but powerful enough to compile languages such as Pascal, Java, or ML without adding high-level primitives to the assembly language. More specifically, the typing discipline supports stack allocation of temporary variables and values that do not escape, stack allocation of procedure activation frames, exception handlers, and displays, as well as optimizations such as callee~ registers. Unlike the JVM architecture 20, our system does not constrain the stack to have the same size at each control-flow point, nor does it require new high-level primitives for procedure call/return. Instead, our assembly language continues to have low-level RISC-like primitives such as loads, stores, and jumps. However, source-level stack allocation, general source-level stack pointers, general pointers into either the stack or heap, and some advanced optimizations cannot be typed. A key contribution of the type structure is that it provides a unifying declarative framework for specifying procedure calling conventions regardless of the allocation strategy. In addition, the framework further elucidates the connection between a heap-based continuation-passing style compiler, and a conventional stack-based compiler. In particular, this type structure makes explicit the notion that the only differences between the two styles are that, instead of passing the continuation as a boxed, heap-allocated tuple, a stack-based compiler passes the continuation unboxed in registers and the environments for continuations are allocated on the stack. The general framework makes it easy to transfer transformations developed for one style to the other. For instance, we can easily explain the callee-saves registers of SML/NJ 2-4 and the callee-saves registers of a stack-based compiler as instances of a more general CPS transformation that is independent of the continuation representation. 2

Overview

o f TALL a n d C P S - B a s e d

Compilation

In this section, we briefly review our original proposal for typed assembly language (TAL) and sketch how a polymorphic functional language, such as ML, can be compiled to TAL in a continuation-passing style, where continuations are heap-allocated. Figure 2 gives the syntax for TAL. Programs (P) are triples consisting of a heap, register file, and instruction sequence. Heaps map labels to heap values which are either tuples of word-sized values or code sequences. Register files map registers to word-sized values. Instruction sequences are lists of instructions

31

(H, {}, I ) where H = l_fact: code {rl: <> ,r2: int ,r3: T/c}. bneq r2,1_nonzero unpack G,r3,r3 7. zero branch: call k (in r3) with 1 id r4,r3 (0) Z project k code ld rl,r3(1) 7. project k environment moT r2, I 7. j u m p to k imp r4 l_nonzero : code{rl:O,r2:int,rS:T/c}. sub r 4 , r 2 , 1 7.n--I malloc r5int, vk 7. create environment for cont in r5 st r5(O),r2 7. store n into environment st r5(1),r3 7. store k into environment m a l l o c r 3 V.{rl: mov r2,r4 7. arg := n - 1 m o t ~3,va~k < i ~ l , T ~ > , r 3 a s T/c 7. a b s t r a c t environment t y p e jmp l _ f a c t 7. recursive call l_cont : cod.{rl:<~t 1, ~ ),r2:i~t}. 7. r 2 contains (n - 1)! l d r 3 , r l (0) 7. retrieve n id r4,rl (I) Z retrieve k mul r2,r3,r2 7. n • unpack ct,r4,r4 7. unpack k id r3,r4(O) 7. project k code ld rl,r4(1) 7. project k e n v i r o n m e n t jmp r3 7. j u m p to k 1_halt : code{rl:O,r2:int } 9 mot rl,r2 haltint 7. halt with result in r l and I =

malloc rl malloc r2

malloc r3~f.{rl:O,r2:int}, mot r4, l_halt st r3(O),r4 st r3(1),r2 mov r 2 , 6 NOV r3,pack 0,r3 as "r/c jmp l _ f a c t

7. create e m p t y environment ( 0 ) 7. create e m p t y environment <) 7. create halt closure in r 3 7. 7. 7. 7. 7. 7.

store cont code store environment 0 load a r g u m e n t (6) abstract environment type begin fact with { r l -- 0 , r 2 -- 6, r3 ----haltcont}

F i g . 1. T y p e d A s s e m b l y Code for Factorial

32 types initialization flags label assignments type assignments register assignments registers word values small values heap values heaps register files instructions

T

::=

I int I vA r I <"$-~ 01)...) T'n,~n> I 30g.T 011 {~1:T1,...

F ::---r :::

,~n:rn}

{rl :rl,..., r~:r~} rl'"

Irk

v ::~

~ i 7r wr l Pa& r, w as r' r I ~ I vN ) pack k, v as r'

h ::~

(~1,..., ~ ) I r

w :::

H

::=

R ::~

{gl ~-~ h i , . . . ,gn ~-~ hn} {rl ~-~ W l , . . . , r n ~-+ Wn} aop rd,r,,v I bop r, v l ld rd,rs(i) lmalloc rr-~ NOV rd, v mst rd(i), r. Iunpack o~, rd, v I

arithmetic ops branch ops instruction sequences programs

aop : : : add sub mul bop ::: beq mbneq bgt b l t bgte I blte t;I I jmp v I halt 1" P ::= (H, R, I) Fig. 2. Syntax of TAL

terminated by either a imp or h a l t instruction. The context A ~ ~ds the free type variables of F in VIAl.F, and of both F and I in codeAF.I. The instruction unpack c~,r,v binds (~ in the following instructions. We consider syntactic objects to be equivalent up to alpha-conversion, and consider label assignments, register assignments, heaps, and register files equivalent up to reordering of labels and registers. Register names do not alpha-convert. The notation )~ denotes a sequence of zero or more Xs, and 9 I denotes the length of a sequence. The instruction set consists mostly of conventional RISC-style assembly operations, including arithmetic, branches, loads, and stores. One exception, the unpack instruction, strips the quantifier from the type of an existentially typed value and introduces a new type variable into scope. On an untyped machine, this is implemented by an ordinary move. The other non-standard instruction is malloc, which is explained below. Evaluation is specified as a deterministic rewriting system t h a t takes programs to programs (see Morrisett et al. 23 for details). The types for TAL consist of type variables, integers, tuple types, existential types, and polymorphic code types. Tuple types contain initialization flags (either 0 or 1) that indicate whether or not components have been initialized. For example, if register r has type (int ~ intl), then it contains a label bound in the heap to a pair that can contain integers, where the first component may not have been initialized, but the second component has. In this context, the type system allows the second component to be loaded, but not the first. If an integer value is stored into r(0) then afterwards r has the type (int 1, intl), reflecting the fact

33 that the first component is now initialized. T h e instruction malloc t i T 1 , . . . , ~'n heap-allocates a new tuple with uninitialized fields and places its label in register

r. TAL code types ( V a l , . . . , an.F) describe code blocks ( c o d e a l , . . . , anF.I), which are instruction sequences, that expect a register file of type F and in which the type variables a l , . . . , an are held abstract. In other words, F serves as a register file pre-condition that must hold before control may be transferred to the code block. Code blocks have no post-condition because control is either terminated via a h a l t instruction or transferred to another code block. The type variables that are abstracted in a code block provide a means to write polymorphic code sequences. For example, the polymorphic code block code~{rl:a, malloc st st mov jmp

r2:VD.{rl:(o 1, a l ) } } . r31o, o r3(O),rl r3(1),rl rl, r3 r2

roughly corresponds to a CPS version of the ML function f n (x:o) => (x, x). The block expects upon entry that register r 1 contains a value of the abstract type 0, and r2 contains a return address (or continuation label) of type V~.{rl : (a 1, ol)}. In other words, the return address requires register r l to contain an initialized pair of values of type a before control can be returned to this address. The instructions of the code block allocate a tuple, store into the tuple two copies of the value in r l , move the pointer to the tuple into r l and then jump to the return address in order to "return" the tuple to the caller. If the code block is bound to a label i, then it may be invoked by simultaneously instantiating the type variable and jumping to the label (e.g., jmp ~int). Source languages like ML have nested higher-order functions that might contain free variables and thus require closures to represent functions. At the TAL level, we represent closures as a pair consisting of a code block label and a pointer to an environment data structure. The type of the environment must be held abstract in order to avoid typing difficulties 21, and thus we pack the type of the environment and the pair to form an existential type. All functions, including continuation functions introduced during CPS conversion, are thus represented as existentials. For example, once CPS converted, a source function of type int-> () has type ( int, (0 -+ void)) -->void} After closures are introduced, the code will have type:

3 a 1 .((al, int, 30t2.((a2, ()) --~ void, a2)) -+ void, al) Finally, at the TAL level the function will be represented by a value with the type: 301. (V~.{r l : a l , r2:int, r3:3a2. (V~. {r 1:o2, r 2 : 0 } 1 , a~)} 1 , a t) t The void return types axe intended to suggest the non-returning aspect of CPS code.

34

Here, c~1 is the abstracted type of the closure's environment. T h e code for the closure requires t h a t the environment be passed in register r l , the integer argument in r2, and the continuation in r3. The continuation is itself a closure where (~2 is the abstracted type of its environment. The code for the continuation closure requires t h a t the environment be passed in r l and the unit result of the c o m p u t a t i o n in r2. To apply a closure at the TAL level, we first use the unpack operation to open the existential package. Then the code and the environment of the closure pair are loaded into appropriate registers, along with the argument to the function. Finally, we use a j u m p instruction to transfer control to the closure's code. Figure 1 gives the CPS-based TAL code for the following ML expression which computes six factorial: let

fun

fact

fact

n = if n = 0 t h e n

1 else

n

*

(fact(n

- I))

in

6

end

3

Adding

Stacks

to TAL

In this section, we show how to extend TAL to achieve a Stack-based T y p e d Assembly Language (STAL). Figure 3 defines the new syntactic constructs for the language. In what follows, we informally discuss the dynamic and static semantics for the modified language, leaving formal t r e a t m e n t to Appendix A.

types

T

stack types type assignments register assignments word values small values register files stacks instructions

a ::= p nil r::a A:: . . . . Ip, A F

: : ~ " "" I n 8

::-- {rl:Vl,...

w ::. . . . i w~

,rn:rn,sp:a} l ns

R ::= { n ~ w l , . . . , r ~ ~ w ~ , s p ~ S ::= nil I w::S

::. . . .

I ,~nor

n

I ,~r,.

n

S}

I - ld r~, ,p(i) t , , t sp(i),

~.

Fig. 3. Additions to TAL for Simple Stacks

Operationally, we model stacks (S) as lists of word-sized values. Uninitialized stack slots axe filled with nonsense (ns). Register files now include a distinguished register, sp, which represents the current stack. There are four new instructions t h a t manipulate the stack. The s a l l o c n instruction places n words of nonsense on the top of the stack. In a conventional machine, assuming stacks grow towards lower addresses, an s a l l o c instruction would correspond to subtracting n from the current value of the stack pointer. T h e s f r e e n instruction removes the

35 top n words from the stack, and corresponds to adding n to the current stack pointer. The s l d r, sp(i) instruction loads the ith word of the stack into register r, whereas the s s t sp(i), r stores register r into the ith word. Note, the instructions l d and s t cannot be used with the stack pointer. A p r o g r a m becomes s t u c k if it a t t e m p t s to execute: - s f r e e n and the stack does not contain at least n words, - s l d r, sp(i) and the stack does not contain at least i + 1 words or else the i th word of the stack is ns, or s s t sp(i), r and the stack does not contain at least i + 1 words. As in the original TAL, the typing rules for the modified language prevent well-formed programs from becoming stuck. Stacks are described by s t a c k t y p e s ( a ) , which include n i l and r : : a . The latter represents a stack of the form w : : S where w has type T and S has t y p e a. Stack slots filled with nonsense have type n s . Stack types also include stack type variables (p) which m a y be used to abstract the tail of a stack type. T h e ability to abstract stacks is critical for supporting procedure calls and is discussed in detail later. As before, the register file for the abstract machine is described by a register file type (F) m a p p i n g registers to types. H o w e v e r , / " also m a p s the distinguished register sp to a stack type a. Finally, code blocks and code types support polymorphic abstraction over both types and stack types. One of the uses of the stack is to save t e m p o r a r y values during a computation. The general problem is to save on the stack n registers, say rl through r n , of types T1 through rn, perform some computation e, and then restore the t e m p o r a r y values to their respective registers. This would be accomplished by the following instruction sequence where the comments (delimited by Y,) show the stack's type at each step of the computation.

salloc

sst

n sp(O),rl

~/, n $ : : n $ : :

" " 9 ::n3::a

Y. ~-l::ns:: 9- 9: : n s : : a

sst s p ( n - 1), r n Y. TI::T2:: 999::7"n::a code for e 7. TI ::T2:: " " " ::Tn::a sld r l , sp(0) Y. TI::T2:: 999::rn::a

sld

r n , sp(n

sfree

n

-

1)

T1::7"2::

9 9 9

::Tn::ff

Y,a

If, upon entry, ri has type Ti and the stack is described by a, and if the code for e leaves the state of the stack unchanged, then this code sequence is well-typed. Furthermore, the typing discipline does not place constraints on the order in which the stores or loads are performed. It is straightforward to model higher-level primitives, such as p u s h and pop. The former can be seen as simply s a l l o c 1 followed by a store to sp(0), whereas

36 the latter is a load from sp(0) followed by s f r e e 1. Also, a "jump-and-link" or "call" instruction which automatically moves the return address into a register or onto the stack can be synthesized from our primitives. To simplify the presentation, we did not include these instructions in STAL; a practical implementation, however, would need a full set of instructions appropriate to the architecture. The stack is commonly used to save the current return address, and temporary values across procedure calls. Which registers to save and in what order is usually specified by a compiler-specific calling convention. Here we consider a simple calling convention where it is assumed there is one integer argument and one unit result, both of which are passed in register r l , and the return address is passed in the register ra. When invoked, a procedure may choose to place temporaries on the stack as shown above, but when it jumps to the return address, the stack should be in the same state as it was upon entry. Naively, we might expect the code for a function obeying this calling convention to have the following STAL type:

VO.{rl:int , sp:a, r a : V 0 . { r l : 0 , sp:a}} Notice that the type of the return address is constrained so that the stack must have the same shape upon return as it had upon entry. Hence, if the procedure pushes any arguments onto the stack, it must pop them off. However, this typing is unsatisfactory for two reasons. The first problem is that there is nothing preventing the procedure from popping off values from the stack and then pushing new values (of the appropriate type) onto the stack. In other words, the caller's stack frame is not protected from the function's code. The second problem is much worse: such a function can only be invoked from states where the stack is exactly described by a. This effectively prevents invocation of the procedure from two different points in the program. For example, there is no way for the procedure to push its return address on the stack and jump to itself. The solution to both problems is to abstract the type of the stack using a stack type variable: Vp.{rl:int,

sp:p, ra:VH.{rl : int,-~p:p}}

To invoke a function with this type, the caller must instantiate the bound stack type variable p with the current type of the stack. As before, the function can only jump to the return address when the stack is in the same state as it was upon entry. However, the first problem above is addressed because the type checker treats p as an abstract stack type while checking the body of the code. Hence, the code cannot perform an s f r e e , sld, or s s t on the stack. It must first allocate its own space on the stack, only this space may be accessed by the function, and the space must be freed before returning to the caller. 2 The second problem is solved because the stack type variable may be instantiated Some intuition on this topic may be obtained from Reynolds' theorem on parametric polymorphism 26 but a formal proof is difficult.

37 in different ways. Hence multiple call sites with different stack states, including recursive calls, m a y now invoke the function. I n fact, a recursive call will usually instantiate the stack variable with a different t y p e t h a n the original call because, unless it is a tail call, it will need t o store its r e t u r n address on the stack.

(H, {sp ~-~ nil}, I) where H -- l _ f a c t : codep{rl : (),r2 : int, sp : p, r a : rp}. bneq r2,1_nonzerop 7. if n ----0 continue mov r l , 1 7. result is 1 jmp r a 7. return l_nonzero :

: (), r 2 : int, s p : p, r a : rp}.

r

sub r 3 , r 2 , 1 salloc 2

7. n - - 1

7. save n and return address to stack

sst sp(O),r2 sst sp(1) ,ra mov r2,r3 mov ra, l_contp imp l_factint::rp::p l_cont :

codep{rl : int, sp : int::rp::p}. sld r2,sp(0) sld ra,sp(1) sfree 2 mul rl,r2,rl jmp ra l_halt : code~{rl : int, sp : ~il}.

halt

and I =

1)

7. restore n and return address

7. result is n x fact(n - 1) 7. return

lint

malloc rl~ mov r2,6 mov ra, l_halt

imp

7. recursive call f a c t ( n -

7. environment 7, argument 7. return address for initial call

l_factnil

and rp ----VD.{rl : int, sp : p} Fig. 4. STAL Factorial Example

Figure 4 gives stack-based code for the factorial example of the previous section. T h e function is invoked by m o v i n g its e n v i r o n m e n t (an e m p t y tuple) into r l , the a r g u m e n t into r2, and the r e t u r n address label into r a a n d j u m p i n g to the label l _ f a c t . Notice t h a t the n o n z e r o b r a n c h m u s t save the a r g u m e n t and current r e t u r n address on the stack before j u m p i n g to the f a c t label in a

38 recursive call. It is interesting to note that the stack-based code is quite similar to the heap-based code of Figure 1. Indeed, the code remains in a continuationpassing style, but instead of passing the continuation as a heap-allocated tuple, the environment of the continuation is passed in the stack pointer and the code of the continuation is passed in the return address register. To more fully appreciate the correspondence, consider the type of the TAL version of l _ f a c t from Figure 1: V~.{r 1:0, r2:int, r3:qa.
r2:int} 1, (21)}

We could have used an alternative approach where we pass the components of the continuation closure unboxed in separate registers. To do so, the caller must unpack the continuation and the function must abstract the type of the continuation's environment resulting in a quantifier rotation:

Va.{r i:0, r2:int, r3:V~.{rl:a, r2:int}, r4:a} Now, it is clear that the STAL code, which has type

1:0, 2:im,

r :im}, ,p:p}

is essentially the same! Indeed, the only difference between a CPS-based compiler, such as SML/NJ, and a conventional stack-based compiler, is that for the latter, continuation environments are allocated on a stack. Our type system describes this well-known connection elegantly. Our techniques can be applied to other calling conventions and do not appear to inhibit most optimizations. For instance, tail calls can be eliminated in CPS simply by forwarding a continuation closure to the next function. If continuations are allocated on the stack, we have the mechanisms to pop the current activation frame off the stack and to push any arguments before performing the tail call. b'hrthermore, the type system is expressive enough to type this resetting and adjusting for any kind of tail call, not just a self tail call. As another example, some CISC-style conventions push the arguments, the environment, and then the return address on the stack, and return the result on the stack. With this convention, the factorial code would have type:

Vp.{-~p:VO{sp:int::p}::O::int::p } Callee-saves registers (registers whose values must be preserved across function calls) can be handled in the same fashion as the stack pointer. In particular, the function holds abstract the type of the callee-saves register and requires that the register have the same type upon return. For instance, if we wish to preserve register r3 across a call to factorial, we would use the type: Vp, a . ( r l : 0 ,

r2:int, r3:a, ra:Vl(sp:p, rl:int, r3:a}, sp:p}

Translating this type back in to a boxed, heap allocated closure, we obtain:

V~.{rl:O,

r 2 : int, r3:a, ra:3~.(VO{r 1:~, r2:int, r3:a} 1, ~1)}

39 This is the type of the callee-saves approach of Appel and Shao 4. Thus we see how our correspondence enables transformations developed for heap-based compilers to be used in traditional stack-based compilers and vice versa. The generalization to multiple callee-saves registers and other calling conventions should be clear. Indeed, we have found that the type system of STAL provides a concise way to declaratively specify a variety of calling conventions.

4

Exceptions

We now consider how to implement exceptions in STALL. We will find that a calling convention for function calls in the presence of exceptions may be derived from the heap-based CPS calling convention, just as was the case without exceptions. However, implementing this calling convention will require that the type system be made more expressive by adding compound stack types. This additional expressiveness will turn out to have uses beyond exceptions, allowing most compiler-introduced uses of pointers into the midst of the stack.

4.1

Exception Calling Conventions

In a heap-based CPS framework, exceptions are implemented by passing two continuations: the usual continuation and an exception continuation. Code raises an exception by jumping to the latter. For an integer to unit function, this calling convention is expressed as the following TAL type (ignoring the outer closure and environment): V .{rl:int,

r a : : l ~ 1 . (V . { r 1:o~1, r 2 : ( ) } 1 , ~ ) , re:3o~ 2 . (V . { r 1:o~2, r2: e x , } 1 , ~21)}

Again, the caller might unpack the continuations: VIOL1,o~2.{r l:int, ra:V .{r l:oq, r 2 : 0 } , ra':o~l, re:V .{r 1:G2, r2:exn}, re':G2 } Then the caller might (erroneously) attempt to place the continuation environments on stacks, as before: Vpl, p2.{r l:int, ra:V .{sp:pl, rl:()}, sp:pl, re:V .{ sp:p2, rl:exn}, sp':p2 } Unfortunately, this calling convention uses two stack pointers, and STAL has only o n e s t a c k . 3 Observe, though, that the exception continuation's stack is necessarily a tail of the ordinary continuation's stack. This observation leads to the following calling convention for exceptions with stacks:

re' :ptr(p2 ),

re:V .{sp:p2,

r l: exn } }

3 Some language implementations use a separate exception stack; with some minor modifications, this calling convention would be satisfactory for such implementations.

40 This type uses two new constructs we now add to STAL (see Figure 5). W h e n a l and tr2 are stack types, the stack type al o a2 is the result of appending the two types. Thus, in the above type, the function is presented with a stack with type Pl o P2, all of which is expected by the regular continuation, but only a tail of which (p2) is expected by the exception continuation. Since pl and P2 are quantified, the function m a y still be used for any stack so long as the exception continuation accepts some tail of t h a t stack. To raise an exception, the exception is placed in r l and the control is transfered to the exception continuation. This requires cutting the actual stack down to just t h a t expected by the exception continuation. Since the length of pl is unknown, this can not be done by s f r e e . Instead, a pointer to the desired position in the stack is supplied in r e ' , and is moved into sp. The type ptr(a) is the type of pointers into the stack at a position where the stack has type a. Such pointers are obtained simply by moving sp into a register.

4.2

Compound

Stacks

The add;tional syntax to support exceptions is summarized in Figure 5. T h e new type constructors were discussed above. T h e word value ptr(i) is used by the operational semantics to represent pointers into the stack; the element pointed to is i words from the b o t t o m of the stack. (See Figure 7 for details.) Of course, on a real machine, these would be implemented by actual pointers. The instructions m o v rd, sp and mov sp, rs save and restore the stack pointer, and the instructions s l d rd, rs(i) and s a t rd(i), rs allow for loading from and storing to pointers.

types stack types word values instructions

r ::. . . .

ptr(a)

a ::. . . . w ::. . . . L ::. . . .

I ptr(i) I mov rd, sp I mov sp, rs I s l d ra,r,(i) t s s t ra(i),r,

al

o if2

Fig. 5. Additions to TAL for Compound Stacks

The introduction of pointers into the stack raises a delicate issue for the typing system. When the stack pointer is copied into a register, changes to the stack are not reflected in the type of the copy, and can invalidate a pointer. Consider the following incorrect code: Y. begin with r l , sp sfree 1 salloc 1 s l d r2, r l ( 0 )

mov

sp : T::a, s p ~ w::S (T # ns) ~. r l : ptr(~-::a) Y, sp : tr, sp ~4 S Y, sp : ns::a, sp ~-~ ns::S 7, r 2 : T but r 2 ~-+ ns

41 When execution reaches the final line, r l still has type ptr(T::a), but this type is no longer consistent with the state of the stack; the pointer in r l points to ns. To prohibit erroneous loads of this sort, the type system requires that the pointer rs be valid in the instructions s l d rd, rs(i), sst rd(i), rs, and mov sp, rs. An invariant of our system is that the type of sp always describes the current stack, so using a pointer into the stack will be sound if that pointer's type is consistent With sp's type. Suppose sp has type al and r has type ptr(a2), then r is valid if a2 is a tail of al (formally, if there exists some a' such that al = a' oa2). If a pointer is invalid, it may be neither loaded from nor moved into the stack pointer. In the above example the load will be rejected because r l ' s type T::G is not a tail of sp's type, ns::a.

4.3

Using Compound Stacks

Recall the type for a function in the presence of exceptions: Vpl, P2-{ sp:pi o P2, r l : i n t , ra:V.{sp:pl

re':,tr(p2),re:V.{sp:p

o P2, r l : 0 } ,

,rl:exn}}

An exception may be raised within the body of such a function by restoring the handler's stack from r e ' and jumping to the handler. A new exception handler may be installed by copying the stack pointer to re' and making forthcoming function calls with the stack type variables instantiated to nil and Pl o P2. Calls that do not install new exception handlers would attach their frames to Pl and pass on p2 unchanged. Since exceptions are probably raised infrequently, an implementation could save a register by storing the exception continuation's code pointer on the stack, instead of in its own register. If this convention were used, functions would expect stacks with the type pl o (Thandler::P2) and exception pointers with the type ptr(Thandler::P2 ) where Thandler • V.{sp:p2, r l : e x n } . This last convention illustrates a use for compound stacks that goes beyond implementing exceptions. We have a general tool for locating data of type T amidst the stack by using the calling convention:

Vpl, p .{sp:pl o (r::p2), One application of this tool would be for implementing Pascal with displays. The primary limitation of this tool is that if more than one piece of data is stored amidst the stack, although quantification may be used to avoid specifying the precise locations of that data, function calling conventions would have to specify in what order data appears on the stack. It appears that this limitation could be removed by introducing a limited form of intersection type, but we have not yet explored the ramifications of this enhancement. 5

Related and Future

Work

Our work is partially inspired by Reynolds 27, which uses functor categories to "replace continuations by instruction sequences and store shapes by descriptions

42 of the structure of the run-time stack." However, Reynolds was primarily concerned with using functors to express an intermediate language of a semanticsbased compiler for Algol, whereas we are primarily concerned with type structure for general-purpose target languages. Stata and Abadi 30 formalize the Java bytecode verifier's treatment of subroutines by giving a type system for a subset of the Java Virtual Machine language. In particular, their type system ensures that for any program control point, the Java stack is of the same size each time that control point is reached during execution. Consequently, procedure call must be a primitive construct (which it is in JVML). In contrast, our treatment supports polymorphic stack recursion, and hence procedure calls can be encoded with existing assemblylanguage primitives. Tofte and others 8, 33 have developed an allocation strategy involving regions. Regions are lexically scoped containers that have a LIFO ordering on their lifetimes, much like the values on a stack. As in our approach, polymorphic recursion on abstracted region variables plays a critical role. However, unlike the objects in our stacks, regions are variable-sized, and objects need not be allocated into the region which was most recently created. Furthermore, there is only one allocation mechanism in Tofte's system (the stack of regions) and no need for a garbage collector. In contrast, STAL only allows allocation at the top of the stack and assumes a garbage collector for heap-allocated values. However, the type system for STAL is considerably simpler than the type system of Tofte et al., as it requires no effect information in types. Bailey and Davidson 6 also describe a specification language for modeling procedure calling conventions and checking that implementations respect these conventions. They are able to specify features such as a variable number of arguments that our formalism does not address. However, their model is explicitly tied to a stack-based calling convention and does not address features such as exception handlers. Furthermore, their approach does not integrate the specification of calling conventions with a general-purpose type system. Although our type system is sufficiently expressive for compilation of a number of source languages, it falls short in several areas. First, it cannot support general pointers into the stack because of the ordering requirements; nor can stack and heap pointers be unified so that a function taking a tuple argument can be passed either a heap-allocated or a stack-allocated tuple. Second, threads and advanced mechanisms for implementing first-class continuations such as the work by Hieb et al. 15 cannot be modeled in this system without adding new primitives. However, we claim that the framework presented here is a practical approach to compilation. To substantiate this claim, we are constructing a compiler called TALC that maps the KML language 10 to a variant of STAL described here, suitably adapted for the Intel IA32 architecture. We have found it straightforward to enrich the target language type system to include support for other type constructors, such as references, higher-order constructors, and recursive types. The compiler uses an unboxed stack allocation style of continuation passing.

43 Although we have discussed mechanisms for typing stacks at the assembly language level, our techniques generalize to other languages. The same mechanisms, including the use of polymorphic recursion to abstract the tall of a stack, can be used to introduce explicit stacks in higher level calculi. An intermediate language with explicit stacks would allow control over allocation at a point where more information is available to guide allocation decisions.

6

Summary

We have given a type system for a typed assembly language with both a heap and a stack. Our language is flexible enough to support the following compilation techniques: CPS using both heap allocation and stack allocation, a variety of procedure" calling conventions, displays, exceptions, tail call elimination, and callee-saves registers. A key contribution of the type system is that it makes procedure calling conventions explicit and provides a means of specifying and checking calling conventions that is grounded in language theory. The type system also makes clear the relationship between heap allocation and stack allocation of continuation closures, capturing both allocation strategies in one calculus.

References 1. Andrew W. Appel and Trevor Jim. Continuation-passing, closure-passing style. In Sixteenth A CM Symposium on Principles of Programming Languages, pages 293-302, Austin, January 1989. 2. Andrew W. Appel and David B. MacQueen. Standard ML of New Jersey. In Martin Wirsing, editor, Third International Symposium on Programming Language Implementation and Logic Programming, pages 1-13, New York, August 1991. SpringerVerlag. Volume 528 of Lecture Notes in Computer Science. 3. Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 4. Andrew Appel and Zhong Shao. Callee-saves registers in continuation-passing style. Lisp and Symbolic Computation, 5:189-219, 1992. 5. Andrew Appel and Zhong Shao. An empirical and analytic study of stack vs. heap cost for languages with clsoures. Journal of Functional Programming, 1(1), January 1993. 6. Mark Bailey and Jack Davidson. A formal model of procedure calling conventions. In Twenty-Second A CM Symposium on Principles of Programming Languages, pages 298-310, San Francisco, January 1995. 7. Lars Birkedal, Nick Rothwell, Mads Tofte, and David N. Turner. The ML Kit (version 1). Technical Report 93/14, Department of Computer Science, University of Copenhagen, 1993. 8. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In Twenty-Third A CM Symposium on Principles of Programming Languages, pages 171-183, St. Petersburg, January 1996. 9. Vai Breazu-Tannen, Thierry Coquand, Carl A. Gunter, and Andre Scedrov. Inheritance as implicit coercion. Information and Computation, 93:172-221, 1991.

44 10. Karl Crazy. KML Reference Manual. Department of Computer Science, Cornell University, 1996. 11. KarlCrary. Foundations for the implementation of higher-order subtyping. I n A C M SIGPLAN International Conference on Functional Programming, pages 125-135, Amsterdam, June 1997. 12. Allyn Dimock, Robert Muller, Franklyn Turbak, and J. B. Wells. Strongly typed flow-directed reprsentation transformations. In ACM SIGPLAN International Conference on Functional Programming, pages 11-24, Amsterdam, June 1997. 13. Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Twenty-First ACM Symposium on Principles of Pro9rammin 9 Languages, pages 1-14, January 1994. 14. Amer Diwan, David Tarditi, and Eliot Moss. Memory system performance of programs with intensive heap allocation. A CM Transactions on Computer Systems, 13(3):244-273, August 1995. 15. Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing control in the presence of first-class continuations. In A CM SIGPLAN Conference on Programming Language Design and Implementation, pages 66-77, June 1990. Published as SIGPLAN Notices, 25(6). 16. Intel Corporation. Intel Architecture Optimization Manual. Intel Corporation, P.O. Box 7641, Mt. Prospect, IL, 60056-7641, 1997. 17. David Kranz, R. Kelsey, J. Rees, P. R. Hudak, J. Philbin, and N. Adams. ORBIT: An optimizing compiler for Scheme. In Proceedings of the ACM SIGPLAN '86 Symposium on Compiler Construction, pages 219-233, June 1986. 18. P. J. Landin. The mechanical evaluation of expressions. Computer J., 6(4):308-20, 1964. 19. Xavier Leroy. Unboxed objects and polymorphic typing. In Nineteenth ACM Symposium on Principles of Programming Languages, pages 177-188, Albuquerque, January 1992. 20. Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. AddisonWesley, 1996. 21. Y. Minamide, G. Morrisett, and R. Harper. Typed closure conversion. In TwentyThird A CM Symposium on Principles of Programming Languages, pages 271-283, St. Petersburg, January 1996. 22. Gregory Morrisett. Compiling with Types. PhD thesis, Carnegie Mellon University, 1995. Published as CMU Technical Report CMU-CS-95-226. 23. Greg Morrisett, David Walker, Karl Crary, and Neal Clew. From System F to typed assembly language. In Twenty-Fifth ACM Symposium on Principles of Programming Languages, San Diego, January 1998. Extended version published as Cornell University technical report TR97-1651, November 1997. 24. G. Morrisett, D. Tarditi, P. Cheng, C. Stone, R. Harper, and P. Lee. The TIL/ML compiler: Performance and safety through types. In Workshop on Compiler Support for Systems Software, Tucson, February 1996. 25. Simon L. Peyton Jones, Cordelia V. Hall, Kevin Hammond, Will Partain, and Philip Wadler. The Glasgow Haskell compiler: a technical overview. In Proc. UK Joint Framework for Information Technology (JFIT) Technical Conference, July 1993. 26. John C. Reynolds. Types, abstraction and parametric polymorphism. In Information Processing '83, pages 513-523. North-Holland, 1983. Proceedings of the IFIP 9th World Computer Congress.

45 27. John Reynolds. Using functor categories to generate intermediate code. In TwentySecond ACM Symposium on Principles of Programming Languages, pages 25-36, San Francisco, January 1995. 28. Zhong Shao. Flexible representation analysis. In ACM SIGPLAN International Conference on Functional Programming, pages 85-98, Amsterdam, June 1997. 29. Z. Shao. An overview of the FLINT/ML compiler. In Workshop on Tgpes in Compilation, Amsterdam, June 1997. ACM SIGPLAN. Published as Boston College Computer Science Dept. Technical Report BCCS-97-03. 30. Raymie Stata and Martin Abadi. A type system for java bytecode subroutines. In Twenty-Fifth A CM Symposium on Principles of Programming Languages, San Diego, January 1998. 31. Guy L. Steele Jr. Rabbit: A compiler for Scheme. Master's thesis, MIT, 1978. 32. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 181-192, Philadelphia, May 1996. 33. Mads Tofte and Jean-Pierre Talpin. Implementation of the typed call-by-value Jrcalculus using a stack of regions. In Twenty-First A CM Symposium on Principles of Programming Languages, pages 188-201, January 1994. 34. Mitchell Wand. Continuation-based multiprocessing. In Proceedings of the 1980 LISP Conference, pages 19-28, August 1980.

A

Formal STAL Semantics

This appendix contains a complete technical description of our calculus, STAL. The STAL abstract machine is very similar to the TAL abstract machine (described in detail in Morrisett et al. 23). The syntax appears in Figure 6. The operational semantics is given as a deterministic rewriting system in Figure 7. The notation ab/c denotes capture avoiding substitution of b for c in a. The notation a(b ~-~ c} represents m a p update:

I {b ~-~ c, bl ~ c l , . . . , b n ~-~ cn}, if b r {bl,...,bn} {bl ~-~ cl, b2 ~-~ c2,..., bn ~ c~}{b ~ c} = (bl v-+ c, b2 v-+ c2,..., b,~ ~-~ cn}, if b = bl " To make the presentation simpler for the branching rules, some e x t r a notation is used for expressing sequences of type and stack type instantiations. We introduce a new syntactic class (r for type sequences:

r

r,r

The notation wr stands for the obvious iteration of instantiations; the substitution notation Ir is defined by:

S./. = S /r, e / a , A = Irlar so, r = Xolpr

46 The static semantics is similar to TAL's but requires extra judgments for definitional equality of types, stack types, and register file types and uses a more compositional style for instructions. Definitional equality is needed because two stack types (such as (int::nil) o (int::nil) and int::int::nil) may be syntactically different but represent the same type. The judgments are summarized in Figure 8, the rules for type judgments appear in Figure 9, and the rules for term judgments appear in Figures 10 and 11. The notation A', A denotes appending A' to the front of A, that is: .,A_--A

(~, ~'), ~ = ~, (~', ~) (p, ~ ' ) , ~ = p, (~', ~) As with TAL, STAL is type sound:

Proposition A1 (Type Soundness)

I f F- P a n d P i

>* P ' t h e n P ' is not

stuck.

This proposition is proved using the following two lemmas. L e m m a 1 ( S u b j e c t R e d u c t i o n ) . I f F- P and P ~ > P ' then F- P ' . A well-formed terminal state has the form (H, R { r l ~+ w}, h a l t v) where there exists a ~ such that t- H : 9 and ~;. b w : r wval. L e m m a 2 (Progress). I f F- P then either P is a well-formed terminal state or there exists P ' such that P ~ > P ' .

47

(',-t',... ,.,-g") I 3~.~-I ptr(o')

types

"r ::= o, I int I ns I vn.r'

stack types initialization flags label assignments type assignments register assignments

a ::= p I nil I r::a o"1 o 0"2 ~o ::= 0 1 k~ ::---- {~1:r1,... ,g,~:r,~} A ::=. I m A l p , A F : : : { r l : r l , . . . , rn:rn, sp:a}

registers word values small values heap values heaps register files stacks

r::=rl

instructions

h

...

I

Irk

::= e I i ns I ?~ I wd I ~~ I pack k, w ::= r I w I ' d I .~ I p~ck k , ' as ~' : : = (Wl,..., w,)lcodeAF.I

as v'

H ::= {gi ~4 h l , . . . , g ~ ~+ h~} R ::= {rl ~ W l , . . . , r n ~-+ wn, sp ~-+ S} S ::= nil I w::S aop rd, r., v I bop r, v Iid rd, r. (i) Imalloc r~

mov rd, v I mov sp, rs I mov rd, s p I s a l l o c aft,, n I sia r ~ , s p ( i ) I sid r d , r . ( i ) I

~t arithmetic ops branch ops instruction sequences programs

I ptr(i)

~v(i),r.I ~t rd(i),r.

unpack lot, rd, v aop ::=- add I sub I mul bop ::= beq I bneq b g t b l t I ::= ~;I I imp v I h a l t r P ::= ( H , R , I ) F i g . 6. S y n t a x of S T A L

.t rd(i),r~ I

I bgte I blte

n

I

48

(H, R, I ) ~-+ P where then P = '(H,R{rd ~-~ R(rs) + / ~ ( v ) } , I ' ) and similarly for mul and sub

if I = add

rd,rs,v;I'

beq r,v;I' '(H,R,I') when R(r) ~ 0 and similarly for bneq, b l t , etc. beq r, v; I ' '(H, R, I " r .... when R(r) = 0 where ~(v) ----gr and H(g) --- codeLlF.I" and similarly for bneq, b l t , etc. imp v '!(H, R, I'~b/A) .where/~(v) = gr and H(s = codeLlF.I' l d rd, rs(i); I ' (H, R{rd ~ wi}, I') where R(r~) = ~ and g ( g ) = ( w o , . . . , W ~ - l ) and 0 _< i <: n m a l l o r rdrl . . . . , T,; I"(H{g ~-~ (?T1, 999, ?Tn>}, R{rd w+ ~}, I ' ) where g r H mov rd, v; I' '.(H, R{rd ~ ~(v)}, I ' ) mov rd, sp; I' (H, R{rd ~ ptr(S)}, I ' ) mov sp, rs; I ' (H, R { s p ~ w j : : - - . ::Wl::nil}, I ' ) where R ( s p ) = w n : : - . . ::wl::nii and R(r,) = ptr(j) with 0 < j <: n s a l l o c n; I ' (H, R { s p ~-~ ns:: .~. : : n s : : R ( s p ) } , I ' ) s f r e e n; I '

'(H, R { s p ~-~ S}, I ~ where R ( s p ) = W l : : . . . ::wn::S (H,R{rd ~-~ w , } , I ' ) where R ( s p ) = w o : : - - . ::w,~_l::nil and 0 <_ i < n (H, R{rd ~ wj-i}, I ' ) where R(rs) = ptr(j) and R ( s p ) = wn::... ::wl::nil

s l d rd, sp(i);I' s l d rd, rs ($), I 9

isst

.

t

'

sp(i),rs; I'

s s t rd(i), r~,I" '

s t rd (i), rs ;I' unpack a, rd,V; I'

,,,

and 0_< i < j _< n '(H, R { s p t-~ W 0 : : " " " ::wi-x::R(7"8)::S},I t) where R ( s p ) = w o : : . . . ::wi::S and 0 < i '(H,R{sp ~-+ wn:: . ::wj-i+l::R(rs)::wj-i-l:: . . . .. ::wx::nil}, I') where R(rd) = ptr(j) and R ( s p ) = w , : : . . . ::wl::nil and 0 _< i < j _< n '(H{~ *-+ ( w 0 , . . . , wi-1, R ( r s ) , W i + l , . . . , Wn-1 )}, R, I ' ) where R(rd) = g and H(g) = (w 0,... , w ~ - l ) and 0 <_ i < n '(H,R{rd ~ w},I'v/c~) w h e r e / ~ ( v ) = pack v, w as r'

I R(r) W h e r e 9%/ x

ntv) =

w ~(v')T

pack 7,/~(v')

when when when as 7' when

v = r v = w v = v'r

v = pack % v' as -r'

F i g . 7'. O p e r a t i o n a l Semantics of STAL

49

Meaning r is a valid t y p e is a valid stack t y p e is a valid heap t y p e (no context is used because heap t y p e s m u s t be closed) F is a valid register file t y p e /1~-r A ~- T1 = 7-2 rl and r2 are equal types !al and a2 are equal stack t y p e s /"1 a n d / ' 2 are equal register file t y p e s /1 I'- F1 = F2 Vl is a s u b t y p e of v2 A ~- rl <_ r= /"1 is a register file s u b t y p e of F2 / 1 ~ FI < F2 ~- H : ~/' t h e heap H has t y p e ~' ~- S : a the stack S has t y p e a F- R : F t h e register file R has t y p e F ~- h : r hval the heap value h has t y p e r ~; A ~- w : r wval t h e word value w has t y p e r ~; A ~- w : r v fwval the word value w has flagged t y p e v v (i.e., w has t y p e r or w is ? r and ~o is 0) ~,; A; F F- v : r the small value v has t y p e r g,;/1; F F- e :=>/1'; F ' given a context of t y p e ~ ; / 1 ; F , t is a well formed instruction a n d produces a c o n t e x t of t y p e ~';/1'; F ' ~ ; / 1 ; F ~- I I is a valid sequence of instructions ~- P P is a valid p r o g r a m Judgement kl-r A~-a

F i g . 8. Static Semantics of S T A L (judgments)

50

A>r

/1>a

>~P

A > T = T A b r

/1>r

I

/1>0"=0" ~1ha

/1>o",=0"2

A > rI = r2

. > ri > {gl ~'+ r l , . . . , g n

A>F1

=/"2

/ 1 > r2 = r1

/1>rl

/ 1 > r , = r2

=r=

all-r,

LOt > 0.2 = 0., / 1 > a l =0.2

A I- 0.1 = 0"2

(~ 9

=ra /I > 0"2 = aa

=aa

/I > int = int

/1', / 1 > r ~ =1"2

/1 > V/1'.F1

/ 1 > r2 = r3

/1>al

/ 1 > oL = a

A>F=F A>F

~-~ r n }

/ 1 > r, = r"

= V/1'.F=

/ 1 > = (r~ ~1 , . . . , r , ' ~ " ' /

a , A > rl : r2

/1 > 61 = a2

/1 > B a . r l = B a . r 2 /1>

/1 > n s = n s

p = p

/1 > p t r ( a l )

(p 9

= ptr(0.2)

A > nil = nil

/1 > r l = r 2 /1>0.,=O"2 /1 > r l : : a , = r2::a, A>a

/1 > O'1 ---'-:O'~ A > a 2 = a ~ /1 > 0.1 o a , = a~ o 0.~ A>a

/1> nil o a = a A > a o n i l = 0. A>r A}-0., A>a2

a e (~::0.1) o 0.2 = ~::(0., o 0.2) /1 > 0"1

/1 > 0"2

/1 > 0.3

/1 > (0., o 0.~.) o 0"~ = 0", o (0", o 0"~) /1b

a = a'

Z~ > {SI:):U, r l ~ T 1 , . . , k>r,

/1>F1<

,rn

i--+

A > ri = r"

r~}

=

{sFa',r,:r~,...

,r~:<}

r2 A > r l = r2 A > r l < r2

/ 1 > r l <__r2 / 1 > r2 <_ ra A > r , <_ ra A>r~

/1 > (r~ 1, /1>a=a'

...

, C - ' ", - 1, r : , - ~',+1 '+l A>ri=ri

'

,

9 . . ,~Z~) <- - (r1~1,..

(forl
.,

Abrl

A > {sp:o,r,:rl,... ,rm:rm} < {sp:a',rl:rl,...

r,~l

,

r o, r i +~,+1 l ,.-.,rn

~}

(forn
,r,:r'}

F i g . 9. Static Semantics of STAL, J u d g m e n t s for T y p e s

(m _> n)

51

FH:~

~FS:a

~t-R:F F-H:~"

~F-R:/" ~;.;/"FI F- ( H , R , I )

}- ~ O F- hi : ri hval }- {el ~'+ h i , . . . , i n ~-~ h~} :

(~ = {el:r1,... ,tn:rn})

~ ; - F- w : r wval g ' F - S : a ~- nil : nil ~ F- w::S : r::a F- S : a O; . ~- wi : ri wval (for l < i < n )

~; A F w : r wval

~; A; F F v : r I

~;AFw:r~fwval

9 "; - I- wl : r ~ ~ f w v a l

,4 I- F

F ( w l , . . . , w,~) : ( v ~ ' , . . . , r ~ ~) hval

/ 1 F n
VA'.Vr/~

A I- r

(1~1=

~;/1

i)

/1 ~- a

V/1'./"'r/a

/11-r

wval

wval

~I'; A I- ns : ns wval AI-r I- ? r : r ~ fwval

~ ; / 1 ; V ~ r : r (V(r) = r )

~;/1; s F v : Va,/1'.s

~ ; / 1 ; s I- v r :

V/1'.V./p

wval

Aka ~; A F- p t r ( i ) : p t r ( a ) wval

/1 ~- r

hval

~; A F- w : Vp, A ' . F

~;/1 e ~~:

~; A I- pack Jr, w as 9 a . r ' : 9 a . r ' wval

k~;/1 I- w : r wval ~ ; / 1 ~ w : r ~ fwvaJ

VA./"

g,;AFi:intwval

A t- a

wval

k~; A b w : r'r/a

~ ; A; F I- I

~' F- c o d e A / " . / :

(~(t) = n)

~; A t- w : ga, A ' . / " wval

~;/1 ~ ~r:

( m >_ n)

~ ; / 1 ; r f- v : Vp,/1'.s

~ ; / 1 ; V I- v a :

~;/1;/"l-v:

k~;/1 I- w : r wval ~;/1; V ~ w : r

V/1'.F'a/p

r'r/a

~; /1; F I- pack r, v as Ba.r' : Ba.r' 9F - r l = r =

/1Frl=ra O;/1F-w:r2wval ~ ; / 1 ~- w : rl wval /1F-rl=r2 ! P ; / 1 ; F F- v : r2 O;A;FFv:n

k~F-h:r2hval I- h : r l hval

~I,; A; F F- I ~;A;FF~A';F'

~;A';F'F-I

AbF1

~ ; / 1 ; F I- ~; I

A~-r

~ ; / 1 ; 1"1 I- v : V.T'2 @;/%;/"1 F jmp v

~; A ; F ~- rl : T

@; A; s ~- halt r F i g . 10. S T A L Static Semantics, T e r m C o n s t r u c t s e x c e p t Instructions

52

I~;/1; F i-L ~/1'; FI ~; A; F f- r8 : int

k~; A; F F- v : int

~; A; F t- aop ra, rs, v =~ A; F{ra:in~} ~;A;Fl~-r:int k~;A;Flt-v:V.F: At-FI
: (T~o,...,~,-l\~_l /

~,; A; F F- l d rd, rs(i) ~ /1; F{rd:Ti} ( ~ = 1 A 0 < i < n) A~-n

~; a ; r ~ m~1or d n , . . .

,~

~ a; r{~:(T~

T~

9 ; / 1 ; F f- v : T k~;/1; F ~- mot rd, v =r

F{rd:T}

k~;/I; F I- mot rd, sp :=~ A ; F { r d : p t r ( a ) }

( F ( s p ) -- 0,)

~; A; F F- r8 : ptr(a2) A }- (rl - - - 0,3 o a2 ~ ; A ; F F-mOT sp, r , =~ A;F{sp:a2}

(F(sp) = al)

/t A I--

~; a ; g

~p;/1; V

0"1

TO:: "'" ::Tn--1::0,2

=

~- s f r e a

n ~

A;P{sp:a2}

( P ( s p ) = a~)

A ~ al = ro::'"::ri::a2 (E'(sp) = a l A 0 _< i) f- s l d rd, sp(i) =~ /1; V{ra:vi}

kP;/1; F F- r~ : ptr(a3) /1 I- O"1 /! ~-- 6r3 ~--.TO:: ' ' ' ::Ti::0,4

:

0"20

0"3

m;/1; r ~ ,~d ~,, ~,(i) ~ / 1 ; r { ~ : T , }

( r ( , ~ ) = ol ^ o < i)

/1 ~- a~ = To::. " 9 ::vi::0,2 ~ ; / 1 ; F F- r~ : v ~ ; / 1 ; F f- s s t sp(i), r~ ~ A; F{sp:ro::...::Ti-l::r::a~}

~'; /1; F f- ra : ptr(aa) / 1 ~- 0,1 :

0 " 2 0 0" 3

( F ( s p ) = a~ A 0 _< i)

~; /1; F f- r~ : v

Z~ ~ 0,3

=

TO:: 9 " 9 ::T{::a4

::Yi_l::T::0,4 ~ ; / 1 ; F t- s s t ra(i), rs :=~ /1; F{sp:0"~ o as, rd:ptr(a5)} ( F ( s p ) : vh A 0 _< i) / 1 ~- 0" 5 ~

TO::

k ~ ; / 1 ; F t- rd : ( V ~ ~

9

. .

~ , -~1 ,} ,Vn_

~; ,a; r ~ .,t r~(i), rs ~ / 1 ; r{r~:(To~~ 9

~';/1;Ft-r~:vi

T:'7'_ , T:, T,+~"+', 9..,T,_1~-1 ,..

~p; A; F b v : 3c~.T k~; A; F ~- u n p a c k a, rd, v ~ ~, A; F { r d : r } F i g . 11. S T A L S t a t i c S e m a n t i c s , I n s t r u c t i o n s

(o < i <

~)

H o w G e n e r i c is a G e n e r i c B a c k End? U s i n g M L R I S C as a B a c k E n d for t h e TIL Compiler* (Preliminary Report) Andrew Bernard**, Robert Harper, and Peter Lee School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

A b s t r a c t . We describe the integration of MLRISC, a "generic" compiler back end, with TIL, a type-directed compiler for Standard ML. The TIL run-time system uses a form of type information to enable partially tag-free garbage collection. We show how we propagate this information through the final phases of the compiler, even though the back end is unaware of the existence of this information. Additionally, we identify the characteristics of MLRISC that enable us to use it with TIL and suggest ways in which it might better support our compiler. Preliminary performance measurements show that we pay a significant cost for using MLRISC, relative to a custom back end.

1

Introduction

We describe how we integrated MLRISC, a "generic" compiler back end, with TIL, a type-directed compiler for the Standard ML (SML) programming language. A type-directed compiler uses variable type information to guide successive translations between intermediate languages 13. Type-directed compilers rely on complete variable type information for most or all phases of c o m p i l a t i o n - thus, types are preserved by intermediate code transformations during each phase. A generic compiler back end translates a low-level intermediate language * This research was sponsored in part by the Advanced Research Projects Agency CSTO under the title "The Fox Project: Advanced Languages for Systems Software", ARPA Order No. C533, issued by ESC/ENS under Contract No. F19628-95-C-0050, and in part by the National Science Foundation under Grant No. CCR-9502674. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, the National Science Foundation, or the U.S. Government. ** This material is based on work supported under under a National Science Foundation Graduate Fellowship. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

54 into machine code--the intention is that the back end does not depend on a particular source language or front-end implementation technology. A compiler back end could itself be type directed--both Typed Assembly Language 12 and Proof-Carrying Code 14 15 encode variable type information at the assemblylanguage level--although this is not common practice, and, as a matter of fact, MLRISC is not type directed. TIL translates a source program through a succession of typed intermediate languages until it arrives at conventional assembly code. The typed intermediate languages used in TIL specify types explicitly at all variable binding sites. Thus, TIL can always determine the type of a variable without having to resort to type inference. The universal availability of type information permits TIL to check types after any phase of compilation--this helps to ensure the correctness of compiler optimizations during development. Type information also allows TIL to perform additional optimizations that are not directly available to conventional compilers 13 10. A principal benefit of type-directed compilation is that it facilitates sound tag-free garbage collection 18. Sound garbage collection requires that the heap pointers used by an executing program be identified unambiguously. Tag-free garbage collection permits these heap pointers to be identified without perturbing the run-time representations of values. In TIL's run-time model, word-sized values (e.g., integers and heap pointers) are not tagged, whereas composite values contain tags for their constituent locations. The location of a word-sized value is tagged instead of the value itself-this "out-of-band" tagging scheme allows a single location tag, or trace value, to identify many different run-time values. A trace table is a static encoding of trace values either for a given procedure activation or for a set of static storage locations. A procedure activation requires trace values for the machine registers used by the procedure as well as its stack frame slots. A complete set of trace tables can be synthesized at compile time for a program because all the possible procedure activation shapes can be statically determined. RTL (Register Transfer Language) is the lowest-level intermediate language used in TIL. A back end for TIL thus translates RTL to assembly language. RTL is an imperative language that resembles the instruction set of a RISC processor, but it also provides complex primitives that are tailored to SML. An RTL pseudo register identifies a procedure-local, word-sized storage location: pseudo registers are mapped to machine registers and stack slots by the back end. RTL is not a typed intermediate language, but RTL does annotate pseudo registers with trace values; these are similar to, but distinct from, run-time trace values (the latter are a translation of the former). There are actually two versions of the TIL compiler: TILl 17 13 is the first-generation compiler; its successor is called TILT. We will refer to the TIL compiler when the discussion applies to either compiler interchangeably. Both compilers share a common RTL language: the back end of TILl translates RTL directly to assembly language. This back end operates on RTL itself and explicitly propagates trace values through the spilling and register allocation phases.

55 As only the trace value component of this back end was customized for T I L l , it largely duplicates other work. MLRISC 1, on the other hand, is a compiler back end implemented in SML that transforms an abstract intermediate language into the assembly language for a particular processor architecture. Taking RTL, MLRISC, and the TIL runtime system as given, our task is to transform RTL code into MLRISC code, and to make the object code generated by MLRISC compatible with the TIL run-time system. We impose an additional constraint on our implementation: we may not customize the interface of MLRISC specifically for TIL, as this would make it necessary to track such customizations across new versions of MLRISC. As MLRISC does not propagate trace information, we cannot use it as a "drop in" replacement for the T I L l back e n d - - w e must have an additional mechanism that derives run-time trace values from RTL trace values. Run-time trace values are encoded with references to machine register numbers and stack slots. This means that our mechanism must operate in concert with MLRISC, because the global register allocation and spilling phases of MLRISC assign these locations. This is the principal difficulty addressed by our work: how do we translate abstract trace values to concrete trace values "in parallel" with the abstract-toconcrete code translation performed by MLRISC? Note that significant correctness questions are inevitably raised by the specification of such a translation, because trace values represent invariants--much in the same way that types represent invariants--that may be perturbed by the back end. Trace tables betray an implicit expectation by the run-time system that the object code produced by the back end will (loosely) reflect the original type structure of the program. As the back end is presented with no explicit type structure, how can we expect its code transformations to respect such a type structure? Another contribution of our work is that we describe the implicit constraints that we expect MLRISC to satisfy to ensure the soundness of the trace value translation. In the remainder of this paper, we focus on how TIL communicates trace information to the run-time system by way of MLRISC. In Section 2 we detail the compiler and run-time system, whereas in Section 3 we present MLRISC. We discuss in Section 4 the techniques we use to marry MLRISC to RTL and the TIL run-time system, and Section 5 is an assessment of our experience. In Section 6 we propose improvements to the current implementation, and in Section 7 we draw together summarizing conclusions.

2

TIL

The TIL compiler is characterized by its aggressive use of type information. TIL compiles programs written in the Standard ML '97 programming language to DEC Alpha assembly language. All transformations (i.e., compiler phases) 1 We chose to use MLRISC as a back end for our compiler because our research does not directly address back end implementation technology. With MLRISC, we hope to leverage the work of a larger group of researchers.

56 in TIL are based on explicitly typed intermediate languages. However, RTL, the lowest-level intermediate language used in TIL, is not typed in the same sense that the other intermediate languages are typed. RTL pseudo registers are tagged with trace values that represent a degenerated form of type information that is tailored to the run-time system. Figure 1 is a depiction of the intermediate languages used in TILT; see Morrisett 13 for a description of the intermediate languages used in T I L l . SML ~ Abstract Syntax ~ HIL ~" MIL ~ RTL ~ MLRISC ~ Assembly Language Fig. 1. Intermediate Languages in TILT

2.1

HIL and MIL

The TILT elaborator translates programs from abstract syntax to HIL (Highlevel Intermediate Language). HIL is an explicitly-typed refinement of the SML programming language, including the module system; a detailed discussion of HIL is beyond the scope of this paper (see Harper and Stone 11 for further details). HIL is translated to MIL (Mid-level Intermediate Language) by the phase splitter, which is responsible for eliminating modules and breaking abstraction barriers. MIL is a lower-level, explicitly-typed, polymorphic intermediate language that does not provide modules. 2.2

RTL

TILT translates MIL to RTL (Register Transfer Language) after performing closure conversion, determining data representations, and making heap allocation explicit, among other things. RTL resembles an abstract assembly language in which there are an unbounded supply of local pseudo registers for each procedure. Pseudo registers are identified by positive integers and are automatically mapped to machine registers and stack slots by the back end. Each pseudo register is annotated with a trace value that classifies the kinds of values that the register can contain---one can think of trace values as degenerated type information that is present only for the benefit of the run-time system. RTL trace values are derived directly from MIL variable types. Figure 3 contains one possible RTL translation 2 of the SML function in Figure 23. Pseudo registers are given names (e.g., x) in this example to clarify 2 This is not the actual RTL code currently produced by TILT for this function: it has been simplified by hand to clarify its correspondence to the original SML code. This correspondence is obscured by the poor RTL code that TILT currently generates. 3 This function was not written to compute anything interesting. Rather, the particular intermediate code it translates to helps to illustrate points later in this paper.

57 the presentation; in an actual RTL program, pseudo registers are identified by positive integers. Variables introduced by the compiler are prefixed by an underscore. Pseudo register trace values are written following the pseudo-register name in parentheses--trace values encode the "traceability" of a pseudo register, perhaps by projecting the contents of another pseudo register (Table 1). Trace values identify pointers into the heap to the run-time system--we explain the role of run-time trace values in Section 2.3. Section 2.3 also contains a discussion of type environments, of which _ t e n v l ( t r a c e ) is one example.

fun f(x, n: int, I: int list, 12: int list) = g(x, n, if length I>0 then hd 1 else I)

Fig. 2. An SML Function to Take the Head of a List

A pointer into the heap An integer A pointer to machine code A floating-point number A pointer to data, but not into the heap compute path May be a pointer into the heap: path is an expression that evaluates (at run time) to the actual trace value unset Uninitialized A pointer into the middle of an item in the heap (cannot be traced) locative trace notrace_int notrace_code notrace_real label

Table 1. Trace Values for an RTL Pseudo Register

On entry to the body of the procedure, the argument pseudo registers listed in the procedure header contain the values of the actual arguments passed to the procedure. Similarly, on exit from the body, the result pseudo register listed in the procedure header will be copied into the actual machine-level result register, if necessary. The arguments and results of each procedure call are simply listed in order, as RTL uses an implicit calling convention. When a given pseudo register needs to be moved to/from a specific argument/result machine register according to a particular calling convention, this code is generated as part of the call/entry/exit sequence.

2.3

The Run-Time System

R u n - t l m e T y p e I n f o r m a t i o n Certain benefits arise from the use of typed intermediate languages: for example, types can be checked after compiler passes to help ensure correctness. Additionally, type-based information can be used at

58

procedure f: arguments = _tenvl(trace), x(compute _tenvl(trace).O), n(notrace_int), l(trace), 12(trace) results = _t3(notrace_int)

; argmnents to f

; result of f

( call

bcndi2 call

br

"length" arguments = I (grace) results = _t I (notrace_int) le, _tl(notrace_int), O, _L1 "hd" arguments = l(trace) results = _t2 (notrace_int) _L2

; _tl <- length(i) ; (call #I) ; if _tl<=O goto _LI ; _ t 2 <- h d ( 1 ) ; ( c a l l #2) ; goto

_L2

_LI: mv

_L2: call

1, _ t 2 ( n o t r a c e _ i n t )

tl~lt arguments = _tenvl(trace), x(compute _tenvl(trace).O), n(notrace_int), _t2(notrace_int) results = _t3(notrace_int)

; _t2 <- 1 ; _ t 3 <- g ( x , n, _ t 2 ) ; ( c a l l #3)

F i g . 3. A Translation of the code in Figure 2 to R T L

59 run time for dynamic type dispatch and tag-free garbage collection13. In TIL, the main ramification for the back end is that the run-time system uses a simple form of type information to reclaim storage. The TIL run-time system uses a tracing, copying garbage collector to reclaim unused values in the heap. When the garbage collector is invoked, it must determine the locations of all the heap pointers that are in use so that it does not reclaim accessible memory. The garbage collector is said to trace these pointers to determine the layout of the objects they address, and to copy these objects to new locations. Traceable pointers may reside in registers, on the stack, in the data segment, or in the heap. However, these locations may also contain a variety of non-pointer values (e.g., word-sized integers) that cannot be safely traced. The "traceability" of a given location is determined by its type: trace values, which are derived directly from types, specify to the garbage collector which locations should be traced. Machine registers and stack frame slots are tagged with trace values according to a static table that is indexed by the return address of an active procedure. Static (i.e., global) storage locations are tagged by a corresponding set of static tables. The header of a heap value contains trace values for its slots. Tagging locations is potentially more efficient than tagging values because a single location tag can be shared for many values. Note that the usual run-time model for garbage collection is not tag free. A typical implementation of tagged garbage collection makes the representations of heap pointers and non-heap pointers disjoint by encoding "tags" in the loworder bits of each word. This approach introduces extra overhead, constrains the range of representable values, and complicates interoperability with other languages. Part of the purpose of TIL is to determine whether these pitfalls can be avoided in a garbage-collected programming language implementation.

Representing Type Information At run time, vestigial type information is represented as type environments and trace tables. Type environments supply type information for variables whose types cannot be resolved at compile time (e.g., polymorphic variables). For example, in Figure 2, the type of x is polymorphic and thus cannot be statically determined by the compiler--x could take the value 3, " t h r e e " , 1, 1, 1, or any of a number of values that have distinct representations at run time. A type environment for f has the caller pass an explicit representation of x's type so that both the run-time system and f can operate on it 18 10. Type environments are needed only for functions such as f where complete type information is not available at compile time. A type environment is a record of values that encode properties of types that are important to the run-time system. Note that type environments are unlike the explicit value descriptors used in many other language implementations, in that type environments are constructed only in contexts where they are specifically needed, as opposed to being an integral part of every value. Trace tables map machine registers, stack slots, and static locations to trace values. A trace table gives a value of yes to those locations that are known to contain pointers into the heap. Other trace values allow the status of a location

60 to depend on a type environment or on the dynamic caller's trace table: the trace values that can be attached to a storage location by a trace table are documented in Table 2. This table resembles Table 1 because run-time trace values are derived from the corresponding RTL trace values. By contrast, objects in the heap have special headers that specify trace values for locations in the object.

Contains a pointer into the heap Does not contain a pointer into the heap callee id Contains the saved value of a callee-save register: id identifies a machine register in the dynamic caller's activation whose trace value should be used for this location. s t a c k offset, index A polymorphic location: offset is the offset in the current stack frame of a pointer to a type environment that contains the trace value of this location at index index. global label, index A polymorphic location: label is the label of a type environment in the data segment that contains the trace value of this location at index index. unset Uninitialized impossible Contains a heap pointer, but cannot be traced yes

no

Table 2. Trace Values for a Trace Table Storage Location

Locating

P o i n t e r s Trace tables are consulted by the run-time system only when the garbage collector is invoked. This invocation takes the form of a library call from an active procedure that is unable to allocate storage. At this time, the collector must locate and trace all pointers into the heap: this process is nontrivial because pointers can potentially reside in any machine register, stack location, or static variable, and because the pointers themselves contain no identifying information (e.g., tags). For static variables, we create a table in a known location that points to the trace tables for all static regions. Tracing the stack is more difficult because the stack depends on the dynamic behavior of the program. However, there are only a statically computable number of distinct activations, each of which can be determined by the compiler. Thus, we index a static table for each activation record according to the return address of a call site in the corresponding procedure. Because the collector is invoked via a procedure call, we simply use the return address in its activation record to locate the trace table of the most recent stack frame. Each trace table includes the size of its stack frame, so we can use these offsets to "walk up" the stack. This means that we need to generate a trace table for each direct call to the collector and for each call to another procedure that might indirectly call the collector 4. At collection time, a callee-save register initialized by an active procedure may have had its value saved on the stack by another procedure, or the original 4 In practice, we simply assume that all procedures might indirectly call the collector.

61

value may be left intact. The trace table of the most recent procedure activation holds the correct trace values for the machine register file at the time the collector is invoked. If any callee-save registers are not allocated by the most recently called procedure, then their trace values are determined according to the trace table of the next most recently called procedure: this is the function of the c a l l e e trace value. A c a l l e e trace value is also possible when a callee-save register is saved to the stack (and the register is presumably overwritten)--in this case, the proper stack location is given the c a l l e e trace value and the trace table of the next most recent activation record is consulted to determine the status of the stack location. This process can continue as long as the trace value of a location is specified as callee. E x a m p l e In Figure 4, we show a possible DEC Alpha Assembly Language translation of the example function of Figure 2, whereas in Figure 5 we document the registers used in Figure 4. $_tenv, $x, $n, $1, and $12 are assigned to calleesave temporaries in this procedure (e.g., $11, $12, etc. according to the standard calling convention). $ _ t e n v l contains the type environment for the function. The procedure begins by allocating a stack frame of size 32 and saving the return address and the callee-save registers on the stack. The arguments to the procedure are then moved from registers defined by the calling convention into the corresponding callee-save temporaries. Next, a call is made to l e n g t h with the value of 1 as an argument (the standard calling convention requires all calls to jump through Spy). The result of the length call is then compared against zero and a branch is taken to _L1 if it is not strictly positive. Assuming the branch is not taken, a call is made to hd and the result is saved in $t2; otherwise $ t 2 gets the value 1. The two control paths next converge at a call to g with arguments x, n, and $ t 2 - - t h e result of this call becomes the result of f after it restores the caller's register file from the stack frame. The ldgp instructions are a peculiarity of the Alpha standard calling convention. The important things to notice in Figure 4 are the three call sites (commented ( c a l l #n)) for which we must construct trace tables. These trace tables must correctly identify the trace status of values in the register file and the local stack frame at the time of the corresponding call. In Figure 6, we show a trace table for call # 2 - - n o t i c e that the trace values of stack slots saving callee-save registers depend on the dynamic caller's trace table (these slots are given trace status callee n). 3

MLRISC

MLRISC 8 is a generic compiler back end developed by Lal George at Bell Laboratories. MLRISC is "generic" in the sense that it can be used to compile many different programming languages. The interface language to MLRISC, also called "MLRISC", is essentially an architecture-independent assembly language: MLRISC is thus suited to compiling programming languages for which a translation to assembly language is feasible; to date, MLRISC has been used to compile SML

62

f:

ldgp subl stl

$gp, 0(Spy) $sp, 32, $sp Sra, 0($sp)

stl stl stl

$_tenv, 8($sp) $x, 12($sp) $n, 16($sp)

stl

$1,

stl

$12, 24($sp) $argO, $_tenv

mov mov mov mov mov stl

mov

; set global pointer ;alloc frame

; save return address ; save callee save

20($sp)

; get arguments

$argl, Sx $arg2, $n $arg3, $i Sarg4, $12 $_tenv, 28($sp) ; save type environment $1, $argO ; $ti <- length(I)

ida

$pv, length

jsr ldgp

$ra, ($pv) $gp, 0($ra)

cmple bne

$res, O, StO StO, _L1

; if $t1<=0 goto _L1

mov ida jsr

$1, Sarg0

; $t2 <- hd(1)

SpY, hd

idgp mov br _LI : ida _L2: mov mov mov

mov ida

jsr ldgp ldl ldl idl

ldl ldl ldl addl jmp

; (call #1)

; set global pointer

Sra, (SPY)

; (call #2)

$gp, 0($ra)

; set global pointer

$res, $t2 Szero, _L2

; goto _L2

$t2, 1 $_tenv, SargO

; $t3 <- g(x, n, $t2)

$t2 <- 1

Sx, Sargl $n, $arg2 $t2, Sarg3 Spy, g

$ra, ($pv) $gp, O($ra) $ra, O($sp) $_tenv, 8($sp) $x, 12($sp) Sn, 16($sp) $1, 20($sp) $12, 24($sp) Ssp, 32, Ssp Szero, ($ra)

; ( c a l l #3) ; set global pointer ; restore return address ; restore callee save

; dealloc frame ; return

Fig. 4. A Translation of the SML function in Figure 2 to DEC Alpha Assembly Language

63 Ssp Stack pointer $pv Call address Sra Return address

Sargn $res $tn Szero

Argument n Result Caller-savetemporary n Always zero

Fig. 5. Registers Used in Figure 4 $_tenv Sx

yes

$n

no

8($sp)

callee callee callee callee callee yes

12($sp) 16($sp) 20($sp) 24($sp) 28($sp)

stack 28, 0

$_tenv Sx $n $1 $12

Always trace Trace according to $_tenv Never trace Use dynamic caller's Use dynamic caller's Use dynamic caller's Use dynamic caller's Use dynamic caller's Always trace

trace trace trace trace trace

value for $_tenv value for $x value for Sn value for $1 value for $12

Fig. 6. A Trace Table for Call #2 of Figure 4

and Tiger 3. Our compiler differs from other compilers using MLRISC 4 2 3, however, in that TIL does not use dynamic tag bits to distinguish heap pointers from other word-sized values. In MLRISC, as in RTL, local storage locations are identified by numbered pseudo registers. Pseudo registers are transparently mapped to machine registers or spilled to the stack by MLRISC. Pseudo registers in MLRISC, however, carry no trace values or other type information; there are distinct classes of integer, floating-point, and condition-code pseudo registers, but an integer pseudo register that happens to be used as a heap pointer is not distinguished in any way. The principal challenge in integrating MLRISC with TIL, then, is to propagate pseudo-register trace values to the run-time system in the form of run-time trace values. Because pseudo registers are transformed into machine registers and stack slots, trace values for these locations will be based on the code transformations performed by MLRISC (e.g., register allocation, spilling). In Figure 7, we show the SML function in Figure 2 as it might be translated to MLRISC. Pseudo registers are given names (e.g., x) in this example to clarify the presentation; in an actual MLRISC program, pseudo registers are identified by positive integers (e.g., 500). Machine registers are referred to by a small positive integer (e.g., 16) and can be used interchangeably with pseudo registers in MLRISC code. Names generated by the compiler are prefixed by an underscore; _tenvl contains the type environment (see Section 2.3) for the function and c s l through cs5 are used to hold the saved values of the callee-save registers. Following the Alpha standard calling convention, this code uses machine registers 16 through 20 to hold arguments and register 0 to hold the result; in MLRISC, unlike RTL, calling conventions are explicitly specified in terms of primitive op-

64 erations. An MLRISC procedure is a sequence of imperative s t a t e m e n t s , each of which m a y refer to applicative expressions; the terms "statement" and "expression" have the normal connotations of p r o g r a m m i n g language terminology. In Table 3 and Table 4, we document the M L R I S C constructs used in this example. Expressions can be nested to an a r b i t r a r y depth, so, in general, a single statement can generate m a n y assembly language instructions.

bcc cexp, label

Branch to label label if the result of evaluating conditional expression cexp is true. c a l l addr Call the procedure at the address formed by evaluating expression addr. copy dst, src Copy the registers listed in src into the corresponding registers listed in dst; this is a "parallel" operation: no register can appear more than once in the union of src and dst. copy statements are coalesced 9 by MLRISC whenever possible. my dst, exp Move the result of evaluating expression exp into register dst. jmp addr Jump to the code at the address formed by evaluating expression addr. ret Return from the current procedure. store32 addr, exp Store the result of evaluating expression exp as a 32-bit value at the address formed by evaluating expression addr. T a b l e 3. Selected MLRISC Statements

4

Techniques

This section discusses the translation techniques we use to integrate MLRISC with TIL. In Section 4.1 we touch on the technology t h a t translates RTL code to MLRISC code, whereas in Section 4.2 we outline how we construct trace tables for MLRISC from RTL trace values. Finally, in Section 4.3 we justify the correctness of trace values for translated code. 4.1

From RTL to MLRISC

Translating RTL "instructions" to MLRISC "statements" is relatively straightf o r w a r d - - t h e principal difficulty lies in generating efficient code for conditional branches. RTL provides two forms of conditional branch instruction: one t h a t compares a pseudo register against zero, and one t h a t compares two pseudo registers. The current translation from MIL to R T L favors the former kind of branch, even for comparisons between two pseudo registers. It does this by storing the boolean result of each comparison in a third pseudo register and then testing the third pseudo register against zero. Although this idiom matches the use of conditionals in certain RISC architectures (e.g., the Alpha), it cannot

65

f: gp, reg pv mv sp, sub(reg sp, const frame) mv store32 add(reg sp, li 0), reg ra copy c s l , cs2, cs3, cs4, c s 5 ,

; set global pointer ; a l l o c frame ; save return address

11, 12, 13, 14, 15 _ t e n v l , x, n, 1, 12, 16, 17, 18, 19, 20 store32 ad d ( r eg sp, const _ t e n v l _ o f f s e t ) , reg _tenvl copy 16, Eli mv pv, label "length" call reg pv

; save callee save

copy

my copy bcc

copy mv call mv

gp, reg pv

_tl,

; g e t arguments

; save type environment ; _tl <- length(i)

; (call

#1)

; set global pointer

o

cmp(le, reg _tl, Ii 0), _L1 16, i By, label "hd" reg pv gp, r e g pv

; if _tl<=O goto _L1 ; _t2 <- h d ( 1 )

; (call #2) ; set global pointer

copy

I_t2,

3mp

label _L2

; g o t o _L2

_t2,

; _ t 2 <- 1

_LI: my _L2: copy mv

call mv copy copy mv copy mv ret

o

li 1

16, 17, 18, 19, _ t e n v l , x, n, _t2 pv, l a b e l "g" r e g pv gp, reg pv

_t3,

; _t3 <- g(~, n, _t2) ; (call #3) ; set global pointer

0

0, _t3 ra, add(reg sp, 11, 12, 13, 14, csl, cs2, cs3, sp, add(reg sp,

li 0),

; restore return address

15, cs4, cs5 const frame)

; restore callee save ; dealloc frame ; return

F i g . 7. A Translation of the SML function in Figure 2 to MLRISC

66 add

(expl, exp2)

crop

and exp2. (crop, expl, exp2) Evaluates to true if the result of evaluating ezpl and exp2 axe ordered according to comparison crop. This expression

const~

label li n

string

load32

addr

reg sub

id (expl, exp2)

Evaluates to the result of adding the results of evaluating expl

evaluates to a condition code, as opposed to an integer. Evaluates to the result of calling the function fn during the final code generation phase, const allows constants in the final assembly language program to depend on the results of earlier phases (e.g., spilling). Evaluates to the address of label string. Evaluates to the integer n. Evaluates to the result of loading a 32-bit value from the address formed by evaluating expression addr. Evaluates to the contents of pseudo register id. Evaluates to the result of subtracting the result of evaluating exp2 from the result of evaluating expl. Table 4. Selected MLRISC Expressions

be expressed efficiently in MLRISC, because there is no statement to move the result of a comparison directly into an integer pseudo register. We address this problem by "preprocessing" the RTL code into two-operand conditional branch form whenever the result of a compare instruction is used by an immediately following branch instruction, and the boolean result is not used anywhere else in the procedure. Although RTL and MLRISC treat pseudo registers in much the same way

(i.e., as local storage locations for a procedure), one cannot interchange the two notions. For example, the MLRISC translation of an RTL instruction that refers to pseudo register ,500 cannot simply refer to pseudo register 500, because pseudo registers in MLRISC code must be allocated explicitly through MLRISC. To overcome this difficulty, we maintain a mapping from RTL pseudo registers to MLRISC pseudo registers and allocate a new pseudo register from MLRISC whenever we see an RTL pseudo register that does not have an existing mapping. As MLRISC pseudo registers, unlike RTL pseudo registers, carry no explicit trace values, we construct a separate mapping from MLRISC pseudo registers to runtime trace values. Storing the trace values "off to the side" allows us to forget about RTL pseudo registers entirely for the later phases of the translation. Our translation "forces" certain pseudo registers to spill by manually replacing them with memory accesses; this transformation is accomplished as a separate pass over the MLRISC code just before we pass it to MLRISC. Pseudo registers that are forced to spill include those holding type environments referred to by trace values, those saving callee-save registers in the presence of exception handlers, as well as any global registers that do not fit in the machine register file. Because the trace value of a given pseudo register can refer to a type environment on the stack to resolve its status (e.g., s t a c k in Table 2), we must ensure

67 that these type environments are in fact on the stack and not being held in a machine register. The callee-save registers must be restored to their former values at the end of an exception handler, so we force the pseudo registers that are used to save these registers to be spilled to the stack so that we can later restore them. Finally, for performance, TIL reserves a small number of machine registers to hold global values that are used by most procedures (e.g., the current heap and limit pointers). Unfortunately, certain machine architectures (most notably, the Intel x86) do not have enough registers for this scheme to be feasible, so we rewrite code using these registers with references to global memory locations.

4.2

Constructing Trace T a b l e s

The most interesting part of the translation from RTL to MLRISC is constructing trace tables for call sites. As MLRISC does not explicitly propagate type information, we construct trace tables by passing trace values "around" MLRISC's code generator. Trace tables are represented as data pseudo operations that are compiled into the data segment of the program. Because trace tables are encoded in terms of machine register numbers and stack offsets, and because trace values are attached indirectly to pseudo registers, we must account for the results of spilling and register allocation during trace table generation. For example, if pseudo register 500 has the run-time trace value y e s and is mapped to machine register 12, then a trace table should contain a y e s entry for machine register 12. This implies that we must generate code and trace table data in separate phases--first we translate the code to obtain a pseudoregister mapping, then we generate trace table d a t a based on this mapping. We must also generate a single trace table for all the static locations in a module: this is accomplished by mapping the RTL label for each static location to a corresponding MLRISC label. In Figure 8 we illustrate how an RTL module containing procedures and static variables is transformed into MLRISC code statements and data directives. Note that for reasons of expediency, trace tables are generated first in terms of RTL data directives which are then translated to MLRISC data directives. This allows us to reuse the trace table module from the T I L l back end. Procedures\ (RTL C o d e ) \ ,

~ Text (MLRISC Code) ace Tables~ Register Map/ (RTL D a t a ) \ (MLRISC) Globals ...... (RTL Data)

>'Data (MLRISC Data)

Fig. 8. Generating MLRISC Code and Trace Tables from RTL

68 The results of register allocation and spilling are not difficult to obtain from MLRISC. The mapping from pseudo registers to machine registers is exported as a data structure by the MLRISC interface. We can construct the mapping from spilled pseudo registers to stack offsets because MLRISC spills pseudo registers via a "call back" to our code. It is important to understand that the mappings used by these phases must be accessible for our technique to work--for a back end that does not export this information, we cannot determine how the pseudo registers are represented at run time, and therefore cannot construct trace tables from pseudo-register trace values. This problem is explored further in Section 5.2

4.3

Register Allocation

Suppose that in Figure 7, pseudo-registers 12 (trace value yes) and _t2 (trace value no) have both been mapped to machine register 4 by MLRISC's register allocator. Which trace value should we give machine register 4 when constructing a trace table? Obviously, we cannot resolve this conflict with just the pseudoregister mapping--we should look at the code to determine which pseudo register was defined most recently on the control path to the call site in question. However, because the code may contain arbitrary branches and loops, a linear scan will not resolve this ambiguity in general. Notice that pseudo registers can be mapped to the same machine register only if they have non-overlapping live ranges: otherwise, definitions of the pseudo registers would interfere with each other. Thus, for a given call site we can resolve a conflicting register assignment by choosing the trace value of the pseudo register that is live across the call site 7, as there can be only one. Figure 7 additionally illustrates a deeper problem, in that the correct trace value for machine register 4 at c a l l #3 depends on the run-time contents of pseudo-register _tl: if _tl is greater than zero, then machine register 4 will be overwritten by the result of c a l l #2. This example suggests that there are cases where the code generation transformations induced by the back end will make it impossible to give a fixed trace value to a particular machine register. Fortunately, such unpredictable trace values can only arise when none of the pseudo registers in question are live across the call site--otherwise, the generated code would be incorrect, because the definition of one pseudo register could interfere with the later use of another. Returning to our example, we know that neither 12 nor _t2 can be used after c a l l #3, because such a use might read the value of the wrong pseudo register. This observation suggests that we must take into account the next use of a pseudo register after the call site as well as its definition before the call site it is not sufficient to simply note the trace value at the most recent definition. We can give a machine register the trace value no if the contents of that register will not be used after the corresponding call site: because the register will not be used, its contents need not be retained by the garbage collector. This happy coincidence allows us to give the trace value no to machine register 4 for our example.

69 Thus, to construct a trace table for a given call site, we map the pseudo registers live across the site through the pseudo-register mapping and pair the resulting machine registers with the trace values for the corresponding pseudo registers. All other machine registers are given the trace value no. Liveness analysis has the added benefit of minimizing storage retained during garbage collection. This, in turn, enhances performance by reducing the load on the collector and also enables certain programs to terminate that would not otherwise 16. Our liveness analysis is based on well-understood data-fiow techniques 1. Note that the call-site liveness analysis must be at least as precise as the registerallocation liveness analysis for this technique to work. We give trace values to stack slots holding spilled pseudo registers with an analogous technique---in Figure 9 we show the construction of a trace table for c a l l #2 of Figure 7 from the live pseudo-register set, a run-time trace value mapping, and a sample MLRISC register mapping.

Live Set

MLRISC Register Map

_tenvl ~+ x ~+ n ~-~

II 12 13

RTL Trace Map

x

~-~ yes ~-~ compute _tenvl.O I

n

~-~

no

_~

Trace Table

11~ yes 1 12 e-~stack 12, 0 13 ~-~ no Fig. 9. Constructing a Trace Table

5 5.1

Assessment TIL

RTL is not well-suited as a source language for MLRISC because the languages are similar enough that the translation between them is essentially wasted work. The principal difference between RTL and MLRISC is that RTL pseudo registers are annotated with trace information; as we describe in Section 4.1, it is not difficult to simulate this capability for MLRISC pseudo registers, so there is no compelling reason to use RTL as a separate translation step. We thus see the use of RTL as an intermediate language as vestigial: we plan to translate directly from the MIL intermediate language in the future. We originally decided to translate from RTL to expedite development of the compiler, as MIL was in a fluid state of development at the time.

70 5.2

MLRISC

This section seeks to identify the specific features of MLRISC that made it possible for us to integrate it with TIL. We also suggest additional features not found in MLRISC that would have made our job easier, or would have resulted in more efficient code generation. We speculate that our experience may be of use to designers of other generic back-ends 5 6. We divide the relevant features into two classes: those that are essential to the propagation of type information, and those that can enhance performance or simplify translation when using type-directed techniques. An underlying theme of our characterization is that the back end needs to do more than simply emit assembly code on behalf of the client--it should also return information about how the translation was accomplished. We first present a brief summary of our conclusions. These are the key features of MLRISC that enable the type-directed translation techniques of our compiler: A A A - A -

-

-

visible pseudo-register-to-machine-register mapping machine-register mapping that is unique for a given pseudo register visible pseudo-register-to-stack-slot mapping spilled pseudo register will not also be mapped to a machine register

These are features not found in MLRISC that might enhance the performance of our translation: -

-

Visible liveness information An extensible spill mechanism

Essential F e a t u r e s As the TIL run-time system uses trace tables that contain machine register numbers, a client-accessible pseudo-register mapping is needed to propagate trace values. Input trace values are attached to pseudo registers, so we must be able to uncover which pseudo registers are mapped to which machine registers if we are to encode trace value mappings for the latter. Although it might be possible to deduce the pseudo-register mapping by comparing the output object code with the input pseudo code, this is likely to be difficult for a back end that performs aggressive optimizations (e.g., global instruction scheduling). The shape of the mapping to machine registers can also present problems to the implementor. MLRISC maps each pseudo register to at most one machine register 9; thus, when an unspilled pseudo register is live across a call site, we can always precisely identify which machine register it is mapped to. If a single pseudo register might be mapped to one of several machine registers at different points in the code, then MLRISC would need to tell us the mapping ranges for each register. Again, it might be possible to deduce this information from the object code--or even from the pseudo code, if we know the back end's register allocation algorithm--but such a deduction algorithm is likely to be complex and correspondingly inefficient.

71 Given the mapping to machine registers, we must have a similar mapping to stack slots for pseudo registers that have been spilled transparently by the back end. For MLRISC, there is no special interface to this information, but because we implement the spill mechanism ourselves 5, we can easily reconstruct it. If a back end does not provide a customizable spill mechanism, it must allow the client to query the spill status and location of a given pseudo register so that trace value mappings can be constructed for the stack. Note that to simplify trace table generation, we ensure that a spilled pseudo register will never be mapped to a machine register (i. e., a spilled pseudo register is always on the stack for its entire lifetime). This is accomplished by allocating a new temporary and rewriting the instruction referring to the spilled pseudo register with a reference to the temporary instead; a store instruction (or load in the case of a reload) is then appended (or prepended) to the rewritten instruction. Because the lifetime of the t e m p o r a r y is only between the rewritten instruction and the store (or load), we can assume that it will never be live across a call site 7, and thus need not be traced. Although such a t e m p o r a r y will never be traced, the stack slot containing the original value still might be traced if it is live across the call site in question. Making the "spilled to the stack" and "mapped to a machine register" states exclusive for a given pseudo register simplifies the process of constructing trace tables--if we could not assume this, then we would have to track where the pseudo register is moved to or from the stack and which location(s) (stack slot or register) its next use(s) expect it to be in. As this information is not exported by MLRISC, it is not clear how we would recover it. Our assumption about the lifetimes of spill temporaries may not hold in the presence of global instruction scheduling, but as MLRISC does not currently perform global instruction scheduling, our implementation is sound for the moment. To implement our translation in the presence of global instruction scheduling, we would need access to the liveness analysis of the back end, or we would need to be able to constrain the scheduling of instructions referring to potentially traceable values. Note that the former solution has the added benefit of reducing the overhead incurred by our translation. D e s i r a b l e F e a t u r e s It is unfortunate that we must perform a data-flow analysis to determine liveness across call sites, because this work will be largely duplicated by MLRISC's register allocator. It would be more efficient if we could derive call site liveness from the register allocator's own liveness information. This might be accomplished in a hypothetical version of MLRISC by returning the pseudo registers that are live into and out of each basic block as part of the translation to assembly language. Additionally, each call site must be isolated in its own basic block: this could be done by TIL if MLRISC were to provide a way to explicitly delimit basic blocks. Note that because spilling is performed in a pass prior to register allocation, liveness information may be lost for spilled pseudo 5 When MLRISC decides to spill a pseudo register, it calls a client-supplied function to return an architecture-specific code sequence for the spill.

72 registers when they are replaced by memory accesses--unless MLRISC retains information showing that they cannot be aliased, it will have to assume that they are always live after the first definition. It might be possible to deduce the liveness of spilled pseudo registers by analyzing the spill and reload patterns via another data-flow analysis, but this seems counterproductive. It is not correct to simply assume that spilled pseudo registers are always live, because the contents of a given stack slot may not be initialized until after the first call site. Our translation to MLRISC includes a "forced spill" pass (see Section 4.1) that replaces certain pseudo registers with memory accesses. As MLRISC implements a similar spilling pass, it would save implementation time if MLRISC were to allow the client to provide an additional spill set as a part of code generation. This would avoid the extra spill pass that currently handles our special spill cases.

5.3

Performance

B e n c h m a r k s Because the optimizer in the TILT compiler is still under construction, we cannot yet take meaningful performance measurements of the object code produced by MLRISC in conjunction with TILT. However, because TILT and T I L l use the same RTL intermediate language, we can use MLRISC as a back end for the T I L l compiler: in Table 5, we present the relative execution times of some of the benchmark programs from Tarditi et al. 17. These measurements show that by using MLRISC as a back end for T I L l , we introduce a significant amount of overhead into the generated code. We believe that this overhead is due to complications in the translation of RTL code to MLRISC code, and is not due to MLRISC itself.

Program ITILITIL1-MLRISC TIL1-MLRISC/TIL1 FFT 2.02 2.49 1.23 Knuth-Bendix 2.28 2.70 1.18 Lexgen 2.66 3.09 1.16 Life 2.07 2.51 1.21 Matmult 2.66 2.61 0.98 Simple 11.91 14.03 1.18 Table 5. TIL1-MLRISC Execution Time Relative to TILl

The execution times in Table 5 are the time in seconds required to execute the programs on a DEC Alpha 3000/600 workstation with 96mb of RAM. This workstation has a 175MHz Alpha 21064 processor with 8k primary instruction and data caches and a 2mb unified secondary cache. Each figure is the arithmetic mean of ten consecutive runs of the corresponding program. See Tarditi et al. 17 for descriptions of the benchmark programs.

73 We made one change to MLRISC for the purpose of benchmarking: MLRISC ordinarily generates floating-point arithmetic instructions with the sud flags set in the instruction word. Because these instructions are emulated in software on our workstation, we replaced them with the equivalent "garden-variety" instructions (e.g. addt instead of a d d t / s u d ) . The sud flags control the precise semantics of floating-point operations--see the Alpha Architecture Handbook for more information. As use of the sud flags makes the F F T benchmark about 300 times slower on our workstation, it is not meaningful to take performance measurements with them set. We used a calling convention without integer callee-save registers for these benchmarks because, when used with T I L l , MLRISC often allocates pseudo registers to callee-save registers in such a way that violates the constraints of our trace table encoding. In particular, our encoding requires that the contents of a callee-save register either be saved on the stack or be left in the original machine register during the activation of a procedure. When used with T I L l , however, MLRISC often allocates the pseudo registers used to save the callee-save registers to other (different) callee-save registers--this is not expressible in our trace table encoding. We have encountered this problem with much less frequency when using MLRISC as a back end for TILT, but it remains unresolved.

T a r g e t C o d e The principal techniques outlined in this paper for interfacing MLRISC to TIL operate only on type information, and therefore should not have a direct effect on object code quality. However, there are sources of inefficiency in the code transformations performed by our translation. Additionally, the limitations of these techniques may introduce performance-limiting constraints when used with other back ends. To elaborate on the former point, the details of the translation from RTL to MLRISC has a significant effect on the ultimate quality of the object code, as is indicated by the discussion of conditional branch translation in Section 4.1. It is clear, however, that this particular difficulty arises as an artifact of an unfortunate mismatch between the semantics of conditional values in RTL and MLRISC, and does not represent a general problem with the interaction between TIL and MLRISC. Another valid question might arise about whether the "forced spill" phase outlined in Section 5.2 will introduce so many new spills as to significantly degrade performance. Although it seems unlikely that the indiscriminate spilling of type environments will have a measurable effect on performance, one cannot so easily dismiss the spilling of the callee-save registers and the pervasive global registers. Note, however, that in each of these cases, spilling is introduced as a consequence of constraints imposed by the run-time system, and not as a consequence of a poor interaction between the compiler and the back end. Thus, the forced spill phase is really a function of the run-time architecture used with TIL and will be required in some form whether or not the object code is generated by MLRISC. A general discussion of the performance of type-directed run-time

74 architectures is beyond the scope of this paper, but see Tarditi et al. 17 and Morrisett 13. A potential performance problem that is directly related to the use of MLRISC as a back end for TIL concerns the constraints that our techniques impose on a back end to simplify trace table generation. These restrictions are discussed in Section 5.2, and although none of them appear to be especially restrictive, it will be difficult to demonstrate this without measurements. Unfortunately, this is particularly awkward to do for our technology, as only one of these constraints (spilled pseudo registers) can be alleviated in MLRISC. Even if we were to remove this limitation, however, we would not be able to execute the resulting code because of the absence of trace tables. It might be more productive to examine individual measurements of these code generation features on other compiler platforms and then use the results as a guide to forming conclusions about the potential drawbacks of our techniques.

Compilation Speed A final performance consideration relates to how the use of our techniques affects the speed of compilation. Because we perform an extra liveness analysis before code generation (see Section 4.3), there is a potential for inefficiency here. Preliminary measurements show that our use of MLRISC has a significant performance cost: the combined RTL-to-MLRISC translation and the subsequent MLRISC code generation phases together perform at less than half the speed of the T I L l back end when used with TILT. These same measurements also indicate that the bulk of the time is being spent in translation code external to MLRISC; MLRISC on its own is usually faster than the T I L l back end. Unfortunately, we have not yet isolated the source of this inefficiency: the extra liveness analysis by itself only accounts for a fraction of the translation overhead--it typically consumes less than 5 percent of the total compilation time. It is certainly possible that most of the translation overhead is caused by unoptimized code on our end. In our opinion, it is too early to draw meaningful conclusions about the performance of the translation itself, as there may still be room for substantial optimization. 6

Future

Work

In this section, we discuss features of MLRISC that are currently underutilized. MLRISC is able to perform inter-procedural register allocation on procedures in the same call graph. We do not currently take advantage of this feature, but hope to utilize it once we fully understand the complications with regard to trace table construction. Because ML programs typically use function calls for looping, the performance benefits of this optimization may be significant when the compiler has not entirely optimized away procedure calls. MLRISC does not currently perform any global instruction scheduling, but we expect that it will eventually do so. We anticipate that this optimization will introduce complications into our call-site liveness analysis because the live ranges of pseudo registers will be perturbed across basic blocks. For example,

75 if the first definition of a traceable pseudo register is moved forward past a call site, then the garbage collector will trace an uninitialized value at that call site if no corrective action is taken. Because basic blocks in ML programs tend to be so small that local instruction scheduling has little benefit, we think it will be important to find a solution to this problem that does not unduly constrain the back end. This topic is also discussed in Section 5.2 MLRISC provides condition-code pseudo registers in addition to integer and floating-point pseudo registers. We currently do not use these pseudo registers because RTL does not distinguish condition codes from integers. A direct translation from MIL might make it easier to take advantage of these registers and also to correct some additional inefficiencies in the translation of conditional branches. Finally, we hope to isolate the source of the current translation inefficiency so t h a t using MLRISC with TILT is not significantly slower than using the T I L l back end. We also hope to improve the performance of the object code generated by MLRISC once implementation of the TILT optimizer is complete.

7

Conclusion

We have presented our approach to integrating MLRISC, a generic back end, with TIL, a type-directed compiler. Our work is a solution to a specific instance of a more general problem: how can abstract trace information be mapped to concrete trace information, given that the correct mapping is a function of a parallel code translation performed by the back end? Register allocation and spilling are the critical code translations that must be reproduced to translate trace information. MLRISC exports its register and spill mappings: it is this property of MLRISC that makes it possible to use it with our compiler. As important parts of TILT are still being developed, we cannot draw definitive conclusions yet about the merits of our approach. It is currently unclear if the use of MLRISC will give us a significant improvement in object code quality. It is also unclear whether the "scaffolding" we have constructed around MLRISC can be made efficient enough not to seriously degrade compilation time. It is reasonable to object to our use of RTL as an intermediate language between MIL and MLRISC, because RTL serves essentially the same purpose as MLRISC. We chose to retain RTL from T I L l only to better compartmentalize our development effort. One could argue that some of the problems we have encountered are due more to the use of RTL than to the use of MLRISC. In particular, we expect that in a hypothetical translation from MIL to MLRISC, a redundant liveness analysis on MIL code would be less onerous due to its more structured control flow. This would appear to undermine our contention that the back end should export the results of its liveness analysis for use by the rest of the compiler. However, we do not believe that simply performing the call-site liveness analysis on MIL code is an adequate long-term solution, because it is not clear that liveness of variables in MIL code necessarily corresponds to liveness of machine registers and stack slots in machine code---we even know that this

76 correspondence will not hold in the presence of global instruction scheduling. For this reason, we think t h a t the availability of liveness information from MLRISC will be crucial to the long-term success of this effort. Our work attests t h a t MLRISC is "generic enough" to be reused as the back end of our compiler, even though T I L is substantially different from Standard ML of New Jersey 4, the compiler for which MLRISC was originally developed. Reuse has a t t e n d a n t costs, however, and the most significant of these a p p e a r to be related to the speed of compilation. We suggest that generic compiler technology is a valuable asset, but that more developers will benefit from it if interfaces are made flexible enough to encompass dissimilar compilation strategies.

Acknowledgements Special thanks to Lal George for his helpful insights during the integration of MLRISC and TIL.

References 1. A. V. Aho, R. Sethi, and 3. D. Ullman. Compilers: principles, techniques, tools. Addison-Wesley, 1986. 2. Andrew W. Appel. A runtime system. Lisp and Symbolic Computation 3, pages 343-380, 1990. 3. Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998. 4. Andrew W. Appel and David B. MacQueen. Standard ML of new jersey. In Third International Symposium on Programming Language Implementation and Logic Programming, pages 1-13. Springer-Verlag, August 1991. 5. Andrew W. Appel et al. The national compiler infrastructure project. 6. Robert P. Wilson et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers. Technical report, Computer Systems Laboratory, Stanford University. 7. Lal George. Personal Communication. 8. Lal George. MLRISC: Customizable and reusable code generators. Technical report, Bell Labs, December 1996. submitted to PLDI. 9. Lal George and Andrew W. Appel. Iterated register coalescing. ACM Transactions on Programming Languages and Systems, 18(3):300-324, May 1996. 10. Robert Harper and Greg Morrisett. Compiling polymorphism using intensional type analysis. In Conference Record of the ~$nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 130-141. ACM, January 1995. 11. Robert Harper and Chris Stone. A type-theroretic interpretation of standard ML. Technical report, Carnegie Mellon University, 1997. submitted for publication. 12. Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From system F to typed assembly language. In Conference Record of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 85-97. ACM, January 1998.

77 13. J. Gregory Morrisett. Compiling with Types. PhD thesis, Carnegie Mellon University, December 1995. Published as CMU Technical Report CMU-CS-95-226. 14. George C. Necula. Proof-carrying code. In Conference Record of the 2,~th Annual A CM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, January 1997. 15. George C. Necula and Peter Lee. The design and implementation of a certifying compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implemantation, New York, 1998. ACM Press. 16. Zhong Shao and Andrew W. Appel. Space-efficient closure representations. In Conference on Lisp and Functional programming, June 94. 17. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL : A type-directed optimizing compiler for ML. In Proceedingsof the ACM SIGPLAN Conference on Programming Language Design and Implemantation, pages 181-192, New York, May21-24 1996. ACM Press. 18. Andrew Tolmach. Tag-free garbage collection using explicit type parameters. In Proceedings 199,~ ACM Conference on Lisp and Functional Programming, June 1994.

A Toolkit for Constructing Type- and Constraint-Based Program Analyses Alexander Aiken, Manuel F~ihndrich, Jeffrey S. Foster, Zhendong Su University of California, Berkeley* **

Abstract. BANE (the Berkeley Analysis Engine) is a publicly available toolkit for constructing type- and constraint-based program analyses. 1 We describe the goals of the project, the rationale for BANE's overall design, some examples coded in BANE, and briefly compare BANE with other program analysis frameworks. 1

Introduction

Automatic program analysis is central to contemporary compilers and software engineering tools. Program analyses are also arguably the most difficult components of such systems to develop, as significant theoretical and practical issues must be addressed in even relatively straightforward analyses. Program analysis poses difficult semantic problems, and considerable effort has been devoted to understanding what it means for an analysis to be correct CC77. However, designing a theoretically well-founded analysis is necessary but not sufficient for obtaining a useful analysis. Demonstrating utility requires implementation and experimentation, preferably with large programs. Many plausible analyses are not beneficial in practice, and others require substantial modification and tuning before producing useful information at reasonable cost. It is important to prototype and realistically test analysis ideas, usually in several variations, to judge the cost/performance trade-offs of multiple design points. We know of no practical analytical method for showing utility, because the set of programs that occur in practice is a very special, and not easily modeled, subset of all programs. Unfortunately, experiments are relatively rare because of the substantial effort involved. BANE (for the Berkeley ANalysis Engine) is a toolkit for constructing typeand constraint-based program analyses. A goal of the project is to dramatically lower the barriers to experimentation and to make it relatively easy for researchers to realistically prototype and test new program analysis ideas (at * Authors' address: Electrical Engineering and Computer Science Department, University of California, Berkeley, 387 Soda Hall #1776, Berkeley, CA 94720-1776 Email: {aiken,manuel,jfoster,zhendong}~cs.berkeley.edu ** Supported in part by an NDSEG fellowship, NSF National Young Investigator Award CCR-9457812, NSF Grant CCR-9416973, and gifts from Microsoft and Rockwell. 1 The distribution may be obtained from the BANE homepage at http ://bane. cs. berkeley, edu.

79 least type- and constraint-based ideas). To this end, in addition to providing constraint specification and resolution components, the BANE distribution also provides parsers and interfaces for popular languages (currently C and ML) as well as test suites of programs ranging in size from a few hundred to tens of thousands of lines of code. BANE has been used to implement several realistic program analyses, including an uncaught exception inference system for ML programs FA97,FFA98, points-to analyses for C FFA97,FFSA98, and a race condition analysis for a factory control language AFS98. Each of these analyses also scales to large programs--respectively at least 20,000 lines of ML, 100,000 lines of C, and production factory control programs. These are the largest programs we have available (the peculiar syntax of the control language precludes counting lines of code).

2

System

Architecture

Constraint-based analysis is appealing because elaborate analyses can be expressed with a concise and simple set of constraint generation rules. These rules separate analysis specification (constraint generation) from implementation (constraint resolution). Implementing an analysis using BANE involves only writing code to (1) generate the appropriate constraints from the program text and (2) interpret the solutions of the constraints. Part (1) is usually a simple recursive walk of the abstract-syntax tree, and part (2) is usually testing for straightforward properties of the constraint solutions. The system takes care of constraint representation, resolution, and transformation. Thus, BANE frees the analysis designer from writing a constraint solver, usually the most difficult portion of a constraint-based analysis to design and engineer. In designing a program analysis toolkit one soon realizes that no single formalism covers both a large fraction of interesting analyses and provides uniformly good performance in an implementation. BANE provides a number of different constraint sorts: constraint languages and associated resolution engines that can be reused as appropriate for different applications. Each sort is characterized by a language of expressions, a constraint relation, a solution space, and an implementation strategy. In some cases BANE provides multiple implementations of the same constraint language as distinct sorts because the different implementations provide different engineering trade-offs to the user. Extending BANE with new sorts is straightforward. An innovation in BANE is support for mixed constraints: the use of multiple sorts of constraints in a single application FA97. In addition to supporting naturally multi-sorted applications, we believe the ability to change constraint languages allows analysis designers to explore fine-grain engineering decisions, targeting subproblems of an analysis with the constraint system that gives the best efficiency/precision properties for the task at hand. Section 3 provides an example of successively refining an analysis through mixed constraints.

80 U C

Set Set -+ Set Set Set -4 Set -~{Cl,.. 9 ,Cn) : Set for any set of Set-constructors ci 6 0 : Set :

:

1 :

Eset

Set

Fig. 1. Operations in the sort Set.

Mixed constraint systems are formalized using a many-sorted algebra of expressions. Each sort s includes a set of variables Vs, a set of constructors /~8, and possibly some other operations. Each sort has a constraint relation _Cs. Constraints and resolution rules observe sorts; t h a t is, a constraint X C8 Y implies X and Y are s-expressions. The user selects the appropriate mixture of constraints by providing constructor signatures. If S is the set of sorts, each n-ary constructor c is given a signature c : t~ . . . t,~ - + S

where ~i is s or ~ for some s 6 S. Overlined sorts mark contravariant arguments of c; the rest are covariant arguments. For example, let sort Term be a set of constructors ZT,rm and variables VT,rm with no additional operations. Pure terms over ZT,~m and VT,~mare defined by giving constructor signatures c:Term...Term-+Term

C 6 ZTerm

Y

arity(c) As another example, let Set be a sort with the set operators in Figure 1 (the set operations plus least and greatest sets). Pure set expressions are defined by the signatures c:Set...Se%-+Set

C6Zs,t

There are many examples of program analyses based on equations between T e r m (e.g., DM82,Hen92,Ste96) and based on inclusion constraints between Set expressions (e.g., And94,AWL94,EST95,FFK+96,Hei94). The literature also has natural examples of mixed constraint systems, although they have not been recognized previously as a distinct category. For example, many e f f e c t s y s t e m s GJSO92 use a function space constructor 9 -4.:

Term Set T e r m - + T e r m

where the Set expressions are used only to carry the set of latent effects of the function. These three examples--terms, set expressions, and a mixed language with set and term components--illustrate that by altering the signatures of constructors

81 a range of analysis domains can be realized. For example, a flow-based analysis using set expressions can be coarsened to a unification-based analysis using terms. Similarly, a term-based analysis can be refined to an effect analysis by adding a Set component to the --+ constructor. 2.1

The Framework

From the user's point of view, our framework consists of a number of sorts of expressions together with resolution rules for constraints over those expressions. In addition, the user must provide constructor signatures specifying how the different sorts are combined. In this section we focus on the three sorts Term, FlowTerm, and Set. The distributed implementation also supports a Row sort R~m89 for modeling records. Besides constructors and variables a sort may have arbitrary operations peculiar to that sort; for example, sort Set includes set operations. Each sort s has a constraint relation _Cs and resolution rules. Constraints and resolution rules preserve sorts, so that X C_s Y implies X and Y are s-expressions. For example, for the Term sort, the constraint relation CT,rm is equality, and the resolution rules implement term unification for constructors with signatures Term... Term --+ Term. For clarity we write the constraint relation of term unification as " = t " instead of CT.r~. The resolution rules in Figure 2 are read as left-to-right rewrite rules. The leftand right-hand sides of rules are conjunctions of constraints. Sort FlowTerm has the expressions of sort Term but a different set of resolution rules (see Figure 2b). FlowTerm uses inclusion instead of equality constraints. The inclusion constraints are more precise, but also more expensive to resolve, requiring exponential time in the worst case. For certain applications, however, FlowTerm is very efficient HM97. We write C ~ for the FlowTerm constraint relation. The constructor rules connect constraints of different sorts. For example, in sort FlowTerm the rule

s A c(T~,..., T,) C_,, c(T~,..., T ' ) - S ^ T1 _c,, T ^ . - . ^ T , c_,. T" if

c

:

~I

" " " /~n

---)"

FlowTerm

says constraints propagate structurally to constructor arguments; this is where FlowTerm has a precision advantage over Term (see below). Note this rule preserves sorts. The rule for constructors of sort Term (Figure 2a) is slightly different because _CT,rmis equality, a symmetric relation. Thus, constraints on constructor arguments are also symmetric:

S A f(TI,... ,Tn) =t f(T~,... ,Ttn) ~ SAT1 g,, T~ AT~ C,~ T1 A ...A T,, C_,,, T~ A T~ C_,,, Tn if f : L1 . . . e n --+ Term Figure 2c shows the rules for the Set sort. In addition to the standard rules AW93, Set includes special rules for set complement, which is problematic in the presence of contravariant constructors. We deal with set complement using

82

S A f ( T ~ , . . . ,Ta) = t / ( T ~ , . . .

,T'~) =- S A T ~ C_~ Ti AT~ a_~ T~ ^ ' " A TaC_~.T'AT'C_~.T,~ if / : ~ l ' " ~ a - ~ T e r m

S A f ( . . . ) ----~g(... ) ----inconsistent

if f r g

(a) Resolution rules for sort Term.

S A e ( T ~ , . . . ,Ta) C_, c(Tf,... ,T') - - S A T x C_, Tf ^ ... ATe g ~ T" if c : ~ 9 9 9 ~ a --+ FlowTerm SAc(...) C_ftd(...)----inconsistent i f e • d S ^ a c_, c(T~,.. . ,Ta) -- S ^ a = e(a~,.. . ,a~)^a~

S Ac(T1,... ,T=) C z, a ~ S A a

C_~,T~

ai fresh, c:~1--.~-+FlowTerm = c ( a l , . . . , a n ) AT~ C~, ai ai fresh, C : t l . . . t a - + F l o w T e r m

(b) Resolution rules for sort FlowTerm.

S A 0 C_, T = S SATC_,I =S S A c(T~, . . . ,Ta) C, c(Ti, . . . ,T" ) - S ^ T I

S A c ( . . . ) C, d ( . . . )

C~I T ~ A . . . A T a

c_~ T"

if c : el " " t a --+ Set -- inconsistent ff c ~- d - - S A T x C, T A T 2 C _ , T -- S A T C_, TI A T C_, T2 =S =S = S A TI n Ta C, T2

S A T~ UT2 C, T S A T C, T 1 N T 2 SAaC_,a SAaNTC_,a s ^ T~ c_, Pat(T2, T3) S A a N T ~ C_.T2 = S A a

C_, Pat(T2,TI)

SA-~{cl,.. ,ca}C_,-~{dl,...,dm} -S S A c ( . . . ) C . - ~ { d l , . . . ,din} -- S

if{dl,...,dm}C_{cl,...,ca} if c r { d l , . . . ,d,~}

(c) Resolution rules for sort Set. SAX

C, ~ A a

C~ Y =_ S A X

C~ ~ A ~

C~ Y A X C~ Y

S A T~ C_~T2 =-- S A T 2 C_~T~ (d) General rules. Fig. 2. Resolution rules for constraints.

83 two mechanisms. First, explicit complements have the form --{cl, 999 , c,}, which has all values of sort Set except those with head constructor cl, 9.. ,c,~. Second, more general complements are represented implicitly. Define ~ R to be the set such t h a t / ~ n ~ R = 0 and R U ~ R = 1 (in all solutions). Now define

Pat(T, R)

=

(T

6"I

R)

U

-~R

The operator Pat 2 encapsulates a disjoint union involving a complement. Pat is equivalent to in power to disjoint union, but constraint resolution involving Pat does not require computing complements. Of course, wherever Pat(T, R) is used the set ~ R must exist; this is an obligation of the analysis designer (see FA97 for details). Given the definitions of Pat and -~{cl,... ,c,~}, basic set theory shows the rules in Figure 2c are sound. Our specification of sort Set is incomplete. We have omitted some rules for simplifying intersections and some restrictions on the form of solvable constraints. The details may be found in AW93,FA97. Figure 2d gives two general rules that apply to all sorts. The first rule expresses that C~ is transitive. The second flips constraints that arise from contravariant constructor arguments. We now present a small example of a mixed constraint system. Consider an effect system where each function type carries a set of atomic effects (e.g., the set of globally visible variables that may be modified by invoking the function). Let the constructors have signatures

9-~.:FlowTerm int:FlowTerm al,... ~an:Set

Set FlowTerm -+ FlowTerm

(the atomic effects)

The following constraint

OLa,u~2/3 ~ t

int -~ int

is resolved as follows:

a~u~2# ~ t

int -~ int

GC~intAaIUa2 C sTA~Eft int int ___ifG A a l U a 2 C_B 7 A ~ - C f t int G ~ int A al U a2 C_. -y A ~ ----int

Thus in all solutions a and/~ are both i n t and 7 is a superset of al U a2. 2.2

Scalability

The main technical challenge in BANE is to develop methods for sealing constraint-based analyses to large programs. Designing for scalability has led to

2 Pat, stands for "pattern," because it is used most often to express pattern matching.

84 a system with a significantly different organization than other program analysis systems Hei94,AWL94. To handle very large programs it is essential that the implementation be structured so that independent program components can be analyzed separately first and the results combined later. Consider the following generic inference rule where expressions are assigned types under some set of assumptions A and constraints C

A, C ~- el : T1 A, C F- e2 : 7"2 A, C ~- Eel, e2 : T where Eel,e2 is a compound expression with subexpressions el and e2. In all other implementations we know of, such inference systems are realized by accumulating a set of global constraints C. In BANE one can write rules as above, but the following alternative is also provided: A, C1F- el : 7"I A, C2 ~" e2 : T2

A, C1 A C2 F- Eel, e2 : T C1 contains only the constraints required to type el (similarly for C~ and e2). This structure has advantages. First, separate analysis of program components is trivial by design rather than added as an afterthought. Second, the running time of algorithms that examine the constraints (e.g., constraint simplification, which replaces constraint systems by equivalent, and smaller, systems) is guaranteed to be a function only of the expression being analyzed; in particular, the running time is independent of the rest of the program. Note that this design changes the primitive operation for accumulating constraints from adding individual constraints to a global system to combining independent constraint systems. Because this latter operation is more expensive, BANE applications tend to use a mixture of the two forms of rules to obtain good overall performance and scalability. Many other aspects of the BANE architecture have been engineered primarily for scalability FA96. The emphasis on scalability, plus the overhead of supporting general user-specified constructor signatures, has a cost in runtime performance, but this cost appears to be small. For example, a BANE implementation of the type inference system for core Standard ML performs within a factor of two of the hand-written implementation in the SML/NJ compiler. In other cases a well-engineered constraint library can substantially outperform hand-written implementations. BANE implementations of a class of cubic-time flow analyses can be orders of magnitude faster than special-purpose systems because of optimizations implemented in the solver for BANE's set constraint sort FFSA98.

3

The B A N E Interface by Example

This section presents a simple analysis written in BANE. We show by example how an analysis can be successively refined using mixed constraints. BANE is

85 a library written in Standard ML of New Jersey MTH90. Writing a program analysis using BANE requires ML code to traverse abstract syntax while generating constraints and ML code to extract the desired information from the solutions of the constraints. For reasons of efficiency, BANE's implementation is stateful. B A N E provides the notion of a current constraint system (CCS) into which all constraints are added. Functionality to create new constraint systems and to change the CCS are provided, so one is not limited to a single global constraint system. For simplicity, the examples in this section use only a single constraint system.

3.1

A Trivial Example: Simple T y p e Inference for a L a m b d a Calculus

This example infers types for a lambda calculus with the following abstract syntax: datatype

ast =

Var of s t r i n g I I n t of int I Fn of {formal:string,

body:ast}

I App of {function:ast, argument:ast}

The syntax includes identifiers (strings), primitive integers, abstraction, and application. The language of types consists of the primitive type i n t , a function type --~, as well as type variables v. T ::= v l int l f -+ T

The first choice is the sort of expressions and constraints to use for the type inference problem. All that is needed in this case are terms and term equality; the appropriate sort is Term (structure Bane .Term). To make the code more readable, we rebind this structure as structure T y p e S o r t . structure

TypeSort

= Bane.Term

B A N E uses distinct M L types for expressions of distinct sort. In this case, type expressions have M L type type ty = TypeSort.T Bane.expr Next, we need the type constructors for integers and functions. T h e integer type constructor can be formed using a constant signature, and a standard function type constructor is predefined. val int_tycon = Cons.new {name="int", signa=TypeSort, constSig} val fun_tycon = TypeSort.funCon The constant integer type is created by applying the integer constructor to an empty listof arguments. W e also define a function to apply the function type constructor to the domain and range, using the generic function Bane. Common. c o n s : 'a constructor * genE list -> 'a expr that applies a constructor of sort

86

A b x : Ax

VAR

A F- i :

int

lINT

A F e l :T1 Aa fresh

e2 : r2

a fresh

Ax~-'-~a-e:T A b- A x . e : a - - - + r

lABS

"rl ---- "r2 --+ ot

AF

el e2 : a

APP

Fig. 3. Type inference rules for example lambda calculus

' a to a list of arguments. In general, constructor arguments can have a variety of distinct sorts with distinct ML types. Since ML only allows homogeneously typed lists, BANE uses an ML type genE for expressions of any sort. The lack of subtyping in ML forces us to use conversion functions T y p e S o r t . toGenE to convert the domain and range from T y p e S o r t . T Bane. e x p r to Bane. genE. v a l intTy = Bane.Common.cons (int_tycon, ) fun runTy (domain,range) = Common.cons (fun_tycon, TypeSort. toGenE TypeSort.toGenE

domain,

range

)

Finally, we define a function for creating fresh type variables by specializing the generic function Bane. Y a r . f r e s h Y a r : ' a Bane. s o r t -> ' a B a n e . e x p r . We also bind operator == to the equality constraint of T y p e S o r t . fun freshTyVar () = Bane.Var.freshVar infix = = val op == = TypeSort.unify

TypeSort.sort

With these auxiliary bindings, the standard type inference rules in Figure 3 are translated directly into a case analysis on the abstract syntax. T y p e environments are provided by a module with the following signature: signature ENV = sig type name = string type

'a

env

val empty val insert val find

: ~a e n v : 'a e n v * n a m e * 'a -> 'a e n v : 'a e n v * n a m e -> ~a o p t i o n

end

The type of identifiers is simply looked up in the environment. If the environment contains no assumption for an identifier, an error is reported. fun e l a b o r a t e case

ast

of

env

ast =

87

Vat x => (case Ear.find (env, x) of SOME ty => ty NONE => )

The integer case is even simpler: i Int i => intTy

Abstractions are typed by creating a fresh unconstrained type variable for the lambda bound formal, extending the environment with a binding for the formal, and typing the body in the extended environment. I Fn ~formal,body} => let val v = freshTyVar O val env' = Env.insert (env,formal,v) val body_ty = elaborate env' body in runTy (v, body_ty) end

For applications we obtain the function type t y l and the argument type t y 2 via recursive calls. A fresh type variable r e s u l t stands for the result of the application. Type t y l must be equal to a function type with domain t y 2 and range r e s u l t . The handler around the equality constraint catches inconsistent constraints in the case where t y l is not a function, or the domain and argument don't agree. I App {function,argument} => let val tyl = elaborate env function val ty2 = elaborate env argument val result = freshTyVar () val fty = funTy (ty2, result) in (tyl == fry) handle exn => ; result

end

We haven't specified whether our type language for lambda terms includes recursive types. The Term sort allows recursive solutions by default. If only nonrecursive solutions are desired, an occurs check can be enabled via a BANE option: Bane.Flags.set (SOME TypeSort.sort) "occursCheck";

As an example, consider the Y combinator

Y = Af.(Ax.f (x x))(Ax.f (x x)) Its inferred type is where the type variable a is unconstrained. With the occurs check enabled, type inference for Y fails.

88 3.2

Type Inference with Flow Information

The simple type inference described above yields type information for each lambda term or fails if the equality constraints have no solution. Suppose we want to augment type inference to gather information about the set of lambda abstractions to which each lambda expression may evaluate. We assume the abstract syntax is modified so that lambda abstractions are labeled: Fn of {formal:string, body:ast, label:string}

Our goal is to refine function types to include a label-set, so that the type of a lambda term not only describes the domain and the range, but also an approximation of the set of syntactic abstractions to which it may evaluate. The function type constructor thus becomes a ternary constructor f u n ( d o m , rng, labels). The resulting analysis is similar to the flow analysis described in Mos96. The natural choice of constraint language for label-sets is obviously set constraints, and we bind the structure L a b e l S e t to one particular implementation of set constraints: structure LabelSet =Bane. SetIF

We define the new function type constructor containing an extra field for the label-set by building a signature with three argument sorts, the first two being T y p e sorts and the last being a LabelSet sort. Note how the variance of each constructor argument is specified in the signature through the use of functions T y p e S o r t . c t v _ a r g (contravariance) and T y p e S o r t . cov_arg (covariance). Resolution of equality constraints itself does not require variance annotations, but other aspects of BANE do. val funSig = TypeSert.newSig (args= TypeSert.ctv_arg TypeSert.genSert, TypeSort. cov arg TypeSort. genSort, TypeSort. cov_arg LabelSet. genSort, attributes= ) val fun_tycon = Bane. Cons.new {name="fun", signa=funSig)

We are now using a mixed constraint language: types are terms with embedded label-sets. Constraints between types are still equality constraints, and as a result, induced constraints between label sets are also equalities. The type rules for abstraction and application are easily modified to include label information. a fresh Ax ~+ ~ F- e : T {/} C_ e e fresh A F- AZx.e : fun(s, r, c)

ABS

AF-el :rl A ~- e2 : 7-2 ~, C fresh 71 -- fun(T2, C~,e) A F- el e2 :

APP

Because Term constraints generate equality constraints on the embedded Sets, the label-sets of distinct abstractions may be equated during type inference. As a result, the lABS rule introduces a fresh label-set variable e along with a constraint {1} C_ e to correctly model that the lambda abstraction evaluates to

89 itself. (Note that this inclusion constraint is between Set expressions.) Using a constrained variable rather than a constant set {/} allows the label-set to be merged with other sets through equality constraints. The handling of arroweffects in region inference is similar TT94. The label-set variable c introduced by each use of the lAPP rule stands for the set of abstractions potentially flowing to that application site. The code changes required to accommodate the new rules are minimal. For abstractions, the label is converted into a constant set constructor with the same name through Con-*.new. A constant set expression is then built from the constructor and used to constrain the fresh label-set variable l a b e l v a r . Finally, the label-set variable is used along with the domain and range to build the function type of the abstraction.

I Fn {formal,body,label} => let val v = freshTyVar 0 val env' = Env.insert (env,formal,v) val body_ty = elaborate env' body (* create a new constant constructor *) v a l c = Cons.new {name=label, signa=LabelSet.constSig} val lab = Common.cons (c,) val labelvar = freshLabelVar () in (lab <= labelvar); runty (v, body_ty, labelvar)

end

The changes to the implementation of lAPP are even simpler, requiring only the introduction of a fresh label-set variable. The label-set variable may be stored in a map for later inspection of the set of abstractions flowing to particular application sites.

I App {function,argument} => let val tyl = elaborate env function val ty2 = elaborate env argument val result = freshTyVar () val labels = freshLabelVar 0 val fty = funTy (ty2, result, labels) in (tyl == fty) handle exn => ; result end

We now provide a number of examples showing the information gathered by the flow analysis. Consider the standard lambda encodings for values true, false,

90

nil, and cons, and their inferred types. true )~truex.~trUezy.x false = )~falSex.)~faLsely.y nil = )r cons = )~c~ cl tl.)~C2x.)~C3y.y hd tl =

c~ As A~ c~ \ true C el A true1 C_ e2 c~ - ~ ~ - ~ / ~ \ false C_ ez A false1 C e2 o~ _t~/~ - ~ c~ \ nil C s A nil1 C e2

~ - ~ B - ~ ~ - ~ (~ - ~ Z - ~ 6) ~ , 6 \ cons C

eI

A

C1 __(~e2

c 2 C_ e 3

A c 3 C_

A

e6

The analysis yields constrained types T \ C, where the constraints C describe the label-set variables embedded in type T. (To improve the readability of types, function types are written using the standard infix form with label-sets on the arrow.) For example, the type of nil

a-t~-~a

\ niIC_elAnill C_E2

has the label-set el on the first arrow, and associated constraint nil C el. The label-set is extracted from the final type using the following BANE code fragment: val ty = elaborate error baseEnv e val labels = case C o m m o n . d e C o n s (fun_tycon, ty) of SOME dom,rng,lab => LabelSet.tlb (LabelSet.fromGenE

lab)

I ~0NE =>

The function C o m m o n . d e C o n s is used to decompose constructed expressions. In this case we match the final type expression against the pattern fun(dora, rng, lab). If the match succeeds, deCons returns the list of arguments to the constructor. In this case we are interested in the least solution of the label component lab. We obtain this information via the function L a b e l S e t . t l b , which returns the transitive lower-bound (TLB) of a given expression. The T L B is a list of constructed expressions c ( . . . ) , in our case a list of constants corresponding to abstraction labels. A slightly more complex example using the lambda expressions defined above is head = .~headl.l nil ()~headlx.,)~head2y.x) ((OL ~ /'1 - ~ OL) - ' ~

head (cons true nil) : a ~

~~

head C_ eTA nil C_ elA nil1 C e2A head1 C_e4A head2 C_e5 c~ \ true C_ez A truel C e2

The expression head (cons true nil) takes the head of the list containing true. Even though the function head is defined to return nil if the argument is the empty list, the flow analysis correctly infers that the result in this case is true.

91 The use of equality constraints m a y cause undesired approximations in the flow information. Consider an example taken from Section 3.1 of Mossin's thesis Mos96 select

=

Aselectx.Aselly.Asel2f.if x t h e n f X e l s e f y

T h e select function takes three arguments, x, y, and z, and depending on the t r u t h value of x, returns the result of applying f to either x or y. The abbreviation i f p t h e n el e l s e e2 stands for the application p el e2. The type constraints for the two applications of f cause the flow information of x and y to be merged. As a result, the application

select true false (Az.z) does not resolve the condition of the if-then-else to true. To observe the approximation directly in the result type, we modify the example slightly: select' = Aselectx.ASelly.Asel2f.if

x then f x x else f y x

Now f is applied to two arguments, the first being either x or y, the second being x in both cases. We modify the example use of select such t h a t f now ignores its first argument and simply returns the second, i . e . x . The expression thus evaluates to true. select' true false (Az.Aw.w) T h e inferred type for this application is

true U false C el true1 U false1 C_ e2 where the label-set of the function type indicates t h a t the result can be either true or false. This approximation can be overcome through the use of subtyping.

3.3

T y p e Inference w i t h F l o w I n f o r m a t i o n and S u b t y p i n g

T h e inclusion relation on label-sets embedded within types can be lifted to a natural subtyping relation on structural types. This idea has been described in the context of control-flow analysis in HM97, for a more general flow analysis in Mos96, and for more general set expressions in FA97. A subtype-based analysis where sets are embedded within t e r m s can be realized in B A N E through the use of the FlowTerm sort. The FlowTerm sort provides inclusion constraints instead of equality for the same language and solution space as the Term sort. To take advantage of the extra precision of subtype inference in our example, we first change the T y p e S o r t structure to use the FlowTerm sort. structure

TypeSort

= Bane.FlowTerm

92

The definition of the function type constructor with labels remains the same, although the domain and range are now of sort FlowTerm.

val funSig

val

=

fun_tycon

TypeSort.newSig {args=TypeSort.ctv_arg TypeSort.genSort, TypeSort.cov_arg TypeSort.genSort, TypeSort.cov_arg LabelSet.genSort, attributes=} = Bane.Cons.new

{name="fun",

signa=funSig}

The inference rules for abstraction and application change slightly. In the ABS rule, it is no longer necessary to introduce a fresh label-set variable, since label sets are no longer merged in the subtype approach. Instead the singleton set can be directly embedded within the function type. In the APP rule, we simply replace the equality constraint with an inclusion.

Ax ~ a b e :T A E- AZx.e: fun(a, T, {l))

A } - e l :Wl A - e 2 :T2 "7"1 _Cfun(T2, a, e)

ABS

A ~- el e2: a

APP

Note that the inclusion constraint in the APP rule allows subsumption not only on the label-set of the function, but also on the domain and the range, since fun(dom, range, labels) C_ fun(r2, a, e) r

7-~ C_ dom A range C a A labels C e

We return to the example of the previous section where flow information was merged: select' true false (Az.Aw.w) Using subtype inference, the type of this expression is T \ T = T "~

true C

T "~

T

61

truel C ~2

The flow information now precisely models the fact that only true is passed as the second argument to Az.Aw.w. 4

Analysis

Frameworks

We conclude by comparing BANE with other program analysis frameworks. There have been many such frameworks in the past; see for example ATGL96,AM95,Ass96,CDG96,DC96,HMCCR93,TH92,Ven89,YH93. Most frameworks are based on standard dataflow analysis, as first proposed by Cocke Coc70 and developed by Kildall Ki173 and Kam and Ullman KU76, while others are based on more general forms of abstract interpretation Ven89,YH93.

93 In previous frameworks the user specifies a lattice and a set of transfer functions, either in a specialized language AM95, in a Yacc-like system TH92, or as a module conforming to a certain interface ATGL96,CDG96,DC96,HMCCR93. The framework traverses a program representation (usually a control flow graph) either forwards or backwards, calling user-defined transfer functions until the analysis reaches a fixed point. A fundamental distinction between BANE and these frameworks is the interface with a client analysis. In BANE, the interface is a system of constraints, which is an explicit data structure that the framework understands and can inspect and transform for best effect. In other frameworks the interface is the transfer and lattice functions, all of which are defined by the client. These functions are opaque--their effect is unknown to the framework--which in general means that the dataflow frameworks have less structure that can be exploited by the implementation. For example, reasoning about termination of the framework is impossible without knowledge of the client. Additionally, using transfer functions implies that information can flow conveniently only in one direction, which gives rise to the restriction in dataflow frameworks that analyses are either forwards or backwards. An analysis that is neither forwards nor backwards (e.g., most forms of type inference) is at best awkward to code in this model. On the other hand, dataflow frameworks provide more support for the task of implementing traditional dataflow analyses than BANE, since they typically manage the control flow graph and its traversal as well as the computation of abstract values. With BANE the user must write any needed traversal of the program structure, although this is usually a simple recursive walk of the abstract syntax tree. Since BANE has no knowledge of the program from which constraints are generated, BANE cannot directly exploit any special properties of program structure that might make constraint solving more efficient. While there is very little experimental evidence on which to base any conclusion, it is our impression that an analysis implemented using the more general frameworks with user-defined transfer functions suffers a significant performance penalty (perhaps an order of magnitude) compared with a special-purpose implementation of the same analysis. Note that the dataflow frameworks target a different class of applications than BANE, and we do not claim that BANE is particularly useful for traditional dataflow problems. However, as discussed in Section 2.2, we do believe for problems with a natural type or constraint formulation that BANE provides users with significant benefits in development time together with good scalability and good to excellent performance compared with hand-written implementations of the same analyses.

5

Conclusions

BANE is a toolkit for constructing type- and constraint-based program analyses. An explicit goal of the project is to make realistic experimentation with program analysis ideas much easier than is now the case. We hope that other researchers

94 find BANE useful in this way. The BANE distribution is available on the World Wide Web from h t t p : / / b a n e . c s . b e r k e l e y , edu.

References AFS98

AM95 And94 Ass96

ATGL96

AW93

AWL94

cc77

CDG96

Coc70 DC961 DM82 EST95 FA96

A. Aiken, M. Ffiandrich, and Z. Su. Detecting Races in Relay Ladder Logic Programs. In Tools and Algorithms for the Construction and Analysis of Systems, 4th International Conference, TA CAS'98, volume 1384 of LNCS, pages 184-200, Lisbon, Portugal, 1998. Springer. M. Alt and F. Martin. Generation of efficient interprocedural analyzers with PAG. Lecture Notes in Computer Science, 983:33-50, 1995. L. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Cophenhagen, May 1994. U. Assmann. How to Uniformly Specify Program Analysis and Transformation with Graph Rewrite Systems. In Proceedings of the Sixth International Conference on Compiler Construction (CC '96), pages 121-135. Springer-Verlag, April 1996. A. Adl-Tabatabai, T. Gross, and G. Lueh. Code Reuse in an Optimizing Compiler. In Proceedings of the A CM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '96), pages 51-68, October 1996. A. Aiken and E. Wimmers. Type Inclusion Constraints and Type Inference. In Proceedings of the 1993 Conference on Functional Programming Languages and Computer Architecture, pages 31-41, Copenhagen, Denmark, June 1993. A. Aiken, E. Wimmers, and T.K. Lakshman. Soft Typing with Conditional Types. In Twenty-First Annual ACM Symposium on Principlesof Programming Languages, pages 163-173, January 1994. P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Contruction or Approximation of Fixed Points. In Fourth Annual ACM Symposium on Principles of Programming Languages, pages 238-252, January 1977. C. Chambers, J. Dean, and D. Grove. Frameworks for Intra- and Interprocedural Dataitow Analysis. Technical Report 96-11-02, Department of Computer Science and Engineering, University of Washington, November 1996. J. Cocke. Global Common Subexpression Elimination. ACM SIGPLAN Notices, 5(7):20-24, July 1970. M. Dwyer and L. Clarke. A Flexible Architecture for Building Data Flow Analyzers. In Proceedings of the 18th International Conference on Software Engineering (ICSE-18), Berlin, Germany, March 1996. L. Damns and R. Milner. Principle Type-Schemes for Functional Programs. In Ninth Annual ACM Symposium on Principles of Programming Languages, pages 207-212, January 1982. J. Eifrig, S. Smith, and V. Trifonov. Sound Polymorphic Type Inference for Objects. In OOPSLA '95, pages 169-184, 1995. M. F~tmdrich and A. Aiken. Making Set-Constraint Based Program Analyses Scale. In First Workshop on Set Constraints at CP'96, Cambridge, MA, August 1996. Available as Technical Report CSD-TR-96-917, University of California at Berkeley.

95 FA9~

FFA9~ FFA98 FFK+96

FFSA98

GJSO92 Hei94 Hen92 HM97 HMCCR93

Ki1731 KU76 Mos96 MTH90 R6m89

Ste96

M. FfiJandrich and A. Aiken. Program Analysis Using Mixed Term and Set Constraints. In Proceedings of the 4th International Static Analysis Symposium, pages 114-126, 1997. J. Foster, M. F~ihndrich, and A. Aiken. Flow-Insensitive Points-to Analysis with Term and Set Constraints. Technical Report UCB//CSD-97-964, University of California, Berkeley, July 1997. M. F~ihndrich, J. Foster, and A. Aiken. Tracking down Exceptions in Standard ML Programs. Technical Report UCB/CSD-98-996, EECS Department, UC Berkeley, February 1998. C. Flanagan, M. Flatt, S. Krishnamurthi, S. Weirich, and M. Felleisen. Catching Bugs in the Web of Program Invariants. In Proceedings of the 1996 A CM SIGPLAN Conference on Programming Language Design and Implementation, pages 23-32, May 1996. M. F~ihndrich, J. Foster, Z. Su, and A. Aiken. Partial Online Cycle Elimination in Inclusion Constraint Graphs. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, 1998. D. Gifford, P. Jouvelot, M. Sheldon, and J. O'Toole. Report on the FX91 Programming Language. Technical Report MIT/LCS/TR-531, Massachusetts Institute of Technology, February 1992. N. Heintze. Set Based Analysis of ML Programs. In Proceedings of the 1994 A CM Conference on LISP and Functional Programming, pages 30617, June 1994. F. Henglein. Global Tagging Optimization by Type Inference. In Proceedings of the 1992 ACM Conference on Lisp and Functional Programming, pages 205-215, July 1992. N. Heintze and D. McAllester. Linear-Time Subtransitive Control Flow Analysis. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1997. M. Hall, J. Mellor-Crummey, A. Carle, and R. Rodriguez. FIAT: A Framework for Interprocedurai Analysis and Transformation. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Proceedings of the 6th International Workshop on Parallel Languages and Compilers, pages 522-545, Portland, Oregon, August 1993. Springer-Verlag. G. A. Kildall. A Unified Approach to Global Program Optimization. In ACM Symposium on Principles of Programming Languages, pages 194206, Boston, MA, October 1973. ACM, ACM. J. Kam and J. Ullman. Global Data Flow Analysis and Iterative Algorithms. Journal of the ACM, 23(1):158-171, January 1976. Christian Mossin. Flow Analysis of Typed Higher-Order Programs. PhD thesis, DIKU, Department of Computer Science, University of Copenhagen, 1996. Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. MIT Press, 1990. D. R6my. Typechecking records and variants in a natural extension of ML. In Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, pages 60-76, January 1989. B. Steensgaard. Points-to Analysis in Almost Linear Time. In Proceedings of the 23rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 32-41, January 1996.

96 TH92

TT94

VenS9

YH93

S. Tjiang and J. Hennessy. Shaxlit - A tool for building optimizers. In Proceedings of the ACM SIGPLAN '92 Conference on Programming Language Design and Implementation, pages 82-93, July 1992. M. Tofte and J.-P. Talpin. Implementation of the Typed Call-by-Value A-Calculus using a Stack of Regions. In Twenty-First Annual ACM Symposium on Principles of Programming Languages, pages 188-201, 1994. G. A. Venkatesh. A framework for construction and evaluation of highlevel specifications for program analysis techniques. In Proceedings of the ACM SIGPLAN '89 Conference on Programming Language Design and Implementation, pages 1-12, 1989. K. Yi and W. Harrison, III. Automatic Generation and Management of Interprocedural Program Analyses. In Proceedings of the Twnetieth Annual A CM Symposium on Principles of Programming Languages, pages 246-259, January 1993.

Optimizing Using

a Hierarchy

ML

of Monadic

Types

Andrew Tolmach* Pacific Software Research Center Portland State University & Oregon Graduate Institute Dept. of Computer Science, P.S.U., P.O. Box 751, Portland, O1% 97207, USA

apt@cs, pdx. edu

A b s t r a c t . We describe a type system and typed semantics that use a hierarchy of monads to describe and delimit a variety of effects, including non-termination, exceptions, and state, in a call-by-value functional language. The type system and semantics can be used to organize and justify a variety of optimizing transformations in the presence of effects. In addition, we describe a simple monad inferencing algorithm that computes the minimum effect for each subexpression of a program, and provides more accurate effects information than local syntactic methods.

1

Introduction

Optimizers are often implemented as engines that repeatedly apply improving transformations to programs. Among the most important transformations are propagation of values from their defining site to their use site, and hoisting of invariant computations out of loops. If we use a pure (side-effect-free) language based on the lambda calculus as our compiler intermediate language, these transformations can be neatly described by the simple equations for beta-reduction

(Beta)

let x

:

el in e~ = e2e,/~

and for the exchange and hoisting of bindings

(Exchange)

(RecHoist)

l e t Xl = el i n ( l e t x2 = e2 i n e3) = l e t x2 = e2 i n ( l e t Xl = el i n e3) (xi r FV(e2); x2 r FV(el)) l e t r e c f x = ( l e t y = el i n e2) i n e3 = l e t y = el i n ( l e t r e c f x = e2 i n e3)

(x, I r FY(ei); y r FY(e3)) where FV(e) is the set of free variables of e. The side conditions nicely express the data dependence conditions under which the equations are valid. Either * Supported, in part, by the US Air Force Materiel Command under contract F1962893-C-0069 and by the National Science Foundation under grant CCR-9503383.

98 orientation of the equation generates a valid transformation.1 Effective compilers for pure, lazy functional languages (e.g., 14) have been conceived and built on the basis of such transformations, with considerable advantages for modularity and correctness. It would be nice to apply similar methods to the optimization of languages like ML, which have side effects such as I/O, mutable state, and exceptions. Unfortunately, these "rearranging" transformations are not generally valid for such languages. For example, if we apply (Beta) (oriented left-to-right) in a situation where evaluating el performs output and x is mentioned twice in e2, evaluating the resulting expression might produce the output twice. In fact, once an eager evaluation order is fixed, even non-termination becomes a "side effect." For example, (RecHoist) is not valid unless el is known to be terminating (and free of other effects too, of course). A similar challenge long faced lazy functional languages at the source level: how could one obtain the power of side-effecting operations without invalidating simple "equational reasoning" based on (Beta) and similar rules? The effective solution discovered in that context is to use monads 9, 13. An obvious idea, therefore, is to use monads in an internal representation (IR) for compilers of call-by-value languages. Some initial steps in this direction were recently taken by Peyton Jones, Launchbury, Shields, and Tolmach 11. The aim of that work was to design an IR suitable for both eager and lazy source languages. In this paper we pursue the use of monads with particular reference to eager languages (only), and address the question of how to discover and record several different sorts of effects in a single, unified monadic type system. We introduce a hierarchy of monads, ordered by increasing "strength of effect," and an inference algorithm for annotating source program subexpressions with their minimal effect. Past approaches to coping with effects have fallen into two main camps. One approach (used, e.g., by SML of New Jersey 1 and the TIL compiler 17) is to fall back on a weaker form of (Beta), called (Betav), which is valid in eager settings. (Betav) restricts the bound expression e to variables, constants, and A-abstractions; since "evaluating" these expressions never actually causes any computation, they can be moved and substituted with impunity. To augment this rule, these compilers use local syntactic analysis to discover expressions that are demonstrably pure and terminating. Local syntactic analysis must assume that calls to unknown functions may be impure and non-terminating. Still, this form of analysis can be quite effective, particularly if the compiler inlines functions enthusiastically. The other approach (used, e.g., by the ML Kit compiler 4) uses a sophisticated effect inference system 15 to track the latent effects of functions on a very detailed basis. The goals of this school are typically more far-reaching; the aim is to use effects information to provide more generous 1 Of course, the fact that a transformation is valid doesn't mean that applying it will necessarily improve the program. For example, (Beta) (oriented left-to-right) is not an improving transformation if el is expensive to compute and x appears many times in e2; similarly, (RecHoist) (oriented left-to-right) is not improving if f is not applied in e~.

99 polymorphic generalization rules (e.g., as in 21,16), or to perform significantly more sophisticated optimizations, such as automatic parallelization 6 or stackallocation of heap-like data 18. In support of these goals, effect inference has generally been used to track store effects at a fine-grained level. Our approach is essentially a simple monomorphic variant of effect inference applied to a wider variety of effects (including non-termination, exceptions, and IO), cast in monadic form, and intended to support transformational codemotion optimizations. We infer information about latent effects, but we do not attempt to calculate effects at a very fine level of granularity. In return, our inference system is particularly simple to state and implement. However, there is nothing fundamentally new about our system as compared with that of Talpin and Jouvelot 15, except our decision to use a monadic syntax and validate it using a typed monadic semantics. A practical advantage of the monadic syntax is that it makes it easy to reflect the results of the effect inference in the program itself, where they can be easily consulted (and kept up to date) by subsequent optimizations, rather than in an auxiliary data structure. An advantage of the monadic semantics is that it provides a natural foundation for probing and proving the correctness of transformations in the presence of a variety of effects. In related work, Wadler 20 has recently and independently shown that Talpin and Jouvelot's effect inference system can be applied in a monadic framework; he uses an untyped semantics, and considers only store effects. In another independent project, Benton and Kennedy are prototyping an ML compiler with an IR that describes effects using a monadic encoding similar to ours 3.

2

Source

Language

This section briefly describes an ML-like source language we use to explain our approach. The call-by-value source language is presented in Fig. 1. It is a simple, monomorphic variant of ML, expressed in A-normal form 5, which names the result of each computation and makes evaluation order completely explicit. The class c o n s t includes primitive functions as well as constants. The L e t construct is monomorphic; that is, Let (x, e l , e2) has the same semantics and typing properties as would App (Abs (x, e2), el) (were this legal A-normal form). The restriction to a monomorphic language is not essential (see Sect. 5). All functions are unary; primitives like P l u s take a two-element tuple as argument. For simplicity of presentation, we restrict L e t r e c to single functions. The types of constants are given in Fig. 2. Exceptions carry values of type Exn, which are nullary exception constructors. R a i s e takes an exception constructor; rather than providing a means for declaring such constructors, we assume an arbitrary pool of constructor constants. Handle catches all exceptions that are raised while evaluating its first argument and passes the associated exception value to its second argument, which must be a handler function expecting an Exn. The body of the handler function may or may not choose to reraise the exception depending on its value, which may be tested using EqExn.

100 datatype typ = Int

type varty = vat * typ

I Bool

datatype value = Vat of vat

Exn

Tup of typ l i s t -> of typ * typ

Const

datatype c o n s t = Integer of int True I F a l s e DivByZero I ... Plus I Minus Times Divide

of const

datatype exp = Val of value Abs of varty * exp App of value * value If of value * exp * exp Let of varty * exp * exp Letrec of varty * varty * exp * exp Tuple of value list Project of int * value R a i s e of value Handle of exp * value

EqInt I L t I n t EqBool I EqExn WriteInt

Fig. 1. Abstract syntax for source language (presented as ML datatype) Integer _ True,False DivByZero Plus,Minus,Times,Divide

:

Int

: Bool : Exn : TupInt, Int E q I n t , L t I n t : TupInt, Int EqBool : TupBool~Bool EqExn : TupExn, Exn WriteInt : Int -> Tup

-> Int -> Bool -> Bool -> Boo1

Fig. 2. Typings for constants in initial environment

The primitive function D i v i d e has the potential to raise a particular exception DivByZero. We supply W r i t e I n t as a paradigmatic state-altering primitive; internal side-effects such as ML reference manipulations would be handled similarly. All other primitives are pure and guaranteed to terminate. T h e semantics of the remainder of the language are completely ordinary.

3

Intermediate Representation with Monadic Types

Figure 3 shows the abstract syntax of our monadic intermediate representation (IR). (For an example of the code, look ahead to Fig. 11.) For the most part, terms are the same as in the source language, but with the addition of m o n a d annotations on L e t and Handle constructs and a new Up construct; these are described in detail below.

101

datatype monad = ID

LIFT

EXN J ST

datatype mtyp = M of monad * vtyp and vtyp = Int Bool

Exn Tup of vtyp l i s t -> of vtyp * mtyp type varty = vat * vtyp datatype value = Vat of vat Const of const datatype exp = Val of value Abs of varty * exp App of value * value If of value * exp * exp Let of monad * monad * varty * exp * exp Letrec of varty * varty * exp * exp Tuple of value list Project of int * value Raise of mtyp * value Handle of monad * exp * value Up of monad * monad * exp

Fig. 3. Abstract syntax for monadic typed intermediate representation

Integer _ True,False DivByZero Plus,Minus,Times Divide EqInt,LtInt SqBool

: Int : Bool : Exn : TupInt, Int -> M(ID,Int) : Tup lint, Int -> M (EXN, Int ) : TupInt, Int -> M(ID,Bool) : Tup Bool, Boo1 -> M(ID,Bool) EqExn : TupExn, Exn -> M(ID,Bool) WriteInt : Int -> M(ST,Tup)

F i g . 4. Monadic typings for constants in initial environment

102 Values have ordinary value types (vtyps); expressions have monadic types (mtyps), which incorporate a v t y p and a monad (possibly the identity monad, ID). Since this is a call-by-value language, the domain of each arrow types is a v t y p , but the codomain is an arbitrary mtyp. The monadic types for the constants are specified in Fig. 4. The typing rules are given in Fig. 5. In this figure, and throughout our discussion, t ranges value types, m over monads, v over values, c over constants, x , y , z , f over variables, and e over expressions. For this presentation, we use four monads arranged in a simple linear order. In order of "increasing effect," these are: ID, the identity monad, which describes pure, terminating computations. - LIFT, the lifting monad, which describes pure but potentially non-terminating computations. EXN, the monad of exceptions and lifting, which describes computations that may raise an (uncaught) exception, and are potentially non-terminating. - ST, the monad of state, exceptions, and lifting, which describes computations that may write to the "outside world," may raise an exception, and are potentially non-terminating. -

-

We write m l < m2 iff m t precedes rn2 on this list. Intuitively, m l < m2 implies that computations in m2 are "more effectfur' than those in m l ; they can provoke any of the effects in ml and then some. This particular hierarchy captures a number of distinctions that are useful for transforming ML programs. We discuss the extension of our approach to more elaborately stratified monadic structures in Sect. 6. More formally, suppose for each monad m we are given the standard operations unitm, which turns values into null computations in m, and bind, a, which composes computations in m, and that the usual monad laws hold: (Le~)

bind~ (unit~x) k = k x

(Right)

bind~ e unit~ = e

(Assoc)

bindm e (Ax.bindm (k x) h) = bindm(bindm e k) h

Moreover, suppose that for each value type t and monad m, A4~m(Tt) gives the domain of values of type M( m , t). Then ml < m2 implies that there exists an unique embedding Upm~_,m 2 which, for every value type t, maps A4~ml(Tt) to A4m2(Tt). The up functions, sometimes called monad morphisms or lifting functions 10, obey these laws: (Unit)

upm~__+~~ o unitm~ = unitm2

(Bind)

uPr~l__,m2(bind,~1 e k) = bindm2(UPml.~m2 e) (UPm~__,m2 o k )

The up functions can also be viewed as generalizations of unit operations, since, by (Unit), UPiD__,m = unit~. Fig. 6 gives semantic interpretations for types as

103

E(v) = t

E~% Var v : t Typeof(c) = t E ~-v Const c : t

El-,v:t E ~- Val v : M(ID,%)

E+

{x : t l } I- e : M(m2,t2)

Et-Abs(X:tl,e)

: M ( I D , t l -> M ( m 2 , t 2 ) )

E ~v Vl : tl -> M(m2,t2)

E ~-~ v2 : tl E ~ App(vl ,v2) : M(m2,t2)

E~v:Bo01

E~-el :M(m,t) EF-e2:M(m,t) E ~- I f ( v , e l , e 2 ) : M(rn,t)

E I - e l :M(ml,tl)

E+{x:tl}l-e2:M(m2,t2)

E }- L e t ( m l , m 2 , x

: tl,el,e2)

: M(m2,t2)

E+{f:to E+

-> M(ml,tl),X:to}t-el : M ( m l , t l ) {.f : to -> M ( m l , t l ) } I- e2 : M(~rft2,t2)

E F- L e t r e c ( f

(ml <_m2)

(LIFT < m l )

: to -> M(ml ,$1) , x : t o , e l ,e2) : M(m2 ,t2)

EF-.Vl:tl ... E~-~v~:t~ E F- T u p l e ( v l , . . . , v n ) : M ( I D , T u p t l , . . .

Et-,v:Tuptl,...,t, E ~ Project(i,v)

,in)

(l
E F-. v : Exn E ~- Raise(M(EXN,t) ,v) : M(EXN,t) EF-e:M(m,t)

E~-vv:Exn

-> M ( m , t )

(EXN_<m)

E I- Handle(m,e,v) : M ( m , t ) E F- e : M ( m l , t )

(ml ~

m2)

E ~- U p ( m l ,m2 ,e) : M(m2 ,t)

F i g . 5. T y p i n g rules for i n t e r m e d i a t e language

104 complete partial orders (CT~O's), and for our monads, together with the associated up and bind functions. Note that the following laws hold under these semantics: (Id) (Compose)

UPm_,m = id UPmo_~m 2 : Up~l__~m 2 o Upmo__~ml

(m 0 _~ m l _~ m2)

A typed semantics for terms is given in Figs. 7 and 8. Environments p map identifiers to values. This semantics is largely predictable. However, the Let construct now serves to make the composition of monadic computations explicit, and the tip construct makes monadic coercions explicit. Intuitively, Let(ml,m2, (x,tl) ,el,e2) evaluates el, which has monadic type M(ml , t ) , performing any associated effects, binds the resulting value to x : tl, and then evaluates e2, which has monadic type M(m2,t2). Thus, it essentially plays the role of the usual monadic bind operation; in particular, if ml = m2, the semantic interpretation of the above expression in environment p is just bindml ($elp)(Ay.Ce2px

:= y)

However, our typing rules (Fig. 5) require only that m2 > ml; i.e., e2 may be in a more effectful monad than el The semantics of a general "mixed-monad" L e t is bindm2 (UPrn,_~m 2 (Celip) )( Ay.Ce2px := y) The term Let (tip (ml ,m2 , e l ) ,m2, ( x , t ) ,el ,e2) has the same semantics, so the more general form of Let is strictly redundant. But this form is useful, because it makes it easier to state (and recognize left-hand sides for) many interesting transformations involving L e t whose validity depends on the monad ml rather than on m2. For example, a "non-monadic" Let, for which (Beta) is always valid, is simply one in which ml = ID. Further examples will be shown in Sect. 4. The semantics of the "non-proper morphism" Handle (e, v) deserve special attention. Expression e may be in either EXN or ST, and the meaning of Handle depends on which; the ST version must manipulate the state component. Note that there are two plausible ways to combine state with exceptions. In the semantics we have given (as in ML), handling an exception does not alter the state, but it would be equally reasonable to revert the state on handle. Incidentally, we don't have to give a semantics when e is in ID or LIFT, because the typing rule for Handle disallows these cases. Of course, such cases might appear in source code; to generate monadic IR for them, e can be coerced into EXN with an explicit tip, or the Handle can be omitted altogether in favor of e, which by its type cannot raise an exception! A R a i s e expression is handled similarly; the typing rules force it into monad EXN, so semantics need only be given for that case, but the whole expression may be coerced into ST by an explicit Up if necessary.

105

T : vtyp TInt TBool TExn TTuptl,... , t ~ TTup

M

:monad --~ C P O

-~ C ~ O -

Z

= Z = Z : Tit1 ---- 1

(0 r e p r e s e n t s false) x...

x

Tt~

(n > o)

--~ C ~ O

MIIDc

= c

~LIFTC

:

J~4EXSC J~STC

C•

= (Ok(c) + Fail(Z))• : S t a t e --+ ( ( O k ( c ) + F a i l ( Z ) )

• State)•

bindiD x k = k x bindLiFT x k = k a

_L bindExN x k = k a

Fail(b)• J_ bindsT x k s = k a s'

(fail(b), s')•

if x = a • ifx =_L if x = O k ( a ) • if x = F a i l ( b ) • if x = _l_ if x s -- ( O k ( a ) , s ' ) • if x s --- ( F a i l ( b ) , s ' ) •

.J_ ~m---+m

X

if x s =

_J_

~- X

UBID_~LIF T X =

X•

uPID-~EX~ x = Ok(x)• UPID_,S T X S = ( O k ( x ) , s ) • UPLIFT~XN X = O k ( a ) • _l_ UPLIF~_~S T x s = ( O k ( a ) , s ) • -J-

uPzxs_.s T x s ---- ( O k ( a ) , s ) • (Fail(b), S)z _l_

F i g . 6. S e m a n t i c s of t y p e s a n d m o n a d s

if x = a • i f x = _l_

if x = a • if x=

_l_

if x ---- O k ( a ) • if x ---- F a i l ( b ) • if x = _l_

106

v : (v~Zue : t) -* E ~ .

VIw

-* 7"ti

vlp = p(v)

( I n t e g e r i)p

)Const

YConst )Const

= i

Truep = 1 Falsep : 0

VConst P l u s p ...)Coast Dividep . . . 2Coast WriteIntp )Const

DivByZerop .

.

= = = =

plus divideby writeint divbyO

.

plus (al, as)

= az + as

divideby (al, as) = O k ( a i / a s ) • Fail( divbyO) • ~teint

State = Z a s = ( O k ( ) , a p p e n d ( s , a))• divbyO = 42

if a2 # 0 if as = 0 (sequence w r i t t e n out so far) (arbitrary fixed integer)

F i g . 7". Semantics of values

s

( e x p : MCm,t) ) ~ E n v --r A 4 l m l ( T I t ) EIv,,z v l p = V l v l p C|Abs (x, e)p = ~ y . e ~ l o ~ := u

s

= (v,,,Io) (vI,,~lp) = ,/(v,,lo) ( e ~ , l o ) (EI~,Ip) = ge2(p.f := fix(~f.)w.gell(pI : = f ' , x : = v))) = ( V l " ~ l p , . . . , VI,,.lo) = proj,(VMp) ERaise (M(EXN,t), v) p = (Fail()vp))• gHandle ( m , e, v) p = handlem(Eep)()vp) := y) CLet ( r n l , m 2 , x , e l , e2) p = bindm2 (up.~ 1_,.~2 (Eelp))(~y.Ee2px EUp ( m l ,rtt2 ,e)p = , , p , . l ~ , . ~ (Ei~lp) g I f (v,el ,es)p Letrec (f, X, e l , e2) p g Tuple (Vl . . . . . vn ) p E P r o j e c t (i, v) p

if v at af = at af p r o j , ( v l , . . . , v=) ---- vl handle~xN x h = O k ( a ) • ha _L handlesT x h s = ( O k ( a ) , s ' ) • has' l F i g . 8. Semantics of expressions

ifv#O ifv=O if x = O k ( a ) • if x = F a i l ( a ) • if x = & if x s = ( O k ( a ) , s ' ) • if x s = ( F a i l ( a ) , s ' ) • ifx s=_L

107 (LetLe~) (LetRight)

Let (m2 ,ms , x , U p ( m l ,m2 ,el) ,e2) ---- Let (ml ,ms , x , e l ,e2)

(ml < . ~ < ms) Let(ml,m2,x,e,Up(ID,m2,Val(Yax

(LetAssoc)

x))) =Up(ml,m2,e)

(-~1 < m . ) L e t ( m 2 , m s , x , L e t ( m l , m 2 , y , e l , e 2 ) , e3) = Let(ml,ma,y,el,Let(m2,ma,x,e2,e3))

(ml _< m2 < ms; y • FV(es)) (IdentUp) (ComposeUp)

Up(re,re,e)

= e

U p ( m l , m 3 ,e) ----Up(m2 ,ms, (Up(m1 ,m2 ,e)))

(ml _< m~ _< ms) Up(m2 , m 4 ,Let (rnl ,m2 ,x ,el ,e2) ) =

(LetUp)

L e t (m3 , m 4 , x , Up (rf~l , m 3 , e l ) , Up (m2 , m 4 , e 2 ) )

(ml < m2,-~s < m~) Fig. 9. Generalized monad laws

4

Transformation

Rules

In this section we attempt to motivate our IR, and in particular our choice of monads, by presenting a number of useful transformation laws. These laws can can be proved correct with respect to the denotational semantics of Sect. 3. The proofs are straightforward but tedious, so are omitted here. Of course, this is by no means a complete set of rules needed by an optimizer; there are many others, both general-purpose and specific to particular operators. Also, as noted earlier, not all valid transformations are improvements. Figure 9 gives general rules for manipulating monadic expressions. (LetLeft), (LetRight), and (LetAssoc) are generalizations of the usual (Left), (Pdght), and (Assoc) laws for a single monad, which can be recovered from these rules by setting ml -- ID and m2 = m3 in (LetLeft), setting m l = m2 in (LetRight), and setting m l -- m2 = m3 in (LetAssoc). (IdentUp) and (ComposeUp) are just the (Ident) and (Compose) laws stated in IR syntax; they let us do housekeeping on coercions. Law (Unit) is the special case of (ComposeUp) obtained by setting m l -- ID. (LetUp) permits us to move expressions with suitably weak effects in and out of coercions; (Bind) is the special case of (LetUp) obtained by setting rnl = m2 and m3 = m4, All these laws have variants involving L e t r e c , in which L e t r e c ( f , x , el, e2 ) : M( m , t) behaves just like Let ( ID, m , f , Abs ( x , el ) , e2 ) ; we omit the details of these. Figure 10 lists some valid laws for altering execution order. We have full beta reduction for variables bound in the ID monad (BetaID). In general, the order of two bindings can be exchanged if there is no data dependence between them, and if either of them is in the ID monad (ExchangeID) or both are in or below the LIFT monad (ExchangeLIFT). The intuition for the latter rule is that

108 it harmless to reorder two expressions even if one or both may not terminate, because we cannot detect which one causes the non-termination. On the other hand, there is no similar rule for the EXN monad, because we can distinguish different raised exceptions according to the constructor value they carry. This is the principal difference between LIFT and EXN for the purposes of code motion. Rule (RecHoistID) states that it always valid to lift a pure expression out of a L e t r e c (if no data dependence is violated). (RecHoistEXN) reflects a much stronger property: it is valid to lift a non-terminating or exception-raising expression of a L e t r e c if the recursive function is guaranteed to be executed at least once. This is the principal advantage of distinguishing EXN from the more general ST monad, for which the transform is not valid. Although the left-hand side of (RecHoistEXN) may seem a crude way to characterize functions guaranteed to be called at least once, and unlikely to appear in practice, it arises naturally if we systematically introduce loop headers for recursions 2, according to the following law:

Letrec(f,x,el,e2) :M(m,t) = (Hdr)

Let (ID, m , f , Abs (z, L e t r e c ( / ' , x , el f ' / f ,

hpp ( f ' , z) ) ) , e2)

(f' r FY(el); 1' r z) (HandleHoistExn) says that an expression that cannot raise an exception can always be hoisted out of a Handle. Finally, (IfI-IoistID), (ThenHoistID), and (AbsHoistID) show the flexibility with which ID expressions can be manipulated; these are more likely to be useful when oriented right-to-left ("hoisting down" into conditionally executed code). As before, all these rules have variants involving L e t r e c in place of Let (ID . . . . ), which we omit here. As a (rather artificial) example of the power of these transformations, consider the code in Fig. 11. The computation of w is invariant, so we would like to hoist it above recursive function r. Because the binding for w is marked as pure and terminating, it can be lifted out of the i f using (IfHoistID), and can then be exchanged with the pure bindings for s and t using (ExchangeID). This positions it to be lifted out of r using (RecHoistID). Note that the monad annotations tell us that w is pure and terminating even though it invokes the unknown function g, which is actually bound to h. The example also exposes the limitations of monomorphic effects: if f were also applied to an impure function, then g and hence w would be marked as impure, and the binding for w would not be hoistable. In practice, it might be desirable to clone separate copies of f, specialized according to the effectfulness of their g argument. Worse yet, consider a function that is naturally parametric in its effect, such as map. Such a function will always be pessimistically annotated with an effect reflecting the most-effectful function passed to it within the program. The obvious solution is to give functions like map a generic type abstracted over a monad variable, analogous to an effect variable in the system of Talpin and Jouvelot 15. We believe our system can be extended to handle such generic types, but we have not examined the semantic issues involved in detail.

109

Let ( I D , m , x , e l ,e2)

(BetaID)

=

e2el/x

Let ( m l , m 3 , x l ,el , L e t ( m 2 , m 3 , x 2 , e 2 ,e3)) = Let (m2 ,m3 ,x2 ,e2 , L e t ( m l ,m3 , x l ,el ,e3) ) (ml = IDor m2 = ID; Xl r BY(e2); x2 r B Y ( e l ) )

(ExchangelD)

Let ( m l , m 3 , x l ,el ,Let (m2 ,m3 ,x2 ,e2 ,e3)) = L e t ( m 2 , m a , x 2 , e 2 , L e t ( m l , m 3 , x l ,el ,e3)) ( m l , m2 _< LIFT; Xl 9~ BY(e2); x2 {L B Y ( e l ) )

(ExchangeLIFT)

(RecHoistlD)

L e t r e c ( f , x ,Let (ID ,m2 , y , e l ,e2) ,e3) :M(m3 ,t) = Let (ID, m3, y, e 1, L e t r e c (st , x, e2, ea) ) ( l , x ~ B Y ( e l ) ; Y r FY(e3))

(RecHoistEXN)

L e t r e c ( / , x , L e t ( m l , m 2 , y, e l , e2),App ( / , v) ) = Let ( m l , m 2 , y , e l , L e t r e c ( f , x , e2, hpp ( , v) ) ) (ml < EXN;f, x r F Y ( e l ) ; y # v)

(HandleHoistEXN)

H a n d l e ( m 2 , L e t ( m l ,m2 , x , e l ,e2) ,v) Let ( m l ,m2 , x , e l ,Handle (m2 ,e2 , v ) )

(IfHoistlD)

If(v,Let(ID,m,x,el,e2),e3) Let(ID,m,x,el,If(v,e2,e3)) (x C FV(e3);x # v)

(ThenHoistID)

If(v,el,Let(ID,m,x,e2,e3)) = Let(ID,m,x,e2,If(V,el,e3)) (x r B Y ( e l ) ; X # V)

=

=

(AbsHoistID)

hbs(x : t,Let(ID,m,y,el,e2) ) = L e t ( I D , I D , y , e l , A b s ( x : t,e2))

(x • FV(el); y # x) F i g . 10. Code motion laws for monadic expressions

110 let f:(Int -> M(ID,Int * Int)) -> M(ST,Int) fn (g:Int->M(ID,Int * Int)) => letrer r (x:Int) : M(ST,Int) = letID t:Int * Int = (x,1) in letID s:Bool = EqInt(t)

=

in if s then

Up(ID,ST,O) else letID w:Int * Int = g(3) in letID y:Int = Plus(w) in letID z:Int * int = (x,y) in letEXN x':Int = Divide(z) in letST dummy:() = WriteInt(x')

in r ( x ' ) i n r(lO) in let h:Int->M(ID,Int

* Int) = fn (p:Int)

=>

(p,p)

i n f (h)

Fig. 11. Example of intermediate code, presented in an obvious concrete analogue of the abstract syntax

5

Monad

Inference

It would be possible to translate source programs into type-correct IR programs by simply assuming that e v e r y expression falls into the maximally-effectful monad (ST in our case). Every source Let would become a LetST, every variable and constant would be coerced into ST, and every primitive would return a value in ST. Peyton Jones et al. 11 suggest performing such a translation, and then using the monad laws (analogous to those in Fig. 9) and the worker-wrapper transform 12 to simplify the result, hopefully resulting in some less-effectful expression bindings. The main objection to this approach is that it doesn't allow calls to unknown functions (for which worker-wrapper doesn't apply) to return non-ST results. For example, in the code of Fig. 11, no local syntactic analysis could discover that argument function g is pure and terminating. To obtain better control over effects, we have developed an inference algorithm for computing the minimal monadic effect of each subexpression in a program. Pure, provably terminating expressions ave placed in ID, pure but potentially non-terminating expressions in LIFT, and so forth. The algorithm deals with the latent monadic effects in functions, by recording them in the result types. As an example, it produces the annotations shown in Fig. 11. The input to the algorithm is an typed program in the source language; the output is a program in the monadically typed IR. The term translation is essentially trivial, since the source and target have identical term structure, except for the possible need for Up terms in the target. Consider, for example, the source term I f (x,Val y , R a i s e z). Since Val y is a value, its translation is in the TD monad, whereas the translation of Raise z must be in the EXN or ST

111

EF-el =~ e~:M(ml,t) E~-e2 =~ e~:M(ml,t) (ml E ~ - I f ( V , e l , e 2 ) : t =~ Up(ml,rn2,If(v,e~,e~)) :M(rn2,t)

EF-.v:Bool

<m2)

E t- el =~ e~ : M ( m l , t l ) E -~ {x : t l } J- e2 ~ e !2:M(m2,t2) (ml
EbRaise(t,v):t

E ~ - , v : E x n (EXN<m) =~ Up(EXN,m,Raise(M(EXN,t),v)) :M(m,t)

Fig. 12. Selected translation rules

monad. To glue together these subterm translations we must insert a coercion around the translation of the Val term. Up terms serve exactly this purpose; they add the necessary flexibility to the system to permit all monad constraints to be met. Such a coercion is potentially needed around each subterm in the program. To develop a deterministic, syntax-directed, translation, we turn each typing rule in Fig. 5 (exceptUp) into a translation rule, simply by recording the inferred type and monad information in the appropriate annotation slots of the output, combining the translations of subterms in the obvious manner, and wrapping an Up term around the result. As examples, Fig. 12 shows the translation rules corresponding to the typing rules for I f , Let, and Raise. Each free type and monad in the translated typed term is initially set to a fresh variable; the translation algorithm generates a set of constraints relating these variables just as in an ordinary type inference algorithm. We discuss the solution of these constraints below. As specified here, the translation is profligate in its introduction of Up coercion terms, most of which will prove (after constraint resolution) to be unnecessary identity coercions. We use a postprocessing step to remove unneeded coercions using the (IdentUp) rule. The translation algorithm generates constraints between types and between monads. Type constraints can be solved using ordinary unification, except that unifying the codomain mtyps of two arrow types requires that their monad components be equated as well as their v t y p components. The interesting question is how to record and resolve constraints on the monad variables. Such constraints are introduced explicitly by the side conditions in the Let, L e t r e c , and Up rules, implicitly by the equating of monads from subexpressions in the I f and Handle rules, and (even more) implicitly as a result of ordinary unification of arrow types, which mention monads in their codomains. The side-condition constraints are all inequalities of the form ml > m2, where ml is a monad variable and m2 is a variable or an explicit monad. The implicit constraints are all equalities ml -- m2; for uniformity, we replace these by a pair of inequalities: ml _> m2 and m2 > m l . We collect constraints as a side-effect of the translation process, simply by adding them to a global list. It is very common for there to be circularities among the monad constraints. To solve the constraint system, we view it as a directed graph with a node for each

112 monad and monad variable, and an edge from ml to m2 for each constraint m l > m2. We then partition the graph into its strongly connected components, and sort the components into reverse topological order. We process one component at a time, in this order. Since > is anti-symmetric, all the nodes in a given component must be assigned the same monad; once this has been determined, it is assigned to all the variables in the component before proceeding to the next component. To determine the minimum possible correct assignment for a component, we consult all the edges from nodes in that component to nodes outside the component; because of the order of processing, these nodes must already have received a monad assignment. The maximum of these assignments is the minimum correct assignment for this component. If there are no such edges, the minimum correct assignment is ID. This algorithm is linear in the number of constraints, and hence in the size of the source program. To summarize, we perform monad inference by first translating the source program into a form padded with coercion operators and annotated with monad variables, meanwhile collecting constraints on these variables, and then solving the resulting constraint system to fill in the variables in the translated program. The resulting program will contain many null coercions of the form Up ( m , m , e); these can be removed by a single postprocessing pass. Our algorithm is very similar to a that of Talpin and Jouvelot 15, restricted to a monomorphic source language. Both algorithms generate essentially the same sets of constraints. Talpin and Jouvelot solve the effect constraints using an extended form of unification rather than by a separate mechanism. It would be natural to extend our algorithm to handle Hindley-Milner polymorphism for both types and monads in the Talpin-Jouvelot style. The idea is to generalize all free type and effect variables in l e t definitions and allow different uses of the bound identifier to instantiate these in different ways. In particular, parametric functions like map could be used with many different monads, without one use "polluting" the others. Functions not wholly parametric in their effects would place a minimum effect bound on permissible instantiations for monad variables. Supporting this form of monad polymorphism seems desirable even if there is no type polymorphism (e.g., because the program has already been explicitly monomorphized 19). In whole-program compilation of a monad-polymorphic program, the complete set of effect instantiations for each polymorphic definition would be known. This set could be used to put an upper effect bound on monad variables within the definition body and hence determine what transformations are legal there. Alternatively, it could be used to guide the generation of effect-specific clones as suggested in the previous section. In a separate-compilation setting, monad polymorphism in a library definition would still be useful for client code, but not for the library code: in the absence of complete information about uses of a definition, any variable monad in the body of the definition would need to be treated as ST, the most "effectful" monad, for the purposes of performing transformations within the body.

113 6

Extending

the

Monad

Hierarchy

Our basic approach is not restricted to the linearly-ordered set of monads presented in Sect. 3. It extends naturally to any collection of monads and up embedding operations that form a lattice, with ID as the lattice b o t t o m element. It is clearly reasonable to require a partial order; this is equivalent to requiring that (Ident) and (Compose) hold. From the partial order requirement, the distinguished role for ID, and the assumption that each monad obeys (Left), (Right), and (Assoc), and each up operation obeys (Unit) and (Bind), we can prove the laws of Fig. 9. (The validity of the laws in Fig. 10 naturally depends on the specific semantics of the monads involved.) By also insisting that any two monads in the collection have a least upper bound under embedding, we guarantee that any two arbitrary expressions (e.g., the two arms of an i f ) can be coerced into a (unique) common monad, and hence that the monad inference mechanism of Sect. 5 will work. One might be tempted to describe such a lattice by specifying a set of "primitive" monads encapsulating individual effects, and then assuming the existence of arbitrary "union" monads representing combinations of effects. As the Handle discussion in Sect. 3 indicates, however, there is often more than one way to combine two effects, so it makes no sense to talk in a general way about the "union" of two monads. Instead, it appears necessary to specify explicitly, for every monad m in the lattice, -

a semantic interpretation for m; a definition for bindm; a definition of Upm_+m, for each m < m~; 2 for each non-proper morphism NP introduced in m, a definition of npm, for every m' _> m.

The lack of a generic mechanism for combining monads since it turns the proofs of many transformation laws into We conjecture that restricting attention to up operations monad transformers 10 might help organize such proofs 7

Status

is rather unfortunate, lengthy case analyses. that represent natural into simpler form.

and Conclusions

We believe our approach to inferring and recording effects shows promise in its simplicity and its semantic clarity. It remains to be seen whether effects information of the kind described here can be used to improve the performance of ML code in any significant way. To answer this question, we have extended the IR described here to a version that supports full Standard ML; we have implemented the monad inference algorithm for this version, and are currently measuring its effectiveness using the backend of our RML compiler system 19. 2 Since the (Ident) and (Compose) laws must hold in a partial order, it suffices to define upm_.m, for just enough choices of m, m ~ to guarantee the existence of least upper bounds, since these definitions will imply the definition for other pairs of monads.

114

Acknowledgements We have benefitted from conversations with John Launchbury and Dick Kieburtz, and from exposure to the ideas in their unpublished papers 7, 8. The comments of the anonymous referees also motivated us to clarify the relationship of our algorithm with the existing work of Talpin and Jouvelot. Phil Wadler m a d e helpful c o m m m e n t s on an earlier draft.

References 1. A. Appel. Compiling with Continuations. Cambridge University Press, 1992. 2. A. Appel. Loop headers in A-calculus or CPS. Lisp and Symbolic Computation, 7(4):337-343, 1994. 3. N. Benton, July 1997. Personal communication. 4. L. Birkedal, M. Tofte, and M. Vejlstrup. From region inference to yon Neumann machines via region representation inference. In 23rd A CM Symposium on Principles of Programming Languages (POPL'96), pages 171-183. ACM Press, 1996. 5. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. Proc. SIGPLAN Conference on Programming Language Design and Implementation, 28(6):237-247, June 1993. 6. D. Gifford, P. Jouvelot, J. Lucassen, and M. Sheldon. FX-87 REFERENCE MANUAL. Technical Report MIT-LCS//MIT/LCS/TR-407, Massachusetts Institute of Technology, Laboratory for Computer Science, Sept. 1987. 7. R. Kieburtz and J. Launchbury. Encapsulated effects. (unpublished manuscript), Oct. 1995. 8. R. Kieburtz and J. Launchbury. Towards algebras of encapsulated effects. (unpublished manuscript), 1997. 9. J. Launchbury and S. Peyton Jones. State in Haskell. Lisp and Symbolic Computation, pages 293-351, Dec. 1995. 10. S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In 22nd ACM Symposium on Principles of Programming Languages (POPL '95), Jan. 1995. 11. S. Peyton Jones, J. Launchbury, M. Shields, and A. Tolmach. Bridging the gulf: a common intermediate language for ml and haskel. In P5th ACM Symposium on Principles of Programming Languages (POPL'98), pages 49-61, San Diego, Jan 1998. 12. S. Peyton Jones and J. Launchbury. Unboxed values as first class citizens. In Proc. Functional Programming Languages and Computer Architecture (FPCA '91), pages 636-666, Sept. 191. 13. S. Peyton Jones and P. Wadler. Imperative functional programming. In 20th A CM Symposium on Principles of Programming Languages (POPL'93), pages 7184, Jan. 1993. 14. S. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Proceedings of ESOP'96, volume 1058 of Lecture Notes in Computer Science, pages 18-44. Springer Verlag, 1996. 15. J.-P. Talpin and P. Jouvelot. Polymorphic type, region and effect inference. Journal of Functional Programming, 2:245-271, 1992. 16. J.-P. Talpin and P. Jouvelot. The type and effect discipline. Information and Computation, 111(2):245-296, June 1994.

115 17. D. Tarditi. Design and Implementation of Code Optimizations for a Tgpe-Directed Compiler for Standard ML. PhD thesis, Carnegie Mellon University, Dec. 1996. Technical Report CMU-CS-97-108. 18. M. Tofte and J.-P. Talpin. Region-based memory management. Information and Computation, 132(2):109-176, 1 Feb. 1997. 19. A. Tolmach and D. Oliva. From ML to Ada: Strongly-typed language interoperability via source translation. Journal of Functional Programming, 1998. (to appear). 20. P. Wadler. The marriage of effects and monads. (unpublished manuscript), Mar. 1998. 21. A. Wright. Typing references by effect inference. In Proc. 4th European Symposium on Programming (ESOP '9~), volume 582 of Lecture Notes in Computer Science, Feb. 1992.

Type-Directed Continuation Allocation* Zhong Shao and Valery Trifonov Dept. of Computer Science Yale University New Haven, CT 06520-8285

{shao, tri fonov}@cs, yale. edu

Abstract. Suppose we translate two different source languages, L1 and L2, into

the same intermediate language; can they safely interoperate in the same address space and under the same runtime system? If L1 supports first-class continuations (call/cc) and L2 does not, can L~ programs call arbitrary L1 functions? Would the fact of possibly calling L1 impose restrictions on the implementation strategy of L2? Can we compile L~ functions that do not invoke call/cc using more efficient techniques borrowed from the L2 implementation? Our view is that the implementation of a common intermediate language ought to support the so-called pay-as-you-go efficiency: first-order monomorphic functions should be compiled as efficiently as in C and assembly languages, even though they may be passed to arbitrary polymorphic functions that support advanced control primitives (e.g. call/cc). In this paper, we present a typed intermediate language with effect and resource annotations, ensuring the safety of inter-language calls while allowing the compiler to choose continuation allocation strategies.

1

Introduction

Safe interoperability requires resolving a host of issues including mixed data representations, multiple function calling conventions, and different implementation protocols. Existing approaches to language interoperability either separate code written in different languages into different address spaces or have the unsafe, ad hoc and insecure foreign function call interface. We position our further discussion of language interoperability in the context of a system hosting multiple languages, each safe in isolation. The supported languages may range from first-order monomorphic (e.g. a safe subset of C, or safe-C for short) to higher-order languages with advanced control, e.g. ML with first-class continuations. We assume that all languages have type systems which ensure runtime safety o f accepted programs. In other words, in this paper we do not attempt to solve the problem of cooperating safely with programs written in unsafe languages, which in general can * This research was sponsored in part by the DARPA ITO under the title "Software Evolution using HOT Language Technology", DARPA Order No. D888, issued under Contract No. F30602-96-2-0232, and in part by an NSF CAREER Award CCR-9501624, and NSF Grant CCR-9633390. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government.

117 only be achieved at the expense of "sandboxing" the unsafe calls or complex and incomplete analyses of the unsafe code. We believe that interoperability requires a serious and more formal treatment. As a first step, this paper describes a novel type-based technique to support principled language interoperation among languages with different protocols for allocation of activation records. Our framework allows programs written in multiple languages with overlapping features to interact with each other safely and reliably, yet without restricting the expressiveness of each language. An interoperability scheme for activation record allocation should be safe: it should not be possible to violate the runtime safety of a language by calling a foreign function; - expressive: the scheme should allow inter-language function calls; - efficient: a language implementation should not be forced to use suboptimal methods for its own features in order to provide support for other languages' features. For instance a language that does not use call/cc should not have to be implemented using heap-based allocation of activation records. -

Our solution is to ensure safety by using a common typed intermediate language 22 into which all of the source languages are translated. To maintain safety in an expressive interoperability scheme the type system is extended with annotations of the effects of the evaluation of a term, e.g. an invocation of call/cc, and polymorphic types with effect variables, allowing a higher-order function to be invoked with arguments coming from languages with different sets of effects. The central novelty of our approach is the introduction of annotations of the resources necessary for the realization of the effects of an evaluation; for instance a continuation heap may be required when invoking call/cc. Thus our type system can be used to support implementation efficiency by keeping track of the available language-dependent resources, and safety by allowing semantically correct inter-language function calls but banning semantically incorrect ones. In addition to providing safety, making resource handling explicit also opens new opportunities for code optimization beyond what a foreign function call mechanism can offer. A common intermediate language like FLINT 21, 22 will likely support a very rich set of features to accommodate multiple source languages. Some of these features may impose implementation restrictions; for example, a practical implementation of first-class continuations (as in SML/NJ or Scheme) often requires the use of advanced stack representations 8 or heap-based activation records 20. However in some cases stack-based allocation may be more efficient, and ideally we would like to have a compiler that can take advantage of it as long as this does not interfere with the semantic correctness of first-class continuations. Similarly, when compiling a simple safe-C-like language with no advanced control primitives (e.g,, call/cc) into FLINT, we may prefer to compile it to code that uses the simple sequential stack of standard C; programs written in ML or Scheme using these safe-C functions must then follow the same allocation strategy when invoking them. This corresponds to the typical case of writing low-level systems modules in C and providing for their use in other languages, therefore we assume this model in the sequel, but the dual problem of compiling safe-C functions

118 calling arbitrary ML functions by selectively imposing heap allocation on safe-C is similarly represented and solved within our system. Thus our goal is efficient and expressive interoperability between code fragments written in languages using possibly different allocation disciplines for activation records, for instance, ML with heap allocation and safe-C with stack allocation. The following properties of the interoperability framework are essential for achieving this goal: - ML and safe-C code should interoperate safely with each other within the same address space. - All invocations of safe-C functions in ML functions should be allowed (provided they are otherwise type-correct). Only the invocations of ML functions that do not capture continuations should be allowed in safe-C functions. - Any activation record that can potentially be captured as part of a first-class continuation should always be allocated on the heap (or using some fancy stack-chunkbased representations 8). - It should be possible to use stack allocation for activation records of ML functions when they are guaranteed not to be captured with a first-class continuation. - The selection of allocation strategy should be decoupled from the actual function call. -

The last property gives the compiler the freedom to switch allocation strategies more efficiently, instead of following a fixed foreign function interface mechanism. For example, an implementation of ML may use heap allocation of activation records by default to provide support for continuation capture. However, in cases when the compiler can prove that a function's activation record is not going to be accessible from any captured continuation, its allocation discipline is ambiguous; stack allocation may be preferred if the function invokes, or is invoked by, safe-C functions which use stack allocation. This specialization of code to a different allocation strategy effectively creates regions of ML code compiled in "safe-C mode" with the aim of avoiding the switch between heap and stack allocation on every cross-language call. In general, the separation of the selection of allocation strategy from the call allows its treatment as a commodity primitive operation and subjects it to other code-motion optimizations, e.g. hoisting it out of loops. The proposed method can be applied to achieving more efficient interoperability with existing foreign code as well, although obviously in this case the usual friction between safety and efficiency can only be eased but not removed. In particular the possibility to select the allocation strategy switch point remains, thus higher efficiency can still be achieved while satisfying a given safety policy by specializing safe code to "unsafe mode" (e.g. for running with stack allocation within a sand-box).

2

A Resourceful Intermediate Language

To satisfy the requirements for efficient interoperability, outlined in the previous section, we define an A-normal-form-based typed intermediate language RL (Figure 1)

119 with types having effect and resource annotations. Intuitively, an effect annotation such as CC indicates that a computation m a y capture a continuation by performing call/cc; a resource annotation such as H (continuation heap) or S (continuation stack) means that the corresponding runtime resource must be available to the computation.l Nontrivial effects can be primitive, effect variables, or unions o f effects; commutativity and associativity o f the union with 0 as a unit are consistent with the typing rules and we assume them for brevity of notation. Each effect can only occur when the proper resources are available, e.g. CC would require the use o f heap-based activation record allocation. Both the effect and resource usage annotations are inferred during the translation from the source language to the intermediate language, and can be used to assist code generation and to check the validity o f cross-language function calls.

RESOURCES

stack continuation allocation heap continuation allocation

T ~:~

IH EFFECTS

none call with current continuation effect variable, t E E f f V a r union of effects

t~::= 0 I r162 It

I~v~,

TYPES Typ ~ ( r : : = f~

where/3 E B a s i c T y p resource/effect-annotated function type

a rcont

resource-annotated continuation type

I Vt
bounded effect-polymorphic type

VALUES AND TERMS

v ::= c x I ,V x : a . e I At < r. v I x~" e ::= let r x = e in e I (v) r (e)~ use r (e) Qx x callcc x throwa

constant c E Const variable x E Var resource-annotated abstraction bounded effect abstraction effect application resource-annotated binding resource-annotated value adding spurious effects resource selection application xx first class continuations

Fig. 1. Syntax of a resource-aware intermediate language RL

The resources required and effects produced by a function are made explicit in its type. A continuation can potentially produce all effects possible with the set o f resources available at the point o f its capture; for that reason continuation types only have a resource annotation. 1 In this paper, we focus on application of this system to interoperability issues related to continuation allocation, but more diverse sets of resources will be necessary in a realistic language.

120 Function abstractions are annotated with the resources they may require and will maintain. In a higher-order language the effect of the evaluation of a function application may depend on the effects of its functional arguments; this dependence is expressed by means of effect polymorphism. Polymorphic abstractions introduce variables ranging over the set of possible effects of the term. Since the possible effects are determined by the available resources, we have bounded effect polymorphism; the relation # < r (defined in the context of an effect environment in Figure 3) reflects the dependence between effects and resources, e.g. that callcc can only be performed if continuations are heap-allocated. The effect application x# instantiates the body of the polymorphic abstraction to which x is bound. The language construct use r (e) serves to mark the point where a change in the allocation strategy for activation records is required. Instead of having effect subsumption the language is equipped with a construct (e)~, for explicitly increasing the set of effects of e to include #.

Example 1. The use of resource annotations to select allocation strategies is shown in the RL code below which includes extra type annotations for clarity. let H

applyTolnt

= (At _< H. ,,~H f: Int --~ Int. @f 42) H t : V t < H. (Int ~ Int) ~ Int -

addl_CC

t

t

-- (~H x:lnt. let H c = (,~H k: Int Hcont. let H z = @succ x in throwInt in c a l l c c

addl_Pure

k Z) H

c) H

: l n t - ~ Int cc = (~s x:lnt. ~ succ x} H : l n t - ~ Int

addl_Wrapped = {~H x:lnt, use s (~ addl_Pure x)) H : Int -~ Int 0

in

9 (applyTolntCC) @(applyTolnt(~)

addl_CC ; addl_Wrapped

The function applyTolnt is polymorphic in the effect of its parameter, but the parameter's resource requirements are fixed - it must use heap allocation. We consider two applications of applyTolnt. The argument in the first, addl_CC, is a function invoking callcc, which consequently uses heap allocation; on the other hand the argument in the second application, addl_Pure, is pure and uses stack allocation. It is therefore incorrect to apply applyTolnt to addl_Pure. We use a wrapper to coerce it to the proper type:

121 we apply applyTolnt to addl_Wrapped whose activation record is heap-allocated, and whose function is to switch to stack allocation (via use s) before calling addl_Pure. Heap allocation is resumed upon return from addl_Pure.

3

Two Source Languages

To further illustrate the advantages of this system we consider the problem of translating into R L two source languages (Figure 2): a language H L with control operators (callcc and throw), implemented using heap-based allocation of activation records, and a language SL which always uses stack allocation. H L also allows declaring at the top of a program the identifiers of entities imported from SL code. The type systems of these languages are assumed monomorphic for simplicity, since polymorphism in types is largely orthogonal to the effect polymorphism of RL.

SL TYPES SL TERMS

rsL ::= /3 I ~'sL--r rs~ esL::= c I z I ,~z:rsL. esL l esLesL I l e t z = e s L i n e s L r,L : : = /~ I r , , - + r.~ I r.~ c o n t

HL TYPES HL TERMS HL PROGRAMS

callcc euL I t h r o w r . L e.L e.L pilL:: = enL I external(SL) x :rsL inpnL

Fig. 2. Syntax of the source languages SL and HL

The resource annotations in RL provide information about handling of the stack and heap resources, necessary in the following situations: - when calling from H L a function written in SL, which may require switching from heap allocation of activation records to allocation on the stack used by SL; the heap resource must be preserved for use upon return from SL code. - when calling an H L function from SL code, which is only semantically sound when the evaluation of the function does not capture a continuation, since part of the continuation data is stack-allocated; the type system maintains information about the possible effects of the evaluation, in this case whether calico might be invoked. - when selecting an allocation strategy for H L functions called (directly or indirectly) from within SL code; either their activation records must be allocated on the SL stack, or the latter must be preserved and restored upon return to SL. - when selecting an allocation strategy for H L code invoking SL functions but not callcc, in order to optimize resource handling.

Example 2. Consider a program consisting of a main fragment in H L invoking the external SL function applyTolnt with the H L function add1 as an argument; the call is meaningful because add1 does not invoke callcc. Only the SL type of the external function is given to the H L program which is separately compiled without access to the detailed effect annotations inferred from the code of the SL fragment.

122 SL fragment applyTolnt:

)ff : Int -~ Int. succ (f 42) The result of its separate compilation into RL, which uses stack allocation (for details of the translation we refer the reader to Section 5) is

applyTolnt = At_< S. A s f: Int -~ Int. let s x = 9 f 42 in 9 succ x t : V t < S . (Int -~ Int) ~+ Int -

t

t

HL fragment main: external(SL) applyTolnt : (Int -+ Int) --+ Int i n let add1 = Ax : Int. succ x i n applyTolnt add1

The result of its separate compilation into RL is

main ---- A H applyTolnt:Vt ~ S. (Int s Int) _s Int. t let H applyTolnt_H = (At <_S. A H f: Int -~ Int. t let H f_S = I)~s x:Int, use H (9 f X)) H in use s (9 (applyTolntt)

add1

f_S)) H

: Vt<_S. (Int -~ Int) -~ Int t = (•H x:Int. 9 succ x) H : Int -~ Int 0

in 9 applyTolnt_He

add1

: ( V t <- S" (lnt -~t lnt) -~ ~ lnt) -~ 0 Int The translation infers polymorphic effect types using a simplified version 2 of standard effect inference 23. The resource annotations are fixed by the source language; the type of an external SL function in an HL program is annotated with the SL resources. In the code produced after translation the external functions are coerced to match the resources of HL using automatically generated wrappers. In the above code, the parameter f of applyTolnt_H is wrapped to f_S before passing it to applyTolnt; the function of the wrapper is to switch from the stack allocation discipline used by SL to heap allocation before invoking the code for f, and resume stack allocation upon return. Dually, the call to applyTolnt itself is wrapped to enable stack allocation inside HL code. 2 As presented here our system does not keep track of regions associated with effects.

123 Since the full RL type of the SL fragment is not available to it, the effect inference must conseratively approximate the effects of the SL functions. It treats the external applyTolnt in the HL fragment as an effect-polymorphic parameter in order to allow its invocations with arguments with different effects. The price we pay for inference with this polymorphism in the case of separate compilation is that we assume that the effects of these invocations are the maximal allowed with the resources shared between the languages (in Example 2 we lose no precision since SL has no effects, but the approximation is reflected in the effect annotation 13of the type of the parameter of main). The following code, constructed mechanically given the inferred and expected types of applyTolnt, coerces the actual type of applyTolnt to the approximation used in the typing of main and performs the top-level application, thus linking the modules.

let H applyTolnt_Glue = (At_< S. A s f: Int _~s Int. (9 applyTolntt t

00> H

: V t < S . (Int s Int) s Int -

t

0

in 9 main applyTolnt_Glue More precise inference of the resulting effects is possible when the external function is a pre-compiled library routine whose RL type (with its precise effect annotations) is available when compiling main. In those cases we can take advantage of the letpolymorphism in inferring a type of main (in a setting similar to that of Example 1). However even the approximated effects obtained during separate compilation carry information that can be exploited for the optimization of inter-language calls, observing that the range of effects of a function is limited by the resources of its source language. In Example 2, after inlining and applying results of Section 4.4 (Theorem 2), the code for main can be optimized to eliminate the unnecessary switch to heap allocation in the instance of f_S. This yields

main = (A H applyTolnt:Vt <S._ (Int t

Int) s0 Int.

let H

addl

= (A H x:lnt. @succ x~ H

(* now dead code*)

addl_S = (A s x:lnt. 9 succ x) H in use s (9 (applyTolnt0)

addl_S)) H

Thus the HL function addl has been effectively specialized for the stack allocation strategy used by SL. Example3. Another optimization is merging of regions with the same resource requirements, illustrated on the following HL code fragment.

external(SL) intFn : Int ~ Int in intFn (intFn 42) which is naively translated to the RL function (shown after inlining of the parameter wrapper)

124 At_< S. A H intFn :Int s Int. t let H x = (use s (9 intFn 42)) H in use s (9 intFn x) After combining the two use s (-) constructs the equivalent RL term is At < S. AH intFn :lnt 2> Int. t use s ( let s x = (9 intFn 42) s in 9 intFn x) A generalization of this transformation makes possible lifting o f use r (.) constructs out of a loop when the resources r are sufficient for all effects of the loop. Since in general a resource wrapper must restore resources upon return, a tail call moved into its scope effectively becomes non-tail; thus lifting a wrapper's scope over a recursive tail call is only useful when the wrapper is lifted out of the enclosing function as well, i.e. out o f the loop.

4

Semantics o f RL

4.1 Static Semantics Correctness of resource use is ensured by the type system shown in Figure 3, which keeps track of the resources necessary for the evaluation of a term and a conservative estimate of the effects of the evaluation. An effect environment A specifies the resource bounds of effect variables introduced by effect abstractions and effect-polymorphic types. The rules for effect sequents reflect the dependence of effects on resources (in this language this boils down to the dependence of the call/cc effect CC on the heap allocation resource H) and form the basis of effect polymorphism. The function MazEff yields the maximal effect possible with a given resource; in this system we have MazEff(S) = 0 and MazEff(H) = CC. Rule (Eft-max) effectively states that the resource r ~ can be used instead of resource r if r ~ provides for all effects possible under r. In the sequents assigning types to values and terms the type environment _P maps free variables to types. Type judgments for values associate with a value v and a pair of environments A and _P only a type ~r, since values have no effects and therefore their evaluation requires no resources of the kind we control. The function 0 maps constants to their predefined types. Sequents for terms have the form r; ,4; _P ~ e : ~ a , where r represents the available allocation resource, a is the type of e, and # represents the effects of its evaluation. Rules (Exp-let) and (Exp-val) establish the correspondence between the resource annotations in these constructs and the currently available allocation resource; the effect of lifting a value to a term is none, while the effect o f sequencing two computations via let is the union of their effects. Any effect allowed with the current resource may be added to the effects of a term using rule (Exp-spurious). The central novelty is the use r' (.) construct for resource manipulation; its typing rule (Exp-use) imposes the crucial restriction that the effect # of the term e must be

125

EFFECT ENVIRONMENT FORMATION (Env-eft-empty)

(Env-eft-ext)

TYPES

(Typ-fun)

(Typ-basic)

,4 ~ # <_ r

~A

~0 Fa ,4t, t < r

(Typ-cont)

TYPE ENVIRONMENT FORMATION (Env-typ-empty)

,4~a

(Env-typ-ext)

~`4

,4 ~ r

,4 F r O

,4 ~ a, a '

,4 ~" a 4 a '

O~CC
,4 I-= ~ r c o n t

,4 I - " a

,4 ~ r ~ , x : a

(Typ-poly) F~ A ,4t, t < r F~ a ,4 ~ Vt<_r. cr

EFFECTS

(Eft-empty)

(Eft-CO)

,4 ~ O < r

A ~CC
TERMS

(Eft-var)

(Eft-combine)

I--aA `4(t) = r

(Exp-let) r;A;F

~e:-a

,41-"#'V/~"_
A;F ,4 I-" M a x E f f ( r ) ,4 ~ # <_ r'

/LV/~ I

a'

(Exp-val)

(Eft-max) `4 I-" p <_ r

I-= e': --a'

r;,4;F F~let ~ x = e i n e ' :

~p'
,4 F" t < r

r;A;F=,x:a

F~ v : a

r; `4; F F" (v) r : - a

g r'

r

(Exp-spurious) r; A ; F ~ e : - a

VALUES

r; A ; F

(Val-const)

(Val-var)

,4 I-r F ,4; F ~ c : O(c)

,4 I-r F F ( x ) = a ,4; F ~ x : a

A F" p ' < r

~ (e)~, :

a t~ V ,u/

(Exp-use) r'; A; F I-" e : - a

A I-" # < r

P

r; `4; F ~ user' (e) : -o"

(Yal-abs)

p

,4FF ,4~a r;A;F~,x:a I-'e: ~a'

(Exp-app) A ~ r

r ( x ) = ~' 4 o

A ; U~ ~ ,k~ x : a. e : a -S-~a '

r; A ; F I~ ~ x x ' : - a

(Val-poly)

p

(Exp-callcc)

`4 ~ F `4t, t < r ; F F~v:a , 4 ; F F~ A t < r . v : V t < _ r . a

A I- r F

F(x)=arcont--~a

r; A ; F ~ callcc x : - - a

(Val-tapp) F(x)=Vt
,4; r

r ( x ' ) = o'

P

I~ z ~ :

,4 F" p < r

~lta

pvCC

(Exp-throw) A V r

a ~ ~' r(~! = ~,~r

r; A; F F" t b r o w a

Fig. 3. The R L type system

r(~') =

x x : Ma=Eff(r) 0"1

126 supported by the resource r available before the alternative resource r ' is selected. This ensures the correctness of the propagation of # outside the scope of the use r' (-). The rules for application and callcc set the correspondence between the available resource and the resource required by the invoked function. In addition, (Exp-callcc) and (Exp-throw) specify that the continuation type is annotated with the same resource, which is needed by the context captured in the continuation and therefore must be matched when it is reactivated. The effect of evaluating a callcc includes CC, while the effect of a t h r o w is that of the rest of the computation, which we estimate as the maximal possible with the current resource. By induction on the structure of a typing derivation it follows that if a term has a type in a given environment, it has exactly one type, and the presence of type annotations allows its effective computation, i.e. there exists a function EffTypeO such that

EffTypeOf (r, A, F, e) = (#, a) if and only if r;/1; F ~ e : - a . We will also use the function TypeOf with the same arguments, returning the type a only. 4.2

Dynamic Semantics

The operational semantics of RL (Figure 4) is defined by means of a variant of the tail-call-safe CaEK machine (Flanagan et al. 4). The machine configuration is a tuple (e, E, O, p) where e is the current term to be evaluated, E is the environment mapping variables to machine values, O is a heap of objects (closures), and p is a tuple of machine resources. Depending on the allocation strategy used, p is either a continuation stack S, recording (as in the original CaEK machine) the context of the evaluation as a sequence of activation records, or a pair of a current continuation k and a continuation heap K . In the latter form k is a continuation handle and K is a mapping from ContHandles to activation records which offers non-sequential access. In neither case does a function application (a pp) perform additional allocations of activation records, so both strategies are tail-call safe. Machine values are either small constants or pointers into other structures where larger objects are allocated. All closures are allocated on the heap (the function 7 at the bottom of the figure shows the details). The activation records created when evaluating a letr-expression may be allocated either on the continuation heap K (transition rule (let H)) or on the continuation stack S (rule (letS)). An activation record represents a continuation, and in our small language there are only three possibilities: the computation either halts or continues by binding a variable to a computed value or by restoring a resource. Rules (val H) and (val s) perform the binding, depending on the allocation mode. The evaluation of use r (e) selects the activation record allocation strategy for e, e.g. use s (e) selects stack-based allocation for e (transition rule (useS)). When the current allocation resource is already r we define use r (.) as a no-op; if a change of resource is performed, an activation record is pushed on (the top of) the new allocation resource. Correspondingly, heap-based allocation is restored by transition rule (resume H) after the evaluation of e.

127

SEMANTIC DOMAINS

MachineValBw::= E E h E Object B o ::= 0 E k E ActRcdB a::= K E S::=

Constc I Ptrh I Contk Var --r Machine Val HeapLocs Closure (x,e, E) I TyAbs (t,r, v) HeapLocs --> Object ContHandles Bind(x,e,E,k) I Resumes I Halt ContHandles --+ ActRcd Bind (x, e, E, S) I Resume (k, K) I Halt

machine values environment heap locations closures (objects) object heap continuation handles activation records activation record heap activation record stack

TRANSITION RULES

(app)

<~ Xl X2, E, O, p) I.~1 <e', Elx I P..).E(x2), O, D)

where E(xl) = Ptr h, O(h) = Closure <x', e', E') FOR HEAP-ALLOCATED ACTIVATION RECORDS

(let H)

(let H x = el in e2, E, H, (k, K)) ~-~1 <el, E, H,
(valH)

<
(callcc)

(throwa Xl x2, E, H, (k, K)) ~-+1 (c', E'x' ~ E(x2), O,
(uses)

<uses (e), E, H,

((V)H,E, H,
FOR STACK-ALLOCATED ACTIVATION RECORDS

(let s )

(val s)

<
useH) (resumeH)

<use H (e), E, H, <S)) ~->1 <e, E, H,
<
REPRESENTATION OF

VALUES

3' (c, E, O)=(Const c, O) 3' ()~r x: o'. e, E, O)=> 7 ( x , E , O ) = ( E ( x ) , O)

7(At
7(xp,E,O)

= 7(~/tv,E,O)ifE(x)

= Ptr h', O(h') = TyAbs (t,r,v), and ~ p < r

Fig.4. Semantics of RL

128 Another no-op is the increase of effect sets (.)~ which only serves type-checking purposes.

4.3

Soundness of the Type System

The type system maintains the property that the effects of well-typed programs are possible with their available resources, formalized in the following statement, proved by induction on the typing derivation.

Lemma 1. l f r; A; F ~ e : -~tr is a valid typing judgment, then A b ~ tz < r. Semantically this behavior of well-typed programs is expressed as soundness with respect to resource use, extending the standard soundness for safety of the type system, in the following theorem.

Theorem 1. If r; 0; 0 ~ e : -fftr, then the configuration (e, ~, 0, Halt r) either diverges or evaluates to the configuration ((v) r, E, O, (Halt r)) (for some v, E and 0), where Halt s ~ (Halt), and Halt H A= (k, K ) for some k and K such that K ( k ) : Halt. This result is a corollary of the standard properties of progress and subject reduction of the system, the proofs of which we sketch below. To simplify the proofs, we introduce a type-annotated version of the semantics, which maintains type information embedded in the runtime representation. Thus the representation of an abstraction in the typeannotated version is 7(At x : a . e, E, O) =

(Ptr h, Oh ~ Closure' (r, x, a, e, EIFv(e)_z))

In addition, the runtime environment E is extended to keep the type of each value in its codomain; the value component of E is denoted by VE and the type component by TE. The following definitions are helpful in defining typability of configurations. Definition 1. The bottom hot(p) of an allocation resource p is defined as follows: 1. if p = (S), then bot(p) = bot(S'), if S = Bind ( x ' , e ' , E ' , S ' ) , and bot(p) = S otherwise; 2. if p = ( k , K ) , then hot(p) = b o t ( ( k ' , K ) ) , if K ( k ) = Bind ( x ' , e ' , E ' , k ' ) , and hot(p) = K ( k ) otherwise. Definition 2. The outermost continuation heap outerCont(p) reachable from allocation resource p is 1. 2. 3. 4.

IV if p = (k, K ) and hot(p) = Halt; outerCont((S)) if p = (k, K ) and bot(p) = Resume S; 0, if p = (S) and bot(p) = Halt," outerCont((k, K ) ) if p = (S) and hot(p) = Resume (k, K).

Definition 3. A configuration closed in type environment F is typable under resource r with a result type a and an effect #, written r; F k ~ (e, E , O, p) : -ffa, if for some a', lz'

129

1. Dora (r) n D o , n ( E ) = 0; and 2. r;O;F, I E ~ e : 7 a ' ; and 3. F b ~ (p, E , 0 ) E a' d+ a; and IM

4. for each x E Dora (E), (a) if VE(x) = Const c, then TE(x) = tg(c); (b) if VE(x) = Ptr h and O(h) = Closure' {rl, x l , a l , el, El), then O;T E1 ~ At1 Xl : trl. el : TE(x), and similarly for type abstractions; (c) if VE(x) = Cont k, then TE(x) = al r, cont and F ~ (k, outerCont(p)), E , 0 E al ~ a~ ltl

and

=

v

a n d F ~ ( p , E , O ) E a' r

f o r some

and

a if

1. r = S and p = (Halt) (i.e. an empty stack) and a = a' a n d # = 0; or 2. r = S a n d p = (Bind ( x l , e l , E 1 , S 1 ) ) a n d S ; 1", Xl :0 "t ~-e ( e l , E l , O, $1) : -~a; or

3. r

Sandp ~ (Resume(k',K'))andF

bp ( ( k ' , K ' ) , E , O )

H

C a ' ~ o',

and similarly for r = H. Note that the environment may contain reachable variables bound to continuations even when the current allocation resource is a stack. Type correctness of these continuations cannot be verified with the stack resource, instead we have to find the corresponding continuation heap. However in this case the type system guarantees that the only continuation heap to which there are references in the environment is the outermost continuation heap, if such exists. The reason is that although it is possible to switch to heap allocation after executing in stack allocation mode, there are no invocations of calico allowed since they would introduce the CC effect, which is not possible under the stack resource (cf. typing rule (Exp-use) in Figure 3). We can now formulate the progress and subject reduction properties. L e m m a 2 (Progress). If r; 0 ~-~ (e, E , O, p) : -fig where r corresponds to p (i.e. r = S i f p = (S), r = H i f p = ( k , K ) ) , a n d p ~ Halt r, then there exists C such that (e, E , O, to> I--9.1 C.

Lemma 3

(Subject reduction). If C : (e, E , O, p} and r; 0 F-c C : -fia where r

corresponds to p, and C ~-~1 C ' = (e', E', 0 ' , p'), then r'; 0 F-~ C' : -fira where r' corresponds to p', i.t = ~' V I.t~, and the rule for this transition is (callcc) only if # : CC V #", for some I.t'1 and #". In brief, in the case when e # (v} r, the proofs proceed by examining the structure of the typing derivation for r; 0; F, TE F-- e : ~ra'; together with condition 4 of Definition 3 this yields that the values in the environment and on the heaps have the correct shape for the appropriate transition rule. In the case when e has the form (v} r the proofs ! r inspect the structure of the derivation of/~ ~ (p, E, O) E a ~-4 a, which parallels the decision tree for the transition rules (val) and (resume) and the halting state.

130

4.4

Resource Transformations

Effect inference and type correctness with respect to resource use allow the compiler to modify the continuation allocation strategy of a program fragment and preserve its meaning. The following definitions adapt the standard notions of ordering and observational equivalence of open terms to the resource-based system.

Definition 4. A context C is a term with a hole o; the result of placing a term e in the hole of C is denoted by Ce and may result in capturing effect and lambda variables free in e. The hole of a context C is of type (r, "4, F) =~ -fia if Ce is typeable whenever r; "4; 1" ~ e : -~a. Definition 5. S; A; 1~ ~ e E_ e' : -ffa iffor all contexts C with hole of type (r, "4, F ) ==~ -ffa, all typed environments E closing Ce and heaps 0 closing E, and continuation stacks S, the configuration (Ce', E , O, (S)) converges if (Ce, E , O, (S)) converges. Furthermore, $ ; ' 4 ; F F-~ e ..~ e' : -ffa i f S ; ' 4 ; F F-~ e E_ e' : -fia and S;,4;F ~e'Ee:~a. One possible optimization is the conversion of heap-allocating code to stack-based strategy provided the code does not invoke callcc or throw, as per the following theorem.

Theorem 2. If H; "4; F ~ e : ~a, then S; "4; F t-- use H (e) ~ StkContza (e; F) : ~a, where StkCont is the transformation defined as follows.

StkContA ((v)H; /') = (V) S StkContA ((e)~,; StkContA (use H (e); StkCont,a (use s (e); StkCont,a (9 x, x2;

(StkCont,a (e; F))~, StkCont,a (e; F ) e let s x~ = (As x :F(x2). user (9 x, x)) s in 9 x~ x2 StkContza (let H x = el in e2; F) = let s x = StkContA (el; /') in StkConta (e2; F,, x : T y p e O f (H, "4, F, e2))

5

F) = F) = F) = F)=

T r a n s l a t i o n f r o m H L to R L

Programs in language L E {HL, SL} are translated into RL by an algorithm shown in Figure 5. The algorithm infers the effect and resource annotations of a term using fairly standard techniques. It is presented in the form of an inference system for judgments of the form .4;/" 1-s eHL =~ "4' t- e : ~a, where esL, "4, and T' are inputs corresponding respectively to the L term to translate (also overloaded to HL top-level programs) and the inherited effect and type environments, initially empty. The outputs of the translation are e, A r, #, and a, which stand for the translated term, the inferred effect environment, and the effect and type of e in environments "4~ and F; thus the output of the algorithm satisfies H; "4~;/" t--, e : ~a. The function T~ maps a language name to the resources available to a program in this language: TC(HL) = H and 7~(SL) = S.

131

(Translate-external)

0 . ' = CtoseAlt(MazS(AnnotateS(7., Dora (A))), S) 0." = CtoseAU ( A n n o t a t e S ( r , Dom (.4)), S) Zl; Fx, x : 0." }-tiLp =~ A ' ~- e' : -fi0. A; F I--HLexternal(SL) x : r in p A ' ~- A H X :OJ. let s x = WrapS(e, x, 0.') in e' : $(cr' --~ or) where Annotate r (/3, V) = / 3 Annotate r (7" cont, V) = (Annotate ~ (~-, V)) "cont Annotate ~ (7" --> 7"', V ) = 0. .S+ 0., where t E EffVar - V, t 0. = Annotate ~ (7", V U {t}), 0.' = Annotate ~ (7", V U {t} U fev(0.)) Wrap;' (C, x, V t < r " . 0 . ) = A t < r " . Wrap;' (Clet # x ' = (xt) r' in e, x', 0.) Wrap:' (C, x, 0.1 4 0.2) = ,V' Xl :0.~. let ~' x~ = (Wrap:, (., x l , 0.'a)) ~' in Wrap~' (Clet # x2 = ~ x x~ in e, x2, 0.2) where 0.~ = ConvertTyperr ' (0.1)

Wrap;' (c, ~,/3) =

c(~y'

(Translate-app)

A; F I-c el ~ Aa I- e~ : ~ 0 . ,

A'; S t- a, ~ (0.2 ~ a)

A,; F I-L e2 ~ Ll2 f- el : ~0.2

t r re'u(0.1) Ufev(0.2) U Dora (A2) xl r FV(e) A2 I-1 A'; F I-z el e2 =~ A ' I- let H Xl = e~ in l e t H x2 = e~ in 9 xl x2 : - - S a

i~l Vl~2VSt

where ~;0./0/1-0l,~0"

z/%2;S2~Slo.2"~ S1~ S='ITI~u(S2~'~'S2~')

A1;SI-O.I'~O.~

z~ 1 r' ~2; S - 0.1 -~ 0"2 ~ a~ --~ 0/9 A; S F- al ,~ a2 A; S I- 0.1 ~ c o n t ~ a2 ~cont M i n E n v ( t < r) = t <_ r M i n E n v (@ _< r) = @

A; S ~- a ~ a' A' = M i n E n v ( S t < r) A I-1 A'; S\{ 0 ~- Vt _
M i n E n v (1~ V I~ <_ r) = M i n E n v (1~ <_ r ) f q M i n E n v (#2 <_ r) M i n E n v (CC _< H) = @

(Translate-let) A ; F I-z et =~ Ax I- e~ : ~0.1

(Crtl, A2) = C l o s e ( 0 . 1 , A l , r )

A 2 ; / ' = , x : 0.~ I-z: e2 =~ A ' I- e I : ~0.2

A; F I-s let x = el in e2 => A ' I- let H x = e~ in e~ :

0.2 /~i V/~2

(Translate-abs) a = AnnotateS(r, D o r a ( A ) )

A ; F ~ , x : 0. ~-~e :=~ A ' ~- e' : - 0 : F

za; F ~-c k z : r. e ~ za' ~- (~" x:0.. e')" : - ( a -~ 0.') $

t,

(Translate-calico) A;_r' I-tiLe =:~ A ' F e' :

--(0.

Hcont-~ 0.)

A; F I-tiLc a l l c c e =r A' I- l e t H x = e' in c a l l c c x : - - 0 .

/~V/~lVCC

Fig. 5. Typed translation from HL to RL

132 Several auxiliary functions are shown in the figure, and the definitions of several simpler functions are as follows. The lub of two resources is defined by r U r = r and 5 U H = H. The function N for merging two effect environments is defined as (/11 q A2)(t) = Ax (t) .J A 2 (t) if t E Dora (A i) N Dora (A2), and (Ai ~ A2) (t) = Ai (t) on the rest of Dom (Ai) U Dom (A2). The free effect variables of a type a are denoted by fev(a); the function Close(a, A, F) returns the pair (Vti< A(ti). a, A\{U}), where {~} = fev(a) - fev(F), and similarly we have CloseAll (a, r) = Vti < r. a where : IcY(a).

Separately compiled external functions are treated as parameters of the compiled HL fragment and are wrapped to convert the HL resources (continuation heap) to SL resources (continuation stack). The wrapping is performed by an auxiliary function invoked as Wraprr' (C, x, a), which produces a term coercing x from type a to type ConvertTgpe~' (a) with resource annotations r' in place of r, and places it in context C. When compiling separately, the effects of an external function are approximated conservatively by applying Maz r to the effect-annotated declared type of the function; by definition Maxr(a) is ai r ~ Max r (a2) when a = ai -S-ra2, and a otherwise. MaxEff(r)

#

This allows the view of external functions as effect-polymorphic without restricting their actual implementations. 6

Related

Work

and Conclusions

The work presented in this paper is mainly inspired by recent research on effect inference 5, 10, 11, 23, 24, efficient implementation of first-class continuations 2, 8, 20, 1, monads and modular interpreters 30, 12, 29, 13, typed intermediate languages 7, 25, 21, 17, 16, 3, and foreign function call interface 9, 18. In the following, we briefly explain the relationship of these work with our resource-based approach. - Effect systems. The idea of using effect-based type systems to support language

interoperation was first proposed by Gifford and Lucassen 6, 5. Along this direction, many researchers have worked on various kinds of effect systems and effect inference algorithms 10, 11, 23, 24, 28. The main novelty of our effect system is that we imposed a "resource-based" upper-bound to the effect variables. Effect variables in all previous effect systems are always universally quantified without any upper bounds, so they can be instantiated into any effect expressions. Our system limits the quantification over a finite set of resources--this allows us to take advantage of the effect-resource relationship to support advanced compilation strategies. - Efficient call/cc. Many people have worked on designing various strategies to support efficient implementation of first-class continuations 2, 8, 20, 1. To support a reasonably efficient call/cc, compilers today mostly use "stack chunks" (a linked list of smaller stacks) 2, 8 or they simply heap allocate all activation records 20. Both of these representations are incompatible with those used by traditional languages such as C and C++ where activation records are allocated on a sequential stack. First-class continuations thus always impose restrictions and interoperability challenges to the underlying compiler. In fact, many existing compilers choose not to support call/cc, simply because call/cc is not compatible with standard C

133

-

-

-

-

calling conventions. The techniques presented in this paper provide opportunities to support both efficient call/cc and interoperability with code that use sequential stacks. Threads. Implementing threads does not necessarily require first-class continuations but only an equivalent of one-shot continuations 1. A finer distinction between these classes of continuations is useful, however the issues of incorporating linearity in the type system to ensure safety in the presence of one-shot continuations are beyond the scope of this paper. Monads and modular interpreters. The idea of using resources and effects to characterize the run-time configuration of a function is inspired by recent work on monad-based interactions and modular interpreters 30, 12, 29, 13. Unlike in the monadic approach, our system provides a way of switching the runtime context "horizontally" from one to another via the use r (e) construct. Typed intermediate languages. Typed intermediate languages have received much attention lately, especially in the HOT (i.e., higher-order and typed) language community. However, recent work 7, 15, 22, 17, 3, 16, 14 has mostly focused on the theoretical foundations and general language design issues. The type system in this paper focused on the problem of compiling multiple source languages into a common typed intermediate format. We plan to incorporate the resource and effect annotations into our FLINT intermediate language 22. Foreign function call interface. The interoperability problem addressed in this paper has much in common with frameworks for multi-lingual programming, such as ILU, CORBA 27, and Microsoft's COM 19. It also relates to the foreign function call interfaces in most existing compilers 9, 18. Although these work do address many of the low-level problems, such as converting data representations between languages or passing information to remote processes, their implementations do not provide any safety guarantees (or if they do, they would require external programs run in a separate address space). The work presented in this paper focuses on interfacing programs running in the single address space with much higher performance requirements. We emphasize building a safe, efficient, and robust interface across multiple HOT languages.

We believe what we have presented in this paper is a good first-step towards a fully formal investigation on the topic of safe fine-grain language interoperations. We have concentrated on the issues of first-class continuations in this paper, but the framework presented here should also apply to handle other language features such as states, exceptions, and non-termination. The effect system described in this paper is also very general and useful for static program analysis: because it supports effect polymorphism, effect information is accurately propagated through high-order functions. This is clearly much more informative than the single one-bit (or N-bit) information seen in the simple monad-based calculus 16, 26. There are many hard problems that must be solved in order to support a safe and fine-grained interoperation between ML and safe-C, for instance, the interactions between garbage collection and explicit memory allocation, between type-safe and unsafe language features etc. We plan to pursue these problems in the future.

134

Acknowledgment We are grateful to the anonymous referees for their valuable comments.

References 1

C. Bruggeman, O. Waddell, and K. Dybvig. Representing control in the presence of oneshot continuations. In Proc. ACM SIGPLAN '96 Conf. on Prog. Lang. Design and Implementation, pages 99-107, New York, June 1996. ACM Press. 2 W. D. Clinger, A. H. Hartheimer, and E. M. Ost. Implementation strategies for continuations. In 1988 ACM Conference on Lisp and Functional Programming, pages 124-131, New York, June 1988. ACM Press. 3 A. Dimock, R. Muller, F. Turbak, and J. B. Wells. Strongly typed flow-directed representation transformations. In Proc. 1997 A CM SIGPLAN lnternational Conference on Functional Programming (ICFP'97), pages 11-24. ACM Press, June 1997. 4 C. Flanagan, A. Sabry, B. E Duba, and M. Felleisen. The essence of compiling with continuations. In Proc. ACM SIGPLAN '93 Conf. on Prog. Lang. Design and Implementation, pages 237-247, New York, June 1993. ACM Press. 5 D. K. Gifford et al. FX-87 reference manual. Technical Report MIT/LCS/TR-407, M.I.T. Laboratory for Computer Science, September 1987. 6 D. Gifford and J. Lucassen. Integrating functional and imperative programming. In 1986 ACM Conference on Lisp and Functional Programming, New York, August 1986. ACM Press. 7 R. Harper and G. Morrisett. Compiling polymorphism using intensional type analysis. In Twenty-second Annual ACM Symp. on Principles of Prog. Languages, pages 130-141, New York, Jan 1995. ACM Press. 8 R. Hieb, R. K. Dybvig, and C. Bruggeman. Representing control in the presence of firstclass continuations. In Proc. ACM SIGPLAN '90 Conf. on Prog. Lang. Design and Implementation, pages 66-77, New York, 1990. ACM Press. 9 L. Huelsbergen. A portable C interface for Standard ML of New Jersey. Technical memorandum, AT&T Bell Laboratories, Murray Hill, NJ, January 1996. 10 P. Jouvelot and D. K. Gifford. Reasoning about continuations with control effects. In Proc. ACM SIGPLAN '89 Conf. on Prog. Lang. Design and Implementation, pages 218-226. ACM Press, 1989. 11 P. Jouvelot and D. K. Gifford. Algebraic reconstruction of types and effects. In Eighteenth Annual A CM Symp. on Principles of Prog. Languages, pages 303-310, New York, Jan 1991. ACM Press. 12 J. Launchbury and S. Peyton Jones. Lazy functional state threads. In Proc. ACMSIGPLAN '94 Conf. on Prog. Lang. Design and Implementation, pages 24-35, New York, June 1994. ACM Press. 13 S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In Proc. 22rd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 333-343. ACM Press, 1995. 14 G. Morrisett, D. Walker, K. Crary, and N. Glew. From system F to typed assembly language. In Proc. 25rd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, page (to appear). ACM Press, 1998. 15 G. Morrisett. Compiling with Types. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1995. Tech Report CMU-CS-95-226.

135 16

17 18 19 20 21 22 23 24 25

26

27

28

29 30

S. Peyton Jones, J. Launchbury, M. Shields, and A. Tolmach. Bridging the gulf: a common intermediate language for ML and Haskell. In Proc. 25rd Annual ACM SIGPLAN-SIGACT Syrup. on Principles of Programming Languages, page (to appear). ACM Press, 1998. S. Peyton Jones and E. Meijer. Henk: a typed intermediate language. In Proc. 1997ACM SIGPLAN Workshop on Types in Compilation, June 1997. S. Peyton Jones, T. Nordin, and A. Reid. Green card: a foreign-language interface for Haskell. Available at http://www.dcs.gla.ac.uk:80/simonpj/green-card.ps.gz, 1997. D. Rogerson. Inside COM: Microsoft's Component Object Model. Microsoft Press, 1997. Z. Shao and A. W. Appel. Space-efficient closure representations. In 1994 ACM Conference on Lisp and Functional Programming, pages 150-161, New York, June 1994. ACM Press. Z. Shao. An overview of the FLINT/ML compiler. In Proc. 1997 A CM SIGPLAN Workshop on Types in Compilation, June 1997. Z. Shao. Typed common intermediate format. In Proc. 1997 USENIX Conference on Domain Specific Languages, pages 89-102, October 1997. J.-P. Talpin and P. Jouvelot. Polymorphic type, region, and effect inference. Journal of Functional Programming, 2(3), 1992. J.-P. Talpin and P. Jouvelot. The type and effect discipline. Information and Computation, 111(2):245-296, June 1994. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM SIGPLAN '96 Conf. on Prog. Lang. Design and Implementation, pages 181-192. ACM Press, 1996. D. Tarditi. Design and Implementation of Code Optimizationsfor a Type-Directed Compiler for Standard ML. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1996. Tech Report CMU-CS-97-108. The Object Management Group. The common object request broker: Architecture and specifications (CORBA). Revision 1.2., Object Management Group (OMG), Framingham, MA, December 1993. M. Tofte and J.-P. Talpin. Implementation of the typed call-by-value )~-calculus using a stack of regions. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 188-201. ACM Press, 1994. P. Wadler. The essence of functional programming (invited talk). In Nineteenth Annual ACM Symp. on Principles of Prog. Languages, New York, Jan 1992. ACM Press. P. Wadler. How to declare an imperative (invited talk). In International Logic Programming Symposium, Portland, Oregon, December 1995. MIT Press.

P o l y m o r p h i c Equality - N o Tags Required Martin Elsman Department of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen O, Denmark. E-mail: mael@diku, dk

A b s t r a c t . Polymorphic equality is a controversial language construct. While being convenient for the programmer, it has been argued that polymorphic equality (1) invites to violation of software engineering principles, (2) lays a serious burden on the language implementor, and (3) enforces a runtime overhead due to the necessity of tagging values at runtime. We show that neither (2) nor (3) are inherent to polymorphic equality by showing that one can compile programs with polymorphic equality into programs without polymorphic equality in such a way that there is no need for tagging or for runtime type analysis. Also, the translation is the identity on programs that do not use polymorphic equality. Experimental results indicate that even for programs that use polymorphic equality, the translation gives good results.

1

Introduction

Often, statically typed languages, like ML, provide the programmer with a generic function for checking structural equality of two values of the same type. To avoid the possibility of testing functional values for equality, the type system of Standard ML 11 distinguishes between ordinary type variables, which may be instantiated to any type, and equality type variables, which may be instantiated only to types that admit equality (i.e., types not containing ordinary type variables or function types). In this paper, we show how polymorphic equality may be eliminated entirely in the front-end of a compiler by a type based translation called equality elimination. The translation is possible for expressions that are typable according to the Standard ML type disciplineIll. We make three main contributions: 1. Identification and application of equality elimination in a call-by-value language without garbage collection, including treatment of parametric datatypes and side effects. Equality elimination eliminates the last obligation for tagging values in Standard ML and opens for efficient data representations and easier foreign language interfacing. 2. Measurements of the effect of equality elimination in the ML Kit with Regions 23, 3, 5 (from hereon just the Kit) and a discussion of the possibilities for data representations made possible by the translation. 3. Demonstration of semantic correctness of the translation. It has been considered non-trivial to demonstrate semantic correctness for type classes in Haskell 13, Sect. 4.

137 As an example of equality elimination, consider the following ML program, which declares the function member using polymorphic equality to test if a given value is among the elements of a list: let fun member y = false member y (x::xs) = (y = x) orelse member y xs in (member 5 3,5, member true false) end

The function member gets type scheme Ve.e -4 e list -4 bool, where e is an equality type variable (i.e., a type variable that ranges over equality types). In the example, the function member is used with instances int and bool, which both admit equality. On the other hand, the function map, presented below, gets type scheme Vc~/3.(a -4 8) -4 a list -4 fl list, where c~ and/3 are ordinary type variables and hence may be instantiated to any particular type. f u n map f map f

= (x::xs)

= f x

:: m a p f xs

To eliminate polymorphic equality, it is possible to pass extra arguments to equality polymorphic functions as member above - one for each abstracted equality type variable in the type scheme for the function. Using type information, the example is translated into the program l e t fun member eq y = false I member eq y ( x : : x s ) = eq ( y , x ) o r e l s e member eq y xs i n (member eq i n t 5 3 , 5 , member e q b o o l t r u e f a l s e ) end

For each use of an equality polymorphic function, appropriate instances of the equality primitive are passed as arguments. In the translated program above, e q _ i n t and eq_bool denote primitive equality functions for testing integers and booleans for equality. These primitive functions are functions on base types and can be implemented efficiently by the backend of a compiler without the requirement that values be tagged. An important property of the translation is that it is the identity on expressions that do not use polymorphic equality. Thus, one pays for polymorphic equality only when it is used. In particular, the translation is the identity on the map function. In the next section, we give an overview of related work. The language that we consider is described in the sections to follow. We then proceed to present a translation for eliminating polymorphic equality. In Sect. 7 and Sect. 8, we demonstrate type correctness and semantic soundness of the translation. In Sect. 9 and Sect. 10, we show how the approach is extended to full ML and how it is implemented in the Kit. We then proceed to present experimental results. Finally, we conclude.

138 2

Related

Work

A type based dictionary transformation similar to equality elimination allows type classes in Haskell to be eliminated at compile time 25, 13, 16. However, the motivation for equality elimination is different from the motivation behind the dictionary transformation, which is to separate dictionary operations from values at runtime. In lazy languages such as Haskell, tagging cannot be eliminated even if tag-free garbage collection is used. A more aggressive elimination of dictionaries is possible by generating specialised versions of overloaded functions 8. This technique does not work well with separate compilation and may lead to unnecessary code duplication. No work on dictionary transformations demonstrates semantic soundness. Harper and Stone present an alternative semantics to Standard ML in terms of a translation into an intermediate typed language 6. Similar to the translation we present here, polymorphic equality is eliminated during the translation. However, because the semantics of their source language is given by the translation, they cannot show correctness of the translation. Ohori demonstrates how Standard ML may be extended with polymorphic record operations in such a way that these operations can be translated into efficient indexing operations 14. His translation is much similar to equality elimination in that record indices are passed to instantiations of functions that use record operations polymorphically; Ohori demonstrates both type correctness and semantic soundness for the approach. The TIL compiler, developed at Carnegie Mellon University, uses intensional polymorphism and nearly tag-free garbage collection to allow tag-free representations of values at runtime 20. An intermediate language of TIL allows a function to take types as arguments, which can then be inspected by the function. This means that polymorphic equality can be encoded in the intermediate language, thus, eliminating the primitive notion of polymorphic equality. However, for nearly tag-free garbage collection, records and other objects stored in the heap are still tagged in order for the garbage collector to trace pointers. It has been reported, however, that the nearly tag-free scheme can be extended to an entirely tag-free scheme 21. 3

Language

and Types

We consider a typed lambda calculus extended with pairs, conditionals, a polymorphic equality primitive, and a let-construct to allow polymorphic bindings. First, we introduce some terminology. A finite map is a map with finite domain and if f and g are such maps we denote by Dom(f) the domain of f and by Ran(f) the range of f. Further, we write f + g to mean the modification of f by g with domain Dom(f) U Dom(g) and values (f + g)(x) = if x E Dom(g) then g(x) else f(x). We assume a denumerably infinite set of equality type variables, ranged over by e, and a denumerably infinite set of ordinary type variables, ranged over by

139 c~. Types, ranged over by % and type schemes, ranged over by a, are defined as follows:

{ TlXT2

",-::--c I o, I a ::= Vg'd.T

I bool

A type T admits equality if either v = bool or T -- e or T = T: X T2 and T1 and T2 admit equality. 3.1

Substitutions

A substitution S is a pair (S ~, Sa), where S e is a finite m a p from equality t y p e variables to types such that, for all T 6 Ran(S~), r admits equality and S a is a finite m a p from ordinary type variables to types. When A is any object and S = (S e, S a) is a substitution, we write S ( A ) to m e a n simultaneous capture free substitution of S ~ and S a on A. For any type scheme a = V e : - - . e n ~ : " ' ' a m . T and t y p e r ' , we say t h a t T' is an instance of a (via S), written a > T', if there exists a substitution S = ({el ~-+ T : , . . . , e n ~ Tn},{a: ~-~ T~,...,t~m ~-+ T~}) such t h a t S(7-) = T'. T h e instance list of S, written il(S), is the pair ( I T : , . . . , T,, ITS,..., ~'~). Pairs of the above form are referred to as instance lists and we use il to range over them. When A is any object, we denote by ftv(A) a pair of a set of equality type variables free in A and a set of ordinary type variables free in A. Further, we denote by fetv(A) the set of equality type variables t h a t occur free in A. 3.2

Typed Expressions

In the following, we use x and y to range over a denumerably infinite set of lambda variables. The g r a m m a r for t y p e d expressions is as follows:

I e:e2

I (el,e2)I

e {

{ let x : a =el ine2

I true { false { if e then e I else e2 { eq r W e sometimes abbreviate x(,)

4

with x.

Static Semantics

The static semantics for the language is described by a set of inference rules. Each of the rules allows inferences a m o n g sentences of the form A, TE b- e : T, where, A is a set of equality type variables, TE is a type environment, m a p p i n g l a m b d a variables to type schemes, e is a t y p e d l a m b d a expression, and T is a type. Sentences of this form are read "under assumption (A, TE), the expression e has type T." A type T is well-ormed with respect to a set of equality t y p e variables A, written A }- r, if A _D fetv(T). Moreover, an instance list il = ( T : , . . . , r n , T~,...,T'm ) is well-formed with respect to a set of equality type variables A, written A ~- il, if A ~- n , i = 1..n and A }- r~, i = 1..m.

140 A , T E F e :7-

Expressions

A , T E F el : 7"1 --~ T2 "4, T E F e2 : 71

"4, T E + { x ~ T} b e : T' ,4, T E b )~x : T.e : T -'~ V' (1)

"4, T E F el : "rl "4, T E F e2 : ~'2

i E {1,2}

,4, T E b ( e l , e2) :T1 x T2

(3)

a(s)

"4, T E b x a ( s ) : T

(2)

,4, T E b e : "rl x T2

,4, T E F lh e : Ti

=

T E ( x ) >_ T via S

"4

,4, T E F el e2 : T2

ftv( 'a)

n

(4)

ftv(,4, TE) = 0

,4 U fetv(e-), T E b el : ~-

(5)

"4, T E F t r u e : bool

A , T E + { x ~ a } F e2 : T' "4, T E F l e t x : a = el i n e 2 : T '

,4, T E F e : bool ,4, T E l - e l : T ,4, T E F e2 : T

(7)

A, TE b f a l s e : bool

,4, TE ~- i f e t h e n el e l s e e2 : T

(9)

(6)

(8)

A b ~- 7 admits equality A, T/~ F eq~ ~ - r X T ~ - ~ - ~ (10)

There are only a few comments to note about the rules. In the rule for applying the equality primitive to values of a particular type, we require the type to be well-formed with respect to quantified equality type variables. Similarly, in the variable rule, we require the instance list be well-formed with respect to quantified equality type variables. For simplifying the type system in languages with imperative updates and polymorphism, there is a tendency to restrict polymorphism to bindings of nonside-effecting terminating expressions. This tendency is known as the v a l u e res t r i c t i o n , which is enforced by b o t h the Objective Caml system 10 and the Standard ML language 11. To simplify the presentation, we do not enforce the value restriction in rule 6. We return to this issue later, in Sect. 8. 5

Dynamic

Semantics

The dynamic semantics for the language is, as the static semantics, described by a set of inference rules. An u n t y p e d e x p r e s s i o n may be obtained from a typed expression by eliminating all type information. In the rest of this section, we use e to range over untyped expressions.

141 A dynamic environment, g, maps lambda variables to values, which again are defined by the grammar: V ::----clos(Ax.e,g) I true I false

(Vl,V2)

eq

The rules of the dynamic semantics allow inferences among sentences of the forms g F e ~ v and -eq (Vl,V2) 1~ v, where g is a dynamic environment, e is an untyped expression, and v, vl, and v2 are values. Sentences of the former form are read "under assumptions g, the expression e evaluates to v." Sentences of the latter form are read "equality of values vl and v2 is v."

Expressions

gFe~v

~(x) = v g Fx ~ v

(11)

g t- el ~. clos(Ax.e, go) g~-e2~.v go + {x ~ v} ~- e ~ v' (13) g F el e2 ~ v'

g ~- true ~ true

g F- eq ~ eq

(12)

g F Ax.e ~ clos(Ax.e, g)

g I- el ,0- eq g I- e2 ,~ v l'eq V "U"v' (14) g t- el e2 ~ v'

(15)

g t- false ~ false

g t- el ~ Vl

(17)

g 5 e2 ,~ v

g~if ethenel

elsee2~v

g ~- e2 ~. v2

g ~ (el,e2) ~ (vl,v:)

i 6 {1, 2} g F e ~ (vl, v2) (19) g ~- Tri e # vi

g ~- e ,~ false

(16)

(18)

gt-e,~true

g F-el~.V

gFif ethenel

else e2~v

(20)

g t- el # vx

(21)

g + {x ~ vl } ~- e2 ~ v2

g F l e t x = el i n e2 ~ v2

(22)

142

Equality o f Values

I ~eq (Vl,V2) ~ V I

Vl =V2 Vl,V2 E {true,false} (23) ~eq (Vl,V2) ~true

~eq (v11,v21)~false (24) ~eq ((Vll,V12),(v21,v22)) ~ false

Vl ~V2 Vl,V2 E {true,false} (25) ~-eq (Vl, V2) ~Lfalse

~eq (VII,V21) ~ t r u e ~eq (V12,V22)~V (26) ~eq ((V11,V12),(V21,V22))'~ v

The dynamic semantics includes rules for the polymorphic equality primitive (rules 23 through 26). If the equality primitive is only ever applied to values of type bool (if rules 24 and 26 are not used), the primitive need not distinguish booleans from values of pair type and no runtime tagging is required.

6

Equality

Elimination

In this section, we present inference rules for translating typable expressions into typable expressions for which the equality primitive is used only with instance bool. The translation is the identity for typable expressions that do not use polymorphic equality. A translation environment, E, is a finite map from equality type variables to lambda variables. We occasionally need to construct a function for checking structural equality on a pair of values of the same type. We define a relation that allows inferences among sentences of the form E ~-eq T $ e, where E is a translation environment, ~- is a type, and e is an expression. Sentences of this form are read "under the assumptions E, e is an equality function for values of type T."

Equality Function Construction

E(c) = x

E -eq ~ ~ x

E t-eq r ::~ e I

(27)

E ~-eq T1 ~ el e = e2

E ~-eq bool ~ eqboo I E ~-eq T2 ~ e2 x),

2

(2s)

x fresh

x))

e' = i f el (Trl (Trl x), 71"1 (71"2 x)) t h e n e e l s e f a l s e E I%q T1 x ~'2 =~ Ax : (T1 X ~-2) X (~'l X :r2).e' (29)

143 Each rule for the translation of expressions allows inferences among sentences of the form E ~- e => e', where e and e' are expressions and E is a translation environment. Sentences of this form are read "under the assumptions E, e translates to e'."

Expressions

E F- e :=> e'

Ef-e~e' E F- Ax : T.e => AX : r.e' (30)

E t- el => e~

E ~- el ~ e~ E }- e2 => e~ (32) E }- (el, e2) =~ (e~, e~)

it =

E F- e => e' E ~- r i e => lri e'

n > 0

E }-eq

Ti ----~ e i

i :

E ~-eq r => e

1..n

E ~- x~, ~ ( . . . ( x i , e l ) - - ' e , )

E t- e2 => e~

E ~- el e2 ~ e~ e~

(34)

E f- eqr =~ e

.(31)

(33)

(35)

ff -- Vgl 9"" gn~.T Yl "'" yn fresh n > O E + {~1 ~-+ Yl,--., ~n ~-+ Yn) t- el => e~ r~ = e i x ei--~ bool i = l . . n e'l' = Ayl : T 1 . ' ' ' .Ay,~ : T,~.e'l E ~ e2 :::> e'2 a I = V61 " ' ' 6 n ~ . T 1

El-letx:a=el

E t- true =~ true

--+ " ' " -"> T n ~

i n e 2 :=>let x : a '

(37)

T

=e~' i n e ~

(36)

E F false ~ false (38)

Et-e=~e' E } - e l =~e~ Et-e2~e~ E t- i f e t h e n el e l s e e2 :=> i f e' t h e n e~ e l s e e~ (39)

In the translation rule for the let-construct, we generate abstractions for equality functions for each bound equality type variable in the type scheme for the letbound variable. Accordingly, in the rule for variable occurrences, appropriate equality functions are applied according to type instances for abstracted equality type variables. In rule 35, we generate a function for checking equality of values of type T.

144 7

Type

Correctness

In this section, we demonstrate that the translation preserves types and that all typable expressions may be translated. T y p e Preservation

7.1

We first give a few definitions for relating type environments and translation environments. D e f i n i t i o n 1. ( E x t e n s i o n ) A type s c h e m e rr = V e l ' " e n ~ . T

extends a n o t h e r type s c h e m e 0' -~ Ve~ 9 . . e m' a "~ . T', w r i t t e n ~ ~- a ' , i f n = m a n d e i -: Q~, i = 1..n and ~ = ~' and T = (el X el -4 bool) - 4 . . . -4 (en • en - 4 bool) - 4 7'. A type e n v i r o n m e n t T E ' extends a n o t h e r type e n v i r o n m e n t T E , w r i t t e n T E ' >- T E , i f D o m ( T E ' ) _D D o m ( T E ) and T E ' ( x ) ~- T E ( x ) f o r all x 9

Dom(TE).

Definition 2. (Environment Matching)

A

translation

environment

E

matches a type e n v i r o n m e n t T E , w r i t t e n E E_ T E , if T E ( E ( e ) ) = e x e - 4 bool f o r all ~ E Dom(E). The following proposition states that the equality function generated for a specific type that admits equality has the expected type. P r o p o s i t i o n 1. I f E ~-eq T ~

e a n d E E_ T E and A F- 7- t h e n A , T E ~- e :

T • T - 4 bool. Proof. By induction over the structure of ~-.

We can now state a proposition saying that the translation preserves types.

Proposition 2. ( T y p e Preservation)

I f A , T E }- e : T a n d E ~- e ~

e I and

E E T E I >- T E t h e n A , T E I F- e I : ~-. Proof. By induction over the structure of e. We show the three interesting cases.

CASE e ---- Xil From (5), we have T E ( x ) = a and a = Vcl ""enc~.~" and a > ~- via S and A }- il and A, TE b xil : ~-, where il = i l ( S ) . Prom (34), we have that il = (T1,..., Tn, ...) and E }-eq Ti ~ ei, i = 1..n and E ~- xis =~ ('''(,Til el)...en). Because A F il, we have A F Ti, i = 1..n and because E E_ T E t follows from assumptions, we have by Proposition 1 that A, T E ' F ei : Ti • Ti -'4 bool, i=l..n.

Because T E I >- T E follows from assumptions, we have T E ' ( x ) = a', where O'l ~--- V e l . - . e n ~ . ( e l X e l -"} bool) -~ . . . - 4 (~n x En - 4 bool).T ~, and because > T via S and S ( ~ i ) = Ti, i = 1..n follows from i l ( S ) = (0:1,..., Tn, ...), we have a' > 7" via S, where r tt = (T1 X 71 - 4 bool) - 4 .." --+ (Tn x v,~ - 4 bool) - 4 T. Because A ~- il, we can now apply (5) to get A, T E I ~- xi~ : v '~.

145 Now, because A, T E ~ ~ ei : ri • vi --~ bool, i = 1..n, we can apply (2) n times to get A , T E ' F- ( . . . (x~l e z ) " " en) : T, as required. CASE e = l e t x : a = e l • From (6), we have a = V e l " - c n ~ . T and f t v ( ~ l . . - ~ n ~ ) A ftv(A, TE) = ~ and A U { ~ I , . . . , c n } , T E K el : T and A , T E + { x ~ a } ~- e2 : 7-t a n d A , T E K e : T ~. Further, from (36), we have Yl "'" yn fresh and E + {~1 ~-~ Y l , - . . , Cn ~-+ Y n } ~el ~ e~ and ri = r x ei --~ bool, i = 1..n and e~t = Ayl : Vl.'--.Ayn : Tn.ell and E F- e2 =~ e~ and a t = V ~ l " " e n ~ . T 1 --+ " " --+ Tn --+ T and E t- e :=# l e t x : a' = e~' i n e~. It now follows from assumptions and from the definitions of extension and matching t h a t E + {el ~ y l , . . . , e n ~ y n } U T E " >- T E , where T E " = T E ' + { y l ~-~ T1, . . . , Yn ~-~ Tn }, because D o m ( T E ' ) A {Yl,... Yn} = 0 and D o m ( E ) A {r en} = 0 can be assumed by a p p r o p r i a t e renaming of bound type variables of a. We can now apply induction to get A U { e l , . . . , en}, T E " e~ : T. By applying (1) n times, we get A tA { r en}, T E ' F- e~lt : T1 --~ " " --~ T n --~ T.

To apply induction the second time, we observe t h a t E E T E t + { x ~-~ a ~} >a } by assumptions and definitions of matching and extension and because D o m ( T E t) A {x} = 0 can be assumed by appropriate renaming of x in e. By induction, we have A, T E ' + { x ~ a t } K e~2 : T ~. Because we can assume f t v ( c l . . , end) A f t v ( T E ' ) = 0 by appropriate renaming of bound equality type variables and type variables in a, we can apply (6) to get A, T E t F- l e t x : a t = e~t i n e~ : 7 ~, as required. TE + {x ~

CASE e

=

eqr

From (10), we have A F- T and T admits equality and A, T E K

eqr : v • T --~ bool. From (35), we have E ~-eq T => e ~ and E K e % =~ e t. By assumptions, we have E U T E t and because A }- T, we can apply Proposition 1 to get A, T E t F- e t : T • ~- --~ bool, as required.

7.2

Typable Expressions are Translatable

We now demonstrate t h a t all typable expressions m a y indeed be translated by the translation rules. The following proposition states t h a t for a type t h a t admits equality it is possible to construct a function t h a t checks for equality on pairs of values of this type.

Proposition 3. I f T a d m i t s equality and fetv(T) C D o m ( E ) t h e n there exists an expression e such that E }-eq T ~ e. Proof. By induction over the structure of T.

T h e following proposition states t h a t all typable expressions m a y be translated.

Proposition 4. (Typable Expressions are Translatable) I f A , T E K e : T and A = D o m ( E ) a n d D o m ( T E ) A R a n ( E ) = 0 t h e n there exists e' s u c h that EF-e~e t.

146 Proof. By induction over the structure of e. We show the three interesting cases.

~

From (5), we have T E ( x ) = a a n d a >_ r via S and A I- i l ( S ) ~l(s) : T. Let i l ( S ) be written as (T1,..., Tn, ...). Because A ~- i l ( S ) , we have A ~- ri, i = 1..n, hence, fetv(Ti) C Dora(E), i = 1..n. Further, from the definition of substitution, we have vi admits equality, i = 1..n. We can now apply Proposition 3 to get, there exists an expression ei such t h a t E ~-eq Ti ~ ei, i = 1..n. By applying (34), we have E ~- xil(s) =~ ( " " (xil(s) 6 1 ) . . . en), as required. From (6), we have a = Vg~.T and ftv(g*(~) A ftv(A, T E ) = ~ and A U f e t v ( ~ , T E ~ 61 : V and A, T E + {x ~ a} t- e2 : T ' and A , T E F e : T I. Write g* as 61 "-" 6n and let yl "'" Yn be fresh. Further, let E ' = E + {61 ~'+ y l , . . . , e n ~-+ y~}. By assumptions, we have A U f e t v ( ~ = D o m ( E ' ) and D o m ( T E ) N R a n ( E ' ) = 0. We can now apply induction to get, there exists an expression e~ such t h a t E ' F- 61 =~ e~. Also, let e~' = Ayl : ~ 2 . " " .Ayn : Tn.e~, where Ti = 6i X Ci -+ bool, i = 1..n. By assumptions and by appropriate renaming of x in e, we have D o m ( T E + {x ~ a}) n R a n ( E ) = 0, hence, we can apply induction to get, there exists e~ such t h a t E ~- e2 =~ e~. Letting a ' = Vgd.T1 --+ -.. -+ ~'n ~ v, we can apply (36) to get E }- e ~ l e t x : a ' = e~' i n e~, as required. CASE

e ---- l e t x : a = el i n e2

CASE e =

eq~_ From (10), we have A }- T and T admits equality and A, T E ~-

eq~ : T X T --+ bool. Because A = D o m ( E ) follows from assumptions and A F- T, we have fetv(7-) C_ D o m ( E ) , hence, from Proposition 3, we have, there exists an expression e' such t h a t E F-eq T ~ e'. From (35), we now have E ~- e =~ e', as required.

8

Semantic

Soundness

In this section, we demonstrate semantic soundness of the translation inspired by other proofs of semantic soundness of type systems 9, 22. Because equality functions are represented differently in the original p r o g r a m and the translated program, the operational semantics m a y assign different values to them. For this reason, we define a notion of s e m a n t i c equivalence between values corresponding to the original p r o g r a m and values corresponding to the translated program. We write it F ~ v : T ~ V'. T h e t y p e is needed to correctly interpret the values and to ensure well-foundedness of the definition. T h e environment F is formally a pair (T'e, F a) providing interpretations of equality type variables and ordinary type variables in ~'. Interpretations are n o n - e m p t y sets ) of pairs (Vl,V2) of values. We often abbreviate projections f r o m / " and injections in F. For instance, when F = ( F ~, F~), we w r i t e / ' ( e ) to m e a n F~(c) and F + {c~ ~-~ V} to mean ( F 6, F ~ + {(~ ~ ~))), for any 6, a, and V. - F ~ t r u e : bool ~ t r u e

147 1" ~ false : bool ~ false

-

!

!

-- 1" ~ ( V l , V2): 7"1 X T2 ~, (Vl, V2) iff 1" ~ Vl : 7"1 ~ V~ and 1" ~ v2:7"2 ~ v~ - 1" ~ e q : bool x bool --+ bool ,~ e q

- F ~ eq : T X T --~ bool ~ clos(Ax.e, g) iff for all values Vl, v2, v~ such that 1"~Vl:TXT~V~ and ~-eqVl~V2, w e h a v e g + { x ~ v ~ } ~ - e l ~ v 2 - 1" ~ c l o s ( A x . e , g ) : r~ --+ T2 ~ c l o s ( A x . e ' , g ' ) iff for all values v~, v2, v~ such that 1" ~ v~ : T1 ~ V~ and g + {x ~-~ Vl} ~- e ~ v2, there exists a value v~ such that g' + {x ~ v~ ) }- e' lI v~ and 1" ~ v2 : 7"2 ~ v~ - 1"~v:~v'iff(v,v') 9 - 1" ~ v : r ~ v' iff (v,v') 9 1"(~) T h e semantic equivalence relation extends to type schemes and environments: -- 1" ~

V : VOll'''Oln.T

~

V ! iff for all

interpretations ;l ~ . . . l ; ~ , we h a v e

~ c l o s ( I m . . . . Ayn.e,g) iff for all interpretations 12~-.. 12~12~.-. lyre, values Vl.. "Vn and semantic environments 1"', such that 1"' ~ eq : ei x ei --+ bool ~ vi, i = 1..n and 1"' = 1" + {ex ~-~ ~ ) , " ' , en ~ ;~, c~1 ~ 1 ) ~ , ' " , c~m ~ 12~}, we have there exists a value v' such that 1"' ~ v : r ~-, v ~ and g + {Yl ~ v l , . . . , yn ~ v n } i- e ~ v' - 1" ~ g : T E "~E g' iff Dora(g) = D o m ( T E ) and Dom(g) C_ Dom(g') and for all x 9 Dom(s we have 1" ~ s : T E ( x ) ~-, s Further, for all e 9 Dom(E) we have 1" ~ e q : e x e --+ bool ,~ g ' ( E ( e ) ) - 1" ~ v : V e l . . . e n C ~ l . . . a ~ . r

The following proposition states that a generated equality function for a given type has the expected semantics. We leave elimination of type information from typed expressions implicit. P r o p o s i t i o n 5. If E ~'eq T : ~ e and f o r all ~ 6 D o m ( E ) we have F ~ eq : -+ bool ~ E(E(e)) then there exists a value v such that g }- e ~ v and F~eq:rxr-+bool,,~v.

e xe

Proof. By induction over the structure of r.

The semantic equivalence relation is closed with respect to substitution. P r o p o s i t i o n 6. L e t S be a substitution ({61 I--ff T 1 , . . . , 6 n ~ T n } , {Ol1 T ~ , ' - - , a m ~ ~-~}). Define 1)~ = {(v,v') F ~ v : T, ~ v'}, i = 1..n and v? = {(v,v') I r :r' i = 1..m. T h e n F + {el ~ V { , . . . 6 n r

v :

~ V~,(*l ~ V ~ , " ' , a m

~ V ~ } ~ v : T ,~ v' i f f

v'.

Proof. By induction over the structure of T.

We can now state a semantic soundness proposition for the translation. Proposition

7. ( S e m a n t i c

Soundness)

If A, T E ~- e : r a n d E }- e :=~ e'

and 1" ~ s : T E ~-'E g' and g t- e ~ v then there exists a value v' such that s e I lI v I and F ~ v : T ~, VI.

148 Proof. B y induction over the s t r u c t u r e of e. We show t h e three interesting cases.

I CASE e =

x~z, il = ( r l , . . - ,

T~, T I , - ' - , Tin), n _> 1

'

I

P r o m assumptions, (11),

(5), the definition of semantic equivalence, a n d the definition of instantiation, we h a v e / " ~ v : a ~ v" and v" = $ ' ( x ) a n d a = V e l . . - e ~ c q - - . C~m.T' a n d T E ( x ) = a and S = ({el ~ n , ' " , e ~ ~ Tn},{al ~ T~,''',am ~ r ~ } ) . Because n _> 1, we have v" = c l o s ( ~ y l . . . . 9 ~-Y n . e t , g"~), for some l a m b d a variables Yl "'" yn, expression e ~, and d y n a m i c environment $ " . Prom a s s u m p t i o n s and (34), we have F ~ $ : T E "~E $~ and E F-eq Ti ==~ ei, i = 1..n, hence, we can apply P r o p o s i t i o n 5 n times to get, there exist values vi, i = 1..n such t h a t F ~ eq : Ti X 7-i -+ bool ~ vi a n d C ~ F- ei ~ Vi, i = 1..n. Letting 2 = { ( v , v ' ) F ~ v : 7"i ~ v'}, i = 1..n a n d Y~ = { ( v , v ' ) l F ~ v : T ~V'},i= 1..m a n d F ' = F + { e l ~-+ 1, "',e,~ ~+ 2~,cq ~-+ V ~ , . . . , a m ~-+ 2~}, we can a p p l y P r o p o s i t i o n 6 to get F ~ ~ eq : Ei • ei --> bool ~ vi, i = 1..n. From the definition of semantic equivalence, we now have, there exists a value v ~ such t h a t F ~ ~ v : T ~ ~ V~ and C" + {Yl ~-~ v x , ' " , y n ~-r v n } Fe' ~ v t. Now, because v " = $ ' ( x ) and E' }- ei ~ vi, i = 1..n, we can derive $ ' f- ( . . . (x e l ) . . , en) ~ v' from (13), (11), and (12)9 B y a p p l y i n g P r o p o s i t i o n 6 again, we get F ~ v : r ~-. v ~, as required 9 I CASE e = eqr,

p r o m assumptions, (17), (35), and the definition of semantic

equivalence, we have from P r o p o s i t i o n 5 t h a t there exists a value v ~ such t h a t ~ F- e ~ ~ v I a n d F ~ eq : T ~ • r I --+ bool .~. v ~, as required 9 CASE e = l e t x : ~r = e I i n e2, a 9

= ~1

**

n > 1 I Write ~ in the form

*~n~ 9

Ot

a~ . . . am. Let V . 9 ~ V ~ 9 9 Y~ be interpretations, let v~ q . . . v,~q be values, and let F ' be a semantic environment such t h a t F ~ = F + {ca ~ ~ ) , ' " , e n ~-~ )ne, a~ ~ V ~ , ' " , a m ~-+ Vr~} a n d F ' ~ e q : ~i • r ~ bool ,~ v~~q , i = 1..n. p r o m assumptions and from (36), we have yl Yn are chosen fresh a n d E ~ ~e~ =~ e and e~~ = Ay~ : TX.--..Ay~ : Tn.e~l, where ri = ei • ei -~ bool, i = 1..n and E ~ = E + {el ~-> y ~ , - " , e n ~-4 Yn). p r o m the definition of semantic equivalence, we can now establish F ~ ~ e q : e • e --+ bool ~ g " ( E ' ( e ) ) , for all e ~ D o m ( E ' ) , where g " = S ~ + {Yx ~-> v~q, " ' ' , y n ~ veq}, and hence _F~ ~ $ : T E .~E' s P r o m assumptions, (6), and (22), we have A U fetv(e~ . . . e n ) , T E ~ e~ : T and l ~- e~ ~ v~ a n d because we have E ~ ~- e~ =~ e~, we can a p p l y induction t o get, there exists a value v~ such t h a t $" }- e 3) v~ and F ' ~ vx : T ~ V~. Letting v'~' = clos(Ay~. 9 9.Ay~.e~, g), we have from the definition of semantic equivalence t h a t /" ~ v~ : a ~ v~~ and F ~ s ~ Vx} : T E + { x a} ~ E $ ' + {x ~ v~). From assumptions and from (22), (6), and (36), we have s and A, T E + { x ~ a } ~ - e ~ : T ' a n d $ l - e 2 = ~ e ~ , h e n c e , we can apply induction a second time to get, there exists a value v~ such t h a t " ' "

+ {x

e

and r

We can now apply (12) to get $ ' ~- e~' $~ }- e' ~ v~, as required.

~ v~',

hence, we can apply (22) to get

We now r e t u r n to the value restriction issue9 T h e t r a n s l a t i o n rule for the let-construct does not preserve semantics unless (1) e~ is k n o w n to t e r m i n a t e

149 and not to have side effects or (2) no equality type variables are generalised. In the language we consider, (1) is always satisfied. For Standard ML, the value restriction always enforces either (1) or (2). However, the restriction is enforced by limiting generalisation to so called non-expansive expressions, which include function applications. Adding such a requirement to the typing rule for the letconstruct makes too few programs typable; to demonstrate type correctness for the translation, applications of functions to generated equality functions must also be considered non-expansive. 9

Extension

to Full ML

It is straightforward to extend equality elimination to allow imperative features and to allow a letrec-construct for declaration of recursive functions. We now demonstrate how the approach is extended to deal with parametric datatypes and modules.

9.1

D a t a t y p e Declarations

In Standard ML, lists may be implemented by a datatype declaration d a t a t y p e ~ list = : : o f a • ~ list I nil Because lists are declared to be parametric in the type of the elements, it is possible to write polymorphic functions to manipulate the elements of any list. In general datatype declarations may be parametric in any number of type variables and they may even be declared mutually recursive with other datatype declarations. The datatype declaration for lists elaborates to the type environment {list ~

(t, { : : ~ Vc~.c~ x c~ t --+ ~ t, nit ~ V a . ~

t})}

where t is a fresh type name 11. Every type name t possess a boolean attribute that denotes whether t admits equality. In the example, t will indeed be inferred to admit equality. This property of the type name t allows values of type T t to be checked for equality if T admits equality. When a datatype declaration elaborates to a type environment, an equality function is generated for every fresh type name t in the type environment such that t admits equality. For a parametric datatype declaration, such as the list datatype declaration, the generated equality function is parametric in equality functions for parameters of the datatype. The Kit does not allow all valid ML programs to be compiled using equality elimination. Consider the datatype declaration d a t a t y p e c~ t = A of (a • a) t I B of a Datatypes of the above form are called n o n - u n i f o r m datatypes 15, page 86. It is possible to declare non-uniform datatypes in ML, but they are of limited

150 use, because ML does not support polymorphic recursive functions. In particular, it is not possible to declare a function in ML that checks values of nonuniform datatypes for structural equality. However, the problem is not inherent to equality elimination. Adding support for polymorphic recursion in the intermediate language would solve the problem. Other compilation techniques also have troubles dealing with non-uniform datatypes. The TIL compiler developed at Carnegie Mellon University does not support non-uniform datatypes due to problems with compiling constructors of such datatypes in the framework of intensional polymorphism 12, page 166. 0.2

Modules

The translation extends to Standard ML Modules 11. However, to compile functors separately, structures must contain equality functions for each type name that admits equality and that occurs free in the structure. Moreover, when constraining a structure to a signature, it is necessary to enforce the implementation of a function to follow its type by generating appropriate stub code. The body of a functor may then uniformly extract equality functions from the formal argument structure.

10

Implementation

The Kit compiles the Standard ML Core language by first elaborating and translating programs into an intermediate typed lambda language. At this point, polymorphic equality is eliminated. Then, a simple optimiser performs various optimisations inspired by 1 and small recursive functions are specialised as suggested in 17. The remaining phases of the compiler are based on region inference 24. Each value generated by the program resides in a region and region inference is the task of determining when to allocate and deallocate regions. Various analyses determine how to represent different regions at runtime 3. Some regions can be determined to only ever contain word-sized unboxed values, such as integers and booleans. Such regions need never be allocated. Other regions can be determined to only ever hold one value at runtime. Such regions may be implemented on the stack. Other regions are implemented using a stack of linked pages. The backend of the Kit implements a simple graph coloring technique for register allocation and emits code for the HP PA-RISC architecture 5.

10.1

Datatype Representation

The Kit supports different schemes for representing datatypes at runtime. The simplest scheme implements all constructed values (except integers and booleans) as boxed objects at runtime. Using this scheme, the list 1,2, for instance, is represented as shown in Fig. 1.

151

Jl Fig. 1. Boxed representation of the list 1, 2 with untagged integers. The Standard ML of New Jersey compiler version 110 (SML/NJ) implements lists as shown in Fig. 2, using the observation that pointers are four-aligned on most modern architectures 1. In this way, the two least significant bits of pointers to constructed values may be used to represent the constructor. However, because SML/NJ implements polymorphic equality and garbage collection by following pointers, only one bit remains to distinguish constructed values.

tta 1.

2. Inilp

Fig. 2. Unboxed representation of the list 1, 2 with tagged tuples and tagged integers. Utilising the two least significant bits of pointers to constructed values, we say that a type name associated with a datatype declaration is unboxed if the datatype binding declares at-most three unary constructors (and any number of nullary constructors) and for all argument types T for a unary constructor, T is not a type variable and T is not unboxed (for recursion, we initially assume that the declared type names of the declaration are unboxed.) A type I- is unboxed if it is on the form (T1,..., Tn) t and t is unboxed. The Kit treats all values of unboxed types as word-sized unboxed objects. Using this scheme, lists are represented uniformly at runtime as shown in Fig. 3. Efficient unboxed representations of many tree structures are also obtained using this scheme. In the context of separate compilation of functors, as implemented in Standard ML of New Jersey, version 0.93, problems arise when a unique representation of datatypes is not used 2. If instead functors are specialised for each application, no restrictions are enforced on datatype representations and no representation overhead is introduced by programming with Modules. Current research addresses this idea. 11

Experimental

Results

In this section, we present some experimental results obtained with the Kit and the Standard ML of New Jersey compiler version 110 (SML/NJ). The purpose of the experiments are (1) to assess the feasibility of eliminating polymorphic

152

rl

fnilr

Fig. 3. Unboxed representation of the list 1, 2 with untagged tuples and untagged integers. equality, (2) to assess the importance of efficient datatype representations, and (3) to compare the code generated by the Kit with that generated by SML/NJ. All tests are run on a HP PA-RISC 9000s700 computer. For SML/NJ, executables are generated using the expor t Fn built-in function. We use KitT to mean the Kit with a tagging approach to implement polymorphic equality. Further, KitE is the Kit with equality elimination enabled. In KitE, tagging of values is disabled as no operations need tags at runtime. Finally, KitEE is KitE with efficient representation of datatypes enabled. All versions of the Kit generate efficient equality checks for values that are known to be of base type (e.g., int or real). Measurements are shown for eight benchmark programs. Four of these are non-trivial programs based on the SML/NJ distribution benchmarks ( l i f e , mandelbrot, k n u t h - b e n d i x and simple). The program f i b 3 5 is the simple Fibonacci program and m e r g e s o r t is a program for sorting 200,000 pseudo-random integers. The programs l i f e and k n u t h - b e n d i x use polymorphic equality extensively. The program l i f e m is a monomorphic version of l i f e for which polymorphic functions are made monomorphic by insertion of type constraints. The program s i e v e computes all prime numbers in the range from 1 to 2000, using the Sieve of Eratosthenes. Running times for all benchmarks are shown in Fig. 4. Equality elimination, and thus, elimination of tags, appears to have a positive effect on the running time for most programs. In particular, the l i f e benchmark runs 48 percent faster under KitE than under KitT. However, programs do exist for which equality elimination has a negative effect on the running time of the program. There are potentially two reasons for a slowdown. First, extra function parameters to equality polymorphic functions may lead to less efficient programs. Second, functions generated by K i t e and KitEE for checking two structural values for equality do not check if the values are located on the same address. This check is performed by the polymorphic equality primitive of KitT. In principle, such a check could also be performed by equality functions generated by K i t e and KitEE. The k n u t h - b e n d i x benchmark runs slightly slower under KitE than under KitT. Not surprisingly, efficient representation of datatypes improves the running time of most programs - with up to 40 percent for the s i e v e benchmark. The Kit does not implement the minimum typing derivation technique for decreasing the degree of polymorphism 4. Decreasing the degree of polymorphism has been reported to have a great effect on performance; it makes it possible to transform slow polymorphic equality tests into fast monomorphic ones 19, 18. Due to the decrease in polymorphism, the l i f e m benchmark is 12 percent faster than the l i f e benchmark (under KitEE).

153 time

fibS5

10.9 9.01 life 35.4 lifem 35.2 mergesort 12.9 mandelbrot 35.4 knurh-bendix 26.4 simple 47.1 sieve

10.2 6.18 18.5 16.2 11.9 32.3 26.7 40.7

10.1 3.71 18.1 16.0 9.25 31.9 23.3 40.6

18.5 9.24 5.28 5.25 15.9 7.17

17.7 15.5

Fig. 4. Running times in seconds for code generated by three versions of the Kit and SML/NJ, measured using the UNIX time program. Space usage for the different benchmarks is shown in Fig. 5. No b e n c h m a r k p r o g r a m uses more space due to elimination of equality. For p r o g r a m s allocating a large amount of memory, equality elimination, and thus, elimination of tags, reduces m e m o r y significantly - with up to 31 percent for the s i m p l e program. Efficient d a t a t y p e representation reduces space usage further up to 33 percent for the m e r g e s o r t program.

Space usage

I KitTI

KitEIKitEESHL/NJII 108 108 108 1,380 sieve 1,248 1,052 736 6,180 life 428 376 272 1,408 lifem 428 376 272 1,420 mergesort 16,000 13,000 8,728 18,000 mandelbrot 304 296 296 712 Imuth-bendix 4,280 3,620 2,568 2,724 simple 1,388 960 748 2,396 fib35

Fig. 5. Space used for code generated by the three versions of the Kit and SML/NJ. All numbers are in kilobytes and indicate maximum resident memory used, measured using the UNIX top program. Sizes of executables for all benchmarks are shown in Fig. 6. Equality elimination does not seem to have a dramatic effect on the sizes of the executables. Efficient d a t a t y p e representation reduces sizes of executables with up to 22 percent for the l i f e benchmark. The Kit and S M L / N J are two very different compilers. There can be dramatic differences between using region inference and reference tracing garbage collection, thus, the numbers presented here should be read with caution. The Kit currently only allows an argument to a function to be passed in one register. Moreover, the Kit does not allocate floating point numbers in registers. Instead, floating point numbers are always boxed. The b e n c h m a r k programs mandelbrot and s i m p l e use floating point operations extensively. No doubt, efficient calling

154 Program size KitTKitEKitEESML/NJJ fib35

mergesort mandelbrot 8 12 knuth-bendix 160 168

72 72 20

0 29 17 17 40

12 140

37 71

simple

328

199

sieve

life lifem

0 16 92 92i 20

0 20 92 92 24

356 352

106

Fig. 6. Sizes of executables (with the size of the empty program subtracted) for code generated by three versions of the Kit and SML/NJ. All numbers are in kilobytes. conventions and register allocation of floating point numbers will improve the quality of the code generated by the Kit.

12

Conclusion

The translation suggested in this paper makes it possible to eliminate polymorphic equality completely in the front-end of a compiler. Experimental results show that equality elimination can lead to important space and time savings even for programs that use polymorphic equality. Although tags may be needed at runtime to implement reference tracing garbage collection, it is attractive to eliminate polymorphic equality at an early stage during compilation. Various optimisations, such as boxing analysis 9, 7, must otherwise treat polymorphic equality distinct from other primitive operations. Checking two arbitrary values for equality may cause both values to be traversed to any depth. This is in contrast to how other polymorphic functions behave. Further, no special demands are placed on the implementor of the runtime system and the backend of the compiler. For instance, there is no need to flush all values represented in registers into the heap prior to testing two values for equality. Acknowledgements. I would like to thank Lars Birkedal, Niels Hallenberg, Fritz Henglein, Tommy Hcjfeld Olesen, Peter Sestoft, and Mads Tofte for valuable comments and suggestions.

References 1. Andrew Appel. Compiling With Continuations. Cambridge University Press, 1992. 2. Andrew Appel. A critique of Standard ML. In Journal of Functional Programming, pages 3(4):391-429, October 1993. 3. Lars Birkedal, Mads ToRe, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In 23st ACM Symposium on Principles of Programming Languages, January 1996.

155 4. Nikolaj Bjorner. Minimal typing derivations. In ACM Workshop on Standard ML and its Applications, June 1994. 5. Martin Elsman and Niels Hallenberg. An optimizing backend for the ML Kit using a stack of regions. Student Project, July 1995. 6. Robert Harper and Chris Stone. An interpretation of Standard ML in type theory. Technical report, Carnegie Mellon University, June 1997. CMU-CS-97-147. 7. Fritz Henglein and Jesper J0rgensen. Formally optimal boxing. In 21st ACM Symposium on Principles of Programming Languages, pages 213-226, January 1994. 8. Mark Jones. Dictionary-free overloading by partial evaluation. In ACM Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Orlando, Florida, June 1994. 9. Xavier Leroy. Unboxed objects and polymorphic typing. In 19th ACM Symposium on Principles of Programming Languages, pages 177-188, 1992. 10. Xavier Leroy. The Objective Carol system. Software and documentation available on the Web, 1996. 11. Robin Milner, Mads Torte, Robert Harper, and David MacQueen. The Definition of Standard ML (Revised). MIT Press, 1997. 12. Greg Morrisett. Compiling with Types. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, December 1995. 13. Martin Odersky, Philip Wadler, and Martin Wehr. A second look at overloading. In 7'th International Conference on Functional Programming and Computer Architecture, June 1995. 14. Atsushi Ohori. A Polymorphic Record Calculus and its Compilation. ACM Transactions on Programming Languages and Systems, 17(6), November 1995. 15. Chris Okasaki. Purely Functional Data Structures. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, September 1996. 16. John Peterson and Mark Jones. Implementing type classes. In ACM Symposium on Programming Language Design and Implementation, June 1993. 17. Manuel Serrano and Pierre Weis. Bigloo: a portable and optimizing compiler for strict functional languages. In Second International Symposium on Static Analysis, pages 366-381, September 1995. 18. Zhong Shao. Typed common intermediate format. In 1997 USENIX Conference on Domain-Specific Languages, Santa Barbara, CA, Oct 1997. 19. Zhong Shao and Andrew Appel. A type-based compiler for Standard ML. Technical report, Yale University and Princeton University, November 1994. 20. David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee. TIL: A type-directed optimizing compiler for ML. In ACM Symposium on Programming Language Design and Implementation, 1996. 21. David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee. The TIL/ML compiler: Performance and safety through types. In Workshop on Compiler Support for Systems Software, 1996. 22. Mads Torte. Type inference for polymorphic references. Information and Computation, 89(1), November 1990. 23. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg, Tommy H0jfeld Olesen, Peter Sestoft, and Peter Bertelsen. Programming with regions in the ML Kit. Technical report, Department of Computer Science, University of Copenhagen, April 1997. 24. Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Information and Computation, 132(2):109-176, 1997. 25. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad hoc. In 16th ACM Symposium on Principles of Programming Languages, January 1989.

Optimal Type Lifting* Bratin Saha and Zhong Shao Dept. of Computer Science Yale University New Haven, CT 06520-8285 {saha, shao}@cs, yale. e d u

A b s t r a c t . Modern compilers for ML-like polymorphic languages have used explicit run-time type passing to support advanced optimizations such as intensional type analysis, representation analysis and tagless garbage collection. Unfortunately, maintaining type information at run time can incur a large overhead to the time and space usage of a program. In this paper, we present an optimal type-lifting algorithm that lifts all type applications in a program to the top level. Our algorithm eliminates all run-time type constructions within any core-language functions. In fact, it guarantees that the number of types built at run time is strictly a static constant. We present our algorithm as a type-preserving source-to-source transformation and show how to extend it to handle the entire SML'97 with higher-order modules.

1

Introduction

Modern compilers for ML-like polymorphic languages 16,17 usually use variants of the Girard-Reynolds polymorphic A-calculus 5, 26 as their intermediate language (IL). Implementation of these ILs often involves passing types explicitly as p a r a m e t e r s 32, 31, 28 at runtime: each polymorphic type variable gets instantiated to the actual type through run-time type application. Maintaining type information in this manner helps to ensure the correctness of a compiler. More importantly, it also enables m a n y interesting optimizations and applications. For example, both pretty-printing and debugging on polymorphic values require complete type information at runtime. Intensional type analysis 7, 31, 27, which is used by some compilers 31, 28 to support efficient d a t a representation, also requires the propagation of type information into the target code. Runtime type information is also crucial to the implementation of tag-less garbage collection 32, pickling, and type dynamic 15. * This research was sponsored in part by the DARPA ITO under the title "Software Evolution using HOT Language Technology", DARPA Order No. D888, issued under Contract No. F30602-96-2-0232, and in part by an NSF CAREER Award CCR9501624, and NSF Grant CCR-9633390. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government.

157

However, the advantages of runtime type passing do not come for free. Depending on the sophistication of the type representation, run-time type passing can add a significant overhead to the time and space usage of a program. For example, Tolmach 32 implemented a tag-free garbage collector via explicit type passing; he reported that the memory allocated for type information sometimes exceeded the memory saved by the tag-free approach. Clearly, it is desirable to optimize the run-time type passing in polymorphic code 18. In fact, a better goal would be to guarantee that explicit type passing never blows up the execution cost of a program. Consider the sample code below - we took some liberties with the syntax by using an explicitly typed variant of the Core-ML. Here A denotes type abstraction, A denotes value abstraction, xG denotes type application and x(e) denotes term application. pair = As. Ax:s*s. let f = At.Ay:t .... (x , y) in ,,. fs*e(x) ...

main

= Ac~. A a : ~ .

let doit = Ai:Int. let elem = Array.subG*c~ (a,i) in ... pair G (elem) ... loop = Anl :Int. An2 :Int. Ag :Int--~Unit. if nl <= n2 (g(nl) ; loop (nl + 1, n2, g) ) else () in loop (1,n,doit) Here, f is a polymorphic function defined inside function pair; it refers to the parameter x of pair so f cannot be easily lifted outside pair. Function main executes a loop: in each iteration~ it selects an element elem of the array a and then performs some computation (i.e, pair) on it. Executing the function dolt results in three type applications arising from the Array. sub function, pair, and f. In each iteration, sub and pair are applied to types G * G and G respectively. A clever compiler m a y do a loop-invariunt removal I to avoid the repeated type construction (e.g., G * G) and application (e.g., pairs). But optimizing type applications such as f Is*s is less obvious; f is nested inside pair, and its repeated type applications are not apparent in the doi~ function. W e m a y type-specialize f to get rid of the type application but in general this m a y lead to substantial code duplication. Every time dolt is called, pairs gets executed

and then every time p a i r is called, f s*s will be executed. Since loop calls d o l t repeatedly and each such call generates type applications of p a i r and f, we are forced to incur the overhead of repeated type construction and application. If the type representation is complicated, this is clearly expensive.

158 In this paper, we present an algorithm that minimizes the cost of run-time type passing. More specifically, the optimization eliminates all type application inside any core-language function - it guarantees that the amount of type information constructed at runtime is a static constant. This guarantee is important because it allows us to use more sophisticated representations for run-time types without having to worry about the run-time cost of doing so. The basic idea is as follows. We lift all polymorphic function definitions and type applications in a program to the "top" level. By top level, we mean "outside any core-language function." Intuitively, no type application is nested inside any function abstraction (A); they are nested only inside type abstractions (A). All type applications are now performed once and for all at the beginning of execution of each compilation unit. In essence, the code after our type lifting would perform all of its type applications at "link" time. 1 In fact, the number of type applications performed and the amount of type information constructed can be determined statically. This leads us to a natural question. Why do we restrict the transformation to type applications alone? Obviously the transformation could be carried out on value computations as well but what makes type computations more amenable to this transformation is the guarantee that all type applications can be lifted to the top level. Moreover, while the transformation is also intended to increase the runtime efficiency, a more important goal is to ensure that type passing in itself is not costly. This in turn will allow us to use a more sophisticated runtime type representation and make greater use of type information at runtime. We describe the algorithm in later sections and also prove that it is both typepreserving and semantically sound. We have implemented it in the F L I N T / M L compiler 28 and tested it on a few benchmarks. We provide the implementation results at the end of this paper.

2

The Lifting Algorithm for Core-ML

This section presents our optimal type lifting algorithm. We use an explicitly typed variant of the Core-ML calculus 6 (Figure 1) as the source and target languages. The type lifting algorithm (Figure. 2) is expressed as a type-directed program transformation that lifts all type applications to the top-level.

2.1

The language

We use an explicitly typed variant of the Core-ML calculus 6 as our source and target languages. The syntax is shown in Figure 1. The static and dynamic semantics are standard, and are given in the Appendix (Section 7). Here, terms e consist of identifiers (x), integer constants (i), function abstractions, function applications, and let expressions. We differentiate between 1 We axe not referring to "link time" in the traditional sense. Rather, we axe referring to the run time spent on module initialization and module linkage (e.g., functor application) in a ML-style module language.

159

(con~8)

~ ::= t I I n t I ~I ~ ~2

(tEr?7~8) E ::= i I X I .~x:~.e I ~XlX2 I let x = e in e' l let x =

A~.e.

in e f x ~

(vterms) e, ::= i I x I Ax:#.e I let x = ev in e~ I let x = A~.ev in e~ I x~-~ Fig. 1. An explicit Core-ML calculus

monomorphic and polymorphic let expressions in our language. We use ~ (and ~-) to denote a sequence of type variables tl, ..., tn (and types) so V~. # is equivalent to V t l . . . V t n . p . The v t e r m s (ev) denote values - terms t h a t are free of side-effects. There are several aspects of this calculus t h a t are worth noting. First, we restrict polymorphic definitions to value expressions (ev) only, so t h a t moving type applications and polymorphic definitions is semantically sound 33. Variables introduced by normal A-abstraction are always monomorphic, and polymorphic functions are introduced only by the let construct. In our calculus, type applications of polymorphic functions are never curried and therefore in the algorithm in Figure 2, the exp rule assumes t h a t the variable is monomorphic. The tapp rule also assumes that the type application is not curried and therefore the newly introduced variable v (bound to the lifted type application) is monomorphic and is not applied further to types. Finally, following SML 17, 16, polymorphic functions are not recursive. 2 This restriction is crucial to proving that all type applications can be lifted to the top level. T h r o u g h o u t the paper we take a few liberties with the syntax: we allow ourselves infix operators, multiple definitions in a single l e t expression to abbreviate a sequence of nested l e t expressions, and t e r m applications t h a t are at times not in A-Normal form 4. We also use indentation to indicate the nesting.

2.2

Informal description

Before we move on to the formal description of the algorithm, we will present the basic ideas informally. Define the depth of a t e r m in a p r o g r a m as the number of )~(value) abstractions within which it is nested. Consider the terms outside all value abstractions to be at depth zero. Obviously, terms at depth zero occur outside all loops in the program. In a strict language like ML, all these terms are evaluated once and for all at the beginning of program execution. To avoid repeated t y p e applications, the algorithm therefore tries to lift all of t h e m to depth zero. But since we want to lift type applications, we must also lift the polymorphic functions to depth zero. The algorithm scans the input p r o g r a m and collects all the type applications and polymorphic functions occuring at depths greater t h a n zero and adds t h e m to a list H. (In the algorithm given in Figure 2, the depth is implicitly 2 Our current calculus does not support recursive functions but they can be easily added. As in SML, recursive functions are always monomorphic.

160 assumed to be greater than zero). When the algorithm returns to the top level of the program, it dumps the expressions contained in the list.

We will illustrate the algorithm on the sample code given in Section 1. In the example code, f Is*s is at depth 1 since it occurs inside the Ax, A r r a y . sub a * a and p a i r c~ are at depth 2 since they occur inside the Aa and Ai. We want to lift all of these type applications to depth zero. Translating main first, the resulting code becomes -

pair

= As. Ax:s*s. let f = A t . A y : t . . . . (x , y) in ... fs*s (x) ...

main

= As. let Vl = A r r a y . s u b G , G v2 = pair G in A a : G . let

dolt

-- Ai:Int. let e l e m in

...

= Vl (a,i)

v2(elem)

...

loop = Anl :Int. An2 :Int. A E :Int-+Unit. if nl <= n2

(gCnl) ; loop (nl +i ,n2 ,g) ) () in loop(1,n,doit) else

We then lift the type application of f (inside p a i r ) . This requires us to lift f ' s definition by abstracting over its free variables. In the resulting code, all type applications occur at depth zero. Therefore when main is called at the beginning of execution, v l , v2 and v3 get evaluated. During execution, when the function l o o p runs through the array and repeatedly calls function d o i t , none of the type applications need to be performed - the type specialised functions v l , v2 and v3 can be used instead.

161 p a i r ffi As. let

f = At.Ax:s*s. Ay:t . . . .

(x , y)

v s ffi f s * s in

Ax:s*s

....

(v3(x))(x)

...

main = As. let vl = A r r a y . s u b ~ * G v2 = p a i r G in

Aa:~. let d o l t = A i :Int. let

elem

in

...

= vl(a,i) v2(elem)

...

loop = Anl :Int. An2 :Int. A g :I n t - + U n i t . if nl
(g(nl) ; loop (nl +1 ,n2 ,g) ) e l s e () in l o o p ( 1 , n , d o i t )

2.3

Formalization

Figure 2 shows the type-directed lifting algorithm. The translation is defined as a relation of the form F F- e : # :~ e'; H ; F , t h a t carries the meaning t h a t F F- e : # is a derivable typing in the input program, the translation of the input t e r m e is the t e r m e', and F is the set of free variables of e'. (The set F is restricted to the monorphically typed free variables of e') The header H contains the polymorphic functions and type applications occuring in e at depths greater t h a n zero. The final result of lifting a closed t e r m e of type # is L E T ( H , e') where the algorithm infers 0 ~- e : # :~ e'; H ; 0. The function L E T ( H , e) expands a list of bindings H = ( x l , e l ) , . . . , (xn,en) and a t e r m e into the resulting t e r m let zl = el in . . . i n let xn = e,~ in e. T h e environment F maps a variable to its type and to a list of the free variables in its definition. In the algorithm, we use standard notation for lists and operations on lists; in addition, the functions List and Set convert between lists and sets of variables using a canonical ordering. The functions A* and @* are defined so t h a t ) J L . e and @*vL reduce to AXl :#1 . . . . )~x,~ :#,.e and @(... (@vxl) ...)xn, respectively, where L = xl : # 1 , . - . , xn : #hi. Rules (exp) and (app) are just the identity transformations. Rule (fn) deals with abstractions. We translate the b o d y of the abstraction and return a header H containing all the type applications and type functions in the t e r m e. The translation of monomorphic l e t expressions is similar. We translate each of the subexpressions replacing the old terms with the translated t e r m s and return this as the result of the translation. The header H of the translation is the concatenation of the headers H1 and/-/2 from the translation of the subexpressions.

162

r(~) = (,, _) F k x : ~ =~ x;O;{x : #}

Fki

: Int=~ i;0;0

r(x~) = (~i,-) r(~l) = O, -+ ~ , - ) F F- @xix2 : #2 =v @xlx2;0;{xl : m ~ ~2,x2 : m }

(app)

Fx ~-~ (1~,_) }- e : #' =V e'; H; F F F- Ax : #.e : # -+ #' =~ Ax : I~.e'; H; F \ { x : Iz}

~)

!

F F- el : I~1 =~ e'I;H1;F1 Fx ~, (#1,-) l- e~ :/~2 =~ e2;H2;F2 F ~- let x = el in e2:~2 =~ let x = e~ in e'2;H1IH2;F1 U ( F 2 \ ( x : jui})

(~/~)

F } - e l :1~1 =~e~;H1;F1 L=List(F1) Fx ~ (V~.~u,, L) l- e2:/.,2 : ~ e~, " H 2 ; F2 /" F- let x

A ~ . e l i n e ~ : i z 2 ~ e 2 ; ( ix ,

A tm~ . L E T (

H

1,~*L.e~))::H2;F2

H~

(tapp)

F(x) ----(V~./~, L) v a fresh variable F I- x~-7: #ilti# =~ ~*vL;(v, x~-T~; Set(L) Y H.

Fig. 2. The Lifting Translation

The real work is done in the last two rules which deal with type expressions. In rule (tfn), we first translate the body of the polymorphic function definition. H1 now contains all the type expressions t h a t were in el and F1 is the free variables of e~. We then translate the body of the l e t expression(e2). T h e result of the translation is only e~; the polymorphic function introduced by the l e t is added to the result header H r so t h a t it is lifted to the top level. The polymorphic function body (in H r ) is closed by abstracting over its free variables F1 while the h e a d e r / / 1 is dumped right after the type abstractions. Note t h a t since H r will be lifted to the top level, the expressions in/-/1 will also get lifted to the top level. The (tapp) rule replaces the type application by an application of the newly introduced variable (v) to the free variables(L) of the corresponding function definition. The type application is added to the header and lifted to the top level where it gets bound to v. Note t h a t the free variables of the translated t e r m do not include the newly introduced variable v. This is because when the header is written out at the top level, the translated expression remains in the scope of the d u m p e d header.

163

Proposition 1. Suppose I" t- e : # ~ e'; H; F. Then in the expression LET(H,e~), the term e ~ does not contain any type application and H does not contain any type application nested inside a value(A) abstraction.

This propostion can be proved by a simple structural induction on the structure of the source term e. T h e o r e m 1 (Full Lifting). Suppose F ~- e : # ~ e~; H; F. Then the expression LET(H,e'), does not have any type application nested inside a value abstraction. The theorem follows from Proposition 1. In the Appendix, we prove the type preservation and the semantic soundness theorems. 2.4

A closer look

There are two transformations taking place simultaneously. One is the lifting of type applications and the other is the lifting of polymorphic function definitions. At first glance, the lifting of function definitions may seem similar to lambda lifting 10. However the lifting in the two cases is different. Lambda lifting converts a program with local function definitions into a program with global function definitions whereas the lifting shown here preserves the nesting structure of the program. The lifting of type applications is similar in spirit to the hoisting of loop invariant expressions outside a loop. It could be considered as a special case of a fully lazy transformation 9, 24 with the maximal free subexpressions restricted to be type applications. However, the fully-lazy transformation as described in Peyton Jones 24 will not lift all type applications to the top level. Specifically, type applications of a polymorphic function that is defined inside other functions will not be lifted to the top level. Minamide 18 uses a different approach to solve this problem. He lifts the construction of type parameters from within a polymorphic function to the call sites of the function. This lifting is recursively propagated to the call sites at the top level. At runtime, type construction is replaced by projection from type parameters. His method eliminates the runtime construction of types and replaces it by projection from type records. The transformation also does not rely on the value restriction for polymorphic definitions. However, he requires a more sophisticated type system to type-check his transformation; he uses a type system based on the qualified type system of Jones 12 and the implementation calculus for the compilation of polymorphic records of Ohori 21. Our algorithm on the other hand is a source-to-source transformation. Moreover, Minamide's algorithm deals only with the Core-ML calculus whereas we have implemented our algorithm on the entire SML'97 language with higher-order modules. Jones 11 has also worked on a similar problem related to dictionary passing in Haskell and Gofer. Type classes in these languages are implemented by passing

164 dictionaries at runtime. Dictionaries are tuples of functions that implement the methods defined in a type class. Consider the following Haskell 8 example f

:: E q a => a -> a -> B o o l

f x y = (Ix

== y)

~

(y

== Ix)

The actual type of f is Eqa ~ a ~ a ~ B o o l . Context reduction leads to the type specified in the example. Here a means a list of elements of type a. E q a means that the type a must be an instance of the equality class. In a naive implementation, this function would be passed a dictionary for E q a and the dictionary for E q a would be constructed inside the function. Jones optimises this by constructing a dictionary for E q a at the call site of f and passing it in as a parameter. This is repeated for all overloaded functions so that all dictionaries are constructed statically. But this approach does not work with separately compiled modules since f ' s type in other modules does not specify the dictionaries that are constructed inside it. In Gofer 11, instance declarations are not used to simplify the context. Therefore the type of f in the above example would be Eqa ~ a -+ a ~ B o o l . Jones' optimisation of dictionary passing can now be performed in the presence of separately compiled modules. However, we now require a more complicated type system to typecheck the code. Assume two functions f and g have the same type (# --~ #'). Both f and g can be passed as a parameter to h in (h = Ax:# --+ #'.e). However, f and g could, in general, be using different dictionaries (d/ and dg). This implies that after the transformation, the two functions will have different types - df =~ # --+ #' and dg ~ # --~ #'. Therefore, we can no longer use f and g interchangeably. 3

The

Lifting

Algorithm

for FLINT

Till now, we have considered only the Core-ML calculus while discussing the algorithm. But what happens when we take into account the module language as well? To handle the Full-ML langauge, we compile the source code into the F L I N T intermediate language. The details of the translation are given in 29. F L I N T is based upon a predicative variant of the Girard-Reynolds polymorphic Acalculus 5, 26, with the term language written in A-normal form 4. It contains the following four syntactic classes: kinds (~), constructors (#), types (a) and terms (e), as shown in Figure 3. Here, kinds classify constructors, and types classify terms. Constructors of kind /2 name monotypes. The monotypes are generated from variables, from I n t , and through the --~ constructor. The application and abstraction constructors correspond to the function kind K1 --~ /~2. Types in Core-FLINT include the monotypes, and are closed under function spaces and polymorphic quantification. We use T(/z) to denote the type corresponding to the constructor # (when # is of kind/2). The terms are an explicitly

165

typed A-calculus (but in A-normal form) with explicit constructor abstraction and application.

(kinds)

(cons)

::= t IInt I ~1 --~ #2 I At::/~.~L~ I ~1~~2 (types) a : : = T( # ) I '~' ~ "~ I Vt::,~.,~ (terms) e ::= i x let x = el in e2 I @xxxz I ACx: T(l~).e I Amx :a.e I let x = At/:: ki.ev in ez

x~i

(values) ev ::= i I x I let x = e~ in e~ I Aex: T(,u).e I Amx:a.e let x = A~ :: ki.e~ in e~ I x~ui Fig. 3. Syntax of the Core-FLINT calculus

In ML, structures are the basic module unit and functors abstract over structures. Polymorphic functions may now escape as p a r t of structures and get initialized later at a functor application site. In the F L I N T translation 29, functors are represented as a polymorphic definition combined with a polymorphic abstraction ( f c t = A T :: ki.Amx : a.e). The variable x in the functor definition is polymorphic since the parameterised structure m a y contain polymorphic components. In the functor body e, the polymorphic components of x are instantiated by type application. Functor applications are a combination of a type application and a t e r m application, with the type application instantiating the type p a r a m e t e r s (t~s). Though abstractions model b o t h functors and functions, the translation allows us to distinguish between them. In the F L I N T calculus, Acx: T(#).e denotes functions, whereas Amx:a.e denotes functors. The rest of the t e r m calculus is standard. This calculus complicates the lifting since type applications arising from an abstracted variable (the variable x in f c t above) can not be lifted to the top level. This also differs from the Core-ML calculus in t h a t type applications m a y now be curried to model escaping polymorphic functions. However, the module calculus obeys some nice properties. Functors in a program always occur outside any Core-ML functions. T y p e applications arising out of functor p a r a m e t e r s (when the input structure contains a polymorphic component) can therefore be lifted outside all functions. Escaping polymorphic functions occur outside Core-ML functions. Therefore the corresponding curried type application is not nested inside Core-ML functions. Therefore a F L I N T source program can be converted into a well-formed prog r a m satisfying the following constraints - All functor abstractions (Am) occur outside function abstractions (At). - No partial type application occurs inside a function abstraction. We now redefine the depth of a t e r m in a p r o g r a m as the n u m b e r of function abstractions within which it is nested with depth 0 terms oceuring outside all

166

function abstractions only. Note that depth 0 terms may not occur outside all abstractions since they may be nested inside functor abstractions. We then perform type lifting as in Figure 2 for terms at depth greater than zero and lift the polymorphic definitions and type applications to depth 0. For terms already at depth zero, the translation is just the identity function and the header returned is empty. We illustrate the algorithm on the example code in Figure 4. The syntax is not totally faithful to the FLINT syntax in Figure 3 but it makes the code easier to understand. In the code in Figure 4, F is a functor which takes the structure

F = Ato.AmX:S, f = A%. let

id

=

Atl.ACx2.x2

vl

=

...

in

Vl

idInt(3)

....

v2 = (#1(X)) Eto

Fig. 4. Example FLINT code

X as a parameter. The type S denotes a structure type. Assume the first component of X (#1(X)) is a polymorphic function which gets instantiated in the functor body(v2), f is a locally defined function in the functor body. According to the definition of depth above, and v2 are at depth 0 even though they are nested inside the functor abstraction(AX). Moreover, the type application (#l(X))t0 is also at depth 0 and will therefore not be lifted. It is only inside the function f that the depth increases which implies that the type application idInt occurs at d > 0. The algorithm will lift the type application to just outside the function abstraction (Av), it is not lifted outside the functor abstraction (AX). The resulting code is shown in Figure 5. Is the reformulation merely an artifice to get around the problems posed by FLINT ? No, the main aim of the type lifting transformation is to perform all the type applications during "link" time---when the top level code is being executed--and eliminate runtime type construction inside functions. Functors are top level code and are applied at "link" time. Moreover they are nonrecursive. Therefore having type applications nested only inside functors results in the type applications being performed once and for all at the beginning of program execution. As a result, we still eliminate runtime type passing inside functions. To summarize, we note that depth 0 in Core-ML (according to the definition above) coincides with the top level of the program since Core-ML does not

167

F ffi Ato.AmX:S. f ffi let

id ffi Atl.Acx2.x2 Zl ffi id Int .. (Other type e x p r e s s i o n s in

let

.....

vl . . . . in

v2

in f's body)..

ACv. (type l i f t e d

z1(3)

body

of f)

....

V1

ffi (#1(x)) to

Fig. 5. FLINT code after type lifting

have functors; therefore the Core-ML translation is merely a special case of the translation for FLINT.

4

Implementation

We have implemented the type-lifting algorithm in the F L I N T / M L compiler version 1.0 and the experimental version of S M L / N J v109.32. All the tests were performed on a Pentium Pro 200 Linux workstation with 64M physical RAM. Figure 6 shows CPU times for executing the Standard ML benchmark suite with type lifting turned on and turned off. The third column (New Time) indicates the execution time with lifting turned on and the next column (Old Time) indicates the execution time with lifting turned off. The last column gives the ratio of the new time to the old time. Benchmark Simple Vliw

Description

A fluid-dynamics program A VLIW instruction scheduler lexgen lexical-analyzer generator ML-Yacc The ML-yacc M a n d e l b r o t Mandelbrot curve construction Kb-comp Knuth-Bendix Algorithm Ray A ray-tracer Life The Life Simulation Boyer A simple theorem prover

New Time Old Time Ratio

7.04 4.22 2.38 1.05 4.62 2.98 10.68 2.80 0.49

9.78 4.31 2.36 1.11 4.62 3.11 10.66 2.80 0.52

0.72 0.98 1.01 0.95 1.0 0.96 1.01 1.0 0.96

Fig. 6. Type Lifting Results

The current F L I N T / M L and S M L / N J compilers maintain a very minimal set of type information. Types are represented by integers since the compiler only

168 needs to distinguish primitive types (e.g., i n t , real) and special record types. As a result, runtime type construction and type application are not expensive. The test results therefore yield a moderate speedup for most of the benchmarks and a good speedup for one benchmark--an average of about 5% for the polymorphic benchmarks. Simple has a lot of polymorphic function calls occuring inside loops and therefore benefits greatly from lifting. Boyer and mandelbrot are monomorphic benchmarks (involving large lists) and predictably do not benefit from the optimization. Our algorithm makes the simultaneous uncurrying of both value and type applications difficult. Therefore at runtime, a type application will result in the formation of a closure. However, these closures are created only once at linktime and do not represent a significant penalty. We also need to consider the closure size of the lifted functions. The (tapp) rule in Figure 2 introduces new variables (the set L) which may increase the number of free variables of a function. Moreover after type applications are lifted, the type specialised functions become free variables of the function body. On the other hand, since all type applications are lifted, we no longer need to include the free type variables in the closure which decreases the closure size. We believe therefore that the increase in closure size, if any, does not incur a significant penalty. This is borne out by the results on the benchmark suite none of the benchmarks slows down significantly. The creation of closures makes function application more expensive since it involves the extraction of the environment and the code. However, in most cases, the selection of the code and the environment will be a loop invariant and can therefore be optimised. The algorithm is implemented in a single pass by a bottom up traversal of the syntax tree. The (tfn) rule shown in Figure 2 simplifies the implementation considerably by reducing the type information to be adjusted. In the given rule, all the expressions in/-/1 are dumped right in front of the type abstraction. Note however that we require to dump only those terms (in//1) which contain any of the t~s as free type variables. The advantage of dumping all the expressions is that the de Bruijn depth of the terms in/-/1 remains the same even after lifting. The algorithm needs to adjust the type information only while abstracting the free variables of a polymorphic definition. (The types of the abstracted variables have to be adjusted.) The implementation also optimises the number of variables abstracted while lifting a definition - it remembers the depth at which a variable is defined so that variables that will still remain in scope after the lifting are not abstracted. 5

Related

Work

and Conclusions

Tolmach 32 has worked on a similar problem and proposed a method based on the lazy substitution on types. He used the method in the implementation of the tag-free garbage collector. Minamide 18 proposes a refinement of Tolmach's method to eliminate runtime construction of type parameters. The speedups

169 obtained in our method are comparable to the ones reported in his paper. Mark P. Jones 11 has worked on the related problem of optimising dictionary passing in the implementation of type classes. In their study of the type theory of Standard ML, Harper and Mitchell 6 argued that an explicitly typed interpretation of ML polymorphism has better semantic properties and scales more easily to cover the full language. The idea of passing types to polymorphic functions is exploited by Morrison et al. 19 in the implementation of Napier. The work of Ohori on compiling record operations 21 is similarly based on a type passing interpretation of polymorphism. Jones 12 has proposed evidence passing---a general framework for passing data derived from types to "qualified" polymorphic operations. Harper and Morisett 7 proposed an alternative approach for compiling polymorphism where types are passed as arguments to polymorphic routines in order to determine the representation of an object. The boxing interpretation of polymorphism which applies the appropriate coercions based on the type of an object was studied by Leroy 14 and Shao 27. Many modern compilers like the FLINT/ML compiler 28, TIL 31 and the Glasgow Haskell compiler 22 use an explicitly typed language as the intermediate language for the compilation. Lambda lifting and full laziness are part of the folklore of functional programming. Hughes 9 showed that by doing lambda lifting in a particular way, full laziness can be preserved. Johnsson 10 describes different forms of lambda lifting and the pros and cons of each. Peyton Jones 25, 23, 24 also described a number of optimizations which are similar in spirit but have totally different aims. Appel 2 describes let hoisting in the context of ML. In general, using correctness preserving transformations as a compiler optimization 1, 2 is a well established technique and has received quite a bit of attention in the functional programming area. We have proposed a method for minimizing the cost of runtime type passing. Our algorithm lifts all type applications out of functions and therefore eliminates the runtime construction of types inside functions. The amount of type information constructed at run time is a static constant. We can guarantee that in Core-ML programs, all type applications will be lifted to the top level. We are now working on making the type representation in FLINT more comprehensive so that we can maintain complete type information at runtime.

6

Acknowledgements

We would like to thank Valery Trifonov, Chris League and Stefan Monnier for many useful discussions and comments about earlier drafts of this paper. We also thank the annonymous referees who suggested various ways of improving the presentation.

References 1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.

170 2. A. W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 3. N. de Bruijn. A survey of the project AUTOMATH. In To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pages 579-606. Edited by J. P. Seldin and J. R. Hindley, Academic Press, 1980. 4. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In Proc. ACM SIGPLAN '93 Conf. on Prog. Lang. Design and Implementation, pages 237-247, New York, June 1993. ACM Press. 5. J. Y. Girard. Interpretation FonctionneUe et Elimination des Coupures dans l'Arithmetique d'Ordre Superieur. PhD thesis, University of Paris VII, 1972. 6. R. Harper and J. C. Mitchell. On the type structure of Standard ML. ACM Trans. Prog. Lang. Syst., 15(2):211-252, April 1993. 7. R. Harper and G. Morrisett. Compiling polymorphism using intensional type analysis. In Twenty-second Annual A CM Syrup. on Principles of Prog. Languages, pages 130-141, New York, Jan 1995. ACM Press. 8. P. Hudak, S. P. Jones, and P. W. et al. Report on the programming language Haskell, a non-strict, purely functional language version 1.2. SIGPLAN Notices, 21(5), May 1992. 9. R. Hughes. The design and implementation of programming languages. PhD thesis, Programming Research Group, Oxford University, Oxford, UK, 1983. 10. T. Johnsson. Lambda Lifting: Transforming Programs to Recursive Equations. In The Second International Conference on Functional Programming Languages and Computer Architecture, pages 190-203, New York, September 1985. SpringerVerlag. 11. M. P. Jones. Qualified Types: Theory and Practice. PhD thesis, Oxford University Computing Laboratory, Oxford, july 1992. Technical Monograph PRG-106. 12. M. P. Jones. A theory of qualified types. In The 4th European Symposium on Programming, pages 287-306, Berlin, February 1992. Spinger-Verlag. 13. M.P. Jones. Dictionary-free overloading by partial evaluation. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pages 107-117. University of Melbourne TR 94/9, June 1994. 14. X. Leroy. Unboxed objects and polymorphic typing. In Nineteenth Annual ACM Symp. on Principles of Prog. Languages , pages 177-188, New York, Jan 1992. ACM Press. Longer version available as INRIA Tech Report. 15. X. Leroy and M. Mauny. Dynamics in ML. In The Fifth International Conference on Functional Programming Languages and Computer Architecture, pages 406-426, New York, August 1991. Springer-Verlag. 16. R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. MIT Press, Cambridge, Massachusetts, 1990. 17. R. Milner, M. Tofte, R. Harper, and D. MacQueen. The Definition of Standard ML (Revised). MIT Press, Cambridge, Massachusetts, 1997. 18. Y. Minamide. Full lifting of type parameters. Technical report, RIMS, Kyoto University, 1997. 19. R. Morrison, A. Dearle, R. C. H. Connor, and A. L. Brown. An ad hoc approach to the implementation of polymorphism. ACM Trans. Prog. Lang. Syst., 13(3), July 1991. 20. G. Nadathur. A notation for lambda terms II: Refinements and applications. Technical Report CS-1994-01, Duke University, Durham, NC, January 1994. 21. A. Ohori. A compilation method for ML-style polymorphic record calculi. In Nineteenth Annual A CM Syrup. on Principles of Prog. Languages, New York, Jan 1992. ACM Press.

171 22. S. Peyton Jones. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of Functional Programming, 2(2):127-202, April 1992. 23. S. Peyton Jones. Compiling haskell by program transformation: a report from trenches. In Proceedings of the European Symposium on Programming, Linkoping, April 1996. 24. S. Peyton Jones and D. Lester. A modular fully-lazy lambda lifter in haskell. Software - Practice and Experience, 21:479-506, 1991. 25. S. Peyton Jones, W. Partain, and A. Santos. Let-floating: moving bindings to give faster programs. In Proc. International Conference on Functional Programming (ICFP'96), New York, June 1996. ACM Press. 26. J. C. Reynolds. Towards a theory of type structure. In Proceedings, Colloque sur la Programmation, Lecture Notes in Computer Science, volume 19, pages 408-425. Springer-Verlag, Berlin, 1974. 27. Z. Shao. Flexible representation analysis. In Proc. 1997 A C M SIGPLAN International Conference on Functional Programming (ICFP'97}, pages 85-98. ACM Press, June 1997. 28. Z. Shao. An overview of the FLINT/ML compiler. In Proc. 1997 A C M SIGPLAN Workshop on Types in Compilation, June 1997. 29. Z. Shao. Typed cross-module compilation. Technical Report YALEU/DCS/RR1126, Dept. of Computer Science, Yale University, New Haven, CT, November 1997. 30. Z. Shao and A. W. Appel. A type-based compiler for Standard ML. In Proc. A C M SIGPLAN '95 Conf. on Prog. Lang. Design and Implementation, pages 116-129. ACM Press, 1995. 31. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A typedirected optimizing compiler for ML. In Proc. A C M SIGPLAN '96 Conf. on Prog. Lang. Design and Implementation, pages 181-192. ACM Press, 1996. 32. A. Tolmach. Tag-free garbage collection using explicit type parameters. In Proc. 1994 ACM Conf. on Lisp and Functional Programming, pages 1-11, New York, June 1994. ACM Press. 33. A. K. Wright. Polymorphism for imperative languages without imperative types. Technical Report Tech Report TR 93-200, Dept. of Computer Science, Rice University, Houston, Texas, February 1993.

7

Appendix

In this section, we give the proofs of the type preservation theorem and the semantic-soundness theorem. Figure 7 gives the typing rules. Figure 8 gives a slightly modified version of the translation algorithm. The type environment Fm binds monomorphic variables while the environment Fp binds polymorphic variables. N o t a t i o n 1 (A*F.e a n d @*zF) We use A*F.e and @*zF to denote repeated abstractions and applications respectively. I f F -~ { x l , ..., x n } , then A*F.e reduces to Axl : pl.(...(AXn : #n.e)..) where # l , . . . # n are the types o x l , . . . , x n in Frn. Similarly @*z F reduces to @(..(@ZXl)..)Xn.

172

F F i:Int F~{x:m}

(In)

F- e : ~ 2

F F xl:p'---~p F F x2:p' F F ~xlx2:p F ~- ev : jts

F ~J {x : V~/.I~I} -- e : f/2

F }- l e t x = A ~ . e ~

(tapp)

(la)

x:F(x)

F F AX:pl.e:ttl-+p2

(app)

( qn)

F F

r

ine:p2

F F x : Vti.p p~lt,p

F x~:

F F- e i : m r~{x:m} F- e ~ : p 2 F b let x = el in e2 : p2 F i g . 7. Static Semantics

N o t a t i o n 2 ( T ( L ) ) I L is a set of variables, then T ( L ) refers to the types of the variables in L in the environment Fro. I f L = { X l , X 2 , . . . , x n } and the types of the variables are respectively #1, ..., #n, then T ( L ) -+ Tis shorthand for Ill ~ ('." --~ (#n --~ T)..). T h r o u g h o u t t h i s section, we a s s u m e u n i q u e v a r i a b l e b i n d i n g s - v a r i a b l e s a r e never redefined in t h e p r o g r a m .

7.1

Type preservation

Before we p r o v e t h e t y p e s o u n d n e s s of t h e t r a n s l a t i o n , we will define a c o u p l e of p r e d i c a t e s on t h e h e a d e r - - FH a n d w e l l - t y p e d n e s s of H . I n t u i t i v e l y , FH d e n o t e s t h e t y p e t h a t we a n n o t a t e w i t h each e x p r e s s i o n in H d u r i n g t h e t r a n s l a t i o n a n d w e l l - t y p e d n e s s ensures t h a t t h e t y p e we a n n o t a t e is t h e c o r r e c t t y p e . T o g e t h e r t h e s e two e n s u r e t h a t t h e h e a d e r f o r m e d is well t y p e d .

D e f i n i t i o n 1 ( T h e h e a d e r t y p e e n v i r o n m e n t - FH). If H = (h0 . . . hn), t h e n FH = Fho... Fh,~. If hi : : = (x = e, T), t h e n F ~ : = x ~ r .

D e f i n i t i o n 2 (Let H in e ) . If H = ho ... hn, t h e n L e t H i n e is s h o r t h a n d for let ho in ... let hn in e. T h e t y p i n g rule is as follows - - Fm F Let H in e : p iff Fm; FH F e : #.

173

rm(x) = u Fm;F~,;H~-x:#:::~x;~;{x}

(e~)

/"re(X1) : Ul -'~ #2

(~pp)

I'm; Fp; H

1-

F~;Fp;Hbi:Int~i;O;O Fro(X2) : /.11

~XlX2 : ~~2 ::~ ~XlX2; 0; {Xi, X2}

rmX ~-~ #; Fv; H J- e : #~ :0 e'; H1; F Fro; Fp; H F- Ax : #.e : I~ --+ I~' =~ Ax : #.e'; Hi; F \ { x : i~}

(/,0

!

(teO

F,~; Fp; H I- el : m ~ el; Hi; F1 /'mix ,-+ m; Fp; H t- e~ : t ~ =~ e~; H2; F2

Fm;Fp;H t- let x = el in e2 : #2 =~ let x = e~ in e~;H1 + H2;F1 (J ( F 2 \ { x } ) F,.; Fv; H b el :~1 ~ e'l; H~; Fx nx = (x = A~.Let H~ in A *Fl.el,' u --~ ,11) Fm;Fpx ~-~ (V~.#I,F1);H + H1 t- e2 : 1~2 =~ e~;H2;F2

(q,O

Fm; Fp; H t- let x ----A~.el in e2 :/~2 =~ e~; H1 + H2;/;'2

(t~pp)

Fp(x) = (V~.#, F) r , ( x ) -- V ~ . T ( F ) --+/~ z a fresh variable #,/t,# =~ @*zF; (z = x~7, T ( F ) --+ ~ui/ti#); F

Fro; Fp; H b x ~ 7 :

F i g . 8. The Lifting Translation

D e f i n i t i o n 3 ( H is w e l l t y p e d ) . H is well t y p e d if ho...hn a r e well t y p e d , hi is well t y p e d if ho...hi-1 a r e well typed and --

- hi ::= (x = A ~ . L e t Hx in e , V ~ . # ) , t h e n Fho..h,_l ~- L e t H1 in e : #. - hi ::= (z = x~7,

#i/til~),

t h e n Fho...h i ~- Z : #i/ti#

L e m m a 1. Suppose F m ; F p ; H F- e ~ e ' ; H ' ; F . I f x E Fm and x does not occur free in H , then x does not occur free in H + H ~. P r o o f . T h i s is p r o v e d b y i n d u c t i o n on t h e s t r u c t u r e of e. 2 (Type Preservation). Suppose Fro; Fp; H t- e : # =~ e'; Ha; F . I f H is well typed then H + 111 is well typed and if Fro; Fp F- e : # then Fm; FH t- L e t H1 in e' : #

Theorem

P r o o f . T h e p r o o f is b y i n d u c t i o n on t h e s t r u c t u r e of e. W e will c o n s i d e r o n l y

t f n a n d tapp.

174 H t

C a s e t a p p . To prove that if H is well-typed, H § (z = x~-7, T ( F ) -+ #i/ti/~) is also well-typed and Fro; FH I- L e t H ~ in @*zF : l~i/til~ Since we assume H is well typed, we need to prove H ~ is well typed. By the precondition on the translation FH ~- x : VF/.T(F) -+ #. Since F consists of the free variables of x, T(F) cannot have any of the t~s as a free type variable. Therefore FH+H, ~ Z : T ( F ) -~ #i/ti# which proves that H ' is well-typed. This also leads to Fro; I'H+H' ~- @*zF : #i/tip. C a s e t f n ---- To prove - given H is well-typed, H + / / 1 q- H2 is also well-typed and Fm ; FH ~- L e t HI + H2 in e~ : #2. By the inductive assumption on the translation of el, H + H~ is well-typed and Fro; FH ~ L e t H~ in e~ : #1. Since the variables in F1 are bound in Fm (and not in H~), this implies that Fm; FH b L e t H~ in A*FI.e~ : T(F1) --+ #1. Since A*FI.e~ is closed with respect to monomorphic variables, we no longer require the environment Fro. Therefore FH ~ L e t H~ in A*FI.e~ : T(F1) --> #1. This implies H1 is well-typed. Again by induction, if H +/-/1 is well-typed, then H + / / 1 + / / 2 is well-typed and Fro; FH+H1 ~- L e t H2 in e~ : #2- This implies that Fro; FH+HI+H2 ~ e~ : #2 which leads to the type preservation theorem. 3 7.2

Semantic soundness

The operational semantics is shown in Figure 9. There are only three kinds of values - integers, function closures and type function closures. (values) v ::= i

C l o s ( x " , e , a ) l Clost(Fi,e,a)

Definition 4 (Type of a Value). - F}-i:int - i f / " ~- A x : # . e : # ~ #' , then 1" ~- Clos(x~', e, a) : I~ ~ p~ - if F ~- AFi.ev : VF/.# , then F ~- Clos~(Fi,ev,a) : V~i.# N o t a t i o n 3 The notation a : F F- e -> v means that in a value environment a respecting F, e evaluates to v. If a respects F , then a(x) = v and F ( x ) = # implies F F- v : #. N o t a t i o n 4 The notation a(x ~-+ v) means that in the environment a, x has the value v. Whereas ax ~-~ v means that the environment a is augmented with the given binding.

175

a F- i --+ i

a }- x --~ a ( z )

a ~- )~x:p.e --+ C l o s ( x ' , e , a )

(app) (tin) (let) (tapp)

a ~- x l -+ Clos(xt~,e,a ~)

a ~- x2 -+ v'

a t- ~ X l X 2

a' T x ~-~ v~ ~- e -+ v

--~ v

a ~- AV~.e~ ~ Clos ~(V~, e~, a) a~e~vl

a+x~v~-e2--+v at-let

x = e l

a } - x ~ - > C i o s t ( ~ , e v , a ')

ine2-+v

a'i-ev~ui/ti--~v

a t- z-pT ~ v

Fig. 9. Operational Semantics

We need to define the notion of equivalence of values before we can prove t h a t two terms are semantically equivalent. Definition 5 (Equivalence of Values). -

E q u i v a l e n c e o f I n t i ~ i' iff 9 E~-i:intand/"F-i':intandi=i'.

- E q u i v a l e n c e o f C l o s u r e s C l o s ( x ~, e, a) ~ C l o s ( x ~, e', a') iff 9 F f- C l o s ( x tt, e, a) : I.t -+ p' and F ~ t- C l o s ( x ~, e', a ~) : # -+ #'. 9 VVl,V~ such t h a t F ~- Vl : p and F ' t- v~ : # and Vl ~ v~. 9 a:F+x~+vl ~ - e - - + v a n d a * : F ' + x ~ - + V ' l t - e ' - + v ' and v ~ v' E q u i v a l e n c e o f T y p e C l o s u r e s C l o s t ( ~ , ev, a) ~ C l o s t ( ~ , ev, a') iff 9 E ~- C l o s t ( ~ , e v , a ) : V ~ . # and F ' ~- C l o s t ( ~ , % , a' ' ) : V~./~ and 9 a : r t- e,/~,/t, --~ v and a ' : F ' t- e'viPJt, -+ v' and v ~ v'. Definition 6 (Equivalence of terms). S u p p o s e a : 1" F- e --+ v and a ~ : F ' }- e' --~ v'. T h e n the t e r m s e and e' are s e m a n t i c a l l y e q u i v a l e n t iff v ~ v'. W e d e n o t e this as a : F F- e ~ a' : F ' F- e'.

Before we get into the proof, we want to define a couple of predicates on the header - a H and well-formedness of H . Intuitively aH represents the addition of new bindings in the environment as the header gets evaluated. Well-formedness of the header ensures t h a t the lifting of polymorphic functions and type applications is semantically sound.

176

Definition 7 (The header value environment aH is e q u a l to aho . . . ah~ a n d ah# is -

-

all).

- if h i : : = (x = A ~ . e , r ) t h e n ahj : = x ~ C l o s t { ~ , e , aho...h~_l) -- i f h k : : = (z = x N , r ) t h e n ah~ : = z ~ v w h e r e hj : : = x ~-~ C l o s t ( ~ , e , ah) for s o m e j < k a n d ah : r h ~- e ~ i / t d

-~ v

Definition 8 (Let H in e). S u p p o s e H = h l . . . h n . T h e n L e t H i n e is s h o r t h a n d for let hi ... in let hn in e. If hj : : = (x = e,T), t h e n let hj is s h o r t h a n d for let x = e. P r o m t h e o p e r a t i o n a l s e m a n t i c s we get am : Fm }- L e t H in e ~ am : Fro;all : F H }- e.

D e f i n i t i o n 9 ( H is w e l l - f o r m e d w . r . t am : Fm; ap : Fp). H is w e l l - f o r m e d w.r.t, am : Fro; ap : Fp, if h o , . . . , hn a r e well-formed. A h e a d e r e n t r y hj is w e l l - f o r m e d if all its p r e d e c e s s o r s h 0 , . . . , h i - 1 a r e w e l l - f o r m e d a n d - If h i : : = (x = A ~ . e , T ) , a n d Fp(x) = ( V F / . # , F ) t h e n am : Fm; ap : Fp F- x ~ ~ am : l'm; aho...hj : Fho...h~ F- let z = xD-~ in @* z F - If hj ::-- (z = Xh-7,T), t h e n hj is well-formed. H is w e l l - f o r m e d w.r.t, am:Fro; ap:Fp will b e a b b r e v i a t e d in this s e c t i o n t o H is well-formed.

T h e o r e m 3 ( S e m a n t i c S o u n d n e s s ) . Suppose F m ; F p ; H F- e : # ~ e ' ; H 1 ; F . I f a m : F m ; a p : F p F- e --+ v and H is well-formed w.r.t a m : F m ; a p : F p then am :Fm ; a H : F H }- L e t H1 in e' --} v ~ and v ~ v ~. P r o o f . T h e p r o o f is b y i n d u c t i o n on t h e s t r u c t u r e of e. W e will c o n s i d e r t h e

tapp a n d t i n cases here.

C a s e t a p p = To p r o v e - If H is w e l l - f o r m e d t h e n am : Fm ; ap : Fp ~- x ~ ~ am : Fm ; a H : FH }- L e t H1 in @* z F S u b s t i t u t i n g L e t H1 in t h e a b o v e e q u a t i o n leads t o am : Fm ; ap : Fp t- x~-~ ~ arn : I~m ; a H : FH ~- let z = x)-7

in @* z F

B y t h e p r e c o n d i t i o n on t h e t r a n s l a t i o n rule I'p(x) = (VF/.#, F ) a n d t h e r e exists s o m e hj E H such t h a t hj : : = (x = AFi.e, T). Since H is w e l l - f o r m e d , h j is w e l l - f o r m e d as well a n d t h e r e f o r e b y definition

a m : F m ; a p : F p F- xD-7 ~ am:l"m;aho...hj :Fho...hj ~- let z = x~7

in @ * z F

B u t since we a s s u m e u n i q u e v a r i a b l e b i n d i n g s , no hk for k > j r e b i n d s x. This leads to -

a m : F m ; a p : l " p ~- x)-7

~ am:Fm;aH:FH

which is w h a t we w a n t t o prove.

F- let z = xD-7 in @ * z F

177

C a s e t f n --~ To prove - given H is well-formed am:Fm;ap:Pp - let x = AFi.el in e2 ~ am:l"m;aH:PH ~- Let Ha + H2 in e~ which means we m u s t prove t h a t if and then

am :Fro; apx ~-~ Clost : Ppx ~ (V~-/.Ul , F) t- e2 -+ v am:Fm;aH+g~ :FH+H~ ~ Let H2 in e~ -+ v ~ v ~. v ~ .

Assume for t h e time being t h a t H + Ha is well-formed. T h e n t h e inductive hypothesis on the translation of e2 leads to the above condition. We are therefore left with proving t h a t H + H 1 is well-formed. B y a s s u m p t i o n , H is well-formed, therefore we must prove t h a t / / 1 is well-formed. A c c o r d i n g to the definition we need to prove t h a t I

I

.

I

.

am:I",ap.F;

I

~- x~-

=

I

I

.

a m : F ' , a H + H , :FH+H, F- let z = x~-

in @*zF

In the above e q u a t i o n all1 : : x ~-~ Clo8 t(~i, Let H~ in A*F.e~,aH>, therefore the operational semantics leads to z w+ Clos(FT(F), e~#i/ti, a H + aH~u,/t,) This implies t h a t we m u s t prove -

a~m : F~; a~ : Fp F- x~7

= a~m(F ) : F~n; aH : FH + aHi u,/t,

: FH; u,/t,

~ ei #dl ti

In the source t e r m x ~ Clost(Fi, el, am + ap) which implies t h a t ' '. '. ' am:r;,,ap.s

xm

= a m : rm;ap:rp

etm/td

Therefore we need to prove t h a t -

am:Fm;ap:Fp t- el#Jt~

..~ a~m(F) 91"m, ' 9aH :FHWaH,I#,/t,

:FH~t~,/t,

' ti }- elIZi/ (1)

B u t a ~ ( F ) = am(F) since variables are b o u n d only once. F consists of all the free variables of e~ t h a t are b o u n d in a m a n d therefore in am. Hence evaluating e~ in am(F) is equivalent to evaluating it in am 9 So proving E q n 1 reduces to proving

am :Fro; ap :Fp - elizi/ti

..~ am :/'m; aH :I'H + aH~t,,/t,

: FH~g,/t,

- e~#i/ti

which follows from the inductive a s s u m p t i o n on the translation of el.

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report Andrew Wright1 , Suresh Jagannathan2, Cristian Ungureanu2 , and Aaron Hertzmann3 1

1

STAR Laboratory, InterTrust Technologies Corp., 460 Oakmead Parkway, Sunnyvale, CA 94086 [email protected] 2 NEC Research Institute, 4 Independence Way, Princeton, NJ 08540 {suresh,cristian}@research.nj.nec.com 3 Media Research Laboratory, New York University, 715 Broadway, NewYork, NY 10003 [email protected]

Introduction

A typical compiler for Java translates source code into machine-independent byte code. The byte code may be either interpreted by a Java Virtual Machine, or further compiled to native code by a just-in-time compiler. The byte code architecture provides platform independence at the cost of execution speed. When Java is used as a tool for writing applets—small ultra-portable programs that migrate across the web on demand—this tradeoﬀ is justiﬁed. However, as Java gains acceptance as a mainstream programming language, performance rather than platform independence becomes a prominent issue. To obtain highperformance code for less mobile applications, we are developing an optimizing compiler for Java that bypasses byte code, and, just like optimizing compilers for C or Fortran, translates Java directly to native code. Our approach to building an optimizing compiler for Java has two novel aspects: we use an intermediate language based on lambda-calculus, and this intermediate language is typed. Intermediate representations based on lambdacalculi have been instrumental in developing high-quality implementations of functional languages such Scheme [13,19] and Standard ML [3]. By using an intermediate language based on lambda-calculus to compile Java, we hope to gain the same organizational beneﬁts in our compiler. The past few years have also seen the development in the functional programming community of a new approach to designing compilers for languages like ML and Haskell based on typed intermediate languages [15,20]. By emphasizing formal deﬁnition of a compiler’s intermediate languages with associated type systems, this approach yields several beneﬁts. First, properties such as type safety of the intermediate languages can be studied mathematically outside the sometimes messy environment of compiler source code. Second, type checkers can be implemented for the intermediate languages, and by running these type X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 9–27, 1998. c Springer-Verlag Berlin Heidelberg 1998

10

Andrew Wright et al.

checkers on the intermediate programs after various transformations, we can detect a large class of errors in transformations. Indeed, by running a type checker after each transformation, we may be able to localize a bug causing incorrect code to a speciﬁc transformation, without even running the generated code. Finally, a formal deﬁnition of a typed intermediate language serves as complete and precise documentation of the interface between two compiler passes. In short, using typed intermediate languages leads to higher levels of conﬁdence in the correctness of compilers. Our compiler ﬁrst performs ordinary Java type checking on the source program, and then translates the Java program into an intermediate language (IL) of records and ﬁrst-order procedures. The translation (1) converts an object into a record containing mutable ﬁelds for instance variables and immutable procedures for methods; (2) replaces a method call with a combination of record ﬁeld selections and a ﬁrst-order procedure call; (3) makes the implicit self parameter of a method explicit by adding an additional parameter to the procedure representing that method and passing the record representing the object as an additional argument at calls; and (4) replaces Java’s complex name resolution mechanisms with ordinary static scoping. The resulting IL program typechecks since the source program did, but its typing derivation uses record subtyping where the derivation for the Java program used inheritance subtyping. In contrast to our approach, traditional compilers for object-oriented languages typically perform analyses and optimizations on a graphical representation of a program. Nodes represent arithmetic operations, assignments, conditional branches, control merges, and message sends [8]. In later stages of optimization, message send nodes may be replaced with combinations of more primitive operations to permit method dispatch optimization. In earlier stages of optimization, program graphs satisfy an informal type system which is essentially that of the source language. In later stages, program graphs are best viewed as untyped, like the representations manipulated by conventional compilers for procedural languages. By compiling Java using a typed lambda-calculus, we hope to gain increased conﬁdence in the correctness of the generated code. Indeed, for languages like Java that are used to write web-based applications, whether mobile or not, correctness is vital. Incorrect code generated by the compiler could lead to a security breach with serious consequences. Additionally, by translating Java into an intermediate language of records and procedures, we hope to leverage not only optimizations developed for object-oriented languages [8], but also optimizations developed for functional languages [3,15,20] such as Standard ML and Haskell, as well as classical optimizations for static-single-assignment representations of imperative languages [7]. In particular, representing objects as records exposes their representations to optimization. The representations of objects can be changed by transformations on IL programs, and the type system ensures that the resulting representations are consistent. Even for optimizations like inlining and copy propagation that do not explicitly change object representations, the type system provides valuable assurance that representations remain consistent. Unfortunately, the problem of designing a sound type system that incorporates object-oriented features into a record-based language appears to have no

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

11

simple solution. With a straightforward translation of objects into records and a natural type system, contravariance in the subtyping rule for function types foils the necessary subtyping relation between the types of records that represent Java objects. The problem is that making the implicit recursion through an object’s self parameter explicit as an additional argument to each method leads to function types that are recursive in both covariant and contravariant positions, and hence permit no subtyping. More sophisticated type systems that can express the necessary subtyping exist [2,5,16], but these type systems require more complex encodings of objects and classes. Object calculi that keep selfrecursion implicit [1,5] are more complex than record calculi and do not expose representations in a manner suitable for an intermediate language. Rather than devise an unwieldy IL and translation, we take a more pragmatic approach. We assume that a Java program is ﬁrst type-checked by the Java type-checker before it is translated into the IL. Now, optimizations and transformations performed on the IL must ensure that (1) IL typing is preserved, and (2) safety invariants provided by the Java type-checker are not violated. To satisfy the ﬁrst requirement, self parameters in the IL are assigned type (top), the type that is the supertype of any record type. To satisfy the second requirement, typecase operations are inserted within method bodies to recover the appropriate type of self parameters as dictated by the Java type system. The resulting IL program is typable and performs runtime checks at typecase expressions to ensure it is safe with respect to Java typing. However, since the source program has passed the Java type-checker, these checks should never fail. Failure indicates a compiler bug. During compiler development, these checks remain in the generated object code. For production code, the code generator simply omits the checks. In either case, we lose the ability to statically detect errors in transformations that misuse self parameters. On the other hand, we can still detect a large class of type errors involving misuse of other parameters and variables, and we gain the beneﬁt of a simple, typed intermediate language that is easy to work with. The remainder of the paper is organized as follows. The next section presents a core IL of records and procedures. Following that, Section 3 illustrates the translation from Java to our IL with several examples. Section 4 concludes with a summary of related work.

2

Language

The following grammar deﬁnes the types of our explicitly-typed intermediate language for Java: t ::= pt rt t ∗ → t tag ∗ rt ::= µα .{ tag : tag x : ft } rt array

ft ::= pt array vt ::= t var pt ::= boolean

∗ µα .{{tag : tag x : ft }}

α

vt

t byte

short

int

long

char

float

double

void

12

Andrew Wright et al.

where x ∈ Var is a set of variables and α ∈ TyVar is a set of type variables used for recursive type deﬁnitions. There are four kinds of types t : primitive types pt , function types t1 · · · tn → t , ordered record types {x1 : ft 1 · · · xn : ft n }, and unordered record types {{x1 : ft 1 · · · xn : ft n }}. Two additional kinds, mutable variable types t var and mutable array types pt array and rt array , are not fullﬂedged types in their own right, but may be used as types of ﬁelds in records and as types of variables. Several restrictions, which are motivated below, apply to the formation of types. The ﬁeld names x1 . . . xn of a record type must be distinct. The ﬁrst ﬁeld of an unordered record type must be named tag and of type tag. Tags encode the static type of an object, and are used to inspect the type of a record at runtime. An ordered record type need not include a ﬁeld named tag of type tag, but if it does, this ﬁeld must appear ﬁrst. Unordered record types are considered equal under diﬀerent orderings of their second through last ﬁelds; that is, {{tag : tag, x2 : ft 2 · · · xn : ft n }} = {{tag : tag, permute(x2 : ft 2 , . . . , xn : ft n )}} where permute yields an arbitrary permutation of its arguments. The ﬁelds of ordered record types may not be rearranged. Both kinds of record types may be recursive if preﬁxed by the binding operator µ, hence t = µα.{x1 : ft 1 · · · xn : ft n } = {x1 : ft 1 [α → t ] · · · xn : ft n [α → t ]} and t = µα.{{x1 : ft 1 · · · xn : ft n }} = {{x1 : ft 1 [α → t ] · · · xn : ft n [α → t ]}} → t ] denotes the substitution of t for free occurrences of α in t . where t [α Figure 1 deﬁnes the subtyping relation on types. The relation allows a longer ordered record type to be a subtype of a shorter record type, provided the sequence of ﬁeld names of the shorter type is a preﬁx of the sequence of ﬁeld names of the longer type, and provided that the types of like-named ﬁelds are subtypes. Since the ﬁelds of unordered record types can be reordered arbitrarily (except for the ﬁrst), a longer unordered record type is a subtype of any shorter unordered record type with a subset of the longer type’s ﬁelds. An ordered record type is also a subtype of an unordered record type with the same ﬁelds. The subtyping relation includes the usual contravariant rule for function types, as well as a covariant rule for array types. Our translation uses ordered record types to represent Java classes. In the intermediate language, subtyping on ordered record types expresses Java’s single inheritance class hierarchy. Because ﬁeld oﬀsets for ordered record types can be computed statically, the translation can implement access to a member of a Java object with eﬃcient record-ﬁeld selection operations. For example, our translation could represent objects of the following Java classes: class A { int i; A f( A x ) { i = 0; return x; } }

class B extends A { int get i() { return i; } }

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

pt <: pt

t var <: t var

13

t2 <: t3 t1 <: t2 t1 <: t3

t1 <: t1 · · · tn <: tn t <: t t1 · · · tn → t <: t1 · · · tn → t

t <: t t array <: t array

ft 1 <: ft 1 · · · ft n <: ft n {x1 : ft 1 · · · xn : ft n · · · xn+m : ft n+m } <: {x1 : ft 1 · · · xn : ft n } ft 1 <: ft 1 · · · ft n <: ft n {{x1 : ft 1 · · · xn : ft n · · · xn+m : ft n+m }} <: {{x1 : ft 1 · · · xn : ft n }} {x1 : ft 1 · · · xn : ft n } <: {{x1 : ft 1 · · · xn : ft n }} α <: α ⇒ t <: t α∈ / t µα.t <: µα .t

α ∈ /t

Fig. 1. Subtyping relation.

with the following IL types: tA = µα. { tag : tag, i : int var , f : {{tag : tag}} × α → α }

tB = { tag : tag, i : int var , f : {{tag : tag}} × tA → tA , get i : {{tag : tag}} → int }

(In fact the translated types are not quite this simple; see Section 3.) The type {{tag : tag}} plays the role of discussed in the introduction since any record type containing a tag ﬁeld is a subtype of this type. The Java typing rules permit an object of class B to be passed to methods like f that expect an A. Since tB <: tA , values of type tB can be passed to both IL functions f. A reference to any ﬁeld of a record of type tA or tB is implemented as a ﬁxed-oﬀset access into the record. Since Java interfaces permit multiple inheritance, ordered record types cannot support the necessary subtyping for interface types. Hence our translation uses unordered record types to represent interfaces. Accessing a particular ﬁeld of a record of unordered type is more expensive as record values with diﬀerent ﬁeld orders can belong to the same unordered record type. The ﬁeld access operation for unordered record types determines the actual order of a value’s ﬁelds from the initial tag ﬁeld required of the unordered type. For example, consider the following Java interface and its corresponding IL type: Interface J { int get i(); A f( A x ); }

tJ = {{ tag : tag, get i : {{tag : tag}} → int, f : {{tag : tag}} × tA → tA }}

14

Andrew Wright et al.

e ::= v let d∗ in e

syntactic value binding

∗

{ x : ft = f } x x := e e.x e.x := e e@x e@x := e e; e e(e∗ ) r(e∗ ) if e then e else e

typecase e of [g as x ⇒ e] try e e raise e e[e] e[e] := e e[e] :={} e v ::= c g

λ x:t

∗

∗

record construction variable reference variable update ordered record ﬁeld selection ordered record ﬁeld update unordered record ﬁeld selection unordered record ﬁeld update sequencing procedure invocation primitive invocation conditional [else e]

type conditional exception handler exception raise array element selection array update for primitive types array update for record types simple constant tag

.e

{ x : ft = v

∗

ﬁrst-order procedure }

record of values

f ::= e [e∗ ]

initial value array construction

d ::= x : vt = e

rec [ x : t = v g ≈ t ≺: g ∗

∗

value declaration ]

set of recursive value declarations tag declaration

Fig. 2. Expression syntax.

If we amend class B to implement interface J, type tB does not change, and we have tB <: tJ . (Again, the translated types are not quite this simple; see Section 3.) Figure 2 speciﬁes the expressions e, values v, and declarations d of our intermediate language, where g ∈ Tag are tags, c ∈ Const are basic constants, and r ∈ Prim are primitive operations. Constants, tags, and procedures are values, as well as records where all initializers are values. Primitive operations can only

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

15

appear in call position. Procedures are called by value, bind their arguments as usual, and must be first-order : the only free variables a procedure is allowed are global variables bound by top-level let-expressions. A declaration di ≡ (x : vt = e) appearing in an expression let d1 · · · dn in e binds x of type vt in di+1 through dn and e . A recursive declaration di ≡ rec [ xa : ta = va · · · xz : tz = vz ] binds xa . . . xz of types ta . . . tz in all of va . . . vz and e . A tag declaration di ≡ g ≈ t ≺: g1 · · · gn introduces tag g and associates it with type t and tags g1 · · · gn . Tags g1 · · · gn are called supertags of g. Conversely, g is a subtag of g1 · · · gn . The translation places a tag in a record ﬁeld named tag when the type a record was constructed with may need to be recovered by a language operation like typecase. In a record construction, a ﬁeld type t var indicates that the ﬁeld is mutable, but its initializer must be a value of type t . Similarly, declarations of type t var introduce mutable variables must have initializers of type t . Mutable ﬁelds and variables are automatically “dereferenced” when accessed. (There are no values of type t var .) The expressions e.x and e@x access ﬁelds of records of ordered and unordered type, respectively. The expressions e.x := e and e@x := e update such ﬁelds. The unordered record operations e@x and e@x := e use the initial tag ﬁeld of a record to determine the appropriate oﬀset into the record. Ordinary if-expressions provide boolean conditionals, and typecase tests the tag of a record-valued expression. A typecase expression evaluates the ﬁrst clause [g as x ⇒ e] for which g is a supertag of the record’s tag. In the clause body e, x is bound to the record, but with a more precise type. The expression try e1 e2 evaluates e1 with e2 as an exception handler. If e1 raises no exception, its value is returned as the value of the try-expression. If e1 raises exception v, the expression e2 (v) is evaluated and its value becomes the value of the try-expression. The expression raise e evaluates e to a record v and raises an exception. Since arrays can only appear within records in our IL, the three expressions for accessing and updating arrays actually operate on records. These operations retrieve or modify array elements associated with a record ﬁeld named array. Another ﬁeld named length stores an array’s size. The assignment operation e1 [e2 ] :={} e3 for arrays whose elements are records sets the element of e1 .array at index e2 to the value of e3 . Due to the covariant rule for array subtyping, this operation must also perform a runtime check to ensure that the value of e3 is a subtype of the runtime array component type. Hence a third ﬁeld named elemtag holds a tag representing the component type of the array. Since Java arrays are implicitly subtypes of the Java class Object, our translation places additional ﬁelds such as clone and getClass in records that represent arrays. We explain our rationale for this treatment of arrays below. IL expressions must obey a collection of type checking rules. To simplify the presentation, we describe these rules in two groups. Figure 3 deﬁnes the ﬁrst group of rules which concern simple expressions and procedures. The function D strips var oﬀ a type: D(ft ) =

t if ft = t var ; ft otherwise.

16

Andrew Wright et al.

TypeOf(c) = pt A c : pt

A[x1 → t1 ] · · · [xn → tn ] e : t A λ x1 : t1 · · · xn : tn . e : t1 · · · tn → t

A(x) = vt A x : D(vt )

A(x) = t var Ae:t A x := e : void

A e1 : t1 · · · A en : tn A e0 : t1 · · · tn → t A e0 (e1 · · · en ) : t TypeOf(r) = pt 1 · · · pt n → t A e1 : pt 1 · · · A en : pt n A r(e1 · · · en ) : t A e2 : t A e3 : t A e1 : boolean A if e1 then e2 else e3 : t

A e1 : t1 A e2 : t2 A e1 ; e2 : t2

A d1 ⇒ A1 · · · A + A1 + · · · + An−1 dn ⇒ An A let d1 · · · dn in e : t

A + A1 + · · · + An e : t

A e : D(vt ) A x : vt = e ⇒ [x → vt ] t <: T (g1 ) · · · t <: T (gn ) T (g) = t G(g) = {g1 , . . . , gn } A g ≈ t ≺: g1 · · · gn ⇒ [ ] A[x1 → t1 · · · xn → tn ] v1 : t1 · · · A[x1 → t1 · · · xn → tn ] vn : tn A rec [ x1 : t1 = v1 · · · xn : tn = vn ] ⇒ [x1 → t1 · · · xn → tn ] Ae:t t <: t A e : t

Fig. 3. Typing rules for simple expressions.

A is a type assignment that maps variables to types. The rules also refer to two fin global maps T and G. Map T : Tag −→ Type associates types with tags, and fin map G : Tag −→ P(Tag ) associates sets of tags with tags. An IL expression e is typable if there exist maps T and G and a typing derivation concluding [ ] e : t . Most of the typing rules for simple expressions are standard; we discuss only the exceptions. The last three rules produce environments for declarations. The rule for a tag declaration g ≈ t ≺: g1 · · · gn requires the global map T to associate g with type t , and the map G to associate g with the set {g1 , . . . , gn }. T allows the type associated with g to be recovered by language operations such as typecase. G abstracts the Java type hierarchy and allows language operations such as typecase to test relations in this hierarchy. For soundness, the typing rule requires that the types associated with tags related in G be similarly related under subtyping; that is, if g is declared to be a subtag of g , then T (g) <: T (g ). Figure 4 deﬁnes the typing rules for records and related expressions. We explain only the non-standard rules here. A tag has type tag, provided that T

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

17

g ∈ Dom(T ) g ∈ Dom(G) A g : tag A e1 : ft 1 · · · A en : ft n , A {x1 : ft 1 = e1 · · · xn : ft n = en } : {x1 : ft 1 · · · xn : ft n } if x1 = tag and ft 1 = tag then e1 = g and T (g) = {x1 : ft 1 · · · xn : ft n } if xi = array and ft i = t array then ei = [e1 · · · em ] and xj = length and ej = m and i < j if xi = array and ft i = rt array then xk = elemtag and ek = g and T (g ) = rt and k < i < j A e : D(vt ) A e : vt A e : {· · · x : ft} A e.x : D(ft ) A e : {{tag : tag, x : ft}} A e@x : D(ft)

A e1 : t · · · A en : t A [e1 · · · en ] : t array A e : {· · · x : t var } A e2 : t A e1 .x := e2 : void A e : {{tag : tag, x : t var }} A e2 : t A e1 @x := e2 : void

A e1 : {· · · array : t array · · · length : int } A e1 [e2 ] : t

A e2 : int

A e2 : int A e1 : {· · · array : pt array · · · length : int} A e1 [e2 ] := e3 : void

A e3 : pt

A e1 : {· · · elemtag : tag · · · array : t array · · · length : int} A e2 : int A e3 : t t <: {{tag : tag }} A e1 [e2 ] :={} e3 : void A e0 : {{tag : tag }} A en+1 : t A[x1 → T (g1 )] e1 : t · · · A[xn → T (gn )] en : t typecase e0 of [g1 as x1 ⇒ e1 ] · · · [gn as xn ⇒ en ] [else en+1 ] : t A e1 : t

A e2 : {{tag : tag}} → t A try e1 e2 : t

A e : {{tag : tag}} A raise e : void

Fig. 4. Typing rules for records and related expressions.

and G associate it with appropriate types and supertags. Record expressions receive ordered record types with several restrictions. First, if the ﬁrst ﬁeld is named tag and has type tag, then its initializer must be a tag g whose type in T is the type of the entire record. This ensures that a record’s type can be recovered from its tag. Second, a ﬁeld may have an array initializer of length m if and only if the ﬁeld’s name is array and there is a ﬁeld named length whose

18

Andrew Wright et al.

initializer is the constant m. This restriction ensures that the length ﬁeld can be used for bounds checking accesses to the array. Third, if an array ﬁeld is present of type rt array where rt is a record type, then the record must include a ﬁeld named elemtag whose initializer is a tag corresponding to rt . Array update uses the elemtag ﬁeld to perform its runtime type check. The third and fourth typing rules handle initializers for ﬁelds. The rules for array access and update require e1 to be a record containing array and length ﬁelds. The rule for update where the component type is a record type additionally requires an elemtag ﬁeld. The rule for typecase requires that the expression being tested have a record type including a tag ﬁeld. For each clause [gi as xi ⇒ ei ], variable xi is bound in ei to T (gi ), since the typing rule for record construction ensures that any record containing tag gi will have type T (gi ). Finally, the typing rules for exception constructs require the exception be a tagged record as the translation uses typecase within handlers to distinguish diﬀerent exceptions. Provided that array access and update operations perform bounds checks, this type system is sound. But to achieve high performance code, we need to lift array bounds checks out of loops or eliminate them entirely. Our IL is designed so that a safe array access operation can be replaced with a combination of an explicit test and a corresponding unsafe operation. For instance, we replace e1 [e2 ] with let a = e1 i = e2 in if i ≥ 0 & i < a.length then unsafe [a]i else raise IndexOutOfBoundsException The explicit tests so introduced can then be optimized as usual.

3

Translation

The translation from Java to our intermediate language of records and procedures: • replaces method dispatch with simple record accesses and a ﬁrst-order procedure call; • passes object state explicitly through this parameters that are treated no diﬀerently from any other function parameter; • supports eﬃcient implementation of member access by representing objects as ordered records; • replaces Java’s complex mechanisms for name resolution (visibility keywords, overloading, super, inner classes, and packages) with ordinary static scoping; • ﬂattens the class inheritance hierarchy by explicitly including record ﬁelds deﬁned by superclasses; • expresses method sharing among objects of the same class by placing procedures that implement the methods in a shared record; • accommodates subtyping between Java classes by assigning type to this and using typecase to recover the appropriate type;

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

19

class Point { int x; int y = 3; Point() { x = 2;}; public void mv( int dx, int dy ) { x += dx; y += dy; }; public boolean eq( Point other ) { return (x == other.x && y == other.y); } Point like() { Point p = new Point(); p.x = x; p.y = y; return p; } } class ColorPoint extends Point { int c; public boolean eq( Point other ) { if ( other instanceof ColorPoint ) return super.eq( other ) && c == ((ColorPoint) other).c; else return false; } ColorPoint sc( int c ) { this.c = c; return this; } ColorPoint() { super(); } ColorPoint( int c ) { super(); this.c = c; } }

Fig. 5. Example Java classes.

• uses type tags on records to support runtime type tests and casts; • accommodates interface subtyping by using unordered record types; • lifts static methods and constructor and initialization procedures out of classes and represents them as top-level procedures; • expresses class initialization as explicit tests and calls that can be optimized; • replaces implicit conversions on primitive types with explicit operations, eliminates widening conversions in favor of implicit subtyping, and expresses narrowing conversions with typecase; • expresses local control constructs (for, while, break, etc.) with uses of tailrecursive procedures; • places lock and unlock instructions where control enters or leaves synchronized blocks. In this section, we illustrate some aspects of this translation with examples. All Java objects implicitly extend class Object and hence have members such as clone and getClass, but we omit such members in these examples to simplify the presentation. Figure 5 presents Java code deﬁning two classes Point and ColorPoint. In the example, a Point object contains x and y coordinate ﬁelds, and methods to move a point (mv), test whether two points are the same (eq), and clone a new point from the current one (like). Class ColorPoint inherits from Point and adds a color ﬁeld c. ColorPoint overrides the eq method of Point and also provides a

20

Andrew Wright et al.

tP = µα. { tag: tag , methods: { mv: {{tag : tag }} × int × int → void , eq: {{tag : tag}} × α → boolean, like: {{tag : tag }} → α }, x: int var , y: int var } tC = µβ. { tag: tag , methods: { mv: {{tag : tag}} × int × int → void , eq: {{tag : tag }} × tP → boolean, like: {{tag : tag}} → tP , sc: {{tag : tag }} × int → β }, x: int var , y: int var , c: int var }

Fig. 6. Types for Point and ColorPoint objects in the IL.

new method sc to set its color. The ColorPoint class declares two constructors. The ﬁrst initializes a new ColorPoint object with a default color; the second sets the color ﬁeld explicitly to the color supplied as an argument. Figure 6 presents the record types corresponding to Point and ColorPoint. In general, records corresponding to objects include a tag ﬁeld, a methods ﬁeld, and ﬁelds for the instance variables, both explicit and inherited. The methods ﬁeld contains a record of functions corresponding to the instance methods of the class, both explicit and inherited. Initially, this record is shared by all objects of the class, although optimizations may replace it with a record of specialized functions in certain objects. The functions take an additional ﬁrst argument which is the object record itself. The IL types do not include ﬁelds for constructors or static methods as these procedures are called directly without selecting them from an object. The types of mv and eq in tC and tP are the same. This is because Java requires that an overriding method be of the same type as the overridden one. Since tC has at least the same ﬁelds as tP , and since the members in the shared preﬁx have the same type, we have tC <: tP . Hence a record denoting a ColorPoint can be passed to a function that expects a record denoting a Point. A program in our intermediate language consists of a set of mutually recursive values corresponding to methods, constructors, and method tables. Other than references to other top-level deﬁnitions, these procedures have no free variables. Notably, this is supplied as an explicit argument, unlike its treatment in

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

21

let tagP ≈ tP rec [newP: → tP = λ. { tag:tag = tagP, methods:. . . = Pmethods, x:int var = 0, y:int var = 0 } initP: tP → void = λthis:tP . this.y := 3; this.x := 2 Pmethods: . . . = { mv: . . . = mvP, eq: . . . = eqP, like: . . . = likeP } mvP: {{tag : tag}} × int × int → void = λthis:{{tag : tag }}, dx:int , dy:int . typecase this of [tagP as this ⇒ this.x := this.x + dx; this.y := this.y + dy] [else raise CompilerError] eqP: {{tag : tag }} × tP → boolean = λthis:{{tag : tag }}, other:tP . typecase this of [tagP as this ⇒ if not(this.x == other.x) then false else this.y == other.y] [else raise CompilerError] likeP: {{tag : tag}} → tP = λthis:{{tag : tag }} . typecase this of [tagP as this ⇒ let it = newP() in initP(it); it ] [else raise CompilerError] ] in . . .

Fig. 7. Translation of Point class.

Java and other object-based languages. This property facilitates code-movement optimizations on our IL such as inlining. Figure 7 shows the translation of the Point class. We elide some types that are obvious from context. The translation generates a procedure newP for constructing new Point objects, a procedure initP for initializing them, a record of functions corresponding to the methods of the class, and the functions themselves. Each method function dispatches on the type of its ﬁrst argument. A tag encodes the static type of an object; this type is examined at runtime using typecase. Thus, if mv is invoked by an object that is not a Point, the argument tag supplied in the call will not be a subtag of tagP, and a runtime exception will be raised. Such an error will not be caught at compile-time because the type expected by mv for this argument is = {{tag : tag}}. Indeed, is the self type expected by all translated methods. Figure 8 shows the translation of the ColorPoint class. An interesting aspect of ColorPoint’s deﬁnition is its use of super. Calls to super in ColorPoint constructors are translated to calls to initP. The call super.eq( other ) becomes a direct call to eqP since Java’s semantics dictate that such uses of super bypass the usual

22

Andrew Wright et al. let . . . code for Point . . . in let tagC ≈ tP rec [newC: → tC = λ. { tag:tag = tagC, methods:. . . = Cmethods, x:int var = 0, y:int var = 0, c:int var = 0 } initC1 : tC → void = λthis:tC . initP(this) initC2 : tC × int → void = λthis:tC , c:int . initP(this); this.c := c Cmethods: . . . = { mv: . . . = mvP, eq: . . . = eqC, like: . . . = likeP, sc : . . . = scC } eqC: {{tag : tag}} × tP → boolean = λthis:{{tag : tag}}, other: tP . typecase this of [tagC as this ⇒ if (typecase other of [tagC as other ⇒ true] [else false]) then (if not(eqP(this, other)) then false else this.c == (typecase other of [tagC as other ⇒ other] [else raise CastException]).c) else false] [else raise CompilerError] scC: {{tag : tag }} × int → tC = λthis:{{tag : tag}}, c:int . typecase this of [tagC as this ⇒ this.c := c; this] [else raise CompilerError] in . . .

Fig. 8. Translation of ColorPoint class

dynamic method dispatch. Uses of typecase capture the runtime behavior of instanceof and narrowing conversions. In particular, (typecase other . . . ) takes the tagC branch if other.tag is tagC or any subtag of tagC. All records containing such a tag are guaranteed to represent ColorPoints, or belong to subclasses of ColorPoint. Figure 9 illustrates a Java interface Widget and its corresponding type tW in our IL. Since the classes that implement Widget may have methods in diﬀerent orders, the methods ﬁeld of tW has an unordered record type. If we amend Point to implement Widget, the translated types tP and tC for Point and ColorPoint, also shown in Figure 9, include a tag ﬁeld in their methods record to achieve the subtyping tC <: tP <: tW .

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

23

interface Widget { boolean eq( Point other ); void mv( int dx, int dy ); } tW = { tag: tag , methods: {{ tag: tag , eq: {{tag : tag}} × tP → boolean, mv: {{tag : tag }} × int × int → void }} } tP = µα. { tag: tag , methods: { tag: tag , mv: {{tag : tag }} × int × int → void , eq: {{tag : tag}} × α → boolean, like: {{tag : tag }} → α }, x: int var , y: int var } tC = µβ. { tag: tag , methods: { tag: tag, mv: {{tag : tag}} × int × int → void , eq: {{tag : tag }} × tP → boolean, like: {{tag : tag}} → tP , sc: {{tag : tag }} × int → β }, x: int var , y: int var , c: int var }

Fig. 9. Interface Widget and types for Widget, Point, and ColorPoint.

4

Related Work

Optimizations for object-oriented languages, type systems for object-oriented languages, and typed intermediate languages are three topics that have been investigated independently by other researchers and relate to the work presented here. Optimizations for Objects An important issue addressed by optimizing compilers for object-oriented languages is reducing the overhead introduced by encoding polymorphism. Statically-typed object-oriented languages such as Java support polymorphism

24

Andrew Wright et al.

through subclassing. Subclasses share implementations with their parents. Because methods can be overridden to provide alternative implementations, the exact method invoked at a call site may not be easily determined at compile time. Indeed, without aggressive analyses, compilers are unlikely to determine the control ﬂow of a program that makes any signiﬁcant use of inheritance. On the other hand, relying only on intraprocedural optimization may not be eﬀective because methods are usually short and make frequent calls to other methods. There are two main ways of eliminating the dispatch at a call x.f (. . .). Either (i) the value of the receiver x can be of only one type T , in which case we can call T ’s method f directly, or (ii) x can be of any of the types in a set S, but all types in S share the same implementation of f , in which case we can call f directly. Concrete type inference and class hierarchy analysis are two well-known analyses that have been devised to address the issue of dispatch elimination. Concrete Type Inference [14,17,9,10] is a form of ﬂow analysis that identiﬁes, for each expression, the set of possible types its values may belong to. When a receiver is found to have only one possible type, the method dispatch can be replaced by a direct function call to that type’s method. Class Hierarchy Analysis [9,4,10] is a program analysis that, based solely on the program’s class structure, identiﬁes a set of types S that share the same implementation of method f . An example of such a set is the set containing class C and all subclasses of C that do not override f . Such sets can be computed either from programmer’s annotations (“ﬁnal” in Java) or from inspection of the complete class hierarchy. The analysis can be adapted to work, although less beneﬁcially, in the presence of separate compilation, where implementations are separated from interfaces. In such cases it is still possible to eliminate method dispatch at link time [11]. Even if the above analyses are unable to identify a call site as calling a unique function, it may still be possible to optimize the program by using a type-case statement with execution branching on the exact type of the value to code speciﬁc to each possible type [6]. Message splitting is a variation of this technique which consists of duplicating not only the method call on each branch of the type case, but subsequent statements as well, whenever this enables further optimizations. Dynamically typed languages, and to a lesser extent statically typed languages, could beneﬁt from type feedback—information about the set of concrete types that a receiver is observed to have during program’s execution. Comparison of type feedback with either class hierarchy analysis [9] or concrete type inference [12] shows it to be a valuable technique. In contrast to our typed intermediate language, the intermediate language on which these optimizations have typically been performed is an untyped controlﬂow graph. Low-level nodes in the graph are used to represent arithmetic operations, assignments, conditional branches, etc. High-level nodes are used to represent the semantics of method calls [6]. High-level nodes help the compiler postpone code-generation decisions for method dispatch until after optimizations aimed at replacing method calls with direct function calls are performed. Remaining method dispatches are then translated into more primitive operations,

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

25

and the code is then subject to further intra-procedural optimizations. This approach is well-suited for implementing dynamically typed languages, where a method dispatch can be a rather heavy-weight construct. On the other hand, in a statically typed language with single inheritance such as Java, method dispatch consists of fetching a function pointer from a record from a known oﬀset, and calling that function. We believe that in such a setting, an intermediate language based on ﬁrst order functions and records is a viable alternative. All the complicated constructs of the source language, including method dispatch, are translated into simpler operations. Flow analysis techniques used to drive interprocedural optimizations for functional languages can be directly applied to our intermediate language and need not be modiﬁed to understand the nuances of method dispatch. By having available the function tables constructed for each type, analyses can still compute a reasonably precise conservative approximation to the set of methods called at a call site, facilitating optimizations like inlining. Type Systems for Objects In designing our typed IL for Java, we considered and rejected several alternatives. A naive attempt to translate Java into a record-based IL uses the same language and type system as ours, but gives self parameters the object’s record type rather than . That is, mv, eq, and like in class Point all expect a value of type tP for their ﬁrst argument. This solution fails because, translating ColorPoint the same way, we no longer have tC <: tP due to contravariant subtyping of functions. Hence many Java-typable programs are not typable under such a translation. Several object calculi have most of the language features found in Java and support the necessary subtyping [1,5]. However, in these calculi, self parameters are implicitly bound, and method dispatch is not broken down into separate function selection and procedure call mechanisms. Consequently it would be diﬃcult to adapt existing techniques for optimizing procedural languages to such calculi. Moreover, the complexity of these calculi make them inappropriate as the foundation for an IL. Finally languages that employ a split-self semantics represent an object as a pair of a record containing the object’s state and a record containing the object’s code [16]. They use existential types to achieve subtyping, and include pack and unpack operations to manipulate values of existential type. The encoding of objects in this style is complex and unwieldy for use in a compiler. Typed Intermediate Languages Several advanced functional language implementations have embraced the use of a typed intermediate language to express optimizations and transformations [18,20]. The motivation for using a typed intermediate language holds equally well in the context of a Java implementation. Like most functional languages, Java has a rich type system and requires aggressive compiler optimization to achieve acceptable performance. However, while the intermediate language type

26

Andrew Wright et al.

systems developed for functional language implementations have been based on a polymorphic λ-calculus, the type system in our IL more closely reﬂects features found in Java. Thus, it provides record subtyping to express single inheritance, unordered record types to express interfaces, and a tag type to express runtime type inspection. To summarize, our typed intermediate language for Java serves three major roles: (1) it gives us increased conﬁdence in the correctness of optimizations; (2) it exposes salient properties of an object’s representation that may be then optimized; and (3) it facilitates type-speciﬁc decisions throughout the compiler and runtime system. We are conﬁdent that a typed intermediate language of this kind will be instrumental in realizing a high-performance Java implementation.

References 1. Abadi, M., and Cardelli, L. A Theory of Objects. Springer-Verlag, 1996. 11, 25 2. Abadi, M., Cardelli, L., and Viswanathan, R. An Interpretation of Objects and Object Types. In Proceedings of the Conference on Principles of Programming Languages (1996), pp. 392–406. 11 3. Appel, A. W. Compiling with Continuations. Cambridge University Press, 1991. 9, 10 4. Bacon, D., and Sweeney, P. Fast static analysis of C++ virtual function calls. OOPSLA’96 Conference on Object-Oriented Programming Systems, Languages, and Applications (1996). 24 5. Bruce, K. B., Cardelli, L., and Pierce, B. C. Comparing Object Encodings. In Theoretical Aspects of Computer Software (TACS), Sendai, Japan (Sept. 1997). 11, 11, 25 6. Chambers, C. The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages. PhD thesis, Stanford University, March 1992. 25, 25 7. Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. Eﬃciently Computing Static Single Assignment Form and the Control Dependence Graph. TOPLAS 13, 4 (October 1991), 451–490. 10 8. Dean, J., DeFouw, G., Grove, D., Litvinov, V., and Chambers, C. Vortex: An optimizing compiler for object-oriented languages. OOPSLA’96 Conference on Object-Oriented Programming Systems, Languages, and Applications (1996), 83–100. 10, 10 9. Dean, J., Grove, D., and Chambers, C. Optimization of object-oriented programs using static class hierarchy analysis. ECOOP (1995). 23, 24, 25 10. Diwan, A., Moss, E., and McKinley, K. Simple and eﬀective analysis of statically-typed object-oriented programs. OOPSLA’96 Conference on ObjectOriented Programming Systems, Languages, and Applications (1996). 23, 24 11. Fernandez, M. F. Simple and eﬀective link-time optimization of modula-3 programs. Proceedings of the Conference on Programming Language Design and Implementation (1995), 103–115. 24 ¨ lzle, U., and Agesen, O. Dynamic versus static optimization techniques for 12. Ho object-oriented languages. OOPSLA’95 Conference on Object-Oriented Programming Systems, Languages, and Applications (1995). 25

Compiling Java to a Typed Lambda-Calculus: A Preliminary Report

27

13. Kranz, D., Kelsey, R., Rees, J. A., Hudak, P., Philbin, J., and Adams, N. I. Orbit: An optimizing compiler for scheme. ACM SIGPLAN Conference Proceedings (1986). 9 14. Palsberg, J., and Schwartzbach, M. I. Object-oriented type inference. OOPSLA ’91 Conference on Object-Oriented Programming Systems, Languages, and Applications (1991), 146–161. 23 15. Peyton-Jones, S., Launchbury, J., Shields, M., and Tolmach, A. Briding the gulf: A common intermediate language for ML and Haskell. In Proceedings of the Conference on Principles of Programming Languages (1998), ACM Press, pp. 49–61. 9, 10 16. Pierce, B. C., and Turner, D. N. Simple type-theoretic foundations for objectoriented programming. Journal of Functional Programming 4, 2 (Apr. 1994), 207– 247. A preliminary version appeared in Principles of Programming Languages, 1993, and as University of Edinburgh technical report ECS-LFCS-92-225, under the title “Object-Oriented Programming Without Recursive Types”. 11, 26 17. Plevyak, J., and Chien, A. A. Precise concrete type inference for object-oriented languages. OOPSLA ’94 Object-Oriented Programming Systems, Language, and Applications (1994), 324–340. 23 18. Shao, Z. Flexible Representation Analysis. In Proceedings of the International Conference on Functional Programming (1997), ACM Press, pp. 85–98. 26 19. Steele Jr., G. L. Rabbit: a compiler for scheme. Master’s thesis, Massachusetts Institute of Technology, May 1977. 9 20. Tarditi, D., Morrisett, G., Cheng, P., Stone, C., Harper, R., and Lee, P. TIL: A Type-Directed Optimizing Compiler for ML. In Proceedings of the Conference on Programming Language Design and Implementation (1996), ACM Press, pp. 181–192. 9, 10, 26

Formalizing Resource Allocation in a Compiler Peter T h i e m a n n Department of Computer Science University of Nottingham Nottingham NG7 2RD, England [email protected]

A b s t r a c t . On the basis of an A-normal form intermediate language we formally specify resource allocation in a compiler for a strict functional language. Here, resource is to be understood in the most general sense: registers, temporaries, data representations, etc. All these should be (and can be, but have never been) specified formally. Our approach employs a non-standard annotated type system for the formalization. Although A-normal form turns out not to be the ideal vehicle for this investigation, we can prove some basic properties using the formalization.

1

Resource

Allocation

Resource allocation in the back end of a compiler is often poorly specified. More often than not register allocation, administration of temporaries, and representation conversions are only specified procedurally 1,6. Code generators based on such algorithmic specifications can be hard to maintain or prove correct. Even the authors of such code generators are sometimes not aware of all the invariants t h a t must be preserved. Therefore, we investigate a declarative approach to resource allocation in the back end of a compiler. The approach is based on an a n n o t a t e d t y p e system of implementation types that makes resource allocation and conversion explicit. T h e use of type conversion rules enables us to defer m e m o r y and register allocation until the context of use forces an allocation. For example, a constant initially leads to an annotation of the type of the variable holding the constant without generating any code. The annotation holds the "immediate value" of the constant. There are type conversion rules t h a t change the annotation from "immediate value" to "value in register k" and generate a corresponding piece of code if the context of use requires the value of the variable in a register. Further conversion rules create or remove indirection. The indirection rules move a value to m e m o r y and change the annotation to "value in m e m o r y at address Rk + i", where Rk is an address in register k and i is an offset. T h e indirection removing rules work the other way round. The indirection rules usually apply to the arguments of function calls or to values t h a t are put into d a t a structures. Spilling the contents of registers is another application of the last kind of conversion rules. Other rules m a y make direct use of immediate values, for example when generating instructions with immediate operands.

179 The resulting high degree of flexibility allows for arbitrary intra-module calling conventions. Since the calling convention is part of every function's type, each function "negotiates" its convention with all its call sites. Contrast this with the algorithm used in the SML/NJ compiler 2 where the first call encountered by the code generator determines the calling convention for a procedure. Obviously, this is one pragmatic way of negotiating, but surely not a declarative one (nor a democratic one). External functions can have arbitrary calling conventions, too, as long as their implementation type is known. If the external functions are unknown, any standard calling convention (including caller saves/callee saves registers) can be enforced just by imposing a suitable implementation type. The same holds for exported functions, where the only requirement is that their implementation type is also exported, for example, in an interface file. Implementation types can also model some other worthwhile optimizations. For example, a lightweight closure does not contain all free variables of a function. It can only be used correctly if all variables that it is not closed over are available at all call sites. Implementation types can guarantee the correctness of a variant of lightweight closure conversion (cf. 20). In our case, this conversion does not take place at the level of the source language, it rather happens while translating to actual machine code. The translation ensures that the values that are not put into the closure are available at all call sites. 1.1

Overview

In the next section, we define the source language, its operational semantics, the implementation type language, and the target language. The introduction and discussion of the typing rules is subject of Section 3. Section 4 documents some properties of the system. Finally, we discuss related work (Sec. 5) and draw conclusions (Sec. 6).

2

Language

We have chosen a simply typed lambda calculus in A-normal form, a typical intermediate language used in compilers, as the starting point of our investigation. Compiling with A-normal forms 5 is said to yield the principal benefits of compiling with continuations (explicit control flow, naming of intermediate results, making continuations explicit) without incurring the overhead of actually transforming the program to continuation-passing style and without complicating the types in the intermediate language. 2.1

Terms

We strengthen our requirements somewhat with respect to the usual definition of A-normal form. Figure 1 defines the terms of restricted A-normal form. There are computation terms a and value terms v. Value terms are integer constants

180

a::=letx=vinalletx=z~xinalletx=z+xin

a I

xlx~x v ::= n i x i ~x.a Fig. 1. Restricted A-normal form: t e r m s

n, variables x, or l a m b d a abstractions )tx.a. C o m p u t a t i o n terms either sequentialize computations let x . . . . in a or they return a result, which can either be the value of a variable or it can take the form of a tail call to some function. Usually, A-normal form 5 only requires the arguments of applications and primitives to be values v. Restricted A-normal form requires variables x in all these places. With this restriction, no resource allocation occurs "inside" of a t e r m and resource conversions can be restricted to occur between some let and its body, without lack of generality.

2.2

Operational

semantics

T h e semantics is defined by a fairly conventional C E K machine (see Fig. 2). A machine state is a triple (a~ p, ~) where - a E Term is a t e r m in A-normal form, - p E Env = V a r - --+ Val is an environment, and - ~ E K is a continuation where K = Void + Env • Var • Term • K. Here, partial functions are denoted by - --+, PlF restricts the domain of p to F , V o i d is a one-element set, and + denotes disjoint union of sets. A value E Val is either Num (n) or Fun (p,)~y.a) where p E Env and Ay.a E Term. Inspection of the last rule reveals t h a t the semantics enforces proper tail recursion, because the function call in tail position does not create a new continuation. The transitions t h a t state additional constraints are undefined if the constraints are not met. 2.3

Types

Figure 3 defines implementation types, which form an extension of simple types. In an implementation type, each type constructor carries a location 1. If the location is ~ ("not allocated") then the information corresponding to the type constructor is only present in the type. For example, (going beyond the fragment considered in this paper) if the product t y p e constructor x carries the annotation then only its components are allocated as prescribed by their locations; the pair itself is not physically represented. If the location is imm n ("immediate value n") then the value corresponding to the type carrying this location is know to be the integer n. T h e n a m e comes from the immediate addressing mode t h a t is present in m a n y architectures, and

181

(let x (let x (let x (let x

~-+ (a,px ~ Num (n),~)

= n in a, p, m) = y in a, p, m)

~-~ (~, p= ~ p(y), ~) = .,ky.a' in a,p,~) ~-~ (a, px ~ Fun (PIFv(~,.='), Ay.a'), ~) = w @ z in a,p,~) ~-~ (a', p'y ~ p(z), (pl~v(,., ==.. o= ~. ~), =, a, ,~) ) if p(w) = Fun (p', Ay.a') (letx=w+zina, p , ~ ) ~-r (a, px H. Num (m + n), ~) if p(w) = Num (m) and p(z) = Nurn (n) (a,p= ~ r ,,) (z, p', (p,x,a,~)) (a, px ,-~ Num (m + n), ~) (wW z, p',(p,x,a,~)) if p(w) = Num (m) and p(z) = Num (n)

(a', p'y ~ p(z), ~)

(w ~ z,p,,~)

if p(w) ----Fun

(p', Ay.a')

Fig. 2. Operational semantics

7- ::= (r l) a

: : = int J 7-

F,P,M,FI,k)

c o n t I 7- I c o n t F

l ::= ~ l imm n I Areg n A ::= mem(i,A) Fig. 3. Syntax of implementation types

immediate values are expected to take p a r t in generating instructions using immediate addressing. If the location is reg k then the value of t h a t type is resident in register k. In addition, the register might hold an indirection, i.e., the address of a block of m e m o r y where the value is stored at some offset i: mere(i, reg k). In general, this indirection step m a y be repeated an a r b i t r a r y number of times, which is expressed by mem(i, Areg k). There are two syntactic categories for types. T ranges over implementation types, i.e., ~- is a pair of a "stripped" implementation t y p e a and the location of its top-level type constructor. For this paper, a ranges over int, the type of

F,P,M,F ~,k)

integers, and 72 coat 1 T1, the type of functions t h a t m a p objects of type r2 to objects of type TI involving a continuation closure at location l, and cont F , the type of a continuation. The annotation F, P, M, F ~, k on the function arrow is reminiscent to effect systems 7, 11. It determines the latent resource usage of the function, which becomes effective when the function is called. It is explained in the next section 3 together with the judgements of implementation typing. The last alternative, a = coat F is the type of a continuation identifier. This type carries the location information of the current continuation, which would otherwise be lost (see Sec. 3).

182 2.4

Additional conventions

The architecture of a real processor places certain limits on the use of registers. For example, processors may have -

dedicated floating-point registers; special address registers ("pointers" to closures and tuples); special register(s) for continuations; special register(s) for condition codes.

In addition, the number of such registers is limited. These restrictions are modeled by a function Regs

Regs : TypeConstructor -~ 7~(RegisterNames) that maps a type constructor to a set of register names (which might be represented by integers). Occasionally, we apply Regs to a stripped implementation type a when it should be applied to the top-level type constructor of a. We do not define Regs here since it depends on the particular architecture that we want to generate code for. 2.5

Target code

The target code of the translation is an assembly language for an abstract RISC processor. It has the following commands, expressed in a rather suggestive way with Rk denoting register reference and Ma denoting a memory reference. Here, k, j E RegisterNames, a, i are memory addresses for data, t is a symbolic label for a code address, and n is an integer. t:

Rk Rk Rk Rk

Rk Mi + Rj Rk Goto t Goto Ri

label declaration load numeric constant := t load address constant := Ri + Rj arithmetic operation := + Rj arithmetic operation := Mi + Rj load indirect with offset :-- Rk store indirect with offset := Allocate(n) memory allocation unconditional jump unconditional indirect jump := n

The infix operator ";" performs concatenation of code sequences. We identify singleton code sequences with single instructions. For simplicity, we assume that all data objects have a standard representation of the same size (which might be a pointer). The state of the abstract processor is a triple (C, R, M ) where C is a code sequence, R is the register bank (a mapping from a finite set of register names to data), and M is the memory (a mapping from an infinite set of addresses to data). The program store, which maps labels (code addresses) to code sequences, is left

183

implicit. The instruction Allocate(n) returns the address of a contiguous block of m e m o r y of size n. It guarantees t h a t there is no overlap with previously allocated blocks, i.e., it never returns the same address twice. Some of our proofs exploit this guarantee by relying on the uniqueness of d a t a addresses for identification. In practice there will be a garbage collector t h a t maps the infinite address space into a finite one, which removes old unreachable addresses from the system. 3

Typing

The typing judgement is reminiscent to t h a t of an effect system 7, 11. T h e typing process determines a translation to abstract assembly code as defined above. Therefore, we use a translation judgement F, P, F, S ~- a : ~-; M, F ~, C to describe both together. In every judgement, /~ is a type assumption, i.e., a list of pairs x : ~'. By convention, t y p e assumptions are extended by appending a new pair on the right as in F, x : T. T h e same notation F, x : T also serves to extract the rightmost pair from a type assumption. P is a set of preserved registers. The translation guarantees t h a t all registers k E P hold the same value after evaluation of a as before, but during evaluation of a these values m a y be spilled and register k m a y hold different values temporarily. Members of P correspond to callee-saves registers. - F, F ~ are sets of fixed registers. The translation guarantees t h a t a register k E F is not used as long as there is some reference to it in the type assumption or in the context modeled by F q Furthermore, it expects t h a t the context of a handles the registers mentioned in F ~ in the same way. Members of F must not be spilled. However, if there is no reference remaining to some k E F then k m a y be removed from F. The main use of F and F ~ is lightweight closure conversion and avoiding the allocation of closures altogether. In both cases, the type assumption -

-

F1 ,P1 ,M1 ,F~,kl

contains a variable w of type T2 cont 1 T1 where F1 describes the components of the closure t h a t have not been allocated (i.e., they reside in registers drawn from F1). Consequently, F1 C_ F must hold at a call site a - let x -- w @ z in a ~ so t h a t all registers in F1 contain the correct values. - S is a list of reloads of the form (k, il . . . i p ) 9 . . where register k points to a spill area and il through ip are the spilled registers. The notation ~ is used for the e m p t y list of reloads, i.e., when all values are either implicit or reside in registers: S = E means t h a t nothing is currently spilled. M is a set of registers t h a t are possibly modified while evaluating a. - C is code of the target machine (see Sec. 2.5). -

Before we start discussing the typing rules proper, we need to define the set of registers referenced from a type assumption. D e f i n i t i o n 1. The reference set of a location, type, or type assumption is the

set of registers that the location, type, or type assumption refers to.

184

-

-

-

-

Refer e = 0, Refer (imm n) = 0, Refer (Areg n) = Refer (int;l) = Refer l; Refer (r2 F,P,M,F',k>cont 1~ T1 ; l) = Refer 1 U Refer (cont F ;l) = Refer 1 U F; Refer/~ = U = : r e r Refer r .

3.1

{n};

F;

Typing rules

T h e typing rules are organized into context rules t h a t manipulate t y p e assumptions, value rules t h a t provide typings for variables and constants, representation conversion rules, computation rules t h a t deal with let expressions, and return rules t h a t describe returning values from function invocations. Of these rules, the context rules and the conversion rules are nondeterministic, the remaining rules are tied to specific syntactic constructs, i.e., syntax-directed. C o n t e x t r u l e s Each use of a variable consumes an element of the type assumption. This convention saves us from spilling dead variables since a "good" derivation only duplicates variables t h a t are still live. Hence there is a rule to duplicate assumptions x : v.

(dup)

(F,x : 7-,x : T ) , P , F , S ~- a : T ' , M , F ' , C

V,

TPZ

There is a dual weakening rule t h a t drops a variable assumption. The set of fixed registers is updated accordingly. Dropping of variable assumptions starts on the left side of a type assumption to avoid problems with shadowed assumptions.

(weak)

F, P, F N Refer F, S ~- a : T', M, F', C -(-~.-~, ~ , - ~ , -F, Y ~_a i ~_T,-M,-F~,~

Finally, there is a rule to organize access to the type assumptions. It exchanges adjacent elements of the type assumption provided that they bind different vari-

ables. F , y : v2,x : T1,-r",P,F,S J- a : T',M, FI, C

/

)(exchJF, x TI,y

T2,F~,P,F, S B - a

T~,M,F~,C

x~y

The explicit presence of these rules is reminiscent of linear type systems 8, 23. V a l u e r u l e s And here is a simple rule t h a t consumes a variable assumption for y : T at the price of producing one for x : T.

(F,x : T ) , P , F , S ~- a : T'; M , F ' , C (let-var) (1, y : T), P, F, S F- let x = y in a : T'; M, F ~, C Application of the (let-var) rule does not imply a change in the actual location of the value. The variable x becomes an alias for y in the expression a. The

185 rule can be eliminated in favor of a reduction rule for expressions in restricted A-normal form: let x = y in a ~ ax := y (capture-avoiding substitution of y for x in a). There is no penalty for this reduction, because the system allows the conversion of each occurrence of a variable individually. So former occurrences of x can still be treated differently than former occurrences of y. A constant starts its life as an immediate value which is only present in the implementation type. The typing derivation propagates this type and value to the point where it either selects an instruction with an immediate operand or where the context forces allocation into a register.

(let-const)

(F,x : (int;imm n)),P,F,S I- a: T'; M , F ' , C ~-_p;-~,g g ~e~x ~ n=m a-.. Tq M, F,, C

C o n v e r s i o n r u l e s Some primitives expect their arguments allocated in registers. As we have seen, values are usually not born into registers. So, how do they get there? The solution lies in conversion rules that transform the type assumption. These rules generate code and allocate registers. A register k is deemed available if it is neither referred to by F nor mentioned in P U F: k ~ Refer F U P U F. Immediate integer values generate a simple load instruction. In this case, the register selected must be suitable for an integer (k E RegisterNarnesint) besides being available for allocation.

(F,x : (int; reg k ) ) , P , F , S F- a : T ; M , F ' , C (conv-imm) (F, x : (int; irnm n)), P, F, S }- a : r; M U {k}, F ' , C' where k E Regs(int) \ (Refer F U P U F) C' = (Rk

:= n; C)

The resolution of an indirection mern(i, reg n) generates a memory load with index register Rn and offset i.

(F,x : (a;Areg k)),P,F,S ~- a: r; M , F ' , C (cony-mere) (F,x : (a;Amern(i, reg n))),P,F,S }- a : r ; M U { k } , F ' , C ' where k E Regs(a) \ (Refer F U P U F ) C' = (Rk := MRn + i; C) There is also an operation that generates indirections by spilling a group of registers to memory. The register k must be suitable to hold the standard representation of a tuple (a pointer to a contiguous area of memory) as indicated by k E RegisterNamesx. The (spill) rule is not applicable if there is no such register k. The rule chooses nondeterministically a set X of registers to spill which does not interfere with the fixed registers F . If preserved registers are

186 spilled the corresponding reloads are scheduled in the S component. (spill) /~' (P \ X) U {k}, F, (k, il . . . ip)S F a: T; M, F', C F , P , F , S F a: r; ( M \ X ) U { k } , F ' , C' where /~ = _rreg i s := mem(j, reg k)

1 _< j < n

x = {il,...in}

XNF=~ XAP = {il,...,ip},0

+ n - 1 := Rin;

C)

The notation Freg i s := mere(j, reg k) 1 < j < n denotes the textual replacement of all occurrences of reg ij in implementation types mentioned in F by mere(j, reg k) for 1 < j < n. The corresponding inverse rule (reload) pops one reload entry from S. (reload) F, (P \ {k}) U { Q , . . . , ip}, F, S }- a : r; M, F ' , C I ' , P , F , (k, il . . . i p ) S F a: ~ - ; M , F ' , C ~ where /~ = Freg ij := mere(j, reg k) I 1 < j < p C' -- (-~il :-- MRk + 0;... ; Rip :-- MRk

+ p - 1; C)

C o m p u t a t i o n r u l e s The first computation rule deals with lambda abstraction. The type assumptions are divided in those for the free variables of the function F and those for the continuation A. The function's body al is processed with /~ where some free variables are relocated into the closure, a set P~ of preserved registers as determined by the call sites of the function, and a set of fixed variables F ~ that contains those fixed registers that are referred to from the assumption F. Also, the register m on the function arrow must match the register which is assumed to hold the closure while translating the body of the abstraction. It is not necessary that m -- k, where k is the register where the closure is allocated. Finally, the let's body a2 is processed with ,4. (/~, x2 : r2, c : (tORt F " ; l)), P', F ' , e t- a l : r l ; M ' , F " , C1

F',P',M',F",m)

(let-abs) (,4,xl : (T2 COflt I T1;reg k ) ) , P , F , S F a2 : To;M,F',C2 (F, ,4), P, F, S F let xl = Ax2.al in a2 : TO;M U {k}, F', C' where F n Refer F C_ F r C_ Refer F k E Regs(--r \ (Refer (F, ,4) U P O F ) m E Regs(-~) \ (Refer F U P~ U F ~) P = Freg i s := mere(j, reg m) 1 _< j _< n {is} = Refer r \ F ' , I{i }l = n C' = (Goto t2;tl : C 1 ; t 2 : Rk = AIIocate(n + 1);

MRk

:= tl; iRk

+ 1 := Ril;...

;iRk

+ n := Ri.;

C2)

187 All registers that do not become fixed in the function must be evacuated into the closure for the function which is composed in Rk. Since the continuation (which is located in l) can be handled like any other value, we invent a continuation identifier c and bind it to the continuation. This is a drawback of A-normal form in comparison to continuation-passing style where continuation identifiers are explicit. Next, we consider a typical primitive operation. (/~,X 1 : (int; reg k ) ) , P , F , S I- a : T , M , F ' , C (let-add) (F, x2 : (int; reg i), x3 : (int; reg j)), P, F, S ~- a' : T, M U {k}, F ' , C'

where k E Regs(int) \ (Refer F U P U F) C' = (Rk := Ri + Rj; e ) at=letxl

=x2+x3ina

In addition, we could include a rule for constant propagation (in the case where the arguments are imm nl and imm n2) and also rules to exploit immediate addressing modes if the processor provides for these. Next, we consider the application of a function. (F, xl : T1),P,F",S ~- a : ~-,M,F',C (let-app) r,,-g,p, S V ~ _ T - , ~ U ~ , U-{7,~i--,Fr, Ct

where P U { i l , . . . , i p } C_ pt, F t C_ F { Q , . . . ,iN} = Refer F \ F t, I{i~}1 = n {j, k} c_ Regs(~) \ (Refer F, xl : T1 U P U F), j 74 k

1"' = (F, x2 :(T2 F',P',M',F",i cont (reg j) T1; reg i),xs :r2) a t = l e t x l =x2 @x3 i n a C' = (R~ := AIIocate(u - p + 1); MR~ + 0 := t; MR~ + 1 := Rip+l;... ; MRj + n - p := Rin; Rk := MRi + 0; Goto Rk; t : Rip+l := MRj + 1;... ; Rin := MR~ + n - p; C) The memory allocation in this rule saves values that are accessed by the continuation a. The preservation of the remaining registers is left to the callee by placing them { i l , . . . , i p } in the set of preserved registers pt. Rj points to the continuation closure. The sole purpose of the coat (reg j) T1 construction lies in the transmission of the location of the continuation. The set of currently preserved registers must be a subset of the set of registers preserved by the function. Conversely, the set of currently fixed registers must contain the set of fixed registers demanded by the function. The continuation has to fix registers as indicated by the annotation F " of the function type. The i on the function arrow indicates the register where the function body expects its closure. It must coincide with the register in which the closure actually is.

188

R e t u r n rules Finally, we need to consider rules t h a t pass a value to t h e continuation. T h e m o s t simple rule just r e t u r n s the value of a variable. Due to the conversion rules, we can rely on x : T a l r e a d y being placed in the location where the continuation expects it. All r e t u r n rules expect t h a t their reload list is empty.

(ret-var) k 9 Regs(-+) \ (Refer x : % c : (cont F ; reg i) U P U F ) ( x : T, C : (cont F ; reg i)), P, F, r F- x : T, {k}, F, C where C = (nk

:-- Mni

+ 01; Goto nk)

In this rule, the current continuation identifier c indicates t h a t register i contains t h e continuation closure. As with a n y closure, its zero-th c o m p o n e n t contains t h e code address. T h e final rule specifies a tail call to a n o t h e r function. P C P ' , F ' C_ F, k e Regs(--+) \ ((j, i) tA Refer T2 U P U F ) (ret-app) F, P, ~,,e F- xl @ x 2 : T1, {k}, F, (Rk := MRi + 0; Goto Rk) F ' , P * , M ~ , F ''

where F = (Xl : (T2 > cont (reg j ) T1; reg i), x2 : 72, c : (cont F ; (reg j ) ) ) T h e r e is neither a r e t u r n t e r m nor a r e t u r n rule for addition, because the allocation properties of let x -- y + z in x are identical to those of y + z, if the latter was a legal r e t u r n t e r m . 4

Properties

In this section, we formalize some of the intuitive notions i n t r o d u c e d in the preceding sections. First, we show t h a t preserved registers really deserve their name.

T h e o r e m 1. Suppose F, P, F, S b- a : T; M, F1, C and the processor is in state (C, R, M ) . For each register r E P: Suppose c : (cont F " ;reg w) e F , y = Rr, c = Riw, and (C,R, M) ( C ' , R ' , M ' > . fiR'w = c and C' is a suffix of C such that F ' , P ' , F ' , S ' e a' : T; M',F~, C t and in the derivation steps between a and a' the reload component always has S as a suffix then R'r = y. T h e reference to the continuation c ensures t h a t b o t h m a c h i n e states belong to the s a m e procedure activation, by the uniqueness of addresses r e t u r n e d by Allocate(n). It provides the only link b e t w e e n the two m a c h i n e states. If we d r o p p e d this requirement we would end up c o m p a r i n g m a c h i n e states f r o m different invocations of the s a m e function and we could not prove anything. T h e condition on the reload c o m p o n e n t m e a n s t h a t a r b i t r a r y spills are allowed between p a n d p~, but reloads are restricted not to r e m o v e the reload record t h a t was top-level at p. In other words, S serves as a low-water m a r k . O u r m a i n interest will be in the case where S = S ~ = ~, a is the b o d y of a function, a n d a ~ is

189 a return term. In this case, the theorem says t h a t registers mentioned in P are preserved across function calls. This theorem can be proved by induction on the number of control transfers I --I --I in (C, R, M / ~ht ( C , R , M ) and then by induction on the derivation. Next, we want to formalize a property for F . A value stored in f E F will remain there unchanged as long as the variable binding t h a t f belongs to is in effect or reachable through closures or the continuation. As a first step, we define a correspondence between an environment F and a state of the C E K machine (cf. Sec. 2.2). Definition

2. F F- (a, p, ~) if

1. there exist P, F, S, M, F1, C such that F, P, F, S t- a : T; M, F1, C; 2. x : T in 1" implies x E dora(p) and p(x) E TSem r ; 3. if c : (cont F " ;reg w) in 1" then there are p, x, and a such that ~ = (p, x, a, ~'). Otherwise ~ = (). Unfortunately the connection between c and a is not very deep. We cannot properly relate c to a since the "type" of c does not refer to an environment. In fact, c and its t y p e cannot refer to a specific environment T" because a m a y be called from several places with different environments. Therefore, the t y p e of the return environment of the continuation must be polymorphic. The function TSem T maps an implementation type to a subset of Val. TSem (int; l) = {Num (n) n

is an integer}

TSem (7"2 F,P,M,F') cont l' T1;I) ---

{Fun (p', Ay.a) l Vz E TSem T2.

(a,p'y ~ z, ()) ~ (x,p", 0) such t h a t p"(x) E TSem T1 or (a,p'y ~ z, 0) ~ht (x + w,p", 0) and T1 = (int;l')} However, to formalize reachability through closures and continuations and link this concept with the environment, we need a stronger notion t h a n F F-

(a, p, W h a t we can actually prove by inspection of the rules is a much weaker theorem. 2. Suppose F, P, F, S F- a : T; M, F1, C and the processor is in state (C, R, M). For each r E F: Suppose y = Rr and ( C , R , M ) ~ht (C',-R',-M') such that F ' , P ' , F ' , S ' t- a' : T; M ' , F~, C ~ and there is no intermediate state with a corresponding derivation step. If furthermore r E F' then R'r = y. Theorem

Finally, we establish a formal correspondence between steps of the C E K machine and steps of the translated machine program. To this end, we need a notion of compatibility between a C E K state and a machine s t a t e , / " ~- (a, p, ~) (C, R, M).

190 D e f i n i t i o n 3. Suppose F, P, F, S F a : r; M, F1, C. F F (a,p,~) ~ ( C , R , M ) i f f o r all x : T in F: -

-

-

-

-

i

f

i f T = (int;imm n) then p(x) = Num (n) (the value is not represented in the machine state); if T = (int; reg r) then there exists an integer n s.t. p(x) = Num (n) and Rr = n; if T = (int;mem(j, reg r)) then there exists n s.t. p(x) = Num (n) and MRr + j = n; T = (int;mem(jk,...mem(jo, regr))) then there exist n, i o , . . . , i k s.t. p(x) = Num (n) and ik = n, iv = M i v - t +j~ for 1 < v < k, and io = Rr; i f T = (T2

p(x)

=

F ' P~ M ~ F ~

, ,

, 1~ cont (reg s) rt;reg r) then there exists p ' , y , a ~ s.t.

Fun (p',Ay.a') and for all z E TSem T2 (a',p'y ~ z, ()) ~+ (x',p", ())

where p"(x') E TSem rl, MRr

holds the address of C' such that

( F ' , y : T2,C : (cont F2 ;reg s ) ) , P ' , F ' , S ' F a' : v l ; M ' , F ~ , C ' which starts with F", P", F", S" F x' : 7"1;M I, F~, C"

-

and for each CEK state (C',-R',-M') ~ (C",-R",-M") where R's = R"s we have that ( F ' , y : T2,C : (coat F2 ;l)) P (a',p'y ~ z,a') -~ ( C ' , R ' , M ' ) and F " F (x',p",~') ~- (C",-R",-M"); i f r = ( c o n t F2 ;reg r) then ~ = (p~,x~,a',~ '} such that F ' , P ' , F ' , S ' b a' : T'; M', F~, C' and MRr holds the address of C'.

3. Suppose F, P, F, S F a : T; M, F1, C. / f F F (a, p, ~) -~ (C, R, M ) and (a, p, ~) ~5~ (a', p', ~') where ~ is a su1~ix of ~' then (C,-R,-M) ~-> (C', R',-M') and there exist F', P', F', S', T', M', and F~ such that r ' , P ' , F ' , S ' F a' : r ' , M ' , F ~ , C ' and r ' ~- (a',p', a') ~- ( C ' , ~ ' , ~ ' ) .

Theorem

5

Related Work

Compiling with continuations has already a long history. Steele's Rabbit compiler 21 has pioneered compilation by first transforming source programs to continuation-passing style and then transforming it until the assembly code can be read of directly. Also the Orbit compiler 13 and other successful systems 2, 3,12 follow this strategy. Recently, there has been some interest in approaches which do not quite transform the programs to continuation-passing style. The resulting intermediate language has been called nqCPS 1, A-normal form 5, monadic normal form 9, etc. These languages are still direct style languages, but have the following special features 1 This term has been coined by Peter Lee 14 but it has never appeared in a published paper.

191 1. the evaluation order (and hence the control flow) is made explicit; 2. all intermediate results are named; 3. the structure of an expression makes the places obvious where serious computations are performed (i.e., where a continuation is required in the implementation). Another related line of work is boxing analysis (e.g., 10, 18, 19, 22). Here the idea is to try to avoid using the inefficient boxed representation of values in polymorphic languages. We believe that our system is powerful enough so that (an polymorphic extension of) it can also express the necessary properties. Representation analysis 4 which is used in conjunction with region inference has one phase (their section 5 "Unboxed Values") whose concerns overlap with the goals of our system. Otherwise, their system is concerned with finding the number of times a value is put into a region, the storage mode of a value which determines whether a previous version may be overwritten, or the physical size of a region. Typed assembly language 15 is an approach to propagate polymorphic type information throughout all phases of compilation. This work defines a fully typed translation from the source language (a subset of core ML) down to assembly language in four stages: CPS conversion, closure conversion, allocation, and code generation. As it is documented, the allocation phase takes the conventional fully boxed approach to allocating closures and tuples. It remains to investigate whether the allocation phase operates on the right level of abstraction to take the decisions that we are interested in controlling with our approach. 6

Conclusion

We have investigated an approach to specify decisions taken in the code generation phase of a compiler using a non-standard type system. The system builds on simple typing and uses a restricted version of A-normal form as its source language. We have defined a translation that maps the source language into abstract assembly language and have verified some properties of the translation. One goal of the approach is the verification and specification of code generators by adding constraints to our system that make the typing judgements deterministic. In the course of the work on this system, we have learned a number of lessons on intermediate languages for compilers that want to apply advanced optimization techniques (like unboxing, lightweight closure conversion, and so on): It is essential to start from an intermediate language that clearly distinguishes serious computations (that need a continuation) from trivial ones (that yield values directly). Otherwise, rules like the (spill) rule could not be applied immediately before generating the machine code for the actual function call. It is also essential that A-normal form sequentializes the computation. For unrestricted direct-style expressions the control flow is sufficiently different from the propagation of information in typing judgements to make such a formulation awkward, at best. For example, it would be necessary to thread the typing assumptions through the derivation according to the evaluation order.

192 Also, in an unrestricted direct-style term it is sometimes necessary to perform a resource operation (for example, spilling or representation conversion) on return from the evaluation of an expression. This leads to a duplication of typing rules, one set determining the operations on control transfer into the expression and another set for the operations on return from the expression. In A-normal form, returning from one expression is always entering the next expression, so there is no restriction in having only the first set of these rules. On the negative side, we found that A-normal form does not supply sufficient information when it comes to naming continuations. In contrast to continuationpassing style, our translation has to invent continuation variables in order to make the continuation visible to the resource allocation machinery. It would be interesting to investigate a similar system of implementation types for an intermediate language in continuation-passing style and to establish a formal connection between the two. Another point in favor of continuation-passing style is the allocation of continuation closures. Presently this allocation clutters the rule for function application (let-app). A system based on continuation-passing style might be able to decompose the different tasks present in (let-app) into several rules. Finally, in places the rules are rather unwieldy so that it is debatable whether the intermediate language that we have chosen is the right level of abstraction to take the decisions that we are interested in. Imposing the system, for example, on the allocation phase of a system like that of Morrisett et al. 15 might in the end lead to a simpler system and one where the interesting properties can be proved more easily. Beyond the work reported in this paper, we have already extended the framework to conditionals, and to sum and product types. The current formulation does not allow for unboxed function closures. This drawback can be addressed at the price of a plethora of additional rules. Future work will address the incorporation of polymorphic types and investigate the possibilities of integrating our work with typed assembly language. A c k n o w l e d g m e n t Many thanks to the reviewers. Their detailed comments helped to clean up the presentation substantially.

References 1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers Principles, Techniques, and Tools. Addison-Wesley, 1986. 2. Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 3. Andrew W. Appel and Trevor Jim. Continuation-passing, closure-passing style. In POPL1989 16, pages 293-302. 4. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to yon Neumann machines via region representation inference. In Proc. 23rd Annual A CM Symposium on Principles of Programming Languages, pages 171-183, St. Petersburg, Fla., January 1996. ACM Press.

193 5. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In Proc. of the A C M SIGPLAN '93 Conference on Programming Language Design and Implementation, pages 237-247, Albuquerque, New Mexico, June 1993. 6. Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: Design and Implementation. Benjamin/Cummings, 1995. 7. David K. Gifford and John M. Lucassen. Integrating functional and imperative programming. In Proceedings of the 1986 A CM conference on Lisp and Functional Programming, pages 28-38, 1986. 8. Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1-102, 1987. 9. John Hatcliff and Olivier Danvy. A generic account of continuation-passing styles. In POPL1994 17, pages 458-471. 10. Fritz Henglein and Jesper Jorgensen. Formally optimal boxing. In POPL1994 17, pages 213-226. 11. Pierre Jouvelot and David K. Gifford. Algebraic reconstruction of types and effects. In Proc. 18th Annual A C M Symposium on Principles of Programming Languages, pages 303-310, Orlando, Florida, January 1991. ACM Press. 12. Richard Kelsey and Paul Hudak. Realistic compilation by program transformation. In POPL1989 16, pages 281-292. 13. D. Kranz, R. Kelsey, J. Rees, P. Hudak, J. Philbin, and N. Adams. ORBIT: An optimizing compiler for Scheme. SIGPLAN Notices, 21(7):219-233, July 1986. Proc. Sigplan '86 Syrup. on Compiler Construction. 14. Peter Lee. The origin of nqCPS. Email message, March 1998. 15. Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From system F to typed assembly language. In Luca Cardelli, editor, Proc. Z5th Annual ACM Symposium on Principles of Programming Languages, San Diego, CA, USA, January 1998. ACM Press. 16. 16th Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, January 1989. ACM Press. 17. Proc. 21st Annual ACM Symposium on Principles of Programming Languages, Portland, OG, January 1994. ACM Press. 18. Zhong Shoo. Flexible representation analysis. In Mads Tofte, editor, Proc. International Conference on F~nctional Programming 1997, pages 85-98, Amsterdam, The Netherlands, June 1997. ACM Press, New York. 19. Zhong Shoo and Andrew W. Appel. A type-based compiler for Standard ML. In Proc. of ~he ACM SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, USA, June 1995. ACM Press. 20. Paul Steckler and Mitchell Wand. Lightweight closure conversion. A C M Transactions on Programming Languages and Systems, 19(1):48-86, January 1997. 21. Guy L. Steele. Rabbit: a compiler for Scheme. Technical Report AI-TR-474, MIT, Cambridge, MA, 1978. 22. Peter Thiemann. Polymorphic typing and unboxed values revisited. In Simon Peyton Jones, editor, Proc. Functional Programming Languages and Computer Architecture 1995, pages 24-35, La Jolla, CA, June 1995. ACM Press, New York. 23. Philip Wadler. Is there a use for linear logic? In Paul Hudak and Neil D. Jones, editors, Proc. ACM SIGPLAN Symposium on Partial Evaluation and SemanticsBased Program Manipulation P E P M '91, pages 255-273, New Haven, CT, June 1991. ACM. SIGPLAN Notices 26(9).

An Approach to Improve Locality Using Sandwich Types Daniela Genius, Martin Trapp, and Wolf Zimmermann Institut fiir Progq'ammstrukturen und Datenorganisation University of Karlsruhe 76128 Karlsruhe Germany E-Mail: { genius trapp I zimmer }~ipd.info.uni-karlsruhe.de

A b s t r a c t . We show how to increase locality of object-oriented programs using several heaps. We introduce the notion of sandwich types which allow a coarser view on objects. Our idea for increasing locality is to use one heap per object of sandwich types. Performance measurements demonstrate that the running time is improved by uptoa factor 5 using this strategy. The paper shows how to derive sandwich types from classes. Thus, it is possible to control the allocation of the different heaps using compile-time information.

1

Introduction

In object-oriented programs, the notion of an object is rather fine-grained. The objects are usually allocated on a heap and the size of these objects is small. Thus, a single heap may destroy locality. Improving locality may improve execution time due to caching and paging effects. Often, a coarser view on objects is possible. For example, a list may be considered as a collection of small objects linked in an adequate way, but it may also be considered as one object. We introduce the notion of sandwich types in order to characterize this situation. Our goal is to maintain objects of sandwich types (called sandwich objects) in one single heap (i.e. consecutive fragment of memory) in order to increase locality. We maintain these heaps for a sandwich object by the doubling strategy well-known from the theory of algorithms and data structures (see e.g. 3). This work was initiated by observations during experiments where lists, trees, sets etc. were implemented with flexible arrays using the doubling strategy. These implementations improved the performance considerably compared to linked implementations. Increasing locality of reference by partitioning the heap is a well known technique. The language EUCLID 7 introduced special collections which can be viewed as independent heaps. Dynamically allocated d a t a structures could be assigned to a single collection. The same idea is exploited by the Gnu obstack

195 structure 4. This package gives the programmer control over an arbitrary number of heaps that require stack discipline for allocation and deallocation. However, the responsibility for mapping objects to heaps remains totally with the programmer, both in using EUCLID as well as obstacks. Approaches in automatically finding such mappings have been developed in the context of SMALLTALK.The Object-Oriented Zoned Environment (OOZE) locates all instances of a type in one contiguous interval of virtual addresses 8. This increases locality of reference for objects of the same type, but is unable to deal with structures built from objects of various types. STAMOS 12 presents also additional algorithms for grouping related objects with the intention of increasing locality of reference. This technique requires complete knowledge of the dynamic object graph and are used for restructuring the memory image during garbage collection. Thus, the mapping cannot be found at compile time. In 6, Hayes suggested the use of key objects as representatives for clusters. Death of a key object triggers garbage collection of the structure it represents. Again, key object candidates and the clusters they represent are identified during collection at runtime and can not be statically determined. To the authors' knowledge, there is no work on automatic a priori mapping of dynamically allocated objects to multiple heaps for sequential object-oriented programs. Section 2 introduces the notion of sandwich types and gives some examples. Section 3 shows the performance improvements obtained by object heaps. Finally, Section 4 shows a conservative analysis for identifying sandwich types. Section 5 concludes the paper. Appendix A defines syntax, static and dynamic semantics of a basic object-oriented language BOOL. Every object-oriented language has at least the features of BOOL.

2

Sandwich

Types

The notion of sandwich types is a generalization of balloon types 2. All objects in a balloon can be accesses only via a distinguished balloon object. However, this excludes container types such as e.g. lists or sets. These types usually contain methods to return their elements and to insert elements. Thus, these elements can be accessed from outside, destroying the balloon type property. However, there is often an internal structure which cannot be accessed from outside. This is the reason for the term sandwich object: its internal structure can be accessed only via the sandwich object, but parts of its structure are known externally. It is the sandwich object which decides what is external. Figure 1 visualizes this idea. Based on the definition of memory states in Appendix A, a state 1 of a program is a triple (OBJ, REF, ROOTS) where OBJ is a set of objects, STATE = (OBJ, REF) is a directed graph where Ol .5~ 02 E REF iff there is an attribute a of object ol that refers to object 02, and ROOTS C_ OBJ is the set of objects 1 We speak of states instead of memory states because the instruction pointer and current method plays no role in our discussion.

196

Fig. 1. A Sandwich

refered by the environment env. In particular, obj E R O O T S iff there is a frame E env such t h a t obj = $2 or there is a variable x such t h a t (x, obj) E f $1. PREDobj and outdegob j denote the direct predecessors and the n u m b e r of outgoing edges of obj in the graph S T A T E , respectively. Objl -~ obj2 denotes 2 t h a t there is a p a t h from objl to obj2 in (OBJ, REF). An object o is reachable iff there is an object o r E R O O T S such t h a t o r -~ o. Otherwise it is unreachable. 10, 11 defines an operational semantics based on this definition of states. Suppose t h a t there is a state transition such t h a t objects become unreachable in a state. Then, they will be unreachable forever. Therefore, we can assume w.l.o.g, t h a t unreachable objects are removed, i.e. no state contains unreachable objects. The p a p e r does not require any further knowledge of state transitions. D e f i n i t i o n 1 ( S a n d w i c h O b j e c t s a n d T y p e s ) . Let A be a class whose at-

tributes are all private. Let s = (OBJ, R E F , R O O T S ) be a state and x be an object of class A. The set I N T E R N A L (`) of objects internal to x is the smallest set satisfying I N T E R N A L (s) = (y : x -~ y A PREDu C I N T E R N A L (8) U (x} } x is a sandwich object or upper slice iff I N T E R N A L (s) ~ 0 or outdeg~ = O. All objects z E OBJ \ ({x} U I N T E R N A L (s)) are external to x. If outdeg~ ~ O, the lower slice of x is the set of all external objects that have a predecessor y E I N T E R N A L (s). A sandwich is a sandwich object x together with the set of its internal objects. A is a sandwich t y p e i~ for all states s every object x of type A in s is a sandwich object. Remark i. If there is no lower slice, a sandwich object is also a balloon object according to 2. Observe t h a t the attributes of internal objects need not be private. Example 1. Consider the following implementation of a doubly-linked list (next and previous are used to navigate in the list): 2 _~ denotes that there is at least one edge in the path.

197 c l a s s L I S T ( T ) is p r i v a t e head : L I S T _ C E L L ( T ) ; p r i v a t e end : L I S T _ C E L L ( T ) ; p r i v a t e current : L I S T _ C E L L ( T ) ; previous() is ... e n d ; next() is -.- e n d ; insert(T) is . . . e n d ; delete() i s . . . e n d ; elem 0 : T is -.- e n d ; is_empty 0 : B O O L is -.- e n d ; at_head 0 : B O O L is ..- e n d ; at_end 0 : B O O L is -.. e n d ; end c l a s s L I S T _ C E L L ( T ) is elem : T; previous : L I S T _ C E L L ( T ) ; next : L I S T _ C E L L ( T ) ; end For every type T, L I S T ( T ) is a sandwich type. Let x : L I S T ( T ) . Then x is a sandwich object. Examples of internal objects are the head and the end of x. The elements of the list are the lower slice of the sandwich. For every type T, L I S T ( T ) is a sandwich type. Container types are typical examples of sandwich types. Internal objects can be reached only via the upper slice, i.e. L e m m a 1. Let s = (OBJ, R E F , R O O T S ) be a state and x E O B J a sandwich object of type A. Then for every y E OBJ: y E I N T E R N A L (s) iff for every z E O B J satisfying z -~ y one of the following conditions hold:

(i) z E I N T E R N A L (s) . (ii) Each path from z to y contains x. Proof. "=~": Suppose this would not be the case, i.e. there is a z ~_ I N T E R N A L (s) and a p a t h 7r from z to y not containing x. Since y E I N T E R N A L (8) there must be u, v E ~r such t h a t u r I N T E R N A L (s), v E I N T E R N A L (s) , and u -+ v e R E F . The definition of I N T E R N A L (s) implies t h a t u -- x contradicting our assumption t h a t 7r does not contain x. "r Suppose (i) and (ii) holds, but y ~_ I N T E R N A L ? ) . Then either, x ~ y or there is a z E P R E D u such t h a t z ~. I N T E R N A L (s) U {x}. The latter contradicts (ii). Thus x ;;~ y. Consider a z E O B J such t h a t z -~ y. T h e n (i) must hold since (ii) is excluded by x ~ y. (i) implies x -~ z, contradicting x ~ y. Thus, there is no p a t h to y, i.e. y cannot be reached. This is what we excluded from the definition of states.

198 T h e next L e m m a states t h a t sandwiches are either disjoint or nested (i.e. non-overlapping): 2. Let s = ( OBJ, R E F , R O O T S ) be a state, x E OBJ be any sandwich object of type A and y E OBJ be a sandwich object of type B. If y is internal to x, then every object of the lower slice of x is external to y. Lemma

Proof. Suppose there is a state s such t h a t y E I N T E R N A L (s) and there is an object u E I N T E R N A L (s) of the lower slice of x. Figure 2 visualizes this situation. Since u is in the lower slice of x, it is external t o x, i.e., there is a p a t h from an object w external to x which does not contain x. Since u E I N T E R N A L (s), by L e m m a 1, every p a t h from an object external to y m u s t contain u. T h u s , w m u s t be internal to y.

upper slice of x .......................

-~ .......................

+

-t-

upper slice of y

lower slice of x

lower slice of y Fig. 2. Contrary of Lemma 2

Let v be an a r b i t r a r y object external to x. Since y is internal in x, every p a t h from v to y m u s t contain x (by L e m m a 1). Since w is internal to y, every p a t h from x to w m u s t contain y (by L e m m a 1). Hence, every p a t h from v t o w m u s t contain x. This contradicts the fact t h a t w is external to x.

Example 2. Consider the class H A S H T A B ( T ) with collision resolution by chain lists: c l a s s H A S H T A B ( T ) is

private tab: ARRn(LIST(T)); insert(x : T) is -.. e n d ; delete is . . . e n d ; member(T) i s . . . e n d ; end O b j e c t s of class H A S H T A B ( T ) are sandwich objects. T h e collision lists in the a r r a y tab are also sandwich objects. E a c h lower slice of a collision list is contained in t h e lower slice of hash tables.

199

3

Performance Improvement Using Object Heaps

In general, the number of objects that will be allocated in a sandwich's heap is unknown at compile time. Thus, the heap must be able to grow (and shrink) at runtime. To achieve locality of reference for the objects in a heap, the latter must extend over a minimal number of physical memory pages. I.e. all but at most one page used must be used completely or not at all for object allocation. We guarantee this by allocating a contiguous 3 area of virtual memory for a heap and doubling the size whenever the heap would overflow. Since it is not possible to grow a heap in place, we copy its data to a new memory area. This works in amortized constant time, see 3. Note that all references pointing at objects in a sandwich's heap come from inside that heap or from its upper slice. We exploit this fact by using a compacting copy garbage collector 9 to move the objects from the old memory area to the new one. The root set for the collector is the singleton set containing just the sandwich object. The important advantage is that the heap of a sandwich can be copy collected independently of all external objects. If after garbage collection less than a quarter of the heap is occupied by allocated objects, we halve its size. Whenever a sandwich object is garbage collected, its heap can be deleted at once. For our measurements we use the small list example. The test program creates first two empty lists. Then, we alternately insert single elements in both lists until they both contain MAX elements. Afterwards, the test program iterates through each list separately ITER times. Figure 3 shows the runtime of this iterated list traversals depending on the length of the lists. We have chosen the values of of MAX and ITER so that their product is constant (107). In the first part of the plot, runtime decreases with the length of the list because the iteration overhead becomes less significant. Both axes of the plot are logarithmically scaled. The x-axis shows the number of elements in the list, while the y-axis shows the total runtime in seconds. The values are measured on a 200 MHz i586 Linux system. Each program is run 10 times and the smallest elapsed time is shown in Figure 3. The curve labeled single shows run times for the usual implementation (i.e. one heap for all objects created by the program). The multi curve denotes the result of our method: As noted above, LIST(T) is a sandwich type. Both list objects are sandwich objects by themselves. Thus, they have their own heaps for their internal LIST_CELL(T) objects. There is a third heap for all other objects created by the program. As long as all list elements fit into the data cache, both variants have approximatly the same running time (< 0.3 seconds). If the lists contain more than approximately 100 elements, cache misses occur in the single heap implementation. With the multi heap implementation this effect is postponed to lists If it is guaranteed that the sizes of all internal objects evenly divide the physical page size, it would not matter whether the pages are contiguous in virtual memory.

200 multi - ~ - single -+--

A,- "

/~,

o=

.c

.,p~

.'~

1

0,1

,

I

100

,

I

1000

,

I

10000 N u m b e r o f e l e m e n t s i n a list

,

,

i

100000

,

i

Ie+06

Fig. 3. Measurements: Lists

of approximately 500 elements because of the better locality. For 500 elements, the multi heap variant is almost 5 times faster than the single heap variant. Also after the sharp increase in run time of the multi heap variant due to cache misses and page faults the multi heap variant still clearly outperforms the single heap version. Less memory pages have to be accessed to traverse the lists. For very long lists results for classical allocation stay below 3 seconds, while using multiple heaps never greatly exceed 2 seconds. An overall improvement by our method of about 25 per cent emerges in this case. In general the improvement is even larger: For lists of around 10000 elements the multi heap variant is twice as fast. Although the test program on lists is artificial, it reflects a situation which occurs in practice. Consider for example hash-tables with collision resolution by chaining. Each insertion or look-up in the hash-table traverses a collision list. Furthermore, it is unlikely that the collision lists are stored in contiguous memory cells. This is precisely the situation covered by our experiment. Figure 4 demonstrates this argument. The implementation with collision lists reuses the single and multiple heap implementations for LIST. A simple modulo hashing equally distributes elements over the lists modulo table size. Once the hash value is computed, searching for a key that is not present in the hash table means that one entire collision list has to be traversed once the hash value is computed. The collision lists are treated analogously to the LIST example. MAX elements are distributed over a hash table. Again, MAX and ITER are chosen such that their product is 107. Figure 4 shows for hash table sizes 23 and 501 that the relative behaviour is similar to the list example. The sharp increase in run time due to

201 cache misses occurs later as the hash table size increases, however the single heap variant is always outperformed. 100

9 , v 9 23 s i n g l e h a s h t a b l e size 2 3 - ~ . m u l t i h a s h t a b l e s l z e 5 0 t -~-s i n g l e h a s h l a b l e size 5 0 t .-M ...

multi h a s h t a b l e size

~+..x...-x...~, , ~ - x ....

t0

,

, 100

,

,,

.

~

1000

i

,-~ 1001

,

,'~'-~"~;"

i

100000

,

,J lo+06

Number of elements

Fig. 4. Measurements: Hashtables

4

Recognition of Sandwich

Types

This section shows how sandwich types can be recognized in a program. We first derive a sufficient condition for sandwich types. This sufficient condition abstracts from the state by considering the types of the objects. In particular, we consider the type graph of an object-oriented program, i.e. a graph TG = (CLASSES, USE) where CLASSES are the classes of a program and ( A , B ) E USE iff A has an attribute of type B or has a method with parameter type B or return type B. ~ denotes the reachability relation in type graphs. A class A is recursive iff A ~ A. Now, we lift Definition 1 to types. D e f i n i t i o n 2. Let TG = (CLASSES, USE) and A E CLASSES be a nonrecursive class whose attributes are private. Classes which are parameter types or return types o methods o A are called the accessible types o A. A CCESSIBLE A denotes the set o accessible types of class A. The set INTERNALA of classes that are internal to A is the smallest set satisfying

INTERNALA = { B : A ~ B A B ~_ ACCESSIBLEA U{A}A PREDB C_INTERNALA U {A} } All classes B E CLASSES \ INTERNALA are external to A.

202

The following l e m m a relates type graphs and states. L e m m a 3. Let TG = (CLASSES, USE) be a type graph of a program and s =

( OBJ, R E F , R O O T S ) an arbitrary state. Then x -~ y implies type(x) ~ type(y) for every two objects x, y E OBJ. Proof. From Corollary 1 follows t h a t x --~ y E R E F implies (type(x), type(y)) E USE. The claim follows by induction. A class B internal to a class A can be reached only via A: L e m m a 4. Let TG = (CLASSES, USE) be a type graph of a program and A be

a non-recursive class whose attributes are private, and I N T E R N A L A ~ 0. Then, for any class C E I N T E R N A L A and every class B such that B ~ C, one of the following properties hold (i) B E I N T E R N A L A (ii) B is external to A and every path from B to C contains A. Proof. Analogous to L e m m a 1. Every object y whose type is internal to a class A is either reachable from an object z on the stack whose type is internal to A or an object internal to a sandwich object of class A: L e m m a 5. Let TG -- (CLASSES, USE) be the type graph of a program and

A be a non-recursive class whose attributes are private and I N T E R N A L A ~ 0. For every state s = ( OBJ, R E F , R O O T S ) and for every object y with type(y) E I N T E R N A L A there is an object x of type A such that y E I N T E R N A L (8) or an object z E R O O T S of type internal to A such that z 2+ y. Proof. Suppose there is a C E I N T E R N A L A and an object y of type C such t h a t it is not internal to an object x of type A. Suppose there is an object z with a p a t h r from z to y where type(z) ~. I N T E R N A L ( A ) and the type of every u E r is different from A. By L e m m a 3 there is p a t h r~ from type(z) to type(y) not containing A. This contradicts L e m m a 4(ii). Thus, type(z) E I N T E R N A L A for all objects z such t h a t z ~+ y. Since A ~_ I N T E R N A L A , by L e m m a 3 there is no object of type which is external to A t h a t can reach z or x. Since s does not contain unreachable objects, there must be an object w E R O O T S of type internal to A such t h a t w -+ y. There is a nesting property analogous to nesting of sandwiches (cf. L e m m a 2): L e m m a 6. Let TG = (CLASSES, USE) be a type graph of a program, A be

a non-recursive class whose attributes are private, and I N T E R N A L A ~ 0, and B be a non-recursive class whose attributes are private and I N T E R N A L B 0. If there is a C E ACCESSIBLEB M I N T E R N A L A , then A C C E S S I B L E A M I N T E R N A L B = 0. Proof. Analogous to L e m m a 2.

203 If a class B is recursive and reachable from a class A, all classes in the strongly connected component of TG are either external or internal to A. L e m m a 7. Let TG = ( C L A S S E S , USE) be the type graph of a program, A be a non-recursive class where all attributes are private and I N T E R N A L A ~ 0, and B be a type reachable from A, i.e. A ~ B . If B is recursive, then either all C E ~ are external to A or all C E q8 are internal to A where ~ is the strongly connected component of TG containing B .

Proof. Suppose there is a class C E fl~ internal to A and a class C ~ E ~ external to A. Since ~ is a strongly connected component and 91 is non-recursive, there is a p a t h from C ~ to C in TG not containing A. Then, by L e m m a 4, A cannot be a non-recursive class satisfying I N T E R N A L A ~ 0 whose attributes are private, i.e. the assumptions of L e m m a 7 are violated. Finally, we show t h a t all objects of classes which have internal classes are sandwich objects: T h e o r e m 1. Let TG = ( C L A S S E S , USE) and A be a non-recursive class with accessible types whose attributes are private with unaccessible types. Then, for every state s = ( OBJ, R E F , R O O T S ) , all objects x E O B J of type A are either sandwich objects or none of the attributes of x refer to objects.

Proof. Suppose t h a t there is a state s = (OBJ, R E F , R O O T S ) which contains an x E O B J of type A t h a t is not a sandwich object and one of the attributes refers to an object y. Since the type of this attribute is not accessible this attribute refers to an object y with type(y) E I N T E R N A L A . Since x is not a sandwich type, there must be an object w with a p a t h r from w to y not containing x. By L e m m a 5 there is a sandwich object z of type A in p a t h r such t h a t y is internal to z. But then there cannot be a reference to y from x contradicting our assumption. Algorithm sandwich_types (defined below) identifies the types satisfying the sufficient condition of Theorem 1. The algorithm computes these types by maintaining a set of candidates. The invariant is t h a t all classes which are not candidates do not satisfy the sufficient condition of T h e o r e m 1. After the last step, every candidate satisfies the sufficient condition of T h e o r e m 1, i.e. they are sandwich types. Furthermore, for every sandwich type A, the set of its internal types is computed. Objects of these types are allocated in the heap associated with sandwich objects of type A. The algorithm sandwich_types performs the following steps: 1. Compute the the type graph TG = ( C L A S S E S , USE) of r . Define the set Candidates of candidates to be the set of all classes t h a t have only private attributes. 2. Compute the strongly connected components S C C of TG and the reduced graph, i.e. R U S E = (SCC, R E ) where (91, ~ ) E R E iff there are A e 91 and B E ~ such t h a t (A, B) E USE.

204 3. Let Candidates := {A 6 Candidates : A 6 2 for an 2 E SCC with 21 = 1A (A, A) r USE}. 4. For every A E Candidates compute its accessible types ACCESSIBLEA. Remove all classes A from candidates which contain an attribute whose type is accessible. 5. For every A E Candidates define and perform the following step (starting with ~A ----0) until IA does not change:

A := IAU {B E CLASSES : B ~_ ACCESSIBLEAA VC e PREDB

:

B E IA U {A}}.

6. Remove every A from Candidates where ~A = 0, 7. Declare every A E Candidates to be a sandwich type and define INTERNALA = ~A.

Remark 2. Algorithm sandwich_types recognizes all examples of Section 2. The following lemmas explain algorithm sandwich_types: L e m m a 8 (Step 1). After Step 1, all classes A E OBJ \ Candidates violate the assumption of Theorem 1.

Proof. The assumption of Theorem 1 requires that all attributes of a class are private. L e m m a 9 (Step 3). After Step 3, all classes A E OBJ \ Candidates violate the assumption of Theorem 1.

Proof. Suppose (A, A) E USE or there is a strongly connected component 2 E SCC such that A E 2. In both cases A :~ A, i.e. A is recursive. Hence, the assumptions of Theorem 1 are violated. L e m m a 10 (Step 4). After Step 4, all classes A E OBJ \ Candidates violate the assumption of Theorem 1.

Proof. Let A be a class not in Candidates after Step 4. Suppose, it is not eliminated by Steps 1 and 3. Then, it is eliminated by Step 4, i.e. it contains an attribute whose type is accessible. Thus, the assumption of Theorem 1 is violated. L e m m a 11 (Step 5). After Step 5, for all classes A E Candidates, the set of all classes internal to A.

A

contains

Proof. Step 5 is a closure algorithm in the lattice of sets (ordered by the subset relation) starting with the smallest element. Each step increases the set. Thus, by the fix-point theorem of Tarski 13, the smallest set satisfying

A : ~ A U {B 6 CLASSES: B ~ ACCESSIBLEAA VC E PREDB : B EIA U {A}}

205 is computed. It is not hard to see that IA is also the smallest set satisfying

~A = {B E CLASSES : A ~ B A B r ACCESSIBLEA A r C E PREDB : B E A U {A}}. Thus, the claim follows by Definition 2. L e m m a 12 ( S t e p 6). After Step 6, for every class A E CLASSES: A E

Candidates iff A is non-recursive and contains only private attributes with types internal to A. Proof. Before Step 6, A E Candidates iff A is non-recursive and contains only private attributes whose types are not accessible (Lemmas 8, 9, and 10). Thus, after Step 6, A E Candidates iff A is non-recursive, contains only private attributes, and flA. The claim follows from Lemma 11 since it implies INTERNALA = ~A.

Theorem 2 (Correctness of Algorithm sandwich_type). Let Ir be a program. Every class A declared by Algorithm sandwich_type to be a sandwich type is a sandwich type and INTERNALA is the set of its internal classes. Proof. Follows directly from Step 7, Lemmas 11 and 12, and Theorem 1 It remains to prove the time complexity of Algorithm sandwich_type:

Theorem 3. Algorithm sandwich_type terminates for every program ~ in time O(m 9n) where n is the size of program ~r (i.e. number of nodes in the abstract syntax tree of 7r) and m is the number of classes in the program. Proof. Obviously, the type graph TG can be constructed in time O(n) by a traversal through the abstract syntax tree of ~r. Hence, it is I USE = O(n). Step 2 can be performed in O(CLASSES I + IUSEI) = O(m + n) (see e.g. 1, Section 6.7). While computing the strongly connected components of TG it is possible to mark the classes A such that (A, A) ~ USE and {A} E SCC. Thus, Step 3 can be executed in time O(I Candidates I) = O(m). The accessible types of a class A can be computed by a traversal through the abstract syntax tree of class A. Thus, Step 4 can be executed in time O(n). If the sets A are implemented by Bit vectors over the classes, it is sufficient to set the Bit for the classes B to true in one iteration iff B r ACCESSIBLEA A VC E PREDB : C EIA. The initialization costs time O(m) and the test costs time O(I USEI) amortized over all classes. The maximum number of iterations is O(m). Hence, Step 5 can be executed in time O(m 9n). The implementation of Step 5 can be extended with additional O(m) execution time such that every class A with IA ----~ is marked. Thus, the execution time of Step 5 remains O(m 9 n) and the execution time of Step 6 is O(m). It is not hard to see that the execution time of Step 7 is O(m 2) = O(m . n), because m <_ n.

206 5

Conclusions

We introduced the notions of sandwich types and sandwich object and showed that using object heaps (i.e. one heap per sandwich object) can improve the execution time of object-oriented programs. Theorem 1 gives a sufficient condition for sandwich types. This is used to recognize sandwich types in object-oriented programs. Upon creation of a sandwich object, its heap is created and maintained independently of other heaps. Further work focuses on extending the assumptions in Theorem 1. In particular, the requirement that all attributes are private may be relaxed by defining the type of non-private attributes to be accessible. Another candidate for generalization is the notion of internal classes: Lemma 4 implies that every class can be internal to at most one class except a nesting property is satisfied (Lemma 6). This excludes, e.g. that the type LIST_CELL in our example is used for more than one sandwich type. Hence, the next step is to relax Definition 2 allowing for some classes to be internal to more than one class without nesting. If it would be allowed in general that a class B is internal to more than one class, the condition INTERNALA ~ ~ would not be sufficient to imply that A is a sandwich type: it is not excluded that an object is internal to two different sandwich objects (cf. Figure 5). The key question is: W h a t is the restriction such that INTERNALA 7t 0 implies that A is a sandwich type? Our further work will address this question.

sandwich objects

Fig. 5. Situation if a class B is internal to different classes

An alias and pointer analysis may lead to additional improvements for the implementation of object heaps. For example, stacks, heaps, and lists may be implemented even more efficiently than sketched in this paper. For example, there is no necessity to have a general garbage collection on stacks, queues, and double-ended queues. Since the elements are inserted and deleted at their ends, it is is sufficient to maintain pointers that mark the beginning and the end of the allocated part of the heap. When copying a heap of a list object, it can be linearized. This leads to an additional improvement of locality. Another issue that should be investigated the influence of sandwich types in object-oriented design. It seems natural to use sandwich types when designing object-oriented programs, because it is a way of information hiding. Furthermore, aliasing and sharing of objects can be controlled.

207 A

BOOL - A Basic

Object-Oriented

Language

We define a language which is a p r o t o t y p e of intermediate languages of m a n y object-oriented languages. It is based on 10, 11. We do not consider inheritance and basic types such as integers or booleans since these notions are not i m p o r t a n t in the discussion of sandwich types. Instead introducion parameterized classes, we assume that in a program the p a r a m e t e r s of every parameterized class are instantiated with types. We further focus on the basic features of object-oriented languages (calling methods, accessing attributes, creating objects) and add a few statements required for making BOOL Turing-complete (assignments, conditional statements, method returns). For simplicity, we consider only methods with return types (functions). Method without return types (procedures) can be defined similarly as functions. We define the abstract syntax, the static semantics (in particular typing rules), and the dynamic semantics by abstract state machines. A.1

Abstract

Syntax

A program is a collection of classes together with a designated class MAIN containing a procedure main. main is called when starting the program. Fig. 6 shows the E B N F defining the abstract syntax of BOOL. Attributes of a class A m a y be private. Private attributes of every object obj of class A can only be accessed when executing a method of obj. T1 x . . . x Tk --~ T is the signature of a m e t h o d m ( x l : T 1 , . . . ,Xk : Tk) : T . . . . The conditional statement requires further explanation. Consider the conditional statement if Des '=' E x p r then n occuring in a m e t h o d m of class A. If the condition D e s '=' E x p r is satisfied, then the n-th statement of method m of class A is executed. Otherwise, the s t a t e m e n t after the conditional statement is executed.

A.2

Static

Semantics

A BOOL-program must satisfy the following properties on classes, attributes, methods, variable names, and j u m p targets: - All class names are pairwise disjoint. - For every class, the attribute names and method names are pairwise disjoint. - For every method, the p a r a m e t e r names and the names of local variables are pairwise disjoint. - For every method m, the j u m p targets of the conditional statements of m must be smaller t h a n the number of statements of m.

Furthermore, we assume for simplicity4: - All attribute names and all methods names are different from class names. 4 This can be viewed as a result after name analysis.

208

Prog : : - - Class* Class ::-- class Name is Attr* Method*end ';' Attr ::-- private Var ':' Type ';' Method ::----Id '(' (Par';')*Par ')' ':' Type is (Par ';')*(Star ';')*end ';' Par ::= Var ':' Type Type ::----Name Star ::= AssignReturnl Assign ::----Des ' : = ' Expr Return ::= return Expr If ::= if Des '=' Expr then n Des ::= (Var '.')* Var Expr ::= DeslCalllvoidINew Call ::= Des '.'Id '(' (Des ',')*Des ')' New ::= ' # ' Type where Var is any variable name, Id is any procedure identifier, Name is any class name, and n is a natural number. Fig. 6. Abstract Syntax of BOOL

- W i t h i n every class A, for every m e t h o d m the names of p a r a m e t e r s a n d local variables are different from the a t t r i b u t e names of A, m e t h o d n a m e s of A, and from the class names. It remains to define types. In our case, it is sufficient to assume t h a t classes are types (identified by class names). For defining t y p i n g rules, the following context information is required: - /" contains all classes with their names, attributes, and m e t h o d signatures. - A class A where the attribute, m e t h o d , statement, designator, or expression to be t y p e d occurs. - A m e t h o d m where the statement, designator, or expression to be t y p e d occurs. N o t a t i o n s . F, A, m ~- e : T denotes t h a t within a given context, it can be derived t h a t designator or expression e is of t y p e T. F, A, m ~- s x / d e n o t e s t h a t s t a t e m e n t s is correctly t y p e d within a given context. A p r o g r a m is statically correct iff every s t a t e m e n t is correctly t y p e d within the context it occurs. F, A t- x : T denotes t h a t class A has an a t t r i b u t e x of t y p e T or m e t h o d of signature T. 9 Fig. 7 shows the t y p i n g rules of BOOL using the above notations.

209

Axioms : 1", A, m ~- void : T 1",A, m t - # T

for all types T

:T

Rules : 1",AF-x:T for all methods m of A 1 " , A , m ~- x : T F, A, m t- des : B F, B t- x : C if x is not private in B 1", A , m t- des.x : C F, A b m : T1 x ... x Tk --~ T 1", A, Cn F- dl : T1 . . . 1", A, rh b dk : Tk F , A , ~ n t- r e ( d 1 , . . . ,dk) : T F, B b m : T1 x . . . x Tk --~ T F, A, ~n b des : B 1", A , ~h F- dl : T1 . . . 1", A, ~ ~- dk : Tk

F , A , ~ ~ des.m(dl,... ,dk) : T 1", A, m t- des : T 1", A, m b expr : T F, A , m t- d e s : ~ e x p r x / 1", A - m : T 1 x . . . x Tk -+ T 1", A, m b expr : T 1", A, m b- return exprx/ F, A, m ~- des : T 1", A, m ~- expr : T 1", A, m t- if des=exr then n~/

Fig. 7. Typing Rules for BOOL

A.3

Abstract

State Machines

We define the operational semantics by a b s t r a c t s t a t e m a c h i n e s (ASMs). In this subsection we introduce ASMs as it is required for the operational semantics of BOOL. For the generalization, we refer the reader to 5. An ASM consists of a signature A of the state space, an interpretation of A (the initial state), a n d set of transition rules (used for changing the interpretation of A). A s t a t e is an interpretation of A. It is convenient to assume t h a t interpretations are algebras. In our examples, a t r a n s i t i o n rule has the form i f C o n d i t i o n t h e n Updates where C o n d i t i o n is a t e r m and Updates is a set of updates. An update has one of the following forms: 1. f ( t l , . . . ,t,~) := t for f E A and terms t, t l , . . . , t n . 2. e x t e n d M b y o Updates e n d where M E A is interpreted by sets, o is a new symbol, and Updates is a set of updates (here: only of form (1)) E x e c u t i o n o f update (1) means that t l , . . . , tn, and t are interpreted in the current state, and the interpretation of f is changed to t at point ( t l , . . . , tn) and

210 unchanged otherwise. E x e c u t i o n o f update (2) means t h a t the set M is extended by a new element 5 o, and after this, the updates in Updates are executed. A transition rule fires if its condition evaluates to true in the current state. In this case, its updates are executed. In our example, there is at most one transition rule t h a t fires in a given state. A state is final iff no transition rule fires. It is easy to see t h a t the transition rules define a state transition relation.

Notation: Upper case letters denote sets, lower case letters other elements of A. Sets are denoted as usual. X1 x --. x X n denotes the cartesian product of sets X 1 , . . . , Xn. Tuples are denoted as usual, x $i denotes the projection to the i-th component of tuple x. X --+ Y denotes the set of relations R C X • Y which are functions. R ( x ) denotes the unique element y such t h a t (x, y) e R. X* denotes lists with element type X . xX denotes the list obtained from list X by adding element x, ~ denotes the e m p t y list, and x l , . . . , Xk denotes the list of elements Xl,. 9 , Xk. As usual hd and tl denote the head and tail of a list, respectively. 9

A.4

Operational Semantics

We assume t h a t only static correct programs are executed. The state space of BOOL consists of the symbols defined in Table 1. A f r a m e consists of a set of bindings of local variables and p a r a m e t e r s of the m e t h o d being executed, an object where the method belongs to, and return point (specified by a m e t h o d and instruction pointer). T h e m e m o r y state space is the set M = A \ { curmethod, ip}. A m e m o r y state is an interpretation of the m e m o r y state space M .

OBJ set of objects type E O B J --+ Name dynamic type of an object R E F C OBJ x O B J x Name references between objects (via attributes) ac E O B J the accumulator. env E F R A M E * the environment. eurmethod E Id the method currently be executed

~peN where

the instruction pointer F R A M E = S E T ( B I N D I N G ) x O B J x Id x N B I N D I N G = Id x ( O B J t~ {void})

Table 1. State Space of BOOL

5 This new element is taken from an infinite universe, called Reserve.

211 The initial state is defined by the updates O B J := {system} type := {(system, MAIN)} R E F := 0 ac := system env := ({(Yl, void),... , (Yl, void)}, system, under, undef) curmethod := main ip:=0 where main contains the local variables Y l , . . . , Yk. For the definition of the operational semantics, it is convenient to assume that assignments, conditionalte statements, and return statements are decomposed into the statement sequences shown in Table 2. load expr loads the object computed by expr into the accumulator, store des stores the object contained in the accumulator to the object designated by des, and beq des n compares the object in the accumulator with the object designated by des and jumps to the n-th statement of the current procedure. The return instruction returns the object contained in the accumulator.

statement instructions ~des:=expr load expr; store des if des=expr then nlload expr; beq des nl return e load e; return Table 2. Decomposition of Statements into Instructions

N o t a t i o n s : The operational semantics uses the following abbreviations and notations. Each of these can be formally defined. Sometimes we leave the formal definition to the reader, self = hd(env) $3 denotes the object where the current method is executed. Cmd is instr denotes the instruction to be executed, i.e. if ip = i and curmethod = m, then instr is the i-th instruction of m e t h o d m in type(sell), binding = hd(env) $1 denotes the bindings of the method currently being executed, is_local(x) is true iff x is a local variable or parameter of the method currently being executed, is_attribute(x) is true iff x is an attribute of type(self). We write Ol 2+ 02 E R E F instead of (01,02,x) E R E F . 01 -+ 02 E R E F denotes that there is an attribute x such that 01 2+ 02 E R E F . succ(x, obj) denotes the object referenced by object obj via attribute x, i.e., it denotes the unique element 0 such that obj 2+ 0 E R E F if it exists (otherwise it is void). In this case ref(obj, x) denotes this reference, bind(x) denotes the object bound by binding to local variable or parameter x of the current procedure, object(des)

212 denotes the object designated by designator des, i.e. void succ(self , x)

if des = void if is_attribute(x) if is_local(x)

object(des) : lbind(x) ,succ(object(des'),x)

For a method re(x1 : T 1 ; . . . ;xn : Tn) : T . . .

if des = des~.x for a designator des ~ with local variables Y l , . . . , yk,

bindto(m, objl, . . . , obj,~) =

{(Xl,

Objl),

. . . , (Xn, objn),

(Yl, void),... , ( Y k , void)}

denotes the binding that binds obji to xi, i = 1 , . . . ,n. update_binding(x,o) updates the binding for x, i.e. binds o to x, i.e., update_binding(x, o) = binding := binding \ { (x, b i n d ( x ) ) } U {(x, o)} update(o, x, d) updates the object referenced by object o via attribute x to the object designated by desigator d, i.e. update ( ol, x, 02) = undefined R E F := R E F \ {re(ol, x)}

if Ol = void if ol ~t void and 02 = void

R E F := R E F \ {re(ol, x)} U {02 -~ 02}

otherwise

Figure 8 shows the transition rules for loading objects into accumulators using the above notations. Figure 9 shows the other transition rules. A transition rule is not applicable if it would access an attribute or call a method of void. The following theorem relates typing rules to the dynamic type of objects. The proof is a induction on the number of state transitions and left to the reader. T h e o r e m 4. Let r E BOOL be any program, q be any state of ~r reachable f r o m an initial state, A = type(self) and m = curmethod in state q. Then, the following properties hold:

(a)

If F, A, m t- des : T, then type(object(des)) = T or object(des) = void in state q. (b) If F , A , m t- expr : T and the c o m m a n d to be executed in state q is a store-instruction (resulting f r o m an assignment des := expr), a return (resulting from return expr), or a conditional s t a t e m e n t (resulting f r o m if des = expr then n), then type(ac) = T or ac = void in state q. This theorem implies directly the C o r o l l a r y 1. Let 7r E BOOL be any program and q be any state of 7r reachable from an initial state. Then, for every objects obj, obj ~ E O B J and obj -~ obj ~ in state q, the following properties are satisfied: (a) T = type(obj) contains an attribute a. (b) I f F, A ~ a : T', then type(obj') = T ' .

213 i f Cmd is load des t h e n ac : = object(des)

ip := ip + l i f Grad is load # T t h e n e x t e n d OBJ b y o ac :~ obj

type(obj) : = T end

ip := ip + l i f Grad is load re(d1,... ,d,~) t h e n env : = ( bindto(m, object(d1),... , object(&=)), self, curmethod, ip + 1)lenv curmethod : = m

ip:----O i f Grad is load des.m(dl,... ,dn) A object(des) # void then

env

:=

(bindto (m, object ( d l ) , . . . , object (d,~) ), object (des), curmethod, ip + 1)l env curmethod : = m ip:=O F i g . 8. Transition Rules for loading the A c c u m u l a t o r i f Cmd is store des A object(des) ~ void t h e n store(des, ac)

ip := ip + l where

f update_bindings (des, ac) store(des, ac) ---- ~ update(sell, des, ac) update(object(des'), x, ac) i f Cmd is return A curmethod -7/=main t h e n env := tl(env) curmethod : = hd(env) $3 /p : = hd(env) $4 i f Cmd is beq des n A object(des) = ac t h e n ip : = n i f Cmd is beq des n A object(des) # ac t h e n ip : = pc + 1 F i g . 9. O t h e r Transition Rules

if is_local(des) if is_attribute (des) if des = des'.x

214

References 1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman. Data Structures and Algorithms. Addison-Wesley, 1983. 2. P. S. Almeida. Balloon types: Controlling sharing of state in data types. In ECOOP' 97 - Object-Oriented Programming, volume 1241 of Lecture Notes in Computer Science, pages 32-59, 1997. 3. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw Hill, 1991. 4. Free Software Foundation. The GNU C library. URL: http://www.gnu.ai.mit.edu/softwaxe/libc/libc.html. 5. Y. Gurevich. Evolving Algebras: Lipari Guide. In E. BSrger, editor, Specification and Validation Methods. Oxford University Press, 1995. 6. Barry Hayes. Using Key Object Opportunism to Collect Old Objects. In Proceed-

ings of the OOPSLA '91 Conference on Object-oriented Programming Systems, Languages and Applications, pages 33-46, nov 1991. Published as ACM SIGPLAN Notices, volume 26, number 11. 7. J. J. Horning. A case study in language design: Euclid. In F. L. Bauer and M. Broy, editors, Proceedings of the International Summer School on Program Construction, volume 69 of LNCS, pages 125-132, Marktoberdorf, FRG, July-August 1978. Springer. 8. Daniel H. H. Ingalls. The smalltalk-76 programming system design and implementation. In Conference Record of the Fifth Annual A CM Symposium on Principles of Programming Languages, Tucson, Arizona, pages 9-16. ACM, January 1978. 9. Richard E. Jones. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley, July 1996. With a chapter on Distributed Garbage Collection by R. Lins. Reprinted February 1997. 10. H. W. Schmidt and W. Zimmermann. A complexity calculus for object-oriented programs. Journal of Object-Oriented Systems, 1(2):117-147, 1994. 11. H. W. Schmidt and W. Zimmermann. Reasoning about complexity of objectoriented programs. In E.-R. Olderog, editor, Programming Concepts, Methods and Calculi, volume A-56 of IFIP Transactions, pages 553-572, 1994. 12. James W. Stamos. Static grouping of small objects to enhance performance of a paged virtual memory. ACM Transactions on Computer Systems, 2(2):155-180, May 1984. 13. A. Tarski. A lattice-theoretical fixpoint theorem and its application. Pacific J.Math., 5:285-309, 1955.

Garbage

Collection --

via Dynamic

A Formal

Treatment

Type

Inference

--

Haruo Hosoya and Akinori Yonezawa Department of Information Science, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan

{haruo, yoneza~a}@is, s.u-tokyo.ac, jp

Abstract. A garbage collection (GC) scheme - - what we call type inference G C - that dynamically performs type inference is studied. In contrast to conventional garbage collection that can collect only unreachable objects, this scheme can collect objects that are reachable yet semantically garbage. The idea is to exploit ML-like polymorphic types that can tell whether or not each object may be used in the rest of computation. There has been some work studying algorithms of the GC scheme. However, their descriptions had some obscurity in details of their methods, and did not give any formal correctness proof that captures the details. These facts, we believe, make their descriptions still unconvincing for implementors. This paper aims to present a trustworthy specification of the GC scheme. In this specification, we first consider an underlying language that suitably reflects implementation details, on top of which we then formulate an algorithm of type inference GC, and formally prove its correctness. A significant point in our formulation is that we specify how to deal with Hindley-Milner polymorphism. Furthermore, showing our experimental results, we discuss in what cases this GC scheme is beneficial.

1

Introduction

In a program, some objects (i.e., heap-allocated values) are never accessed after a given point of execution. We call such objects s e m a n t i c garbage at that point. These include not only unreachable objects with no pointers from the live part of the heap, but also some reachable objects whose values are not needed in the rest of the computation. For example, consider the following functional program:

let l = 1, 21, 3 g = A f . ( f l) in g length end Suppose that a garbage collection (GC) is invoked just before the application ( / l ) is evaluated. Since the elements of the list l are reachable, these elements are retained by conventional trace-based GC schemes, such as mark & sweep

216 and copying. However, they will never be accessed from then on. We could enable conventional GCs to collect such semantic garbage by inserting assignments that "nullify" variables or fields of data structures holding the garbage. However, clearly it would impose a heavy burden on programmers and considerably decrease readability of programs. In order to collect such reachable garbage, recent work has suggested a sophisticated GC scheme - - what we call type inference G C - that exploits ML-like polymorphic type systems 5, 4. Basically, the GC scheme traces objects, using their types that indicate their structures. These types are recovered partly from static types. However, due to ML-polymorphism, some static types contain type variables whose actual types are determined dynamically. The GC scheme infers such types at GC time. Not all actual types for type variables can be recovered. Surprisingly, however, objects whose types are not recovered, i.e., remain to be type variables, can safely be collected. In the above example, the global state just before ( f l) is evaluated can be represented as

letrec 11 = I 12 = 2 13 = 3 l = 11,12,13

f = length in f l where the bindings represent the heap, and the body expression represents the remaining computation, which is only the application of f to 1. In other words, the body expression represents the stack. Let the pointers f and I in the stack be given static types tl -+ t2 and tl, respectively. Suppose that the GC is triggered at this point. The GC first traces the stack, which contains two pointers f and I. Since the static type tl of I does not tell what type of object it refers to, the GC suspends to trace I. The GC then traces f with static type tl -+ t2, and performs unification between this type and the type Vt.(t list) --+ int of the object length. Since this will instantiate the static type tl of I to be t list, the GC traces l, resulting in giving type t to the pointers 11, 12, and 13 to list elements. Finally, these elements are collected as semantic garbage. It is correct intuitively because the pointers 11, 12, and Iz can be assumed to refer to objects of any type (nil, for example) for correctly performing the rest of computation. Closely viewing the algorithm, notice that we had to remember the pointer 1 that was given type variable, because the type variable might be instantiated in the rest of tracing. Goldberg and Gloger uses a mechanism called "defer-list" for this purpose 5. The GC keeps an association of addresses with types. When the GC finds pointer, it associates the pointed address with the static type given to the pointer. The GC traces an address only when it is associated with a concrete type (a type other than a type variable, such as a function type and a list type). If there are multiple pointers to an address, the GC may trace the same address more than once. When such sharing occurs, the GC unifies the types given to the pointers. For example, if the GC encounters another pointer to the list that is given type (int list) list, then it unifies t list and (int list) list.

217 From the above presentation, we can see that type inference GC is a rather complicated scheme and raises several non-trivial questions. First, it is not obvious that GC-time unifications always succeed, especially those for shared addresses. Second, it is subtle how to unify polymorphic types that are not trivially related (e.g., tl --~ t2 and Vt.(t list) --+ int). Third, we may not easily understand that the collected objects are actually garbage. Previous papers on the topic have not addressed these questions rigorously. Therefore we aim in this paper to give a trustworthy specification of type inference GC, in the sense that it specifies a GC algorithm reasonably in detail, and that it is given a robust trust by a formal proof of its correctness. For this goal, we first define an underlying language that suitably treats sharing and embeds static type information. On top of it, we construct an algorithm of type inference GC that uses information contained in the language. We then prove correctness of the algorithm. The correctness consists of termination which includes success of each GC-time unification, and soundness with respect to an operational semantics of the underlying language, which tells that collected objects are actually garbage. A significant point in our formulation is t h a t we use Hindley-Milner polymorphism, and specify how to deal with polymorphic types in the GC. Although our framework uses a strict functional language, we believe that it can be applied to languages in other evaluation orders such as a lazy functional language. In addition, our framework can be extended to include other popular features such as tuples, variants, and ML references. It should be remarked, however, that this scheme does not work for languages with operations that may inspect types of their actual arguments. The operations include polymorphic equalities, typecase constructs, and dynamic type checks. Among many questions on practicality, we are especially interested in how much semantic garbage can be collected by the type inference GC scheme in comparison with conventional GCs. We examined it by a preliminary experiment using a prototype interpreter and a type inference GC that we implemented in Standard ML. From the results, though we cannot claim that this scheme benefits for every application, we expect that it benefits particularly for programs with a program structure that we call the phase structure. The rest of this paper is organized as follows. After defining basic notation in Section 3, we present our language in Section 4. On top of this, we describe an algorithm of the GC scheme in Section 5 with its correctness. We formally proved all theorems in this paper. In Section 6, we show results of our experiment and give a discussion on them. In Section 7, we remark on cost of this scheme. Section 8 reviews related work. In Section 9, we conclude this paper as well as touch upon future directions.

2

Underlying Idea

When discussing algorithms of GC, we often consider a notion of accessibility, with which we may give a GC algorithm in such a way that it repeatedly finds

218 an accessible object and traces it. Particularly in traced-based G C schemes, all objects referred from a (live) object are accessible. For example, an object Ax.(f l) has two free variables f and 1 and the objects referred by these variables are accessible. In the type inference GC scheme, we use a type-based accessibility. The above object will have the following type judgment:

{ f : tl -'+ t2,t : tl} I'- )~x.(f t) : t3 --r t2 Usually, we would read this as "under the t y p e environment { f : tl --+ t2, I : tl}, the object Ax.(f l) has type t3 --+ t2." However, the type judgment tells more interesting information: The type environment { f : tl --+ t2,1 : tl} tells how the object Ax.(f l) accesses the free variables. Specifically, the object accesses the variable f at most as a function of type tl -+ t2, and does not access the variable I. In our formulation, we refer to such a pair of a type environment and a type as a typing. For a given typing for an object, if it gives to a free variable x a type other t h a n a type variable, then we say t h a t the object referred by x is accessible. The idea of this accessibility will be understood as follows. As in usual MLlike type systems, well-typedness p r o p e r t y of an object is preserved by any instantiation of type variables. Moreover, the well-typedness property ensures t h a t evaluation will never get stuck as far as any subexpression of ~i~c object is the redex of the evaluation. Therefore free variables t h a t are assigned t y p e variables can be assumed to refer to arbitrary objects. This intuitively means t h a t the objects referred by the free variables will never be accessed. 1

3

Basic

Definitions

Let T V a r be a set of type variables ranged over by s , t , . . . . Define monotypes and polytypes as usual: (monotypes) T ::= t int rl --~ r2 (polytypes) a ::= T I Vt.a Monotypes are either type variables, integer types, or function types. T h e t y p e ~tl... Vtn.T is abbreviated as V t l . . . tn.T or Vt.~-. Define Unq(VE.r) = T. Let there be a set of (program) variables, ranged over by w, x, y , . . . . Addresses are a subset of variables, which are ranged over by a, b , . . . and are assumed to contain a distinct address # . The address # will be used to be temporarily assigned to 1 Of course, this reasoning fails when we have some operations such as typecase that may inspect types of objects.

219 a variable. We define a type environment as a set of pairs of a variable (including # ) and a polytype in the form of x : a, with no variable x occurring twice: (type environments) F, A ::= {xl : a l , . . . , xn : an} We regard a type environment F as a finite function that maps x into a for each

x:aEF. A type substitution S is a substitution of monotypes for free type variables. F T V ( T ) and F T V ( F ) denote the sets of free type variables occurring in T and F, respectively. S F is defined by (SF)(x) -- S ( F ( x ) ) for x e Dom(F). Pairs of type environments and monotypes, written (F, T/, are called typings. The monotype r of a typing (F, 7-) is called result type. A type T' is a substitution instance or simply an instance of T via S, written T "<S T', iff ST = 7". A typing (F, T) is less instantiated than (F', ~-'), written (F, T) "<S (F', T'), iff S F C 1"' and T "<S r'. We omit the subscript S if it is not important. A monotype T is a generic instance of a polytype a = Vtl . . . t n . T ' , written a _> T, iff there is a type substitution S for t l , . . . , tn such that ST' --- T. As for polytypes, we write a >_ a' iff there is an a-variant Vsl ... sm.T' of a' such that no si (1 < i < m) occurs free in a and a > T'. When Dora(F) = Dom(F'), we write F > F ' iff F(x) > F'(x) for any x 9 Dora(F). Let f and g be finite mappings. Dora(f) and R n g ( f ) are the domain and the range of f , respectively. The function composition f o g is a function defined by ( f o g)(d) = f(g(d)) for d 9 Dora(g). When the domains of f and g are disjoint, the disjoint sum f t~ g is a function defined by ( f t~g)(d) =

f(d) (d 9 Dora(f)) g(d) (d 9 Dom(g)).

We write flA for the restriction of f to the domain A.

4

Language

As mentioned in Introduction, we consider an underlying language in which sharing and static type information are embodied. In the language, adopting ideas in 12, 11, we use explicit temporary variables, environments, stacks, and heaps.

4.1

Source Language

Before giving how to embed static type information in our language, we explain what static type information to use. Instead of type information obtained by a usual static type inference, we use finer type information, which was proposed by Fradet 4. In this method, we use the principal typing for each continuation. The motivation is as following. The usual type information monolithically gives each variable a single type. However, for a continuation when the GC is invoked, such type information may be too instantiated. For example, consider the following program:

220 let x = hd (hd l) in length l Because of the first line, l is given type (t' list) list. Suppose t h a t the GC is invoked just before length 1. Even though the elements of l will not be used, the GC cannot collect t h e m because they are given a concrete type t' list. We can (partly) overcome this problem by providing "minimal" type information individually for each continuation. Specifically, we m a y give l t y p e (t ~ list) list for let x = hd (hd/) in length l, and give l type t list for length 1. In our formulation, to express this static type information, as a source language, we use the form of expressions t h a t each intermediate result is explicitly bound to a " t e m p o r a r y " variable (also known as A-normal form), and annotate each expression with static type information. The annotation has the form (r,~)e, where (F, T) gives a typing for e. T h a t is, intuitively, F tells how e accesses its free variables, and T tells at least what type e has, as explained in Section 2. We can easily see t h a t this form is suitable for computing type information for each continuation at compile time. In the above example, we m a y give annotation typings as (rl,rl) let x = hd l in (r2,~2) let y = hd x in (ra,r3) (length l)

where Fl(l) = (g list) list and -P3(/) = t list. Formally, the syntax of our source language is summarized as follows: (values) v ::= (r,~)i I (r'~)Ax.c (codes) ~ : : = ( r , r ) l e t x : a = v i n c

I(r,~)letw:a=xyinc

I(r,~)x

Values are either integers 2, ranged over by i, or functions. A code is a sequence of let expressions t h a t terminates with a variable for result of the code. The let expression binds the result of either a value or an application to a variable. Each value or code is added an annotation typing. 4.2

Evaluation

In real implementation, of course, the GC uses the static type information as it is given at compile time. Therefore we want the description never to operate on the annotation typing at run time. For this purpose, it is not a p p r o p r i a t e to use substitutions in our operational semantics. If we used substitutions, we might have to somehow define "Fly/x" for considering ((r,~)e)y/x where y/x is substitution of y for the variable x. This would involve modification of a mapping from x to a mapping from y, and even unification between the type of x and the type of y if mappings from b o t h variables already exist. (Actually, such manipulations should be postponed until GC time.) 2 To simplify the arguments, we allocate integers to heaps, which are usually represented in actual implementation as unboxed values.

221 Instead of substitutions, we use explicit environments and stacks. An environment is a finite function from variables into addresses, as defined by: (environments) V ::-- ( x l

-: al,...,

Xn

.:

an}

We define frames as pairs of environments and codes, and objects as pairs of environments and values: 3 (frames) F ::-- ((V,c)) (objects) h ::= ((V,v)) An environment attached to a code or value m a p s variables in the code or value into addresses in heaps. Thus environments represent local variables in frames of stacks or closure records of function objects. A stack is a semicolon-separated sequence of frames where the right-most frame is "top": (stacks) C ::= F I F M ; C A stack is extended at a function call and is shrunk at a result. We assume t h a t the environment of each frame in a stack m a p s some variable x into # except for the top frame. Moreover, to treat sharing, we use explicit heaps. A heap is a finite function from addresses into objects: (heaps) H ::-- {al = h1~1,...

,an -~ h,~~}

When a value is evaluated, an object for the value is allocated in a heap. When an application is evaluated, the variable referring to a function is automatically dereferenced. Heaps can treat cycle. However, it will make sense only when our source language includes some feature to create cycles in heaps. A global state is represented as a program in the letrec form that consists of a heap and a stack. An answer is a p r o g r a m t h a t has only one frame whose code is just a variable: (programs) P ::= letrec H in C (answers) A ::= letrec H in ((V, (F'~)x)) Above, we have another type annotation a in h" and a in F ~. We call these types "declaration types" and will explain t h e m later. Before giving our operational semantics, we show an example of evaluation in our language in Figure 1. At the beginning, the p r o g r a m has the e m p t y heap and a stack with one frame t h a t has the e m p t y environment and the code cl. In the first step, we evaluate the value vl. It results in allocation of an object at a new address a in the heap. The object is formed by coupling the current environment and the value as (((}, Vl)). The address a is assigned to the t e m p o r a r y variable x in the environment, and the code is now set as c2. In the second step, we 3 We can include tuples in the same way by expressing a tuple object as ((V, (xl, x2))), though it should be represented in actual implementation as (V(xl), V(x2)).

222 letrec alloc

-~

letrec { a =

apl~ letrec

{a =

ret)

{a =

letrec

where

{} in (({}, el)) (({}, vl)>} in (({. = a}, c2)) (({}, vl)>} in <({x = a, y = # } , c3)>; (({z = 1}, e4>) (({}, v,>)} i. <<{x = a, y = 1}, ~3))

cl = ( r ' ' ) l e t x = vl in c2 c2 = (r~'~2)let y = (x 1) in c3 c3 = (r3,~a)y

vl = z

Fig. 1. Example of evaluation

evaluate the application (x 1). It dereferences the variable x, which refers to the function object (((}, vl)/ that has just been allocated in the heap. The stack is extended with a new "calee" frame that is constructed from the object, with the parameter z being assigned the actual argument 1, and the code being set as the body code c4. In the "caller" frame, # is temporarily assigned to the variable y until the calee returns. In the third step, the calee returns the value 1 that was assigned to z. We shrink the stack and reassign 1 to the variable y in the caller frame that has been assigned # . Notice that all the annotation typings on the codes and the values are never manipulated in the evaluation. Thus the static type information is available whenever GC is invoked. Our operational semantics is given by an evaluation relation > over programs, defined in Figure 2. 4 The (stack) rule says that if a program can be reduced, then a program obtained by adding a frame at the b o t t o m to the program can also be reduced. The other three rules are as explained above. The GC will use declaration types, which have two kinds. One is on objects. When a let expression let x : a = v in c of allocation is evaluated, the declared type a is attached to the allocated object, as hM. The other is on frames. When a let expression let w : a = (x y) in c of application is evaluated, the declared type a is attached to the caller frame, as F M . While the declaration type on an object is relevant to the type of the object, the declaration type on a frame is relevant to the type of # in the frame. Therefore declaration types are attached to every frame except for the top one. Declaration types are used only at GC time and are not essential for deriving type judgments. We formally define accessing of an address, and semantic garbage. D e f i n i t i o n 4.1. W e s a y P accesses a if V ( x ) = a, a E D o m ( H )

and e i t h e r

1. P = letrec H in ((V, (r'r)x)); or 4 In this formulation, an environment may contain variables not free in a code or value. This may not only make closures uselessly large but also cause unnecessary unification at GC time. It would be more close to implementation if we restricted the environment as ((Vfv(~), e)). Our experiment shown in Section 6 is based on it. Our formulation, however, does not adopt it for simplicity. Instead, we avoid unnecessary unification on the side of GC formulation. (See Section 5.)

223

letrec H in C (stack) letrec H in FIe; C

> letrec H ~ in C ~ > letrec H ~ in F~; C'

(alloc) letrec H in (iV, (r'r)let x : a = v in c)) ---+ letrec H ~ {a = ii V, v)) ~}

in (iV W { x = a}, c))

(a ~_ DomiH))

(app) letrec H in ((V, (r,~)let w : a ----(x y) in c))

(H(V(x)) = ---+ letrec H in <(V ~ {w ----#},c))~;

((V',

(r"~')Az.c'))~')

<(V' ~J {z : V(y)},c'))

(ret) letrec H in ((V ~ {w -- #}, c))~; ((V', (r'~)x)) letrec g in ((V t~ {w = Y'(x)}, c)) Fig. 2. Operational semantics

2. P = letrec H in F I l l ; . . . ; ((V, (r'T)let w = (x y) in e)). 111 is semantic garbage or P = letrec/-/1 ~JH2 in C iff there is no PI s.t. P P ' and pi accesses some a E D o m ( H 1 ) .

)*

Type System

4.3

We give the typing rules for our language in Figure 3. To m a k e the set of t y p i n g rules c o m p a c t , we define expressions as a super set of b o t h the set of values and the set of codes: (expressions) e ::= x l i I Ax.e I el e2 I let x : a = el in e2 I (r'r)e T h e typing rules derive the following j u d g m e n t s : F F F F t-

~- e : T ~- ((V, e)) : T ~- C : T t- H : F I P :T

well-typed well-typed well-typed well-typed well-typed

code or value frame or object stack heap program

A l t h o u g h these t y p e j u d g m e n t s c a r r y a l m o s t the s a m e i n f o r m a t i o n as usual Hindley-Milner t y p e systems, we can read these, as m e n t i o n e d in Section 2, t h a t the t y p e e n v i r o n m e n t indicates how the syntactic object accesses its free variables and w h a t t y p e the syntactic object itself has. T h e first five rules are the same as in usual t y p e systems. T h e (annot) rule gives conditions for static t y p e information by two premises. T h e first premise ensures t h a t a n y typing for an enclosing context m u s t be m o r e i n s t a n t i a t e d t h a n the a n n o t a t i o n typing. T h e second premise ensures t h a t the expression m u s t be well-typed u n d e r the a n n o t a t i o n typing. In Section 4.1, we m e n t i o n e d t h a t a n n o t a t i o n typings were intended to express the principal typings for expressions.

224 Expressions: r l - i : int (int) Fkel:r'

F(x) > r (vat) F F- X : T

Ft~ {x: r~} ~- e: r2 (abs) F F- Ax.e : rl "+7-2

~=Y{.r' ~nFTV(F)=O Ft~{x:a}f-e=:r Ff-letx:a=el ine~:r

r ~ el : rl -+ r2 F }- e2 : rl (app) F I- el e2 : r2

( r ' , r') -.~ (F, r)

(tet)

F ' F- e : r '

(annot)

/~ ~- (v"*')e : r

Environments, stacks, heaps, and programs: FoV>F' r'f-e:r F I- (
a'=Vt.r'

(env)

OI-H:F FF-C:r I- letrec H in C : r

(pro9)

F~{#:a'}kF:r

a•

tNFTV(F)=O F t- F~; C : r

Va e Dom(F') = Dora(H). (H(a) -- It" F t~ V' ~- h: Unq(V'(a)) F ~ H : F'

(stack)

a • F'(a)) (heap)

Fig. 3. Typing rules

However, we do not incorporate the principality in our formulation because it is not relevant to soundness of the GC. The (env) rule has two premises. Assuming a type environment F ' for local variables, the second premise ensures t h a t the expression must be well-typed under (F', r). The first premise then relates by generic instance between types of addresses given in F and types of the local variables given in F ' . This premise would intuitively be understood by regarding the environment-attached expression <<{X 1 = al,...}, e)) as a let expression let {xl = a z , . . . } in e. In order to give a typing for this expression, we would require F ( a l ) >_ F ' ( X l ) , . . . . Note t h a t this typing rule ensures t h a t x E D o r a ( V ) and V ( x ) E D o m ( T ' ) , for any x E F V ( e ) . Combining the (annot) rule and the (env) rule, we can relate the type of an address a in F and the type of a variable x in F ' t h a t refers to a, as 1-'(a) >_ 1-"'(x) and 1-'(x) -.~ F " ( x ) for some T"'. Since this relation is often used in the next section, we abbreviate it as T'(a) -.,g F ' ( x ) . Formally, we write a " ~ s a' iff a ' _> a " and a -~ a " for some a " . Similarly, we write (F, T} -~S ( F ', r ' ) iff F ~ >__F " and (F, r ) -<s (F", r'} for some F " . The (stack) rule is analogous to the (let) rule. The (heap) rule deals with potential cycles in heaps. The (stack) rule and the (heap) rule additionally require "compatibility" between declaration types and types t h a t are inferred by the rules. Polytypes al and a2 are compatible, written a 1 • 0-2, iff a3 -< al and 63 -~ ~r2, for some polytype a3- In the (stack) rule, the declaration type a on

225 F must be compatible with the inferred type a f of # . In the (heap) rule, the declaration type a on h must be compatible with the inferred type F ~(a). Although we do not describe in this paper, we can infer annotation typings from source programs using a standard algorithm of polymorphic type reconstruction 10 with a small modification. We can show type soundness of our language as the following theorem that a well-typed program terminates or proceeds without a type error. T h e o r e m 4.1 ( T y p e S o u n d n e s s ) . I f F- P : T, then either P is an answer or else there exists p i s.t. P

5 5.1

Garbage

; P~ and F- p t : T.

Collection

Overview

Let us begin with viewing the type inference GC scheme on the analogy of the trace-based GC scheme. Both schemes maintain a live set, which is memory regarded as live during the GC, A live set consists of the current stack, and a part of the current heap, called "to-heap". Then these schemes can be seen as a fixpoint algorithm to find a live set that satisfies the following conditions: the live set contains the stack and all objects accessible from the live set. Concretely, the algorithm begins with a live set consisting of the stack; it then repeatedly finds an object that is accessible from the live set and that is not yet in the live set, and adds the object to the live set; the algorithm terminates when it cannot find such an object. While the trace-based GC uses "refer-to" accessibility, the type inference GC uses a type-based accessibility, as mentioned in Section 2. To obtain an accessible object, we maintain a G C typing during GC. A GC typing gives a "typing for a live set", that is, it tells the following information: - how the stack accesses addresses - how each object in the to-heap accesses addresses An accessible object can be obtained by picking up an object whose address is given a concrete type in the GC typing. When finding an accessible object, the GC adds it to the live set. At this point, the GC typing must be updated so that it also tells how the added object accesses addresses. It is precisely for this purpose that we perform GC-time unification. In the unification, we extract the annotation typing given in the object, and unify the GC typing with the annotation typing. We call this action "tracing", which is a reminiscence of the trace-based GC. An interesting point is that the GC typing precisely corresponds to the types of pointers kept at addresses that are mentioned in Introduction. The rest of this section formally describes the above algorithm of type inference GC and proves its correctness. Section 5.2 describes our treatment of polytypes in the GC-time unification. In Section 5.3, we give a function for tracing an object or a stack, which involves unification between a GC typing and an annotation typing. Section 5.4 presents the main loop of the fixpoint algorithm.

226

5.2

P o l y t y p e Unification

It seems inevitable that the GC-time unification encounters polytypes that are not trivially related. For example, consider the program let g : Vtzt2.t2 -+ (t~ -+ tl) --+ tz = )~x.Ay.(P2'r~)(y x) in let z = (g I length)

in (P~'T~)length where Fl(length) = Vt3.t3 list --+ int and F 2 ( y ) = t2 ~ tl. After the application (g I length), the program will be

letrec {alength = (({},...))Vt3"t311st-~int} in (({..., length = ale,gth, Z = # } , (rl'~')length));

(({... ,y =

alength}, (F2,T2) (y

X)))

where alength is the address where the function length allocated. Because the variable length in the first frame and the variable y in the second frame share the same address alength, GC would have to somehow unify their types Fl(length) = Vt3.t3 list -+ int and F2(y) = t2 -~ tz. Since considering unification between arbitrary two polytypes seems difficult, we develop a unification method that works specially in our framework. First of all, the goal of unification is to obtain a GC typing (A, T) under which a newly added object or frame is well-typed. In the above case, when we trace the first frame, the GC typing should satisfy A(alength) "~ F1 (length), and when we trace the second frame, the GC typing should satisfy A(a,e,gth) -<< F2(y). It turns out technically convenient to initially give a GC typing that maps every address into the least instantiated type of the declaration type on it. In the above case, since the declaration o n alength is Vt3.t 3 list --+ int, we initially give A(alength) : Vt3.t3 list --~ t4, where t4 is a fresh type variable. Our unification method is rather simple. Suppose we have as inputs a type A(a) = az and a type F(x) = as. We first instantiate all quantified type variables in the types az and a2 with fresh type variables, and then unify the obtained monotypes. As a result, we obtain a type substitution S and update the GC typing as SA(a) = Saz. In the above example, we will obtain for the first frame a type substitution Sz = {t4 ~ int}, and update the GC typing as SiA(ajength) = Vt3.ta list --+ int. For the second frame, we will obtain a type substitution $2 = {tl ~ t~ list, t2 ~-~ int}, and update the GC typing as $2S1 A(alength) = Vt3.t 3 list-4 int (actually unchanged). To roughly explain why this method works, we have the following invariant at any time in the GC. For any address a, we have its "actual type" F0(a) that satisfies A(a) -< Fo(a), and Fl(X) -~ F0(a) for all annotation typings (F1,T1). The first condition is ensured by our initial GC typing and unification method given above, and the second condition is ensured by our type system. From these conditions, our unification can be shown to succeed with SA(a) > SFz (x), which implies the goal condition of unification: SA(a) -<
227 The following function PolyUni formally specifies our unification method where we generalize the above method to take typings as inputs:

PolyUni((rl, T1), (r2, f2)) = let E = {Unique(Fl(X)) = Unique(_P2(x)) I x E Dom(_P1)} U { f l = f2} in S = Unify(E) The function Unify is the well-known unification algorithm to compute the most general unifier of a set E of unifiable equations of monotypes 14. Unique instantiates all quantified type variables in a polytype as fresh type variables, defined as Unique(Vt.7-) = $7 where S(t) = t' (t 6 {, and t' is a fresh type variable). The following lemma summarizes the above-mentioned properties for PolyUni. L e m m a 5.1. Suppose F T V (-r'I , T1)NFTV ( F2, TZ) = 0 and Dom(F1) = D o m ( F 2 ) /f SFz, ST1 = ST2, and S 1 = S' o(SIFTV(FI,rl)) for some S'. 5.3

Trace Function

When tracing an object or stack, we use the function Trace defined below. Trace takes a GC typing (A, T) and an object, frame or stack, and unifies the GC typing and the annotation typing in the object, frame or stack, computing a type substitution S. As a result, the object, frame or stack will be well-typed under the new GC typing (SA, ST):

Trace((A, T), ((V, (r"'>e))) = let S = PolyUni((A o VIFv(e), r), (FlFv(e) , r')) where F T V ( A, T) N F T V ( F , T') = 0 in

SIFTV(A,.r)

Trace((A, T), F M ; C) = let a' = LIT(a) S = Trace((A ~ { # : a'}, T), F) S' = Trace((SA, Unq(Sa')), C) in S' o S

The function LIT computes the least instantiated type of a polytype, as defined by LIT(Vt.T) = V{.LIT'(t, T) where

UT'({,~) =

t' ({N F T V ( T ) = O,t' is fresh) UT'(g, T1) -+ UT'(<~2) (~ = rl ~ r~) T (otherwise).

liT replaces every subterm in a polytype that contains only unquantified type variables with a fresh type variable. For example, klT(Vt.(int -+ int) --+ (t --+ t)) = Vt.t' -+ (t --+ t) where t' is a fresh type variable. In particular, if a is a monotype, liT(a) is simply a fresh type variable. For an object or frame, Trace computes a type substitution that unifies the GC typing (A, 7-) and the annotation typing (F, 7-') given to the value or code e. To explain more specifically,

228 - for each (local or free) variable x in the environment V (that is free in e,) Trace unifies the type of x given in the annotation typing (F(x)) and the type of the address assigned to x that is given in the GC typing (A(V(x))); - Trace unifies the result type of the annotation typing (T') and the result type of the GC typing (v). For a stack, Trace scans the frames in the stack in a bottom-up manner, unifying the GC typing and the annotation typing of each frame. A technical note is that when tracing a frame F , we pass a GC typing that gives # a type a', which is LIT of the declaration type a attached to F. In the tracing of F , a' is expected to be unified as Sa' with the type given to a variable assigned # in the annotation typing of F. When tracing the rest stack C, we pass a GC typing that has result type Unq(Sa'). This type is expected to be unified with the result type of the annotation typing of the next frame. We can prove the following lemma about the Trace function. L e m m a 5.2. If Fo t- C : TO and (A, T) -< (To, 7o), then S = Trace((A, T), C)

and S A ~- C : ST and So = $1 o S for some $1. 5.4

Main Loop

We describe the main part of our GC algorithm. Suppose that the GC is triggered just when a program is evaluated to P = letrec Ho in C with F- P : TO. For simplifying discussion, we consider only the case TO = int. For initialization of the GC, we first set the GC typing as (Z~0,T0) where 5

Ao(a ) = LIT(a) (V(a = k Iv) E H). We then trace the stack C using the function Trace and update the GC typing by the resulting type substitution S:

(Amit, Tinit) = (SAo, STo) where S = Trace((A0, TO), C) The rest part of our algorithm is described in terms of GC states in the form (Hf, Ht, (A, T)) where H I and Ht are heaps respectively called "from-heap" and "to-heap", and (A, v) is a GC typing. In the terminology used in Section5.1, a live set corresponds to the to-heap Ht plus the stack C. We initialize the GC state as (Ho, O, (Ai,~it, Tinit)). At this point, the live set contains only the stack C. Then the GC iterates steps described by a rewriting relation ~ over GC states, given by the following rule:

229 The GC first finds an object h that is allocated at an address a in the fromheap / I f , i.e., not yet in the live set, and that is not given a type variable in A, i.e., accessible from the live set. The GC then moves the object h to the to-heap Ht, i.e., adds the object to the live set. Next, the GC traces the object h using the Trace function and updates the GC typing CA, T) by the resulting type substitution S as (SA, ST). The GC proceeds until all addresses i n / I f are given type variables in the GC typing. The final Ht will be used as a heap for the remaining computation. We can prove correctness of the GC algorithm as the following theorem that the algorithm terminates and finds semantic garbage. Theorem

5.1 ( G C C o r r e c t n e s s ) . Let 0 ~- H0 : T'o, Fo i- C : int and Aini~ be

as defined above. Then, 1. (Ho, 0, (Ainit, int)) :===~* (Hf, Ht, (A, int)), or some HI, H~, A; and 2. H I is semantic garbage or letrec H0 in C.

6

Preliminary

Experiment

This section presents results of our preliminary experiment and gives a discussion on them. In this experiment, we focus on how much garbage the type inference GC scheme can collect in comparison with conventional trace-based GCs. We did not experiment on costs of the type inference GC scheme. We will remark on it in Section 7. Our prototype system consists of an interpreter of our language and both a type inference GC and a trace-based GC. All of these are implemented in Standard ML. We use as applications a recursive fibonacci function of 10 (fib10), a quick sort program for a list of length 10 (qsl0), a merge sort program for a list of length 10 (msl0), a 4-queens program (queen4), a program of the next permutation problem for a list of length 4 (nextperm4), and a simple register allocation program for a series of instructions of length 15. For each application, we measure the live memory size by each GC (i.e., the total size of objects that the GC retains as live) every several steps. The size of each object is calculated as follows: 0 for an integer (since it is usually unboxed6); 2 for a cons cell (a tag and a pointer to a pair); the number of elements for a tuple; and the number of the free variables plus one for code pointer for a closure. The first graph in Figure 4 indicates the number of steps where the type inference GC has gain more than zero. Gain means difference of live memory size by the type inference GC and by the trace-based GC. We have no gain for fibl0 and qsl0, and very little for queen4 and msl0, while we have some gain for nextperm4 and regl5. The second graph in the figure compares the progresses of live memory size by the two GCs during the execution of nextperm4. The third graph shows a similar comparison for regl5. In the second graph, the type To be more precise, our interpreter allocates integers in heaps. However, since integers axe usually represented as unboxed values in real implementations, we consider that these should be treated as zero-size objects when we measure live memory size.

230 inference GC collects some objects earlier than the trace-based GC does. In the third graph, the type inference GC collects dramatically (42% at the best) in the last quarter period. We observe two kinds of program structure in the applications that gained some. In one structure, a function takes a tuple as its multiple arguments and dereferences the tuple every time one of the arguments is necessary. Since the tuple is reachable, the trace-based GC cannot collect unnecessary elements of the tuple, but the type inference GC can. Another structure was observed particularly in the register allocation program. This program consists of three phases. It first computes live variables at each instruction and keeps this information in a data structure representing the instruction. Next, the program obtains register assignments for variables, and it finally generates instructions using this information. In the final phase, the information of live variables is never used but kept in reachable data structures. Therefore the trace-based GC cannot collect this data, but the type inference GC can. Generalizing the observation, we expect that the type inference GC scheme is beneficial particularly for programs that possess what we call phase structure. A program possessing this structure has multiple phases and maintains some d a t a structures during the phases. On some phase, the program keeps temporary data in specific fields of the data structures, and it never uses the data after another phase. The two benefiting programs in our experiment are typical examples. To give another example, a compiler program may keep inferred type information in specific fields of data structures representing nodes of parse tree, and never uses the information after some phase. It is interesting how these results are sensitive to other factors. If we increase the size of input, we expect that gain will become large because the size of the useless temporary data tends to depend on the input size. For another factor, if we test the live memory size much less frequently (with intervals larger than one phase), we will see in the graph gain around the average gain over the whole execution. 7

Remark

on Cost

Despite that the type inference GC aims to save memory, this scheme itself requires extra memory. First, we will need one pointer field per object for implementing defer-list, that is, a linked-list to keep objects whose traversal are suspended. Second, extra memory is necessary for the GC-time unification. 7 Not all of this memory can be allocated at compile time. The unification assumes that the given GC typing and the given annotation typing have no type variables in common (see Section 5.3). Therefore we should copy the annotation typing for each object where the contained type variables are renamed as fresh type 7 The unification may be implemented in a destructive way. That is, we may represent type variables as pointers (initially null), and when the type variables are instantiated to be some types, the pointers are assigned the types.

231

Fig. 4. Experimental results

232 variables. This allocation can be done either at run time or at GC time. Another allocation that cannot be done at compile time occurs in generating fresh type variables in the polytype unification described in Section 5.2. It seems inevitable to be done at GC time. Although it is usually said that no memory should be allocated at GC time, we believe that this scheme should allocate the extra memory (especially the second one) at GC time rather than at run time. It is because the extra memory seems to be so large that if it is allocated at run time, it would exceed the memory saving by this GC scheme. On the other hand, if the extra memory is allocated at GC time, allocated space may be momentarily high, but it is completely useless after the GC finishes and therefore can be released. It would be acceptable if the GC is rather infrequent. We can even use the conventional GC for the most time and the type inference GC very infrequently. We believe that allocating at GC time is not prohibitive in most platforms, because even if the run-time system decides to trigger GC, free memory can be expected available from the underlying operating system. (If it turned out that free memory is actually unavailable, we could safely switch to the conventional GC.) We may theoretically estimate space cost of type inference GC. In the worst case, as in the ML type inference, the size of type expression is exponential on the program size (more precisely, the number of nesting of let expressions), so is the space requirement of the GC. However, in practice, type expressions in the program will be within a reasonable size. Then, the space requirement is reasonably linear on the number of live objects.

8

Related

Work

Some efforts have been made to resolve some memory leaks in ways specific to their target languages. Wadler 17 proposed a GC technique that makes use of an execution scheme of a lazy functional language. The GC detects the run time representation of the form fst (x, y) and reduces it as x; thus collecting unused tuple elements. We believe that the type inference GC can collect such objects in most cases. Type inference GC originates in researches of tag-free GC for polymorphictyped languages. The most important motivation in this area is to eliminate tags indicating pointer or non-pointer, which incur overheads either at every arithmetic or at every pointer operation. In our description, this kind of tags is not necessary. Annotation typings attached to values of objects do not correspond to them and do not involve run time overheads. Several researchers have studied type inference GC. Goldberg and Gloger 5 discovered that semantic garbage can be detected by dynamic type inference in their research of tag-free GC. (Strictly speaking, we may not regard their scheme as type inference GC because their original goal was to traverse all reachable objects. Because of the goal, they use more unification than necessary.) Because their description and correctness argument of the GC algorithm was informal, its soundness was not clear. Fradet 4 also presented a formulation of type infer-

233 ence GC and proof of its correctness using Reynolds' abstraction/parametricity theorem 13. However, his framework was so abstract that there still remains a gap between models and implementation. Specifically, his formulation did not treat sharing, did not specify precise conditions of static type information, and did not prove that GC-time unification succeeds. Morrisett et al. 12 gave an argument on type inference GC. However, they specified conditions of a heap that a GC should find, but did not give an algorithm. Neither Fradet 4 nor Morrisett et al. 12 have dealt with polymorphism. Several papers have proposed another scheme to realize a tag-free GC in a polymorphically typed language 1, 16, 11. In this scheme, a program is transformed into a second-order program where polymorphic functions are passed as extra arguments actual types for type variables in polytypes. Using these types, GC reconstructs types of all reachable objects and traverses all these objects. While this scheme does not collect any reachable garbage, these papers reported that it can be implemented in reasonably low cost. Therefore there would be trade-off between memory saving and speed, possibly depending on applications. This should be investigated through experiments in the future. Many techniques have been proposed to statically estimate life times of objects 2, 8, 6, 15, 3. Techniques using sharing analysis 2, 8, 6 cannot collect more garbage than type inference GCs can since they use reachability-based liveness of objects. Techniques using region inference 15, 3 can collect some semantic garbage. Since we have not studied to the extent to make a precise comparison between their scheme and our scheme, we will below just give examples and show that objects collected by one scheme may not be collected by the other. In an expression fst (el, e2), our scheme can collect the second element of the tuple just after the tuple is allocated, while their scheme cannot. On the other hand, in an expression

(let y = (1, 2) in Ax.(fst (x, Aw.(fst y))) end) e their scheme can collect (1, 2) just after the evaluation of the let expression, while our scheme cannot. 9

Conclusion

and Future

Work

This paper has given a formal description of type inference GC with fully proved correctness. Our framework is sufficiently detailed yet reasonably abstract. Therefore it will serve as a specification for implementing a type inference GC and as a foundation for discussing and extending the GC scheme. We conjecture, in addition to soundness, optimality in the sense that the described algorithm collects more garbage than any other algorithm using the same type system does. We have already proved the optimality for a monomorphic type system by requiring principality of annotation typings. We could do it for polymorphic type systems, but it may be more complicated.

234 From our experimental results, we expect t h a t type inference G C benefits particularly for programs with the phase structure. In order to claim more, we need to implementation the G C scheme in more low-level language, on some existing ML system. T h e n we should measure space and time cost of the GC scheme. We should also investigate larger applications for which the GC scheme is effective. We expect t h a t one such application would be a compiler p r o g r a m t h a t would contain m a n y phase structures. Even for non-benefiting applications, the GC should be at least not worse. Optimization techniques to reduce e x t r a m e m o r y at GC-time by using statically available information as much as possible would be effective. We could even use a conventional GC or a type-passing tagfree GC for the most time, and the type inference G C very infrequently.

Acknowledgments We express our warmest thanks to Kenjiro T a u r a for encouraging us in this research and for giving us a precious advice for clarifying the motivation of our work. Comments from Benjamin Pierce, Naoki Kobayashi, Atsushi Igarashi and the T I C referees were very helpful in improving the presentation of the paper. We also give thanks for m a n y discussions to m e m b e r s of Yonezawa's Group and members of P r o g r a m m i n g Language Seminor in Indiana University.

References 1. S. Aditya, C. Flood, and J. Hicks. Garbage collection for strongly-typed languages using run-time type reconstruction. In Proceedings of Conference on LISP and Functional Programming, pages 12-23, 1994. 2. H. G. Baker. Unify and conquer (garbage, updating, aliasing, ...) in functional languages. In Proceedings of Conference on LISP and Functional Programming, pages 218-226, 1990. 3. L. Birkedal, M. Tofte, and M. Vejlstrup. From Region Inferrence to von Neumann Machines via Region Representation. In Conference record of Symposium on Principles of Programming Languages, pages 171-183, 1996. 4. P. Fradet. Collecting more garbage. In Proceedings of Conference on LISP and Functional Programming, pages 24-33, 1994. 5. B. Goldberg and M. Glogar. Polymorphic type reconstruction for garbage collection without tags. In Proceedings of Conference on LISP and Functional Programming, pages 53-65, 1992. 6. K. Inoue, H. Seki, and H. Yagi. Analysis of functional programs to detect runtime garbage cells. ACM Transactions on Programming, Languages and Systems, 10(4):555-579, 1988. 7. R. Jones. Tail recursion without space leaks. Journal of Functional Programming, 2(1):73-79, 1992. 8. S.B. Jones and D. L. M&ayer. Compile-time garbage collection by sharing analysis. In Conference Proceedings of Functional Programming Languages and Computer Architecture, pages 54-74, Imperial College, London, September 1989. 9. S. L. P. Jones. The Implementation of Functional Programming Languages. Prentice-Hall, 1987.

235 10. R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17:348-185, 1978. 11. G. Morrisett. Compiling with Types. PhD thesis, School of Computer Science Carnegie Mellon University, 1995. 12. G. Morrisett, M. Felleisen, and R. Harper. Abstract models of memory management. In Proceedings of Functional Programming Languages and Computer Architecture, pages 66-76, 1995. 13. J. Reynolds. Types, abstraction, and parametric polymorphism. In Information Processing, volume 83, pages 513-523, 1983. 14. J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of ACM, 12, 1965. 15. M. Torte and J.-P. Talpin. Implementation of the Typed Call-by-Value A-calculus using a Stack of Regions. In Conference record of Symposium on Principles of Programming Languages, pages 188-201, 1994. 16. A. Tolmach. Tag-free garbage collection using explicit type parameters. In Proceedings of Conference on LISP and Functional Programming, pages 1-11, 1994. 17. P. Wadler. Fixing some space leaks with a garbage collector. Software Practice and Experience, 17(9):595-608, September 1987. Appendix A

Proof

Lemma

Proo. Lemma

Proo.

of Theorem

4.1

A . 1 . If (F,T) +< (F',T') and F F-
If t- P : r and P

>P', then F- P' : r.

B y induction on the derivation of P ---+ P ' . We have the following four

cases. (stack) P = letrec H in F ~ ; C ~ letrec H ' in F M ; C ' = P ' with P1 = letrec H in C > letrec H ' in C ' = P~. F r o m t- P : r , I- P1 : r ' . By the induction hypothesis, I- P~ : r ' , which implies 0 I- H ' : F ' a n d F ' I- C ' : v with F ' _~ F. Using L e m m a A.1, we can derive I- P ' : r ' . ( a l l o c ) P = letrec H in ((V,c)) with c = ( r ' , r ' ) l e t x : a = v in d. Let a = Vt.T". F r o m F- P : % 1"o F- ((V,c)) : r, which implies for some Fo, ro o V > V

(1)

r)

(2)

( F ' , r ' ) -<s (F, F'F(letx:a=vinc'):r'

(3)

We can a s s u m e t h a t Dom(S) N t = 0 by an a p p r o p r i a t e r e n a m i n g of t in a. Therefore Unq(Sa) = S t " . F r o m (2), "<s (F, S r " ) a n d from (3), F ' }- v : r " . A p p l y i n g L e m m a A.1 to these, we conclude t h a t F F- v : S r " , which implies F0 F- ((V, v)) : S t " t o g e t h e r with (1). We also know 0 F- H a :

236 F0 from F- P : T. Easily, ~ F- H t~ {a = ((V, v)) ~l } : Fo ~ {a : Sa}. Therefore it is sufficient to show Fo t~ {a : Sa} F- ((V ~ {x = a}, c')) : T. It follows from L e m m a A.1 applied to F'~J{x: a} I- c': r' by (3), (Fot~{a : Sa})o(V~J{x = a}) >_ F ~ { x : Sa} by (1), and (F't~ { x : a } , T ' ) -<s ( F ~ { x : Sa},~-) by (2). ( a p p ) P = letrec H a in ((V,c)) with c = ( r , ~ ) l e t w : a = (x y) in c'. Let a = VE.T". From F- P : T, Fo F- ((V, c)) : T, which implies for some Fo,

Fo o V > F (r', T') -~s (F, ~)

(4) (5)

F' F- (let w : a = (x y) in c'): T'

(6)

We can assume t h a t Dom(S) N t = 0 by an a p p r o p r i a t e renaming of t in a. Therefore Unq(Sa) = ST". We obtain (Fo ~ { # : Sa}) o (Y ~ {w = # } ) > F ~ {w : Sa} from (4), ( P ' ~ { w : rr},T') -~S
ro ~ { # : s~} ~ ((y ~ {w = #}, ~')): ~. T h e n it is sufficient to show Fo F- ((V"~J {z = V(y)},c')) : ST". Let H(Y(x)) = ((Y',(r",T~-~2)Az.c')), a~ = Fo(Y(x)) and ay = Fo(Y(y)). Since 0 F- H a : Fo from }- P : T, we have Fo I- H(Y(x)) : Unq(a~), which implies Fo o V " _> F " (Fit,T1 ~ T2) ~S' ( F ' " , U n q ( a ~ ) ) F " ~ {z : T1} t- C" : T2

(7)

(8) (9)

From (6), we have F'(x) > T~' -+ T" and F'(y) > v~' for some T ' . Therefore, together with (4) and (5), we obtain ax = Fo(Y(x)) > F(x) = SF'(x) > S(T~' -+ r " ) a n d ay = ro(Y(y)) >_ F(y) = SF'(y) > STy'. From ax > S(T~' --+ T") and (8), we have S"S'(T1 -+ TZ) = S(T~' --~ T") where FTV(F'",T1 -+ T2) N Dom(S") = 0. Therefore S"S'T1 = STy', implying ay > S'S~Vl, and S'S~T2 = ST". Consequently, we have ( F o o Y " ) t ~ { z : ay} > F ' " ~ { z : S"S'T1} and (F"~{Z:T1},'r2) -~s, (F"' ~ {z : S'T1}, S'T2) ~s,' (F'" ~J {z : S"S'T1}, ST"). Applying L e m m a A.1 to these and (9), we conclude t h a t Fo F- ((Y" ~J {z = Y(y)}, c")) : ST". (ret) P = ietrecH~inF~;C with F = ((V~{w=#},c)) and C = ((W,(F"T')x)). From F- P : T, Fo F- F~;C : T for some Fo, which implies Fo ~ { # : a'} t- F : T

(10)

Po F- C : T" a' = Vt.T" and t N FTV(Fo) = 0

(11) (12)

From (11), F0 o V' > F and (F',T') -~s (F,T") and F ' F- x : T', for some F and S. Therefore we can derive Fo(Y'(x)) > F(x) = SF'(x) > ST' = T".

237 From (12), Fo(V'(x)) >_ a'. Then, from (10), we have (F0 o V) t~ {w : a'} > F " ~ {w: a"} for some a". Thus (F0 o V) ~ {w: Fo(Y'(x))} > F " ~ {w: a"}. We also know from (10), F " ~ {w : a " } I- c : T. Hence, we conclude Fo I-

((v ~ {w: v'(x)}, e)):

~-.

T h e o r e m A.1 ( T y p e S o u n d n e s s ) . If t- P : T, then either P is an answer or

else there exists P' s.t. P

>P' and t- P' : T.

Proof. We can know that either P is an answer or else there exists P ' s.t. P P ' by induction on the structure of the stack of P with case analysis on its top frame. Then Lemma A.2 is sufficient to prove this theorem.

B

P r o o f of T h e o r e m 5.1

L e m m a B.1 ( U n i f i c a t i o n ) . There exists an algorithm Unify s.t. Unify(E) computes the most general unifier orE, for any set E of unifiable equations of monotypes. L e m m a B.2. Suppose FTV(F1,T1) n FTV(F2,T2) : 0 and Dora(F1) = Dora(F2). If -~$2 (F0,T0), then PolyUni((F~,T~),(F2,T2)) succeeds to produce S such that SF~ >_ SF2, Svl = Sv2, and St = S' o (SIFTV(r~,r~)) for some S'.

Proof. For x 9 Dora(F1), let Fi(x) = V~).7~ z) (i -- 0, 1, 2) and Unique(Fi(x)) = S~Z)T~z) where Dom(S~ z)) = ~ ) (i = 1,2). We assume ~z)Nt-(0z') = 0 if x ~ x'. From (F2,v2) ~ s 2 (Fo,To), there is S~z) s.t. Dom(S~ z)) = t-(0z) and S (3x ) TO (z) c _(z) From (F1,T1) .<S, (F0,T0), T0(z) - S1T~ x), implying S(Z) (z) c(z)~ _(z) 9 Therefore, defining $3 = ~JzEDom(F1)S~ x)S1S2, we 3 T6 = o 3 oa,1 conclude that $3 unifies E. Hence Unify(E) succeeds by Lemma B.1. By Lemma B.1, we have Dora(S) = FTV(S~Z)T~z), "~(z)~2T2(z),T1,T2), oo(z) (z) = ~ 2( z ) T2(z) , ST1 = ST2. Since SFI(X) > ,oo(z) o~ I Ti(z) and t~,(~) N F T V ( S F I ( X ) ) = ~(~),= 0 where ,-,(~) ~2 ~2 ~2(z) , we can derive w,-,(z) . co(z)_Cz) o/w,-'(z) .~= c(~)_(~)~ Srl(X) > ~ ~2 ,2 = ~ 2 -,2 ~ = s r 2 ( x ) . Since we already know o~ o(z) Vi(z) c~'3o2 c(z)_(x) '2 and S3T1 $3T2, by Lemma B.1, we conclude that there is S " s.t. $3 -- S " o S. From :

S3FTV(F1,T,)

:

: S1, S1 -- S It 0 SIFTV(FI,~.I).

L e m m a B.3. If Fo }- C : To, (A,T) -<So (Fo,Vo) and (Dam(So) C F T V ( A , v ) ) , then S = Trace((A, T), C), and S A }- C : S v and So = S1 0 S for some SI.

Proof. We first show that the lemma holds for C = < -~ ( S A o V, ST) and S 0 --- S 1 0 (SIFTV(A,T)) for some Sx. By Lemma A.1, S A t- C : Sv.

238 We then show the l e m m a by induction on the structure of C. C is either F or Fa; C'. We have already shown the former case above. For the latter case, from Fo t- C : TO, F0 ~ { # : a t} t- F : TO with a • a'. Let a " = LIT(a). Then a " -< a'. By applying our first argument, we conclude S ( A ~ { # : a ' } ) ~- F : ST and So = $1 o S for some $1. From Fo F- C : TO, 1"0 ~- C' : T' with a ' = V/.T' and tf~ FTV(Po) = 0. Since a " -< a ' , we can let a " = V/.T'. From Sot = t, ST" -<sl T'. Thus, we obtain (SA, ST") -<S~ (SFo, T') Cwith Dora(S1) C_ Dora(So)t_J FTV(Rng(S)) C_ F T V ( S A , ST")). Applying the induction hypothesis, we conclude S " A ~- C' : S"T" with S " = S' o S, and S1 = $2 o S I for some $2. Therefore So = S l O S = $2 o S ' . Further, S2S"t = Sot = t. Let t' = S"t. If there is t e ~ fq F T V ( S " A ) , we obtain S2t 9 Set' = t and S2t 9 F T V ( S 2 S " A ) = FTV(Fo), which contradicts with t fq FTV(Fo) = 0. Therefore t' fq F T V ( S ' A ) = 0. Consequently, since we have S"A ~ ( # : Vt'.S"T"} ~- F : S"v obtained from S ( A W { # : a " } ) ~- F : ST, S " A t- C t : S ' T ' , and a • VfI.S'T ", we conclude t h a t S ' A ~- FIll; C' : S ' T . The following well-formedness is a basic property of G C states. D e f i n i t i o n B . 1 ( W e l l - f o r m e d n e s s ) . Let 0 t- Ho : Fo and Fo F- C : ~-0. (HI,Ht, CA, r)) is well-formed w.r.t, letrec Ho in C and CFo,7"o) iff

1. Ho = H I t~ Ht 2. A F - C : T 3. A ~- gt(a) : Unq(A(a)) for z 9 Dom(Ht).

4. (A, T> -< (to, TO> A GC state always proceeds by ~ preserving the well-formedness as long as there remain live objects in the from-heap. L e m m a B . 4 ( G C P r o g r e s s ) . / f (HI,Ht,(A,T)) is well-formed w.r.t. P and

(Fo, TO), then 1. A(a) 9 T Y a r for any a 9 Dora(HI); or else 2. (HI,Ht, CA,T>) ==r (H'f,H~, ( A ' , T ' ) ) for some Hi, H~, A 1 and r', and (H'I,H't,
2. S A t- C : ST from A ~- C : T in the well-formedness. 3. We have A l- Ht(a) : Unq(A(y)) for y 9 Dom(Ht) from the well-formedness and we already know S A ~- h : ST". Thus, noticing Darn(S) C Dam(So) C_ F T V ( A , T), we conclude S A i- H ; ( a ) : Unq(SA(y)) for y 9 nom(H~). 4. From So = S10 S, we obtain CSA, ST) -<s~ CPo,TO).

239 We then introduce a notion of isolation, which is defined as a well-formed GC state with the objects in H I having types of type variables. (We call H / t h e isolated heap of Ho.) Definition B.2 (Isolation).

(Hf,Ht,(A,r))

is an isolation w.r.t. P and

(to,To) iff 1. ( H / , H t , (A,T)) is well-formed w.r.t. P and (Fo,To) 2. For any a E Dora(Hi), A(a) E T V a r . Once a heap is isolated for a program, its evaluation preserves the heap to be isolated. If ( H I , H t, (A,T)) is an isolation ~ P', then there is an isolation (HI, H~, ( A', TI )

Lemma B.5 (Isolation Preservation).

w.r.t. P and (1"o, TO), and P w.r.t. P' and (F~, To).

Proof. Analogous to the proof of Lemma A.2. L e m m a B . 6 . ff ( H / , H t , (A, int)) is an isolation w.r.t. P and (F0,int), then P

does not accesses any x E D o m ( H f ). Proof. Suppose that P accesses x and x E Dora(Hi). Then, P is either letrec H0 in (V, (r'~)z) or letrec H0 in F M ; . . . ; (V, (r'~)let w : a = (z y) in c) where V ( z ) = x. In the former case, we have (F, T) "~S (A o V, int). Since we know from x E D o m ( H f ) that A ( V ( z ) ) is a type variable t, F ( z ) must also be a type variable t' with S(t') = t. Since we can derive F ~- z : T, we have t' > T, implying t' = T. However, it contradicts with S(T) = int. In the latter case, we have (F,T) -~s (A o V, ST), A ( Y ( z ) ) = t and F ( z ) = t' s.t. S(t') = t. Since we can derive F ~- z : T1 --+ T2 for some T1, T2, we have t' > 71 --+ T2 and it is impossible. Using these lemmas, we can show that an isolated heap is semantic garbage.

(HI, Ht, (A, int)) is an isolation w.r.t. P and (1"o, int), then H I is semantic garbage for P. T h e o r e m B.1 ( I s o l a t i o n G a r b a g e ) . / f

Proof. This follows from Lemma B.5 and Lemma B.6. The proof of correctness of GC completes by the following theorem that our algorithm finds an isolation. C o r r e c t n e s s ) . Let 0 I- H0 : Fo, F0 I- C : int and Ai,~it be as defined in Section 5.4. Then, there exists (HI, Ht, (A, int)) such that (Ho, O, (Ainit, int)) ==~* (HI, Ht, (A, int)) and (HI, Ht, (,4, int)) is an isolation w.r.t, letrec Ho in C and (Io,int).

Theorem B.2 (GC

Proof. By Theorem B.4, it suffices to show that (Ho, 0, (Ainit, int)) is well-formed w.r.t, letrec Ho in C and (Fo, int). Let "4~) be as defined above. From 0 I- Ho : Fo, we have ('4~, int) -< (Fo, int). By Lemma B.3, we can obtain S A I- C : ST and ('4init, int) -~ (Fo, int), which are sufficient for the well-formedness.

Strong Normalization by Type-Directed Partial Evaluation and R u n - T i m e Code Generation Vincent Balat I and Olivier Danvy 2 1 Ddpartement d'Informatique, t~cole Normale Supdrieure de Cachan 61, avenue du Prdsident Wilson, F-94230 Cachan Cedex, France. E-mall: b a l a t ~ r i p , ens-cachan, f r 2 BRICS

Department of Computer Science, University of Aaxhus Building 540, Ny Munkegade, DK-8000 Aarhus C, Denmark E-mail: danvy@brics, dk

Abstract. We investigate the synergy between type-directed partial evaluation and run-time code generation for the Carol dialect of ML. Type-directed partial evaluation maps simply typed, closed Caml values to a representation of their long f~y-normal form. Caml uses a virtual machine and has the capability to load byte code at run time. Representing the long /~?-normal forms as byte code gives us the ability to strongly normalize higher-order values (i.e., weak head normal forms in ML), to compile the resulting strong normal forms into byte code, and to load this byte code all in one go, at run time. We conclude this note with a preview of our current work on scaling up strong normalization by run-time code generation to the Caml module language.

1 1.1

Introduction Motivation

Strong normalization: Suppose one is given a strongly normalizable (closed) Aterm. How does one normalize this term? Typically one parses it into an abstractsyntax tree, one writes a strong normalizer over abstract-syntax trees, and one translates (unparses) the resulting normal form into whichever desired format (e.g., I~TEX). A solution in ML: ML, like all functional languages, provides a convenient format for representing A-terms: as an ML expression. Suppose thus that we are given a strongly normalizable ML expression. How do we normalize it? Type-directed partial evaluation 8 offers an efficient alternative to writing a parser to represent this ML expression as an ML d a t a structure representing its abstract-syntax tree, writing a strong normalizer operating over this abstract-syntax tree, and

241 unparsing the resulting normal form into an ML expression. Instead, the ML evaluator maps this ML expression into an ML value, and the type-directed partial evaluator maps this ML value into the abstract-syntax tree of its normal form. We can then either evaluate this abstract-syntax tree (for immediate use) or unparse it (for later use).

Motivation: Type-directed partial evaluation entrusts the underlying programming language with all the mechanisms of binding and substitution that are associated with normalization. Higher-order abstract syntax 24 shares the same motivation, albeit in a Logical Framework instead of in a functional setting. Goal: Type-directed partial evaluation, as it is, maps an ML value into the text of its normal form. We want instead to map it into the corresponding ML value - - and we want to do that in a lighter way than by invoking either an interpreter or the whole compiler, after normalization. An integrated solution in Objective Carol: Objective Caml 22 is a byte-code implementation of a dialect of ML. This suggests us to represent normal forms as byte code, and to load this byte code at run time for both immediate and later use. 1.2

Contribution

We report our experiment of integrating type-directed partial evaluation within Caml, which in effect yields strong normalization by run-time code generation. We list below what we had to do to achieve this integration: we wrote several type-directed partial evaluators in Caml, in various styles and with various properties (listed below); - we wrote a dedicated translator from normal forms to byte code; - this required us to find the necessary (hidden) resources in the Caml implementation and recompile the system to make them available, in effect obtaining a more open implementation. These resources are mainly the representation of types, the representation of byte code, and the ability to load byte code at run time. 1.3

Non-contribution

Even though it is primarily inspired by theory, our work is experimental. Indeed, neither the OCaml compiler nor the OCaml virtual machine are formalized. We therefore have not formalized our byte-code translator either. As for typedirected partial evaluation, only its call-by-name version has been formalized so far 1, 2, 7. In that sense our work is experimental: we want to investigate the synergy between type-directed partial evaluation and run-time code generation for OCaml.

242

module

ChurchNumbers

= struct

let

cz

let

cs n

s z = z

let

rec

let

cn2n

s z = n

s

(s z)

n 2 c n n = if n = 0 t h e n cz else cs n = n

(fun

s ->

i+1)

(n2cn (n-l))

0

end

Fig. 1. Church numbers

1.4

A n example: C h u r c h n u m b e r s

Let us illustrate strong normalization by run-time code generation to optimize a computation over Church numbers, which we define in Figure 1. The module ChurchNumbers defines zero (r the successor function (cs), and two conversion functions to and from ML numbers and Church numbers. For example, we can convert the ML number 5 to a Church number, increment it, and convert the result back to ML as follows: # ChurchNumbers. #

:

int

cn2n(ChurchNumbers,

cs

(ChurchNumbers.n2cn

5)) ; ;

= 6

Thus equipped, let us define the function incrementing its argument with 1000: # let val

csl000

csl000

m

= ChurchNumbers.n2cn

: (('a

->

'a)

->

'a - >

I000 ~b)

->

ChurchNumbers.cs ( ~ a ->

'a)

->

m;; 'a ->

'b

= # ChurchNumbers.cn2n(csl000 #

: int

(ChurchNumbers.cz));;

= 1000

If it were not for ML's weak-normalization strategy, 1000 B-reductions could be realized at definition time. We strongly normalize the value denoted by csl00o by invoking our function nip (for "Normalize In Place") on the name of the identifier csl0O0: # nip #

"csl000";;

: unit

=

O

Now csl000 denotes the strongly normalized value, as reflected by its execution time: applying csl000 to the Church number 0 is 4800 times faster now. Depending on the version of the type-directed partial evaluator, normalization takes between 0.1 and 18 seconds. In this example, cslO00 then needs to be applied between 5 and 1000 times to amortize the cost of normalization.

243 1.5

Overview

The rest of this article is organized as follows. We first review type-directed partial evaluation (Section 2), independently of run-time code generation, and with two simple examples: the Hilbert combinators and Church numbers. We then describe run-time code generation in OCaml (Section 3). Putting them together, we report the measures we have collected (Section 4) and we assess the overall system (Section 5). The Caml implementation of modules suggests a very simple extension of our system to handling both first-order and higherorder modules, and we describe this extension in Section 6. After reviewing related work (Section 7), we conclude.

2

Type-Directed Partial Evaluation

Type-directed partial evaluation strongly normalizes closed values of parametric type, by ~wo-level W-expansion 8,14. Let us take two concrete examples, a simple one first, and a second one with Church numbers. We represent residual lambdaterms with the data type of Figure 2.

type exp = Vat of string I Lam of string * exp App

of exp * exp

Fig. 2. Abstract syntax o f t h e A - c M c u l u s

module type SK_siE

= sig val cS : ('a -> 'b -> 'c) -> ('a -> 'b) -> 'a -> 'c val cK : 'a -> ~b -> 'a end module SK : SK_sig = struct let cS f g x = f x (g x) let cK a b = a end

Fig. 3. Hilbert's Combinatory Logic basis

244 The Hilbert combinators

2.1

As is well-known, the identity combinator I can be defined with the Hilbert combinators S and K . This is often illustrated in ML with the functions cS and cK defined in Figure 3: # let v a l cI

cI

x = S K . c S SK.cK SK.cK x ; ;

: 'a ->

'a = < f u n >

# cI 42;; - : int = 42

It is the point of type-directed partial evaluation t h a t one can visualize the text of cI by two-level w-expansion. In the present case, all we need is to w-expand cI with a dynamic introduction rule (the construction of a residual lambdaabstraction) and a static elimination rule (an ML application): # let e e _ i d f = L a m ( " x " , val e e _ i d : (exp -> exp)

f (Var "x"));; -> e x p = < f u n >

# e e _ i d (SK.cS SK. cK SK.cK) ; ; - : e x p = Laln ("x", V a t "x")

#

where in the definition of ee_id, x is fresh. T h e result of applying ee_id to the M L identity function is its text in normal form.

2.2

Church numbers

Let us play the same game with Church numbers. The type of a Church number is ('a->

'a) ->

'a->

'a

Since it is composed with three arrows, w e need to w-expand it three times. Since the two outer arrows occur positively, w e w-expand a Church n u m b e r cn with two dynamic introduction rules and two static elimination rules: Lam("s", Lam("z", cn (...(Vat "s")...)

(Vat "z")))

where s and z are fresh. Since the inner arrow occurs negatively, we w-expand the corresponding variable s with one static introduction rule (an ML abstraction) and one dynamic elimination rule (the construction of a residual application): fun v -> App(Yar "s",

v)

The result reads as follows: # let e e _ c n

cn

= Lam("s", Lam("z", cn (fun v -> App(Var "s", v)) (Var "z")));; val ee_cn : ((exp -> exp) -> exp -> exp) -> exp = #

245 We are now equipped to visualize the normal form of a Church number, e.g., 2: # e e _ c n (ChurchNumbers.n2cn 2 ) ; ; - : exp = Lam("s", Lam("z", App(Var " s " , #

App(Var " s " , Vat " z " ) ) ) )

The result of applying ee_cn to the ML Church number 2 is the text of this Church number in normal form. 2.3

Summary and conclusion

We have illustrated type-directed partial evaluation in ML with two very simple examples: the Hilbert combinators and Church numbers. We have defined them in ML and we have constructed the text of their normal form, by two-level ~-expansion. Type-directed partial evaluation directly constructs two-level ~-redices, given a representation of the type of the value to normalize. It also handles more types, such as base types (in restricted position), and can interpret function types as having a computational effect (in which case it inserts a residual let expression, using continuations). Figure 4 displays our grammar of admissible types.

(type) ::= (covariant-type) (covariant-type) ::= (base-type)

I variable I (contravariant-type) "->" (covariant-type) I (covariant-type) * . . . * (covariant-type) (contravariant-type) ::= b o o l

I variable I (covariant-type) "->" (contravariant-typeI I (contravariant-type) * . . . * (contravariant-type) (I)ase-type) ::= unit I i n t float

l bool I string

Fig. 4. Abstract syntax of types

We therefore implemented several type-directed partial evaluators: - inserting or not inserting let expressions; and - in a purely functional way, i.e., implementing two-level eta-expansion directly in ML, using Andrzej Filinski and Zhe Yang's strategy, 1 or with an explicit representation of two-level terms as the abstract-syntax tree of an ML expression (which is then compiled). 1 Personal communications to the second author, spring 1995 and spring 1996 27.

246 In the following section, instead of constructing a normal form as an abstractsyntax tree, we construct byte code and load it in place, thereby obtaining the effect of strong normalization by type-directed partial evaluation and run-time code generation.

3

Run-Time

Code

Generation

We therefore have written a translator mapping a term in long fly-normal form into equivalent byte code for the OCaml virtual machine. And we load this byte code and update in place the value we have normalized.

3.1

Generating byte code

We do not generate byte code by calling the Caml compiler on the text of the normal forms. The language of normal forms is a tiny subset of ML, and therefore we represent it with a dedicated abstract syntax. Since normal forms are well typed, we also shortcut the type-checking phase of the compiler. Finally, we choose not to use the resident byte-code generator: instead, we use our own translator from normal forms to byte code.

3.2

Loading byte code

For this we need to access OCaml's byte-code loader, which required us to open its implementation. We have thus added more entry points in some of the modules that are available at the user level (i.e., Caml's toplevel). We have also made several interfaces available, by copying them in the OCaml libraries. We essentially needed access to functions for loading byte code, and access to the current environment and its associated access functions. As a side benefit, our user does not need to specify the type of the value to optimize, since we can retrieve this information in the environment.

3.3

U p d a t i n g in situ

Finally, being given the name of a variable holding a value to optimize, and being able to find its type in the environment, nothing prevents us to update the binding of this variable with the optimized value - - which we do. We illustrated the whole process in Section 1.4, by - defining a variable csl000 denoting 1000 compositions of Church's successor function, and - normalizing it in place with our function nip.

247 4

Applications

We have tested our system with traditional partial-evaluation examples, the biggest of which are definitional interpreters for programming languages. The results are consistent with the traditional results reported in the partial-evaluation literature 20: the user's mileage may vary, depending (in the present case) on how much strong normalization is hindered by ML's weak-normalization strategy. The definitional interpreters we have considered are traditional in partial evaluation: they range from a simple while language 5 to an Algol-like language with subtyping and recursion 16. Our interpreters are written in Caml. Some use continuation-passing style (CPS), and the others direct style. In the definitional interpreters, iteration and recursion are handled with fixed-point operators. All our examples clearly exhibit a speedup after normalization. The specialized version of an interpreter with respect to a program, for example, is typically 2.5 times faster after normalization. On some other examples (e.g., Section 1.4), the residual programs are several thousand times faster t h a n the (unnormalized) source program. The computational resources mobilized by type-directed partial evaluation vary wildly, depending on the source program. For example, specializing a directstyle interpreter with respect to a 10000-lines program takes 45 seconds and requires about 170 runs to be amortized. Specializing a CPS interpreter with respect to a 500-lines program, on the other hand, takes 20 minutes. We believe that this low performance is due to an inefficient handling of CPS in OCaml. Essentially the same implementation takes a handful of seconds in Chez Scheme for a 1000-lines program, with less than 0.5 seconds for type-directed partial evaluation proper, and with a fairly small difference if the interpreter is in direct style or in CPS. We also experimented with the resident OCaml byte-code generator, which is slower by a factor of at least 3 than our dedicated byte-code generator. This difference demonstrates that using a special-purpose byte-code generator for normal forms is a worthwhile optimization. 5

Assessment

Although so far we are its only users, we believe that our system works reasonably well. In fact, we are in the process of writing a users's manual. Our main problem at this point is the same as for any other partial evaluator: speedups are completely problem-dependent. In contrast with most other partial evaluators, however, we can quantify this statement: because (at least in its pure form) type-directed partial evaluation strongly normalizes its argument, we can state that it provides all the (strong) normalization steps that are hindered by ML's weak-normalization strategy. Our secondary problem is efficiency: because OCaml is a byte-code implementation, it is inherently slower than a native code implementation such as

248 Chez Scheme 18, which is our reference implementation. Therefore our benchmarks in OCaml are typically measured in dozens of seconds whereas they are measured in very few seconds in Chez Scheme. 2 Efficiency becomes even more of a problem for the continuation-based version of the type-directed partial evaluator: whereas Chez Scheme represents continuations very efficiently 19, that is not the case at all for OCaml. On the other hand, the continuation-based partial evaluator yields perceptibly better residual programs (e.g., without code duplication because of let insertion).

Caveat: If our system is given a diverging source program, it diverges as well. In that it is resource-unbounded 13, 17.

6

Towards M o d u l a r T y p e - D i r e c t e d Partial Evaluation

In a certain sense, ML's higher-order modules are essentially the simply typed lambda-calculus laid on top of first-order modules ("structures") 23. Looking under the hood, that is precisely how they are implemented. This suggests us to extend our implementation to part of the Caml module language.

Enabling technology: After type-checking, first-order modules ("structures") are handled as tuples and higher-order modules ("functors") are handled as higherorder functions. Besides, enough typing information is held in the environment to be able to reconstruct their type. P u t together, these two observations make it possible for us to reuse most of our existing implementation.

module type BCWK_sig ffi sig val cB : ( ' a - > 'b) -> ( ' c - > 'a) -> ' c - > 'b v a l cC : ( ' a - > 'b - > ' c ) - > 'b - > ' a - > 'c v a l cW : ( ' a -> ' a - > ' b ) - > ' a - > 'b v a l cK : ' a - > 'b-> 'a end module BCWK : BCWK_sig = struct open SK l e t cB f g x " cS (cK c S ) cK f g x let cC f x y = c3 (c5 (cK (cS (cK cS) cK)) cS) (cK cK) f x y let cW f x = cS cS (cK (cS cK cK)) f x let cK = cK end

Fig. 5. A

C o m b i n a t o r y L o g i c b a s i s of r e g u l a r c o m b i n a t o r s

2 For comparison, an interpreter-based and optimized implementation of type-directed partial evaluation in ML consistently performs between 1000 and 10000 times slower than the implementation in Chez Scheme 25. The point here is not byte code vs. native code, but interpreted code vs. compiled code.

249

Achievements and limitations: We handle a subset of the Caml module language, excluding polymorphism and sharing constraints. An example: typed Combinatory Logic. Let us build on the example of Section 2.1. We have located the definition of the Hilbert combinators in a module defining our standard Combinatory Logic basis (see Figure 3). We then define an alternative basis in another module, in terms of the first one (see Figure 5). Because of ML's weak-normalization strategy, using the alternative basis incurs an overhead. We can eliminate this overhead by normalizing in place the alternative basis: # nip_module "BCWK"; ; - : u n i t - - () #

What happens here is that the identifier BCWK denotes a tuple with four entries, each of which we already know how to process. Given the name of this identifier, the implementation 1. 2. 3. 4. 5. 6.

7

locates it in the Caml environment; accesses its type; constructs the simple type of a tuple of four elements; strongly normalizes it, using type-directed partial evaluation; translates it into byte code, and loads it; updates in place the environment to make the identifier BCWKdenote the generated code. Related

Work

Partial evaluation is traditionally defined as a source-to-source program transformation 6, 20. Type-directed partial evaluation departs from that tradition in that it is a compiled-to-source program transformation. Run-time code generation completes the picture by providing a source-to-compiled transformation at run time. It is thus a natural idea to compose both, and this has been done in two settings, using offiine partial-evaluation techniques: For imperative languages: the Compose research group at Rennes is doing runtime code generation for stock languages such as C, C ++, and Java 3. For functional languages: Sperber and Thiemann have paired a traditional, syntax-directed partial evaluator and a run-time code generator for a bytecode implementation of Scheme 26. Both settings use binding-time analysis. Sperber and Thiemann's work is the most closely related to ours, even though their partial evaluator is syntaxdirected instead of type-directed and though they consider an untyped and module-less language (Scheme) instead of a typed and modular one (ML). A remarkable aspect of their work, and one our implementation so far has failed to

250 achieve, is that they deforest the intermediate representation of the specialized program, i.e., their partial evaluator directly generates byte code. Alternative approaches to partial evaluation and run-time code generation include Leone and Lee's Fabius system 21, which only handles "staged" firstorder ML programs but generates actual assembly code very efficiently.

8

C o n c l u s i o n and Issues

We have obtained strong normalization in ML by pairing type-directed partial evaluation and run-time code generation. We have implemented a system in Objective Caml, whose byte code made it possible to remain portable. The system can be used in any situation where strong normalization could be of benefit. Besides the examples mentioned above, we have applied it to type specialization 9, lambda-lifting and lambda-dropping 10, formatting strings 11, higher-order abstract syntax 12, and deforestation 15. We are also considering to apply it for cut elimination in formal proofs, in a proof assistant. We are in the process of extending our implementation for a subset of the Caml module language. This extension relies on the run-time treatment of structures and of functors, which are represented as tuples and as higher-order functions. Therefore, in a pre-pass, we assemble type information about the module to normalize (be it first order or higher order), we coerce it into simply typed tuple and function constructions, and we then reuse our earlier implementation. The practical limitations are the same as for offiine type-directed partial evaluation, i.e., source programs must be explicitly factored prior to specialization. The module language, however, appears to be a pleasant support for expressing this factorization.

Acknowledgements This work is supported by BRICS (Basic Research in Computer Science, Centre of the Danish National Research Foundation; h t t p : / / ~ . b r i c s . d k ) . It was carried out at BRICS during the summer of 1997. We are grateful to Xavier Leroy for supplying us with a version of c a l l / c o for OCaml, and to the anonymous reviewers for comments.

References 1. Ulrich Berger. Program extraction from normalization proofs. In M. Bezem and J. F. Groote, editors, Typed Lambda Calculi and Applications, number 664 in Lecture Notes in Computer Science, pages 91-106, Utrecht, The Netherlands, March 1993. 2. Ulrich Berger and Helmut Schwichtenberg. An inverse of the evaluation functional for typed A-calculus. In Proceedings o/ the Sixth Annual IEEE Symposium on Logic in Computer Science, pages 203-211, Amsterdam, The Netherlands, July 1991. IEEE Computer Society Press.

251 3. The COMPOSE Project. Effective partial evaluation: Principles and applications. Technical report, IRISA (http : / / ~ . i r i s a , fr), Campus Universitaire de Beaulieu, Rennes, France, January 1996 - May 1998. A selection of representative publications. 4. Charles Consel, editor. ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, Amsterdam, The Netherlands, June 1997. ACM Press. 5. Charles Consel and Olivier Danvy. Static and dynamic semantics processing. In Robert (Corky) Cartwright, editor, Proceedings of the Eighteenth Annual ACM Symposium on Principles o Programming Languages, pages 14-24, Orlando, Florida, January 1991. ACM Press. 6. Charles Consel and Olivier Danvy. Tutorial notes on partial evaluation. In Susan L. Graham, editor, Proceedings of the Twentieth Annual ACM Symposium on Principles of Programming Languages, pages 493-501, Charleston, South Carolina, January 1993. ACM Press. 7. Catarina Coquand. From semantics to rules: A machine assisted analysis. In Egon BSrger, Yuri Gurevich, and Karl Meinke, editors, Proceedings of CSL'93, number 832 in Lecture Notes in Computer Science. Springer-Verlag, 1993. 8. Olivier Danvy. Type-directed partial evaluation. In Guy L. Steele Jr., editor, Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Programming Languages, pages 242-257, St. Petersburg Beach, Florida, January 1996. ACM Press. 9. Olivier Danvy. A simple solution to type specialization. Technical Report BRICS RS-98-1, Department of Computer Science, University of Aarhus, Aarhus, Denmark, January 1998. To appear in the proceedings of ICALP'98. 10. Olivier Danvy. An extensional characterization of lambda-lifting and lambdadropping. Technical Report BRICS RS-98-2, Department of Computer Science, University of Aarhus, Aarhus, Denmark, January 1998. 11. Olivier Danvy. Formatting strings in ML (preliminary version). Technical Report BRICS RS-98-5, Department of Computer Science, University of Aarhus, Aarhus, Denmark, March 1998. To appear in the Journal of Functional Programming. 12. Olivier Danvy. The mechanical evaluation of higher-order expressions. In Preliminary proceedings of the 14th Conference on Mathematical Foundations of Programming Semantics, London, UK, May 1998. 13. Olivier Danvy, Nevin C. Heintze, and Karoline Malmkj~er. Resource-bounded partial evaluation. ACM Computing Surveys, 28(2):329-332, June 1996. 14. Olivier Danvy, Karoline Malmkjmr, and Jens Palsberg. The essence of etaexpansion in partial evaluation. LISP and Symbolic Computation, 8(3):209-227, 1995. An earlier version appeared in the proceedings of the 1994 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation. 15. Olivier Danvy and Kristoffer H0gsbro Rose. Deforestation by strong normalization. Technical report, BRICS, University of Aarhus and LIP, ENS Lyon, April 1998. To appear. 16. Olivier Danvy and Rend Vestergaard. Semantics-based compiling: A case study in type-directed partial evaluation. In Herbert Kuchen and Doaitse Swierstra, editors, Eighth International Symposium on Programming Language Implementation and Logic Programming, number 1140 in Lecture Notes in Computer Science, pages 182-197, Aachen, Germany, September 1996. Extended version available as the technical report BRICS-RS-96-13. 17. Saumya Debray. Resource-bounded partial evaluation. In Consel 4, pages 179192.

252 18. R. Kent Dybvig. The Scheme Programming Language. Prentice-Hall, 1987. 19. Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing control in the presence of first-class continuations. In Bernard Lang, editor, Proceedings of the ACM SIGPLAN'90 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 25, No 6, pages 66-77, White Plains, New York, June 1990. ACM Press. 20. Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall International Series in Computer Science. Prentice-Hall, 1993. 21. Mark Leone and Peter Lee. Lightweight run-time code generation. In Peter Sestoft and Harald Sondergaard, editors, Proceedings of the A CM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Technical Report 94/9, University of Melbourne, Australia, pages 97-106, Orlando, Florida, June 1994. 22. Xavier Leroy. The Objective Caml system, release 1.05. INRIA, Rocquencourt, France, 1997. 23. David B. MacQueen. Modules for Standard ML. In Guy L. Steele Jr., editor, Conference Record of the 1984 A CM Symposium on Lisp and Functional Programming, pages 198-207, Austin, Texas, August 1984. 24. Frank Pfenning and Conal Elliott. Higher-order abstract syntax. In Mayer D. Schwartz, editor, Proceedings of the ACM SIGPLAN'88 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 23, No 7, pages 199-208, Atlanta, Georgia, June 1988. ACM Press. 25. Tim Sheard. A type-directed, on-line, partial evaluator for a polymorphic language. In Consel 4, pages 22-35. 26. Michael Sperber and Peter Thiemann. Two for the price of one: composing partial evaluation and compilation. In Ron K. Cytron, editor, Proceedings of the ACM SIGPLAN'97 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 32, No 5, pages 215-225, Las Vegas, Nevada, June 1997. ACM Press. 27. Zhe Yang. Encoding types in ML-like languages (preliminary version). Technical Report BRICS RS-98-9, Department of Computer Science, University of Aarhus, Aarhus, Denmark, April 1998.

Determination of Dynamic Method Dispatches Using Run-Time Code Generation Nobuhisa Fujinami Sony Computer Science Laboratory Inc.

A b s t r a c t . Run-time code generation (RTCG) enables program optimizations specific to values that are unknown until run time and improves the performance. This paper shows that RTCG can be used to determine dynamic method dispatches. It can produce a better result than conventional method dispatch prediction mechanisms because other run-time optimizations help the determination. Further, the determined functions can be inlined, and this may lead to other optimizations. These optimizations are implemented in the author's RTCG system. The evaluation results showed good performance improvement. 1

Introduction

Run-time code generation (RTCG) is a partial evaluation 1 performed at run time. It generates machine code specific to values which are unknown until run time and enhances the speed of a program, while preserving its generality. R T C G itself is becoming a mature technique. Description languages and systems for runtime code generators have been proposed. Systems that automatically generate run-time code generators from source programs have also been proposed. Also, much efforts has been made to improve the performance of objectoriented languages. Recent research papers have focused on using run-time type feedback or static type inference to optimize dynamic method dispatch. This paper describes an RTCG system that can optimize dynamic method dispatches of an object-oriented language. This system can produce a better results than conventional method dispatch prediction mechanisms because other run-time optimizations, such as global constant propagation/folding and complete loop unrolling, help the determination. This paper focuses on these optimizations. The basics of the RTCG system itself are described only briefly in this paper. Refer to 2, 3, and 4 for details. This system focuses on the instance variables of objects and uses the fact that objects can be regarded as closures 5. If the values of some instance variables are run-time constants, the system generates specialized code generators for methods that use them. Machine code routines optimized to their values are generated at run time. The rest of the paper is organized as follows: Section 2 overviews the R T C G system. The optimizations implemented in the system are described in Section 3. Section 4 evaluates the optimizations. Section 5 overviews related research. Finally, Section 6 provides a summary and future plans.

254 2

System

Overview

This section describes briefly the R T C G system for an object-oriented language proposed by the author. As stated in Section 1, RTCG improves the efficiency of programs by generating machine code optimized to values that are unknown until run time, e.g. intermediate results of computation and the user's inputs. If programs operating on these values are written in object-oriented languages, it is natural to define objects with instance variables that represent the values known at run time. For example, to program stream i n p u t / o u t p u t functions, the programmer may assign descriptors of files, sockets, strings, etc., to instance variables of stream objects. Stream objects may have methods for reading or writing streams, which have the descriptors as their run-time constants. Another example is the generation and rendering of a three dimensional scene. The programmer may represent the scene, which is a run-time constant during rendering, as a scene object with instance variables representing a set of graphics objects, a viewing point, light sources, etc. The scene object's methods for rendering can be optimized through RTCG. The benefits of focusing on instance variables of objects are as follows: A u t o m a t i o n o f t h e t i m i n g o f c o d e g e n e r a t i o n / i n v a l i d a t i o n : Because of the encapsulation mechanism of object-oriented languages, all the assignments to non-public instance variables (e.g. p r i v a t e data members in C + + ) can be known, except for indirect accesses through pointers, from the definition of the class and its methods. Since the system knows when to generate/invalidate code, the programmer is freed from annotating programs and from providing suitable parameters to preserve consistency between the embedded values in the code and the actual values. A u t o m a t i o n o f t h e m a n a g e m e n t o f g e n e r a t e d c o d e : Since generated machine code (a specialized method) can be viewed as a part of the instance, management of it can be left to the instance creation/destruction mechanism of a object-oriented language. Management of multiple machine code routines for the same method is trivial. The generated machine code can be automatically invoked instead of using the original method. The programmer is freed from managing memory for the code and from rewriting programs to invoke the code. The system is implemented as a preprocessor of a C + + compiler. The current implementation is for Borland C + + compilers (Version 4.0 or higher) running on 80x86 based computers with a Win32 API. The executable file name is RPCC.EXE. The reasons for choosing C + + as the source language are as follows: - Since C + + has static type declarations, it is easy to determine the types of values used in the run-time code generator. Since C + + is quite an efficient object-oriented language, the system can provide the best possible implementation of a program written in a highlevel language. -

255 The programmer directs the system to use RTCG by inserting the keyword r u n t i m e before a declaration of a member function. 1 The system assumes all the "known" data members (see the next paragraph) used but not changed in that member function to be run-time constants. The programmer can direct not to assume data members as run-time constants by putting the keyword dynamic before the definitions of the members. The "known" data members are detected as follows: In the first step of analyzing the source program, all p r i v a t e , p r o t e c t e d , or c o n s t 2 data members of the class without the keyword dynamic are marked "known". Then, if any of the member functions of the class use the n o n - c o n s t member that satisfies the following conditions, the mark for the member is cleared: - The address of the member is taken, e.g. an operand of the unary & operator or a reference parameter. - The address is passed directly or indirectly via casts or the binary + and operators to a variable, as a function parameter, or as a return value. - The type of the destination is not a pointer/reference to a c o n s t . The values of the members still marked "known" are known in the sense that only the functions that explicitly use or modify the members can use or modify them. Let F be a member function with the keyword r u n t i m e , and let X be any data member marked "known". If X is used, but not changed in F, X is treated as a run-time constant in the code generator for F. If X is a run-time constant in the code generator for F and member function G changes X, the code to invalidate the machine code for F is inserted into G. If such G's exist, and F calls other functions, then X may be modified during the execution of the generated code. In this case, a new data member is introduced to count the number of active executions of F. Code to check the counter value is inserted into G. If the value is not zero, the code warns the programmer that the insertion of the keyword r u n t i m e is inappropriate. 3 Figure 1 shows an overall organization of the system. The upper half illustrates the action at compile time, and the lower half illustrates the program execution. At compile time, C + + preprocessor directives in a source program are processed first ( C P P 3 2 . E X E in Borland C + + ) . Then R P C C . E X E analyzes the program and generates, if necessary, run-time code generators in C + + . The code generators, the code for invoking them, and the code for invoking/invalidating the generated code are embedded into the original source program. The output 1 Automatic detection of the applicability is possible but not practical, because a too aggressive application of RTCG increases the compilation time and the size of the executable file. 2 It may violate the assumption of the analysis to cast a pointer to const into a pointer to non-const. Such an attempt is considered to be illegal because it is not safe to modify an object through such a pointer. 3 Using the exception handling of C + + may lead to false warnings because the counter may not have been decreased correctly. In this case, catching exceptions in F will solve the problem.

256

Fig. 1. Organization of the implemented system

is compiled into an executable file using a normal C + § compiler (BCC32.EXE in Borland C++). The source program and its intermediate representation are manipulated only at this compile time. At run time, code generators are invoked with run-time constants as parameters. They generate member functions optimized to run-time constants in machine code format. Each code generator is specific to one member function. Since the code is directly written into memory, and since neither source program nor intermediate representation of it is used, code generation is efficient. One code generator may generate multiple machine code routines with different run-time constant values. The generated routines, which are expected to be more efficient than statically compiled ones, are invoked instead of the original member functions. Figure 2 shows an example of an input to RPCC.EXE. Figure 3 shows the output (comments are added for readability). Preprocessor RPCC.EXE processes member functions with the keyword rtmtime and generates run-time code generators in C § Pointers to generated machine code routines are added to the class as its data members; code generators are added as its member functions. The processed member functions are replaced with code fragments to check the validity of the generated code, to invoke the code generators if necessary, and to invoke the generated code. The preprocessor also inserts code for deleting generated machine code in the destructors and in the member functions that modifies the data members embedded in the generated machine code.

257 class A {

private: int x; public: A(int i); runtime int f(int y);

}; A::A(int i): x(i) {} int A::f(int y) { return y-x,x;

Fig. 2. Example of an input to RPCC.EXE

3

Optimizations

The optimizations of the machine code generated by the run-time code generator are divided into two categories: those detected at compile time and those detected at code generation time (i.e. run time). The former are treated in a way similar to conventional code optimizations. They include constant propagation/folding, copy propagation, strength reduction, reassociation, redundant code elimination, algebraic simplification, jump optimization, delay slot filling, and loop invariant motion. Since the latter is performed at run time, the efficiency is important. The system adopts a method to generate machine code directly. It does not manipulate any source program or its intermediate representation at run time. The output from RPCC.EXE contains optimization routines specialized to the target member functions. Optimizations performed at run time include local constant propagation/folding, strength reduction, redundant code elimination, and algebraic simplification. Because of the naive implementation of RPCC.EXE, redundant optimization code may be included in the code generator, but most of it is optimized away by the C + + compiler (Code generators are generated in C + + . See Section 2.). The rest of this section describes non-trivial optimizations performed at code generation time. These optimizations include global run-time constant propagation, complete loop unrolling, and virtual function inlining.

3.1

Intermediate Representation

This subsection describes intermediate representation used at compile time. RPCC.EXE consists of three phases similar to conventional compilers (see Figure 1): 1. Translator from source to intermediate representation 2. Optimizer of intermediate representation 3. Generator of run-time code generator

258

#include

// macros and functions for RTCG

class A { private: int x; public: A(int i); int f(int y);

"A() char *qq_f ; void qq__f 0 const; static char *qql_f; static char *qql__f();

// // /I // //

destructor pointer to generated code code generator address of label "generate" in f function to initialize qql_f

}; A::~A() { if(qq_f!=qql_f) delete qq_f; } A::A(int i): x(i)

,qq_f(qql_f){}

int A::f(int ) { retry: asm M0V ECX,this; asm JMP DWORD PTR ECX.qq_f; generate: qq__f(); goto retry;

// jump to generated code // invoke code generator

} char *A::qql_f=qql__f(); void A::qq__fO const { char *qqcode; / / code address // prologue code generator (omitted) qqMOVdx(0,5,12); // M0V EAX,EBP+12 ; Y qqSUB_I(0,(int)x*x); // SUB EAX,x*x // epilogue code generator (omitted) *(char **)&qq_f=qqcode; // set code address

}

Fig. 3. Example of an output from RPCC.EXE (Macro qqXX(YY) writes instruction XX with operand(s) YY into memory.)

259

i I=/\0tem_ o /N

L ('iJ Fig. 4. Intermediate representation of { int i,s--0; for(i=0;i
s; }

The intermediate representation format used in these phases is designed to be suitable for generating run-time code generators. Intermediate representation is a flow graph that represents the meaning of a function. It is generated for each function to be optimized or inlined. The nodes of the graph axe basic blocks. They are represented as a sequence of statements. Figure 4 shows an example of intermediate representation. A statement is one of assignment, function invocation, selection, switch, virtual, and return. A goto-statement is represented as an edge of the flow graph and has no special node. Some statements have expressions as their operands. Expressions are represented as directed acyclic graphs (DAGs) whose nodes are operators, identifiers, or constants. Expressions with side effects are divided into multiple statements. Conditional operators (&&, II, and ?:) are expressed using separate basic blocks. Temporary variables may be introduced in this transformation. The reasons for using DAGs instead of a conventional flat format such as a three-address code are as follows: - It is easy to classify nodes into stages (see the next paragraph). - It is easy to reconstruct C + + expressions to calculate run-time constants which will be embedded into run-time code generators. - It is easy to replace variables with expressions. This operation is necessary in the optimization described in Subsection 3.2. Leaf nodes of DAGs are identifiers or compile-time constants. Each identifier's entry in the symbol table has a flag that tells if the identifier is a data member treated as a run-time constant. R P C C . E X E classifies nodes into three stages: "compile time", "code generation time", and "dynamic". Simple traversal of DAGs can classify internal nodes and statements into three stages. When the run-time code generator is generated, the stage information is used as follows: If the stages of a subgraph is "compile time", its value is calculated at compile time and the result is embedded into the run-time code generator. If its stage is "code generation time", a C + + expression that calculates the run-time

260

select') ~

>1_ Y~ I< ye~retlarn

(~XpJ

( N,,qJ ~. y )

Fig. 5. Intermediate representation before run-time constant propagation s~ect 1

I

select

yes l ,~ ~Tx~Pj yeSix./*~Tx~qj

-Lx- x

lno Fig. 6. Intermediate representation after run-time constant propagation constant value is embedded into the code generator. If its stage is "dynamic", a code generation routine for it is embedded. The system inlines functions during translation to intermediate representation. Functions that perform recursive calls are not inlined to prevent infinite loops of partial evaluation. 3.2

Global Run-Time Constant Propagation

This optimization is processed in the second phase (optimizer of intermediate representation). Like conventional compilers, this phase performs dataflow analysis on the flow graph, propagates compile-time constants, eliminates redundant code, etc. Simple extension of compile-time constant propagation allows runtime global constant propagation. In the normal constant propagation, only the values of compile time constant expressions are propagated to the places of their use. If the right operand of an assignment operator is an arithmetic expression consisting of run-time constants, this phase also propagates the expression. Thus, all the use of run-time constants are replaced with expressions that compute their values. In the third phase (generator of run-time code generator), the new expressions are classified as "compile time". The C++ expressions are reconstructed and embedded into the run-time code generator. This enables global run-time constant propagation. For example, if x is a run-time constant in the block:

261 { y=x*x; if(y>=p && y
Fig. 7. Run-time code generator (qqXX is a macro for code generation) M0V EAX, EBP+12 Cl~u EAX,9 JG L2 M0V ECX, EBP+16 CMP ECX,9 JG L5 L2: ; code f o r

other

; p

; q

basic

blocks

L5" MOV EAX,9 ; exit code Fig. 8. Generated machine code if x=3 (translated into Intel mnemonic)

(its intermediate representation is in Figure 5), uses of y are replaced with the run-time constant expression x*x (see Figure 6). In the next phase, a code generation routine shown in Figure 7 is generated. If run-time constant x=3 is supplied to the code generator at run time, machine code shown in Figure 8 will be generated. Since this optimization technique is flow-sensitive, it successfully processes cases in which a variable is a run-time constant at one point in the program, but not at another point. For example, suppose p and q are not run-time constants and an assignment y = q - p ; follows the code in Figure 5. In this case, the variable y does not always hold a run-time constant, but an assignment y=x*x; is successfully propagated using dataflow analysis. An assignment y = q - p ; does not disturb the optimization in Figure 6. Using the Static Single Assignment Form (SSA) and a flat intermediate representation format also allows flow-sensitive stage classification. This m e t h o d

262 i n t C : : g ( i n t p , i n t q)

{ int sffiO; for(i=O;ifp ~& y
Fig. 9. Member function g

avoids duplicated run-time constant expressions (e.g. x*x in the previous example) but requires stage annotations for all of the temporary variables. The system does not adopt it and leaves the optimization to eliminate duplicated run-time constant expressions, if any, to the C + + compiler.

3.3

Complete Loop Unrolling

Unlike other run-time code generation systems (see Section 5), the author's system can unroll only simple loops. But it automatically decides whether each loop should be fully unrolled or not, depending on the upper limit of the number of iterations of the loop. In the second phase (optimizer of intermediate representation), loops are detected during dataflow analysis. It checks to see if one of the exits of the loop is a simple comparison of the control variable and compile- or run-time constant, and if the initial value and the update step are compile- or run-time constants. The loop may have other exits. If a loop passes the check, its upper limit of the number of iterations are compile- or run-time constant. The third phase (generator of run-time code generator) emits a code fragment to unroll the loop based on this information. If the upper limit of the number of iterations is a compile-time constant, either a normal code generation routine or a complete unrolling routine of the loop is generated, depending on the upper limit. If it is a run-time constant, code generators of both versions are generated, and the selection is performed at code generation time. To support both versions, the run-time constant propagator is extended. If a run-time constant expression contains a loop control variable, the propagated expression (for calculating the value of the run-time constant) is represented as a special node that also holds the original variable. An example is a loop in function g in Figure 9. If a and n are run-time constant data members of class C, the use of y is replaced with a special node that holds both y and a i . If the loop is not unrolled, variable y is used in the i f statement. If the loop is completely unrolled, run-time constant a i is used in the i f statement. If the loop is completely unrolled, the stage of the loop control structure is assumed to be "code generation time" in the output code generator. The loop

263 qqOi=O; for(;;) qqMOVdx(0,5,12); qqCMP_I(O,aqqOi); qqGenJccF(G,5); qqNOVdx(1,5,16); qqCMP_I(1,aqqOi); qqGenJccF(LE,5); qqADD_I(6,1);

qqGenLbl(5); qqOi=qqOi+l; if(!(qqOi
/ / i=O / / MOV EAX,EBP+12 / / CHP EAX,aqqOi // JG L5 / / 80V ECX,EBP+I6 // aMP ECX,aqqOi // JLE L5 // INC ESI //LS: // i++

; p ; ai ; q ; ai ; s

Fig. 10. Code generator for the loop in g (qqXX is a macro for code generation)

~fl"""i-- ~ n e x t p J ~statement Fig. 11. Flow graph of virtual function invocation

control variable becomes a variable of the run-time code generator and is treated as a run-time constant in the body of the loop (see Figure 10).

3.4

Inlining Virtual Functions

Suppose a run-time constant is a pointer to a class object. If virtual function invocation is used with this pointer, the actual invoked function can be determined at code generation time. This is because of the following reason: Since the pointer value is constant, the object's address cannot be changed, and a legal way to change the class is to use union. A class with some virtual functions should have an implicit or an explicit constructor. If a class has any constructor, it cannot be a m e m b e r of union. Thus the actual class of the object cannot be changed. Furthermore, if the invoked function is declared as i n l i n e , it can be inlined 4. 4 Virtual functions of C + + can be declared as inline. Normal compilers inline them only if the invoked functions can be determined at compile time.

264 class objectTableType { int count; // graphics object counter objectType *tableMAXOBJECT; I/ pointers to graphics objects public: objectTableType(): count(O) {} int add(objectType *p); // add graphics object runtime const objectType *intersect_all(rayType 6, myfloat 6); }; // return the first object the ray intersects

Fig. 12. Definition of class objectTableType const objectType *

objectTableType::intersect_all(rayType &ray, myfloat ~t) { myfloat t l ; const objectType *obj=O; for(int i=O;iintersect(ray,tl)) { if(ZI>MIN_DISTANCE ~6 (obj==O If tl
{

Fig. 13. Member function i nt ersect _al l

In the intermediate representation used in RPCC.EXE, virtual function invocation is represented as a special node "Virtual" (see Figure 11). Each Ik represents a call of a member function of derived class Cn or an inlined image of it. RPCC.EXE embeds code to test the actual class and to generate machine code for Ik if the class is Ck. Operator t y p e i d , which returns run-time type information, is used in the test. Other run-time optimizations, such as constant propagation and complete loop unrolling, help this optimization. For example, Figure 12 shows a class that represents a set of graphics objects. Member function i n t e r s e c t _ a l l has the keyword runtime, and its implementation is given in Figure 13. Each element of t a b l e can point to any graphics object (derived class of obj ect T ype: see Figure 14). If the loop is unrolled, the invocation of virtual function i n t e r s e c t is determined and inlined. If t a b l e 0 points to an object of class Plane and t a b l e 1 points to an object of class discType, then the unrolled loop looks like: Inlined image of Plane::intersect Rest of the loop body

265 class objectType {

protected: const surfaceType * surface; const pointType center; public: inline virtual int intersect(const rayType&,myfloat &t) const = O; objectType(surfaceType * s, const pointType& c); virtual const SinfoType * getSinfo(const pointType & pos) const; virtual vectorType getNormal(const pointType &) const =0; virtual int flat() const = O; };

class Plane: public objectType { const vectorType N; const myfloat d; public: Plane(surfaceType *s,pointType &pos,vectorType &NN); inline int intersect(const rayType &,myfloat &) const; vectorType getNormal(const pointType &) const { return N; } int flat() const { return I; } };

int P l a n e : : i n t e r s e c t ( c o n s t rayType &ray,myfloat &t) const { myfloat t l = i n n e r ( N , r a y . v ) ; if(tl==O.O) return O; t=(d-inner(N,ray.p))/tl; if(t<=MIN_DISTANCE) return O; return 1;

Fig. 14. Class objectType, derived class Plane and virtual m e m b e r function intersect

Inlined image of d i s c : : i n t e r s e c t Rest of the loop body

This is impossible for conventional techniques, such as type inference of variables or variable occurrences in call sites. Inlined member functions are further optimized through run-time constant propagation/folding, algebraic simplification, etc. In the previous example, runtime constants t a b l e 0, t a b l e 1, .-. are propagated to the t h i s pointers of inlined member functions P l a n e : : i n t e r s e c t , d i s c T y p e : : i n t e r s e c t , 99-, and they are specialized with respect to c o n s t data members, if any, of graphics objects pointed to by t a b l e 0, t a b l e 1, ....

266 execution time (sec) speed ratio Program original (C) 72.5 1.00 0.59 new (C++) 42.6 1.70 1.00 optimized (C++) 30.4 2.38 1.40 Table 1. Evaluation results (ray tracer) Program original (C) original (C-t-+) optimized (C§

execution time (sec) speed ratio 69.2 11.00 1.04" 72.3 0.96 1.00 36.9 1.87 1.96

Table 2. Evaluation results (puzzle solver)

Figure 14 shows definitions of class P l a n e and member function i n t e r s e c t . If an instance of P l a n e represents a plane parallel to the x-y plane, the x and y components of its normal vector N are zeros. In this case, inlined image of function i n n e r (inner product: three multiplications and two additions) is algebraically simplified into a single multiplication.

4

Evaluation

This section reports the evaluation results of the implementation. The evaluation environment is as follows: M a c h i n e : NEC PC-9821St15/L16 (Pentium Pro 150MHz, RAM: 48MBytes) O p e r a t i n g S y s t e m : Microsoft Windows 95 Operating System C o m p i l e r : Borland C + + Version 5.0J C o m p i l e r O p t i o n s : -6 -02 -OS - v i (Pentium Pro, optimize for speed, instruction scheduling, enable inlining) The first program is a ray tracer. Table 1 shows the results. The program reads a scene file at run time and displays the ray-traced image. It contains class obj e c t T a b l e T y p e in Figures 12 and 13. The keyword "runtime" is inserted before the declaration of member function i n t e r s e c t a l l . The ray-traced scene contains three transparent spheres, seventeen opaque spheres, two mirrors, one disc, one checked square, and one light source. The output is a 512 • 512 24bit color image. The original program is in 6 and is written in C ("original" in Table 1). The author's part-time assistant rewrote it into C + + ("new" in Table 1). The new program runs about 1.7 times as fast as the original one 5 and the run-time-optimized one is 1.4 times as fast as the new one. Analyzing the generated code shows that the speedup is mostly due to determination of virtual member function invocation combined with inlining. 5 The original program uses a switch statement to classify graphics objects. Using virtual member function optimized the classification.

267 { private: int index,dirs,num;

class Piece

int offsetMaxDirMaxSize; public: Piece(int i); runtime void put(Box *b,int o,int iO,PieceList *1);

private: void set(Box* b,int o,int j); void reset(Box* b,int o,int j); >;

void Piece::put(Box *b,int o,int iO,PieceList *i) { int j,k;

for(j=O;jputok(b,o,index,iO); reset(b,o,j); next:;

k!=null)

gore next;

)

Fig. 15. Class P i e c e

The second program is a box-packing puzzle solver. Table 2 shows the results. It reads a puzzle definition file that describes the box and the pieces and prints the solutions. Run-time optimization is applied to member function p u t of class P i e c e (see Figure 15). Its instance variables represent the shapes of the pieces and are run-time constants. The goal here is to pack 11 pieces into a size 4 cubic box. There are three solutions. The optimized program runs about twice as fast as the normal one. Here the speedup is mostly due to loop unrolling and constant folding. The author wrote the puzzle solver in both C and C + + . Notice that the program in C + + is a little bit slower than that in C. In both cases, the optimized program runs faster than the original one and the one written in C. Optimized programs in C + + can run faster than their C counterparts. The cost for code generation is 625 microseconds per 7638 bytes of generated instructions, or about 12 machine cycles per bytes in the case of the ray tracer. It is low enough compared with the code generation system that manipulates intermediate representations 7 but it is still higher than the result of 8, which emits run-time code generators in machine code. The cause of this const seems to be that the compiled code generator contains quite a few instructions operating on byte data. The Pentium Pro processor cannot execute such instructions

268

efficiently. Rewriting the generator of code generator will make the cost even lower.

5

Related Work

There are a number of related research papers on optimization of object-oriented languages 9 10 11 12 13 14 15. These papers focus on run-time type feedback oi" static type inference to optimize method dispatch, and the methods are partially evaluated with respect to the inferred type. Since the author focuses on values, more aggressive optimizations can be applied, and run-time valuespecific optimizations help to determine dynamic method dispatches. However, the author's system cannot optimize cases where the pointer value is not constant but the type of the pointed object is fixed. Using the results of these research papers will enable optimization of both cases. A framework named specialization classes 16 focuses on the instance variables of objects and can specialize methods with respect to their values at both compile time and run time. Its prototype was implemented for Java. Since it does not specialize methods with respect to run-time types or to constant objects 6, and since it cannot consider array variables as invariants, inlining methods combined with loop unrolling, run-time constant propagation, and algebraic simplification (see Subsection 3.4) is impossible. There is another value-specific partial evaluator for an object-oriented language 17; however, objects are not regarded as closures in it. There are also various systems for run-time code generation. Fabius 18 8 19 is a compiler for a subset of ML with automatic run-time light-weight code generation. It requires the programmer to declare functions to take their arguments in curried form. The idea is very similar to that of using objects as closures. It is also very natural in functional programming languages. But in languages with assignments, it is not desirable because the actual values may be different from the values embedded in partially evaluated functions. The author's method is preferable in richer and efficient languages like C + + , because the values of their instance variables can be changed after instantaton. Tempo 20 21 is an online and offiine partial evaluator of system programs in C. It automatically generates quite efficient run-time code generators. But the programmer has to invoke the code generator explicitly and has to manage the generated code. It is the programmer's responsibility to maintain consistency between the actual value and the value embedded in the generated code. DyC and its predecessor 22 23 24 implicitly generate machine code for program regions the programmer indicates. The indications include run-time constants for those regions. If the values of some run-time constants change, the corresponding machine code is automatically generated. The system uses a pair of dataflow analyses to identify which variables will be constant at run time. The If an instance variable is of object type, it can specialize methods with respect to the specialization state (declared by specialization class name) of the object.

269 system, however, is error prone because it requires the programmer to insert the keyword dynamic at every use of a C pointer that refers to dynamic values. The system can manage multiple machine code routines for each region. The generated code is looked up using the values of run-time constants. The author's method uses a pointer in each object instance and is therefore more efficient. If a number of object instances share the same set of values of the instance variables, the author's method requires larger storage for generated code. Reusing code at code generation time reduces the storage requirement without a significant effect on speed. 'C 25 is a language for run-time code generation. The programmer can control run-time code generation explicitly. It may be efficient, but the programmer has to rewrite the source program using ' and $ operators.

6

Summary

and

Future

Work

This paper showed that RTCG can be used to determine dynamic method dispatches. It can produce a better result than conventional method dispatch prediction mechanisms because other run-time optimizations, such as global runtime constant propagation and complete loop unrolling, help the determination. Further, the determined functions can be inlined. This may lead to other opti~ mizations. These optimizations are implemented in the author's RTCG system by simple extension of the intermediate representation optimizer and the generator of runtime code generator. The system is implemented as a preprocessor of a C + + compiler, and time-consuming operations are performed only at compile time. The evaluation results showed good performance improvement. The author plans to extend the system to optimize groups of objects. Commonly used data structures, such as linked lists and hash tables, consist of groups of objects. Operations on these data structures can be represented as code fragments held in the objects. This is similar to executable data structures or Quajects by Massalin 26 27, which were implemented using some handwritten templates in an assembly language. Object-oriented languages may permit automatic application of this optimization. The author has already proposed basic ideas to optimize groups of objects 28. The author is also going to release the preprocessor RPCC.EXE as a free software (tentative name: C-t-+ Doubler).

Acknowledgments I would like to express my gratitude to Dr. Mario Tokoro for supervising the research. I also appreciate the many suggestions offered by Dr. Satoshi Matsuoka and Dr. Calton Pu as well as the work of programming ray tracer in C + + by Ms. Kayoko Sakai. Finally, I would like to thank the members of Sony CSL for their valuable advice.

270

References 1. Neil D. Jones. An Introduction to Partial Evaluation. A CM Computing Surveys, Vol. 28, No. 3, pp. 480-503, September 1996. 2. Nobuhisa Fujinami. Run-Time Optimization in Object-Oriented Languages. In Proceedings of 12th Conference of Japan Society for Software Science and Technology, September 1995. In Japanese. Received Takahashi Award. 3. Nobuhisa Fujinami. Automatic Run-Time Code Generation in C++. In Yutaka Ishikawa, Rodney R. Oldehoeft, John V.W. Reynders, and Marydell Tholburn, editors, LNCS 1343: Scientific Computing in Object-Oriented Parallel Environments. Proceedings, December 1997. Also appeared as Technical Report SCSL-TR-97-006 of Sony Computer Science Laboratory Inc. 4. Nobuhisa Fujinami. Automatic and Efficient Run-Time Code Generation Using Object-Oriented Languages. To appear in Computer Software, Japan Society for Software Science and Technology, 1998. Also appeared as Technical Report SCSLTR-98-001 of Sony Computer Science Laboratory Inc. (In Japanese). 5. Uday S. Reddy. Objects As Closures: Abstract Semantics of Object Oriented Languages. In Proceedings of the ACM Conference on Lisp and Functional Programming. ACM Press, July 1988. 6. Peter Holst Andersen. Partial Evaluation Applied to Ray Tracing. Student Report, DIKU, University of Copenhagen, 1993. 7. Dawson R. Engler and Todd A. Proebsting. DCG: An Efficient, Retargetable Dynamic Code Generation System. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 263-272. ACM Press, October 1994. Also appeared in SIGPLAN NOTICES, Vol.29, No.10. 8. Peter Lee and Mark Leone. Optimizing ML with Run-Time Code Generation. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 137-148, May 1996. 9. Jeffrey Dean, Craig Chambers, and David Grove. Identifying Profitable Specialization in Object-Oriented Languages. Technical Report 94-02-05, Department of Computer Science and Engineering, University of Washington, 1994. 10. Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Walter Olthoff, editor, LNCS 952, Object-Oriented Programming, Proceedings of ECOOP'95, August 1995. Also appeared as Technical Report 94-12-01, Department of Computer Science and Engineering, University of Washington. 11. Jeffrey Dean, Greg DeFouw, David Grove, Vassily Litvinov, and Craig Chambers. Vortex: An Optimizing Compiler for Object-Oriented Languages. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1996. ACM Press, October 1996. 12. David F. Bacon and Peter F. Sweeney. Fast Static Analysis of C + + Virtual Function Calls. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1996. ACM Press, October 1996. 13. Urs HSlzle and David Ungar. Optimizing Dynamically-Dispatched Calls with RunTime Type Feedback. In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation, pp. 326-336, 1994. 14. Gerald Aigner and Urs HSlzle. Eliminating Virtual Function Calls in C + + Programs. In Proceedings of ECOOP'96, June 1996.

271 15. Jan Vitek, R. Nigel Horspool, and James S. Uhl. Compile-Time Analysis of ObjectOriented Programs. In U. Kastens and P. Pfahler, editors, LNCS 641, Compiler Construction, 4th International Conference, CC '92, pp. 236-250, October 1992. 16. Eugen N. Volanshi, Charles Consel, Gilles Muller, and Crispin Cowan. Declarative Specialization of Object-Oriented Programs. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1997. ACM Press, October 1997. 17. Morten Marquard and Bjarne Steensgaard. Partial Evaluation of an ObjectOriented Imperative Language. Master's thesis, University of Copenhagen, April 1992. 18. Mark Leone and Peter Lee. Lightweight Run-Time Code Generation. In Proceed-

19.

20.

21.

22.

23.

24.

25.

26. 27.

28.

ings of the 1994 ACM SIGPLAN Workshop on Partial Evaluation and SemanticsBased Program Manipulation, pp. 97-106. ACM Press, June 1994. Mark Leone and Peter Lee. A Declarative Approach to Run-Time Code Generation. In Workshop Record of WCSSS'96: The Inaugural Workshop on Compiler Support .for System Software, pp. 8-17, February 1996. Charles Consel, Luke Hornof, Francois NSel, and Nicolae Volanshi. A Uniform Approach for Compile-time and Run-time Specialization. Technical Report No. 2775, INRIA, January 1996. Eugen-Nicolae Volanshi, Gilles Muller, Charles Consel, Luke Hornof, Jacques Noy~, and Calton Pu. A Uniform and Automatic Approach to Copy Elimination in System Extensions via Program Specialization. Technical Report No. 1021, IRISA, June 1996. Joel Auslander, Matthai Philipose, Crig Chambers, Susan J. Eggers, and Brian N. Bershad. Fast, Effective Dynamic Compilation. In Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, pp. 149-159, May 1996. Brian Grant, Markus Mock, Matthai Philipose, Craig Chambers, and Susan J. Eggers. Annotation-Directed Run-Time Specialization in C. In Proceedings of Workshop on Partial Evaluation and Semantics-Based Program Manipulation (PEPM'97), June 1997. Brian Grant, Markus Mock, Matthai Philipose, Graig Chambers, and Susan J. Eggets. DyC: An Expressive Annotation-Directed Dynamic Compiler for C. Technical Report 97-03-03, Department of Computer Science and Engineering, University of Washington, 1997. Dawson R. Engler, Wilson C. Hsieh, and M. Frans Kaashoek. 'C: A language For High-Level, Efficient, and Machine-independent Dynamic Code Generation. In Conference Record of POPL '96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 258-270, January 1996. Calton Pu, Henry Massalin, and John Ioannidis. The Synthesis kernel. Computing Systems, Vol. 1, No. 1, pp. 11-32, Winter 1988. Henry Massalin. Synthesis: An E~icient Implementation of Fundamental Operating System Services. PhD thesis, Graduate School of Arts and Sciences, Columbia University, April 1992. Nobuhisa Fujinami. Run-Time Optimization of Groups of Objects. In Proceedings of 14th Conference of Japan Society for Software Science and Technology, September 1997. Also appeared as Technical Memo SCSL-TM-97-007 of Sony Computer Science Laboratory Inc. (In Japanese).

Type-Based Analysis of Concurrent Programs Naoki Kobayashi Department of Information Science, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan koba@is, s .u-tokyo. ac. jp

Analysis and compilation of concurrent programs are challenging tasks: since concurrency primitives for thread creation and communication have a much more dynamic nature t h a n sequential primitives like function creation and application, it is difficult to reason about p r o g r a m behavior for both p r o g r a m m e r s and compilers. For example, unlike in sequential p r o g r a m m i n g languages, it is not easy to know which part of a program is executed first - - consider scheduling of a process t h a t tries to receive a value from a communication channel: if a value is available, the process is executed immediately, while, if not, the process is suspended and another process must be scheduled. This kind of dynamic p r o g r a m behavior complicates not just a p r o g r a m m e r ' s debugging but also a compiler's efficient code generation. In order to deal with the above problems, several type systems and p r o g r a m analyses have been studied through process calculi. In this talk, we focus on Kobayashi, Pierce, and Turner's type system for linear (use-once) channels 3 and its extensions 1, 2. The main idea of those type systems is to augment ordinary types with information on how often and in which order each communication channel can be used. With such extra information, we can ensure t h a t a certain part of a concurrent p r o g r a m is confluent a n d / o r deadlock-free. After giving an overview of the type systems, we show how such type information can be used for reasoning about program behavior and p r o g r a m optimizations. 1

References 1. Atsushi Igarashi and Naoki Kobayashi. Type-based analysis of usage of communication channels for concurrent programming languages. In Proceedings of International Static Analysis Symposium (SAS'97), Lecture Notes in Computer Science, Vol. 1302. Springer-Verlag, Berlin Heidelberg New York (1997) 187-201. 2. Naoki Kobayashi. A partially deadlock-free typed process calculus. To appear in ACM Transactions on Programming Languages and Systems, ACM, New York (1998). A preliminary summary appeared in Proceedings of LICS'97, (1997) 128139. 3. Naoki Kobayashi, Benjamin C. Pierce, and David N. Turner. Linearity and the pi-calculus. In Proceedings of ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, ACM, New York (1996) 358-371.

1 An electronic copy of the slides is available through http://www.yl.is.s.utokyo.ac.jp/members/koba/publications.html.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages Dominic Duggan Department of Computer Science, Stevens Institute of Technology, Castle Point on the Hudson, Hoboken, New Jersey 07030. dduggan@cs, s t e v e n s - t e c h , edu

Abstract. Marshalling is an important aspect of distributed programming, par-

ticularly in typed programming languages. A semantics is provided for userdefined marshalling operations in polymorphic languages such as ML. The semantics of this are expressed in an internal language with recursion and dynamic type dispatch at both the term and type levels. User-defined marshalling amounts to reifying dynamic type dispatch to the programmer level in an ML-like language. A "external" language XMLdyn is provided with explicit pickle types and operations for building and deconstructing pickles with user-defined marshalling.

1

Introduction

In distributed programming environments, where programs operate in separate address spaces, there must be some way of converting values from their internal format to an external "wire" format for communication to other programs. This conversion process is referred to as marshalling, and its converse as unmarshalling. User-definedmarshalling is now widely recognized as essential for monomorphic distributed programming languages. For example Birrell et al. 5 report: It is difficult to provide fully general marshalling code in a satisfactory way. Existing systems fail in one or more of the following ways. Some apply restrictions to the types that can be marshalled, typically prohibiting linked, cyclic or graph-structured values. Some generate elaborate code for almost any data type, but the resulting stub modules are excessively large. Some handle a lot of data types, but the marshalling code is excessively inefficient. The Modula-3 Pickle module allows user-defined type-specific pickling and unpickling routines (called "specials") to be registered with the pickler. Such a facility is also found in languages designed for distributed programming; for example the Argus distributed programming language 26, and Concurrent CLU, developed for use in the Cambridge Distributed Computer System 4, allow ADT implementations to export programmer-defined marshalling code, to support efficient remote procedure call. Allowing a user-defining pickling operation essentially provides a mechanism for reflection in distributed programming 18.

274 Our intent in this paper is to provide a semantics for user-defined marshalling in polymorphic language, based on the use of run-time type information to guide userdefined marshalling operations that recurse over type descriptions. Recent work has suggested type-based transformations as a general framework for program optimization for polymorphic languages 20, 30. Our semantics is couched in terms of this framework. We extend this framework with refinement kinds, that allow the exhaustiveness of dynamic type dispatching to be checked statically. Refinement kinds play a crucial r61e in typing our semantics. Although not essential to our work, we attach our semantics to explicit pickles in the language. Explicit pickle types are found in many distributed languages. For example the OMG CORBA provides for a type of AIIY, the type of pickles that can be transmitted between address spaces. Abadi et al. 2 suggested a similar mechanism for adding dynamic typing to statically typed languages, the type dynamic. This mechanism incorporated an operation dynamic for bundling a value with its type, and a t y p e c a s e construct for examining the value: f u n p r i n t ( x : d y n a m i c ) = t y p e c a s e x of int(xi) $ output (toString xi) I s t r i n g ( x s ) ~ o u t p u t xs p r i n t ( i f t r u e t h e n dynamic 3 e l s e dynamic " h e l l o " ) We attach our semantics for user-defined marshalling to dynamics; however this semantics could just as well be attached to the message-passing operations themselves. The usefulness of a facility such as dynamics for distributed programming has been echoed by practical experience. For example, Krumvieda reports from his implementation of a distributed dialect of Standard ML that: The lack of a dynamic type or some other method of implicitly attaching marshalling functions to SML types hampered much of DML's interface development and complicated its signature. Although DML was originally intended to support dynamic types, the necessary work never materialized and group type objects have proliferated and propogated through its implementation and coding examples 23. We introduce a new construct for dynamics that allows user-definable marshalling routines to be attached to the dynamic construct. Our semantics for dynamics is particularly aimed at polymorphic languages, such as ML. We make use of a new approach to computing with dynamics, based on dynamic type dispatch, that fixes some problems with the use of dynamics in polymorphic languages. User-definable marshalling is based fundamentally on allowing user-specified type-based transformations to be reflected in a semantics based on dynamic type dispatch. Languages such as Modula-3 allow many parts of the run-time to be implemented in the language itself. For example, most of the threads and garbage collection code, and all of the marshalling code, for Modula-3 is implemented in Modula-3 itself 28. For a "high-level" language such as ML, it should be possible to define a "safe" subset of ML in which low-level operations such as marshalling can be implemented. As examples of this endeavour, the SML/NJ compiler generates ML code for polymorphic equality

275 (including reference equality), while Cardelli considers a subset of Quest in which a garbage collector for Quest can be implemented 8. If there is any intrinsic reason that a safe subset of ML cannot be defined in which marshalling can be implemented, then the semantics presented here should be considered as being for a hypothetical language that is not so deficient. Sect. 2 reviews our approach to dynamic type dispatch with refinement kinds, reviewing the language XML dyn originally introduced by Duggan 12. In this approach, dynamic type dispatch is refined so that the programmer can control where run-time failures happen due to the use of dynamic type dispatch9 Sect. 3 introduces our operations for user-definable marshalling, including their static semantics. We call the 9 XMLrI dyn, since it extends XML dyn. Sect. 4 gives a language introduced in this section translation semantics from XML~yn into XMI_~yn, an extension of XML dyn with iteration at the type level. In Sect. 5 we give an alternative semantics for user-definable marshalling. This uses a simpler version of the static semantics for XML dyn, but requires a somewhat more complicated "internal language" for its operational semantics.

2

Dynamic Type Dispatch With Refinement Kinds

In this section we describe X/VILdyn, the kernel language that is at the heart of our approach. XML dy" combines dynamic type dispatch with "refinement kinds" that ensure the absence of run-time type failures9 Type failure is isolated to a particular construct. XML dyn was originally introduced by Duggan 12. Types in our approach are stratified into simple types x and polymorphic types t~:

o::=x

I ~ 2

I W<:x 9

I Voc<::Z.~ I W:<:X.~

Our type system is based on the two-level stratified type system used by Harper and Mitchell 19 to explain ML's polymorphic type discipline9 In this approach we have the usual collection of monomorphic types or monotypes (closed under the --4 type constructor and any other type constructors), and a second level of polymorphic types or polytypes, based on the closure of the collection of monotypes under the universal type quantifier V. Type constructors t denote both base types, such as Lnt and r e a l , as well as type constructors such as 15.st and the product and function type constructors ('c. x' and x ~ 'g, respectively). Type variables range over both types and type constructors, so type expressions include both ZLst (5.at) and 0c(s (the latter being the application of the type constructor variable ~ to i n t ) . We sometimes use the syntax t(Xl,... ,xn) to denote (t xl ... x,). Polymorphic types abstract over both type variables oc and kind variables ~. Kinds are regular tree expressions denoting (possibly infinite) sets of types9 The syntax of kinds is given by: %::=p

I ZI~Z2

P::=•

I r

I K I t_(N) I OlUp2 I . K . o

276 where p denotes kinds, and K denotes kind variables. X is used to denote arities for type variables ranging over type constructors. For example T denotes the arity of all types, while T --+ -F denotes the arity of unary type constructors (for example, l i s t ) . Kinds p denote refinement kinds, that refine the arity T denoting the set of all ground monotypes. The kind operator t3 denotes union, and/~ is the fixed point operator. Kinds intuitively form a lattice of sets of types, with _L and -F as the bottom and top of the lattice, respectively. Each type constructor 1; has an associated kind constructor t of the same arity; a kind expression ~(91,---, P,) denotes the set of types with outermost type constructor 1; applied to types Xl 6 P l , . . . ,x, 6 Pn. Then for example the kind i n t U r e a l denotes the set of types {int,real}, while the recursive k i n d / i K . i n t U (1~ l i s t ) denotes the infinite set of types { i n t , i n t l i s t , i n t l i s t l i s t , . . . } . We let x <: p denote that x is contained in the set of types denoted by p, and we refer to this as the containment relation. So for example i n t <: ( i n t U r e a l ) . The subset relation between the interpretation of kinds induces a subkinding relationship p <: p' between kinds. For example we have

list(list(int))

<:

@~.int U list(K)

Inclusion between kinds p induces a subtype inclusion between the arities of type constructors. For example we have (X1 --> X2) <: (% --~ X~) if X <: X1 and X2 <: X~. The function arity constructor should not be confused with the kind constructor --+, that describes the set of types with outermost type constructor --+, and where xl --+ x2 <: Pl--+P2 if xl <: Pl andx2 <: P2. Our type system involves constraints of the form x <: X (with in particular "c <: p denoting that x is included in the kind p), X1 <: X2 (with in particular pl <: P2 denoting that the set of types described by Pl is included in the corresponding set for P2), and 01 <: 02 (denoting that ol is a subtype of 02). We use ~/,~ to denote both type variables ct, ~ and kind variables ~. Furthermore we use ~ to denote both types o and kinds p. Then ~1 <: ~2 stands generically for any of the above three forms of constraints. We have the following judgement forms: Judgement r F- X1 = X2 r ~- X1 <: Z2 F ~- x <: X F ~- Ol <: o2

Meaning Kind equality Kind containment Kind membership Subtyping

where F is a context of constraints on type and kind variables:

r::=(} I

I

We require that kinds are contractive 12. the following form:

I

I r ur2

This allows us to assume that kinds have

p::=}c I T J /llc.tl(~-T)U-..Utn(~-n) We furthermore require that kinds are discriminative: in a union kind, the outermost type constructors t 1,..., tn are required to be distinct.

277 The abstract syntax for the core language of XML dyn is given by: e ::= x I Lx: (~.e I (el e2) I l e t X : el i n e2 I recol~o2 e I A y < : ~ . e

e~I/

The construct for dynamic type dispatch in X M L dyn is provided by the t y p e r e c construct: e:: . . . .

I t y p e r e c f : f f of tl(~-) ===>el I - . . I t k ( ~ ) ~ e k

The novelty of this construct is that it defines a function that recurses over a type rather than over a value. The fixed point of this function is given by the variable f that is introduced by the typerec. Such a function specifies a form of type-safe dynamic type dispatch, wherein a polymorphic function dispatches at run-time on the basis of type arguments. The type rule for this construct is provided by the TYREC type rule: o = ( V o c < : p . o "t)

p=/zK. Ut_k(p-k )

ti~ktjifi~j

1",~7 <: {p/K}pi;A,f : 0 ~- ei: {ti(~i)/O~}Oa for i = 1,... ,k F;A l- ( t y p e r e c f : o of tl(~-T) ::~ el ... tk(~"k) ::~ ek) :ff

(TYREC)

The rule demonstrates that the t y p e r e c defines a polymorphic function of type V~ <: p.o. The kind constraint on c~ restricts the domain of applicability of this function, to types for which the cases in dynamic type dispatch are defined. Since the t y p e r e c in general defines a recursive function, this domain kind constraint must also be recursive. Besides the obvious computation rules for the other constructs, the t y p e r e c has this computation rule: e ti(~)

)

{~/~,e/f}ei

where: e = ( t y p e r e c f : f f of tl(~T) ==~el I... irk(~'~) ==~ek) For example, the following defines a function that can be applied to integers, references (no matter their element type), lists of integers, lists of references, lists of lists of integers, and so on:

t y p e r e c f : (V(z <: (uK.int U r e f ( T ) U list(K)).0c ~ (x) of int ~ ~tx : int. intPlus(x,x) iref(~) $ ~x:ref(~).x ilist(~) $ ~xs:list(~).map (f ~) xs The first clause defines a function of type int-~ int. The second clause defines a function of type Vo~ < : - l - . r e f ( ( x ) - - ~ r e f ( ( x ) , where the type variable oc is unconstrained because no operations are defined on the element type of the reference cell. The third clause defines a function of type Vo~ <: (/l~.int U r e f ( T ) U l i s t ( ~ r ) ) . l i s t ( c x ) --+ l i s t ( ( x ) . In this third clause, the fixed point f of the t y p e r e c is applied to the list element type, and hence the

278 element type c( in this clause is constrained by the declared domain kind of f. The domain of the typerec is then s U t e l ( T ) U l i s t ( p K . i n t U r e f ( T ) U Zist(K)). By the fixed point unrolling rule for kinds (pK.p = {(flK.p)/K}p), this is equal to

//K.int U t e l ( T ) U list(K). Harper and Morrisett 20 and Morrisett 27 present a calculus, ~,/MC,that includes a t y p e r e c construct for dynamic type dispatch by recursing over type descriptions. The motivation for their framework is in using dynamic type dispatch for type-based compilation based on transformations of data representations. The most important difference in the approaches is our provision of "refinement kinds" that refine the structure of T, the set of all simple monotypes. Harper and Morrisett assume that all uses of dynamic type dispatch are total (defined for all types). Morrisett 27 describes an approach where a "characteristic function" F can be defined for the domain of an operation that uses dynamic type dispatch, using Typerec. F(x) = void, the empty type, if "t contains any type constructor outside the domain of the operation, and F(~) = "c otherwise. Beyond the fact that type inference is hard or impossible with the T y p e r e c construct, there is also the problem that this approach does not prevent instantiation of an operator with a type outside its domain kind. For the example above, the function would be instantiated to type v o i d --~ v o i d when applied to the string type, under the approach described by Morrisett. This is not as precise as preventing the erroneous instantiation of the function in the first place. Dubois et al. 11 have considered another approach to dynamic type dispatch, with different guarantees of type correctness relative to this and other work. Essentially they use dynamic type dispatch to provide unsafe operations such as a C-like p r i n t f function and variable-arity procedures in ML, as an alternative to Haskell-style parametric overloading. Their type system only distinguishes between "static" and "dynamic" type variables, the latter being variables which need to be instantiated with run-time type arguments. They also provide a static check for the exhaustiveness of the dynamic type dispatching code. However this check is not formalized in a type system. Furthermore it requires abstract interpretation of the entire program, and so is inapplicable for separate compilation of reusable software components; type checking of uses of overloaded operations is done at link-time. Once we have bounded universal types V(x <: p.(~, an obvious next step is to consider bounded existential types 3o~ <: p.x. These are useful in the sequel, so we add them here:

e ::. . . .

I pack~a<:p.x(~,e)

I open el as pack3a<:p.x(o~,x) in e2 I

n a r r ow3a<:p.tz,3~t<:p,.c ~( e )

x:: . . . .

I 3a<:p.x

Kind inclusion induces a type widening rule for existentials: (3(x <: p.x) <: (3~ <: p',x) if p <: p'. The narrow construct allows us to narrow a value of existential type to a more specific type. All type failure in our framework is isolated to the n a r r o w construct.

279

3

Primitives for Marshalling

We now consider how to extend the language introduced in the previous section with primitives for marshalling and unmarshalling data. We name the language introduced in this section XML~yn. An obvious first choice for marshalling operations is: extern intern

: V~.(x*port -~ u n i t : V~.port -+ (x

There are some problems with this approach. These operations are not total (for example, in general it is not possible to marshall native-code functions in a heterogeneous environment), but this partiality is not captured by the above types. Invoking marshalling may therefore lead to a run-time type failure that should have been caught at compiletime. One approach to this problem, in a language such as Haskell with parametric overloading, is to define a type class for marshalling: class Extern((~) w h e r e e x t e r n : c(, p o r t -~ unit i n t e r n : p o r t -~ (x

This is similar to the approach taken with Java 17, where only objects that implement the S e r i a l i z a b l e interface can be marshalled. There are several advantages to this approach. Applications of e x t e r n to types for which no marshalling operation is available are detected statically in the Haskell type system. This framework allows the programmer to define her own marshalling operations for a type, as instances of this class. Finally the compiler can automatically generate specialized marshalling operations based on combining instances of these operations. Rather than using type classes to restrict the domain of marshalling operations and dispatch the operations, we instead rely on the approach to dynamic type dispatch summarized in the previous section. The precise relationship between this approach and type classes is developed in another paper 14. We choose this course because the framework of dynamic type dispatch is the basis for an approach to computing with dynamically typed values, that overcomes several problems with the traditional approach to computing with dynamically typed values in polymorphic languages. This is explained more fully by Duggan 12. Nevertheless if the marshalling primitives are expressed using (an extended version of) type classes, it is possible to adapt our semantics based on dynamic type dispatch to this situation. Our second reason for deviating from the type class approach is that the E x t e r n class above does not ensure that an instance of the i n t e r n operation is in agreement with the corresponding e x t e r n operation on what should be the external representation "on the wire" of the type. This is also an issue with the S e r i a l i z a b l e interface in Java. Our approach is to define marshalling operations as pickling and extraction operations that map to and from an external representation type. Marshalling a data value consists of first transforming it to the corresponding external type, then using the built-in marshaller to pickle the value. Unmarshalling a data value consists of first unpickling a value from external storage, then using the extraction function to obtain a copy of the original pickled value. An attempt to express this in the parlance of type classes is given by:

280 class Pickle(ix) where pickle : ~ - + pickleType((~) extract : pickleType(~)-+(~

e x t e r n and i n t e r n can then be represented as ordinary (non-overloaded) functions, of type Vot.Pickle (ct) =r o r , p o r t -~ u n i t and Vet.Pickle (or) ~ p o r t -4 (x. The intention is that p i c k l e T y p e be a function that maps from a type to the corresponding representation type. Each instance for the P i c k l e class should then specify a case in this type function for transforming the instance type. In general this type function must be applied recursively to the element types for a collection type. Essentially we need a construct analogous to the T y p e r e c construct introduced by Harper and Morrisett 20 and Morrisett 27. In general implementing type inference in the presence of T y p e r e c appears difficult or impossible. However for the special case of user-definable marshalling, we can make use of a construct similar to T y p e r e c internally, while providing type inference in the external language. In order to introduce our semantics independent of any particular message-passing operations, we make pickles explicit in the language as dynamics 2, 1, 25, 24, 7. A dynamic is a bundling of a value and a type descriptor for that value, into a single value of type dynamic. A t y p e c a s e construct allows the tag in a dynamic to be examined, and the bundled value to be extracted in a type-safe way.

O = VK <: DOMAIN(FI).V(x<: ~r --+ Dynamic(K) H;F;A F dynamic : O r F p <: p' r I'- Dynamic (p) <: Dynamic (pl)

H; F;A F e :Dynamic (p) F F p' <: p II;F;A F narrowp,p,(e) :Dynamic (pr)

n;r;AFe~:(v~<:p.(t~x) (~r H;F;A F e2 : Dynamic(p)

n;r;A

F typecase(el,e 2) :T

(DINTRO) (DWID) (DNAR)

(DELIM)

~t2.(t I F-+t2) 6 rl H; F, z <: Z,~. <: Z, ~ <: ( r --1' T);A, ( f : V a <: K.a--r p(a)) F e l : t l (~;,) -~ t2(}(~,,)) ll; r, r <: T, ~'-.n<: R,~ <: (T --)' T);A, (g : V~ <: K.I~(o0 ~ ft.) F e2 : t2(~(~n)) "-'1"t! (~.) I-IU{tl ~-~t2};F;A F e : x H;F;A I- (defdynamic tl =~ t2 with (f.el,g.e2) in e) : (DDEF)

Fig. 1. Type Rules for Dynamics in XMLayn

A dynamic is essentially a data algebra inhabiting an existential type 3o~ <: T.tx. The refinement kinds introduced in the previous section motivate an obvious general-

281 ization of dynamics to safe dynamics, originally introduced by Duggan 12. A safe dynamic type Dynamic (p) exports a refinement kind revealing the structure of the encapsulated type; semantically it is a bounded existential type 3tx <: p.tx. The constructs for safe dynamics are given by: x :: . . . .

I Dynamic(p)

e:: . . . .

I dynamic

narrowp,p,(e)

I typecase(el,e2)

narrow is the only operation where type failure can arise; it allows us to refine the kind of a safe dynamic (for example, from T to i n t t_Jr e a l ) . In the expression t y p e c a s e ( e l , e 2 ) , e2 is a safe dynamic of type Dynamic(p), while el is a polymorphic function with type Voc <: p'.~. In the semantics of the t y p e c a s e , the dynamic value e2 is unbundled and the polymorphic function el applied to both the type and value components of the dynamic. Provided p <: p' (which can be checked statically), this application of dynamic type dispatch is guaranteed not to encounter run-time type failure. The type rules for safe dynamics are provided in Fig. 1. The dynamic operation creates a dynamic value. There are several possible typings for this operation:

dynamic dynamic dynamic dynamic

: : : :

V(x.~-+ Dynamic(T) V~ <: p.~--r Dynamic(T) V~ <: p.o~--+ Dynamic(p) V~c<: p.V~ <: ~.~ --+ Dynamic(~)

The latter three of these types define dynamic as a polymorphic operation whose argument type variable is constrained by a kind describing the set of types for which dynamic is defined. The second type is sufficient if we are not concerned with safe dynamics (only full dynamics). The fourth rule gives the most precise typing if we are interested in using safe dynamics. For example, suppose dynamic is only defined for integers and reals. Then the following type-checks:

(dynamic int

int

3) : Dynamic(intJstring)

This following point is worth emphasizing: Our semantics for user-definable marshalling can be adapted to work with any of the above possible typings for dynamic, and with or without safe dynamics. Although we use safe dynamics and the fourth typing for dynamics, our semantics for user-definable marshalling can be adapted fairly easily to the following types for the dynamic operations:

dynamic~ : ~ -~ dynamic typecasez : dynamic-~ Refinement kinds and safe dynamics ensure static checking of the uses of dynamic type dispatch. Using the latter form of dynamic operations amounts to foresaking this static checking in the programming language. However such static checking might still be used internally, in an analogous manner to the use of refinement types and soft types 16, 32.

282 The construct for attaching user-defined marshalling and unmarshalling operations to the operations for building pickles is given by: e :: . . . .

I defdynam•

I; 1 =r

with

(f.el,g.e2)

in e

The type rule for this construct is given by the DDEF rule in Fig. 1. This construct has all of the elements of user-defined marshalling operations that were discussed earlier in this section, t2 denotes the external representation for the type t l . A use of this construct must specify a clause in the definition of the external type representation function, p i c k l o T y p e . The following clause is added to the final definition of this function:

pickleType(tl (cq,..., ctn)) = t2(pickleType(oq ),. . . ,pickleType(otn) ) The static semantics of XML dyn uses type judgements of the form H;F;A ~- e : ~. The environment FI carries information about clauses in the type pickle function that have been contributed by uses of the defdynamic. I1 contains pairs of the form t l ~-+ 1;2, representing clauses in the type pickle function. H is used to define the domain of the dynamic operation, used in the DINTRO rule:

DOMAIN(H)=/aK.t I (~)U---m t I (~,) where II = {t{ ~+ t ~ l i = l , . . . , k } el is the pickling operation for the t l type constructor; el has type <: ~.tl (~-#) ~ t2 (pickleType(o~,)), where ~: is a local rigid "kind" variable in the scope of the definition of el and e2. e2 is the corresponding extraction operation, of type <: ~:.t2 (picklerype(ctn) ) --~ t 1(-~n). In defining el, the programmer in general will need to transform values of type cti to type pickleType(cti). This is provided by the local variable f introduced by the construct, bound to a function of type Vct<: K.ct ~ pickleType(o O. Note that within the definition of el, each ~i is constrained from above by ~; this restricts the application of f to values of type tx1,..., tx,. A similar explanation is given for the definition of e2. f (in the definition of el) and g (in the definition of e2) represent the final fixed points of the pickling and extraction functions that are built up using the clef dynamic construct. In typing el and e2, reference must be made to the final fixed point of the type representation function pickleType. This reference is represented by the type constructor variable 3 that is introduced locally by the d e f d y n a m i c construct. At the use site for the dynamic operation, the clauses contributed by the d e , d y n a m i c construct are joined to form the type representation function pickleType. The pickle and extraction operations are also formed by joining the instances contributed by def dynami c, to define the operations of type: Vtx <: p.ct --+ pickleType(o 0 Vtx <: p.pickleType(o 0 ~ o~ where p is the domain of tiynam• described in the next section.

These operations are used to build a pickle, as

283

4

Semantics for User-Definable Marshalling

In this section we consider the operational semantics for user-definable marshalling. The language X M L ~ yn introduced in the previous section was an extension of X M L dyn. In this section we introduce another extension of X M L dyn, and then map the extensions dyn 9 o f X M L n into this latter language. We name the new language XML~ yn. This language adds one new construct to X.M~ dyn, a type-level T y p e r e c construct: "c :: . . . .

I T y p e r e c p ~ z tl(~)

::::~'171 I''" I tk(~k'k) ::::~'lTk

This is essentially the T y p e r e c introduced by Harper and Morrisett 20. We use a slightly simpler form of it, supporting iteration rather than recursion (this is sufficient for our purposes). We keep X M L dyn and XMI_~yn separate because type inference is possible with an implicitly-typed version of X M L dyn, whereas type inference appears difficult or impossible with the T y p e r e c . X M I ~ yn is only intended to be used as an "internal" language, into which source programs (in X M L ~ yn) are translated. The type rule for the T y p e r e c is given by: p=

K.tl (FT) u - . . u tk(

)

r , ~ i <: {p/lc}pi F "ci <: % for i = 1,... ,k

(TYREc)

The type-level computation rule for the T y p e r e c is given by: e(ti(e,,...,en,))

, {e(e~)l~,...,e(eo,)l~,~,}x,

where: "~ = ( T y p e r e c p ~ z tl(O~l) m 1:11"'" I tk(O~knk) =:~ 'Ok) We can define the notions of canonical f o r m s for terms and types (v and a), respectively), and evaluation contexts for terms and types (E and T , respectively), in a fairly standard manner. A term e is f a i l e d if e is not a (term) value, and e - Ee' where e' - narro~rp,p, (pack(Y, v)) for some v and ~, and e' is not a redex. A term e is f a u l t y if e is not a value, e is not failed, and e = Ee' or e = Ex where e', x are not redices. A type x is f a u l t y if x is not a (type) value, and x = Tx' or x -- Te where e, x' are not redices. A term or type is closed if it has no free variables, e ~" denotes that the evaluation of e loops infinitely i.e. there is an infinite sequence e ----+ .-~ ei ~ ei+l > "" ".

Theorem 1 (Semantic Soundness). 1. I f F ; A F e : x, then e ~, o r e > e' where e' is f a i l e d or e' is some value v. 2. I f F F x <: X then x ~, or x ----> "of o r some value ~).

To define the translation from X M L ~ n to XMI_~yn, we start with the following translation of types in XML~Yn:

284 r~; ~A u ENV(I I) I-- e : ('fir <: p.V(z <: ~'.~ -1. Dyn~tmt c (K')

(DINTRO)

where: xpkx = PICKLE(H) and p = DOMAIN(H) opkl =Vet <: p.~ ---~X~kl(~t) and 6,xt = Vet <: P.'~pkl(~) --",~t

epk~ = typ,r,c f : op~ o( t (~) ~ (AI P ~P~ f ~) I... I t~(n) ~ (A~ P x~w1f ~) eoxt = typerec f: O.xt of t I (~) ~ (gt P Xpklg ~1) l"" It~(~) =~ (gt~ O Xpkl g ~) e = AK.Ac(~it.Lx.pack((~it,pack(Zpk1(c(ult), (epkl Jet.It (x),e.xtc(~it)))

q; A~ UENV(II)F el : Vc( <: p.o' q; ~A UENV(I1)F e2 : Dynamic (p) FJ;HueNv(n)

F ~ : ~q

(DELIM)

where: e = (open e2

pack(ffgkl,pack(~,(x, extract))) in el lit} (extract x))

as

i( <: T , ~

<: i(, l} <: T -+ T } ; IA, ( f : VGI <: l(:.a e 13(~()) U ENV(I1) l-

~r, ic <: T , ~

<: K, 13<: m -4, T } ; ~A, (g: V~ <: ic.l}(~ ) e ot) U ENV(I1) I-

r,

I;AUENV(IIU{tl ~ t2} ) F e: 1: I q ; IAl ~ e' : q (DDEF) where: e ' = l e t ft~ =

AK.A~.kf.A~,.el in l o t gtt = AIcA~.~.g.AEffn.e2 in e Fig. 2. Translation of Dynamics in XMLd
t(z,,...,z,))

= t(H,...,H)

~,~ ~ (*2 = ~ d ~ ~2lI Ivy<: v.~ = v r < : vJ.E~ The last case in this definition is the real point of this translation, i.e. dynamic types are translated as existential types. The data algebra for a dynamic now has two witness types: ~ is the external type o f the value that has been bundled in a dynamic, with the external kind constraint p which reveals some of the structure of the type. 13 is the internal pickle type, encapsulated by the existential type quantifier, and with no structure revealed by the kind witness constraint. The dynamic contains two values: the pickled copy of the value that has been bundled in the dynamic, and an extraction operation for converting from the pickle value back to the original value. This extraction operation is bundled in the dynamic when it is created, and is invoked when the dynamic

285 is unbundled. As such we refer to these as self-extracting dynamics. The translation of kinds is simply the identity: p = p. We also have:

q

= (Y <: Iv I (Y <: v) E r}

,4

= {(x: ff)

(x: O) E A}

The interpretation of X1VILdyn is Xl~'lqyn is defined by induction on type environ. XML dyn ments in the former. A type derivation for the judgement H;F;A F e : ff in n is used to construct a program e' in XMI-~yn, with correctness given by the following: 9

dyn

Theorem 2. IfH;F;A F- e : t~ m XML n , with e' the program in XlVIqyn constructed

based on this type derivation, then 11; A U ENV(I-I) F e' : tY. ENV(H) denotes the types of the pickling and extraction function fragments defined by uses of degdynam• c. This metafunction is defined by:

PTYPE(t I, t 2 ) = VK <: T.VI} <: T --~ T.(Vo~ <: I<.e --~ 13(1~)) --~ (V~-~n<: ~.tl (~-~n) --~ t2(l~(e,,)))

ETYPE(tl,t2) = Vl~ <: T.V~ <: T -e T.(Vet <: K.~(0~) -~ Or) --> (V-~n <: K.t2(~(0t.)) -4 t l ( ~ ) )

ENV(H) = {ft, : PTYPE(t!,t2),g~, :ETYPE(tl,t2) I (tl ~ t 2 ) E I-l} Each use of defdynam• defines two functions, ftl and gtl, that represent clauses in the definition of the pickling and extraction functions. We maintain these functions in the environment, extracting them from the environment when they are needed. PTYPE(t 1, t2) denotes the type of a clause for the pickling function, for the case when values of type tl(~ ) are pickled to type pickleType(tl(~)) = t2(pickleType(x)). The function ftl abstracts over the domain kind K of the final pickling function, the final definition 13of the pickle type function, and the fixed point of the pickling function itself9 A similar explanation can be given for ETYPE(t 1, t2). The metafunction ENV(I-I) generates a type environment for those instances of these functions that have been defined9 Fig. 2 gives the cases for the translation of the dynaraic constructs. The case for the defdynam• DDEF, is fairly uninteresting9 This simply builds the functions ftl and gt2. The main work is done by uses of dynamic, given by the DINTRO rule9 At the use site for dynamic, the final definition of the pickle type function is constructed, defined by:

PICKLE(H) =

... I t (e) where II = {t~ ~ t& l i = 1,...,k} Typerec

t l ( a ) =#

The pickling function is constructed by using the t y p e r e c construct to assemble the various clauses defined using defdynamic. Each such clause, say of type PTYPE(tl,t2), is applied to p = DOMAIN(H) and Xpk~ = PICKLE(H), giving a function of type

(A, P X:pkl) E (V~ <: ID.~~ 1:pkl(~)) ~ (V-~-nn<: Pnn.tl(~n) ~ t2(~pkl(O(n)))

286 Let f be the fixed point of the function defined by the t y p e r e c , then we have:

(AI P "Cpklf ) E (V'&'~n<: ~nn.tl(~nn) % t2('~pkl(13~n))) This clause of the t y p e r e c is defined to be:

(tl(~nn) ~ ft I P "~pklf ~---~n) E (gl(~nn) ~ t2(~pkl(O~n))) By the definition of Xpkl, this latter type is equal to (t 1(~--~) --~ Xvkl(t 1(~nn))). Therefore by the TYREC rule, the function epk~ has type Vcc <: p.cc --r Xvkl(CO). The translation of dynamic is a function that takes a type 0~it and a value x of type co,it. The pickling of this value consists of creating the pickled value (epkZ C~it X), of type Xpkl(O~it) (the external representation type). The extraction function eo.t, of type Vc~ <: p.Xpkl(C~) ~ Ct is constructed in a manner similar to epkl. The expression (e,xt oc~t) gives the extraction function specialized to the type ofx. The resulting pair of type

T~pk:l.(l~it) * (T'pk.(lff~it)~(~it) denoting the pickled value, and an operation for extracting the original value from this pickle value, is then encapsulated in the data algebra for the dynamic that is constructed, with existential type Dynamic(p) : qcc <: p.313 <: T.13* (13-+ oc).We call this value a self-extracting dynamic. The translation of the t y p e c a s e is reasonably obvious: the data algebra for the dynamic is opened, the bundled extraction function is applied to the pickle value, and the typecase function is then applied to the resulting extracted value.

5

An Alternative Semantics for User-Definable Marshalling

F;A F dynamic :A(dynamlc) A(dynamic) = VI~ <: (#uK.p).Voc<: It.Co~ Dynamlc(l~)

(DINTRO)

tl ~ tc(p)

r,l~ <: T , ~ <: 1~,1 <: (T -, T)};A, (./: Vot <: l~.ft --~ I~(Ct)) F el : t l ( ~ ) --r t:2(~(ct.)) I',1r <: , ~ <: Ir <: ('I -4 T);A,(R : Vet <: K.p(~) ~ ct) F e2 : t2(P(ct,)) -4 ti(~;;) r;a, (dynamic : (Vt: <: (plc.pU t l (~)).VCt <: K.0t--~ Dynamic(1r F e:x U;A I- (dofdynamic t I =r t 2 with (f.el,g.e2) in e) : (DDEF)

Fig. 3. Type Rules for Dynamics in XMLdY_ n

The approach to user-defined marshalling provided in the previous two sections was facilitated by the FI environment in the static semantics for XML dyn. The disadvantage of this environment in the semantics is that the clauses of the external representation

287 type (the p i c k l e T y p e function) are exposed in the environment. In this section we consider the repercussions of abstracting over this external representation type function. In this section we demonstrate how this may be done. With this alternative approach, the semantics no longer requires the H environment recording external representations. Instead the type of dynamic is recorded as a type of the form VK <: p.Vot <: r . ~ -+ Dynamic (K) in the type environment, with the domain p providing the only information about the abstract type representation function. This approach admittedly brings with it some complexity in the internal language. In particular our internal language requires both coproducts and general recursion at the type level in order to type our semantics. To ensure equational consistency, we require that type functions are strict 6. Fig. 3 gives the type rules of XML dyn that are modified with this alternative ap9 9 dyn proach. We name this varmnon XMLn_. t c denotes the outermost type constructors of a kind: t c(t_l (P-T)O.--tO t__m(~mm))= {t 1,..., tin}. We concentrate in the sequel on the translation semantics for this language. The basis for our semantics is a new language, XML A. XML A is formed by taking XML dyn as defined in Sect. 2, omitting the t y p e r o c construct, and adding the following constructs: e:: . . . . a:: ....

abortx t(~n:pnn)==~e A~x<:p.x

el@e2

cl(e)

tyrec(va<:p.a ) e

The type rules for these constructs are provided in Sect9 5. The construct t(cxl : Pl,... ,cx, : p,) ~ e is used to define the individual clauses in the definition of a typecase. The construct el 9 e2 is used to combine these clauses. The resulting combination is a polymorphic function with non-T domain kind, that uses run-time type discrimination with respect to its single type argument, tyrec(va<:p.a) e denotes the fixed point of a recursively defined polymorphic function. This fixed point operator is necessary because the typecase defines a function that computes by recursing over its type argument. The special type Ac~ <: p.x is used to type the composition of a collection of clauses that make up a typecase. The operation cl(e) closes up a typecase definition to an ordinary polymorphic function9 The fixed point operator for kinds/.nc.p is used to define recursive domain kinds, while the fixed point operator for polymorphic functions t y r e c a e is used to define recursive dynamic type dispatch. To see how the t y p e r e c can be translated into this language, consider that the t y p e r e c construct in XML dyn has the form: t y p e r e c f : o of t l (~i) :=~ el I... I tk(~-k) =r ek where o = V~x <: p.o ~ and p = M1r O~_k(P'k). Assuming e~ is the translation of ei from XML~Y_n into XML A, then the translation of the clauses of the t y p e r e c is given by:

cl((tl(~i- : {O/~:}Pl) ~ e~) ~ . . . @ ('ck(~-~ : {P/K}Pk) ===>e~)) 9 (V(x <: (t_l({p/~}pl) U... U~({p/K}pk)).E(r') = (V(~ <: p.o') Abstracting over the fixed point, and then using the fixed point operator for polymorphic functions, gives the translation of the t y p e r e c : t y r e c ( ~ , f , cZ((1B1(~~ : {p/K:}pl) ===>el) ~ . . . (D (tk(~-# : {P/~}Pk) ===>e~)))

288 XML dyn incorporates a monolithic t y p e r e c construct that is the basis for defining dynamic type dispatch. All clauses in a t y p e r e c are defined at once. In XML 6, by contrast, the clauses in a function using dynamic type dispatch are defined as independent program fragments, of the form t (~ : ~) ===~e. The @ operation combines these clauses, and the cl(e) operation forms this collection of clauses into a polymorphic function. The reason for taking this approach is that the clauses of the pickling and extraction operations in the implementation of the dynamic operation are contributed by independent uses of the d e f d y n a m i c construct. There needs to be some way of combining these clauses at the use sites for the dynamS.c operation. The approach pursued in the previous section was to carry the individual clauses as polymorphic functions in the environment, and then use the t y p e r e c construct to combine them at the use site. The problem with this approach, with the static semantics of XML~IY_ n, is that it does not help us with the combination of the clauses of the pickle type function, that are also contributed by uses of the d e , d y n a m i c construct. The approach we adopt is to extend the approach for dynamic type dispatch in the term language of XML ~, to the type level. In other words, we now allow type functions at the type level, which discriminate based on their type argument, giving a form of z y p e c a s e for types. We also add a recursion operator at the type level to define the fixed points of such recursive type functions. We extend the syntax of types and kinds in XML 6 with:

(el,e2) I hi(e) I ~2(e)

e:: . . . . Xl~'C2

abort

a::=...

I o*cr

~:: . . . .

I ,r

cl('t)

I tyreczl,X2('c )

I 3~<:p.~

Figure 6 in App. A provides the kind rules for functions, recursion and dynamic type dispatch at the type level. These essentially repeat the corresponding type rules for programs. The TYABS and TYCASE rules include the proviso that the variables introduced by L-abstraction and type-casing occur in the body of the type operator (so type operators are a variation of the ~,l-calculus). We provide the kind rules as congruence rules for the equality relation on type operators, omitting the obvious reflexivity, symmetry and transitivity rules. Figure 6 in App. A provides the conversion rules for type operators. The TYCASECOP rule characterizes union kinds as coproducts, and allows reasoning-by-cases when verifying the equality of type operators defined over types of union kind. The TYFIXBETA rule allows folding and unfolding of fixed points at the type level. These rules are necessary in Theorem 3 when verifying that the translation of programs in XMLdY_n preserves well-typedness in XML A. The El-calculus restriction in the TYABS and TYCASE rules is necessary in order to preserve the equational consistency of the type system 6. Proposition 1. The equality theory for types in XML a is consistent. PROOF SKETCH: We give an interpretation for types using Scott domains, where union kinds are interpreted as separated sums and type operators are interpreted as strict continuous maps 6.

289 The computation rules for terms include the computation rules: tyrec(vc~<:p.(~) e

c1(....

(t(ctl :

: pn)

> e (Act <: p.(tyrec(vc(<:p.o) e) ct)

el) e . . . )

>

The reduction rules for types are obtained by orienting the TYFUNBETA, TYCASEBETA and TYFIXBETA rules in Fig. 6 from left to right as rewrite rules. We do not include the extensionality rules TYFUNETA and TYCASECOP in this rewrite system. In recent work on rewrite systems, these extensionality rules are oriented from right to left, as expansion rules rather than as reduction rules 9, 10. Expansion rules, and their complications,do not appear appropriate in a run-time evaluator for a programming language. The translation of types ~, kinds p and kind environments F of XMLdY.n into XML A is similar to the translation of XML dyn into XML dyn. For type environments we have: .4

= {(x: ~0) I (x: O) E A, x # dynamic}

u{(fdyn : (Vl(.~ctpkl< : Ppkl.%in* ~out)) (dynamic :(~dyn)9 A} where Odyn : V~: < : (M~.p).Vct < : ~.ct - ~ O y n a m i c ( ~ ) , and

Ppkl = (1r --~ T) --> Act <: p.T 1:in = V~ < : K --> T.(VCt < : 1C.ct --). ~(ct)) --).

(act <: p.ct --). tycase(O(i,kl ~)(ct)) 1:o,, = V~ < : ~ --)' T.(Vct < : 1c.~(ct) --). ct) --).

(act <:

IS)(ct)

ct))

Essentially the dynamic operation consists of three parts: a type function ctvklmapping from external types to internal pickle types, a pickling function pickle mapping from values of an external type to the corresponding internal pickle type, and an extraction function extract mapping from values of an internal pickle type to values of the corresponding external type. pickle constructs a pickled value, while extract recovers the original value from a pickled value. Therefore the dynamic operation is represented in the environment as a data algebra inhabiting an existential type. This existential type is parameterized by 1c, the fixed point of the kind constraining the domain of the pickling operation. The pickle type function is parameterized by its fixed point (of kind ~c-e T). The pickling functions are parameterized by the fixed point of the pickle type function, and the fixed point of the pickling operation; similarly for the extraction operation. These fixed points are left open to further extension by the defdynami c construct. Theorem 3. If F;A t- e : o, then F; A t- e' : o, extracted using the algorithm in Fig. 4.

where e' is the translated program

290 A(dynamic) = Vif <: p.Vo~<: If.Ix ~ Dynamic(if) "Cpkl

TYCLOS(o~,kx) e.~t = CLOS(extract Xpkx) o~it

=

e~k~ = CLOS(pickle XpkX) ~ i t

e~n = ~,.ck(0~,~,.ck(x~kx(a.~),(e~l(~),e..~))) r; A 1 I- e' : A(dynamic) (DINTRO) where;

e'--'-(AIf.Aff~it.~.opon

fdyn~UIf.p as pack(ff~kl,(pickle, extract)) in edyn) H ; | A ~- e, :Va <: 0.Ia' F;A I-"e2 : Dynnmic (p)

~-(open e2 as paek(o~kl,paek(~,(x, extract))) in el a (extract x)) : r i (DELIM)

';A

A(dynamic) = Vif <: (/Jif.p).V~ <: If,~ --~ Dynamic(if) q , If <: T,~n <: If, ~ <: T -r T}; ,4, ( f : Va <: If.a --~ ~(a)) ~- el : t l (~nn) -~ t2(~(a)) q , If <: T, ~nn <: If, ~ <: T --~ T}; ,4, (g : Vet <: If.~(a) --r a) ~- e2 : g2(~(ct)) -~ tl (~nn) epkl = A~.).f.(pickle ~ f) ~ (g! (~n) ~ el) e.xt = A~.~g.(extract ~ g) ~ (t2(~nn) ==r e2) %kl = Xl3.Ct(~) = . clC~,k113)(t(~))) ~ (t, C~) = ~ t~Cl~Ca))) edr~ = p~ck(xpkl, (epkx,e..~)) |1"1; ,4, (dynnmic : (VIf <: (~u~c,pU t i (Y)).VIx <: If.or --~ Dynamic(If))) t- e : r l r ; i a i F e' : |~ (DDEF) where:

e ' - ( l e t fd~= (Ale.open fdynIf as pack(ff~kl,(pickle, extract)) in edyn) in e) 9

.

9

dyn

Fig. 4. Translation of Dynamics m XMLn_

Proof By induction on the translation of the program e, which in turn is defined by induction on type derivations in XML~Y_n. The cases for DINTRO and DDEF are nontrivial, because the type pickle function is encapsulated in the existential type for the d y n a m i c implementation. We therefore rely on certain equality rules, the TYCASECOP and TYFIXBETA rules, in order to reason about the encapsulated type transformation. We consider the case for DINTRO first of all. The translation for the d y n a m i c operation constructs a polymorphic function from the data algebra for d y n a m i c in the environment. This function abstracts over K and Ix~it, the kind and type arguments in any application of dynamic. The following metafunctions are used to build the body of the polymorphic function:

TYCLOS('c) = t y r e e (kix.cl('c Ix)) CLOS(e) = t y r e c ( k f . c l ( e f ) )

291 The witness type 0q,kl in the data algebra for the implementation of dynamics has kind PpkZ = (0 C --~ T) ~ A(X <: p.T). Instantiating ~c with the domain kind MK.p, and closing the (type-level) typecase in this type function, produces a functional ~.Cl(0~pkl 0{) of type

((MI~.p) -+ T) ~ (0ug:.p) ~ T) and the fixed point operator for types, applied to this, gives the transformation function Xpkl that maps external types (in the domain of the dynamic operation) to internal pickle types, o~it is the type of the value being bundled (a use-site type argument to dynamic). Then Xpkl(0~it) denotes the type of the argument to dynamic once it is pickled. The data algebra also contains a pickling operation pickle of type xi~. This polymorphic function is parameterized by the fixed point ~ of the witness type function (Xpkl. Instantiating K with Mr.p, and instantiating ~ with ~kl, and closing up the type case for the body of the pickling function, produces a functional )~f.cl(pickle ~kl f ) of type:

(v~ <: (~K.p).~ ~ xp~(~)) ~ ( w <: (~.p).~ ~ ~i(%~ xp~)(~)) Using the TYFIXBETA rule we have the equivalence:

(CI(%kZ (tyrec(Zl3.cl(~k~ 13))))) < > (tyrec(Zl3.c1(%kz 13))) Then using the congruence rules in ~,f.cl(pickle XpkZ f ) has type:

Fig. 6 in App. A,

we have that

(vo~ <: (n~.p).~ ~ "r.r,kl (oc)) -+ (voc <: (~.p).~ ~ Xpk~.(O0) Taking the fixed point of this functional gives a function epkl with type:

V(~ <: 0L/~-p).0{~ Xpkl(Cf,) which is the expected type of the pickling operation at the use sites for the dynamic operation. The data algebra also contains an extraction operation extract of type Xo~t. Applying the same specializations as with the pickling operation, we obtain a function e,~t of type: W <: (~.p).Xpk~(~) ~ which is the expected type of the extraction operation at the use sites for the dynamic operation. We now consider the case for the DDEF rule. We need to verify that the extension of a dynamic implementation is well-typed. In the context of the definition of the extended type function, we have

0~pkZ<: ((K -+ T) ~ A~ <: p.T), 13<: K ~ T Therefore we have:

(t(N) ~

c1((~pkl l~)(t(~))) <: (A0( <: p.T)

292

(tl (~nn) =

I;1(3(00)) <: (Aot <: t l ( ~ ) . T )

Therefore using TYJOIN and TYABS we have:

Xpkl <: ((K: --> T) --> Ao~ <: p U t(~).T) Now consider the well-typedness of the definition of the extended pickling function. The environment contains the constraints and types: %kl <: ((~ --> T) -+ A a <: p . T ) , 13 <: K -+ T

pickle

: xin, f : (Vtz <: 1~.~ --+ 3(t~)

Then we have

(pickle 13

f ) : (At~ <: p.tx --+ e l ( % k l 3)(iX))

We have cl(Xpkz 13) <: (p --+ T) where p : mr(K). By the TYFUNBETA and TYCASEBETA rules, we have:

cl(1;pkz 13)(t(~)) < > c1(%k1 13)(t(~)) <: T for all t 9 {~}. By congruence we have:

cl(t(~) ~

cl(1;pkl I~)(t(~)))(O 0 (

> c l ( t ( ~ ) ===~ cl(l:~k 1 13)(t(~)))(O 0

for ot <: p. Therefore by TYCASECOP we have:

cl(~l

I~)(~) < > ~ l ( ~

I~)(~)

Thus we have

(pickle 13

f ) : A(z <: p.o~ --->cl(xpkz 3)((z)

We also have the series of type judgements: el : tl(~-'-n) --->t2(~(~n))

el : t ~ ( ~ ) ~ c1(%1 I3)(t1(~)) (t~(~) ~

e0 : ~Xa <: t l ( ~ ) . a - + cl(zpkl

~)(a)

where the step from the first to the second judgement follows from the definition of Xpk~, and using TYCASEBETA. Then combining these with TMJOIN, we have:

((pickle $

f) G (tl(~)

---4- el)) :

The new pickling function is then formed by abstracting over f and ~:. The welltypedness of the extension of the extraction operation is similar.

293 The major point of difference between XML a and the framework of Harper and Morrisett 20 is the formulation of the ~,-calculus at the type level. The latter use a simple typed ~,-calculus extended with bounded recursion over the free algebra generated by the type constructors. They are able to verify strong normalization and confluence for their calculus, and are then able to perform type-checking on the intermediate language of their compiler. Strong normalization also provides a definitional notion of equality for their calculus. By contrast, and as already noted, the abstraction of the pickle type function in our semantics means that we require a stronger equality theory for types in order to type our semantics: specifically, the TYFIXBETA rule for the DINTRO rule, and the TYCASECOP rule for the DDEF rule. Because our type calculus combines fixed points and coproducts, we add the ~d-restriction (requiring that all type operators are strict) to ensure equational consistency. Without the fixed point operator, Dougherty 10 has verified strong normalization for a ~,-calculus similar to our calculus of types, with the TYCASECOP and TYFUNETA rules oriented as expansion rules. Dougherty also verifies confluence for base types, and these properties still hold with bounded recursion. These results suggest that, if desired, we can at least expect to have a type-checking algorithm for our internal language that is practically useful, even if theoretically incomplete. The main problem is with the unrolling rule for fixed points, TYFIXBETA. However there are two places in the translation of programs into XML dyn where we expect to use this unrolling rule: first, where a type transformation is applied to a type (but all type transformations are restricted by the d e f d y n a m i c to be primitive recursive); and secondly in the typing of the translation of the dynamic construct (but in this case the recursion in the type is regular, and there are by now well-known algorithms for checking the equivalence of regular recursive types 3).

6

Related Work

Dynamics have received some attention in the literature 2, 24, 25, 1. More recently Duggan 12 has considered an approach to computing with dynamics, based on dynamic type dispatch, that overcomes some problems with the traditional t y p e c a s e construct in polymorphic languages. Duggan 12 introduces safe dynamics, which are superficially related to partial dynamics 7, although in fact quite different. The difference is that safe dynamics incorporate recursion at the kind level, not the type level (as with some extensions of partial dynamics). As an example of the difference, the following can be type-checked with recursion at the type level: f u n Ap ( f , x ) = typecase ( f , x) of ((~--+~)(f),~(x)) ~ d y n a m i c ( f x) However type-checking is undecidable with safe dynamics extended with this form of multi-parameter typecase patterns 15. It remains to be seen if a restricted version of these patterns could be added to safe dynamics. Herlihy and Liskov 21 propose an approach to adding user-definable marshalling code to a distributed programming language. In this case, the language is CLU, and

294 the approach is based on defining overloaded marshall and unmarshall operations that are exported by a CLU cluster. For parameterized types, Herlihy and Liskov use Adastyle constrained genericity to parameterize the overloaded marshalling operations by marshalling operations for the element types. Herlihy and Liskov do not consider a formal semantics for their approach. They also do not consider type inference. Although we have omitted details from the current presentation for lack of space, our semantics can be used in an implicitly typed language with type inference 13. There is a relationship between our type system and parametric overloading, explored in more detail in other papers 12, 14. Essentially our approach is based on a "closed world assumption," as opposed to the "open world assumption" underlying parametric overloading. This allows us to make direct use of dynamic type dispatch for our semantics, whereas the semantics of parametric overloading is based on call-site closure construction 14. Knabe 22 has considered the problem of preventing the marshalling of nativecode-implemented functions, in the Facile distributed programming environment 31. Knabe introduces x f u n and x f n constructs, analogous to the f u n and f n constructs in Standard ML, for building transmissible closures (referred to by him as "potentially transmissible functions"). Potentially transmissible functions require two representations, one as a transmissible representation, the other as machine code. When a transmissible function is received at a remote site, it is typed as an ordinary function, with the run-time system implicitly compiling it to native code as it is used. However Knabe does not formalize potential transmissibility in the type system. Instead the compiler attempts to marshall potentially transmissible functions at compile-time, with the marshaller raising an exception if there is an attempt to marshal a function without a transmissible representation 22, Page 62. Our type system can be extended to distinguish transmissible closures from other forms of closures: dynami e can be given the domain kind p~.(... U (T --~ T) U...) where values of type xl 2+ x2 are transmissible closures. Ohori and Kato 29 give a semantics for marshalling in polymorphic languages. In their approach functions are not transmitted, instead proxy functions are transmitted that invoke the function at the sender site when invoked on the receiver site. Ohori and Kato do not consider the issue of providing user-specified marshalling code in polymorphic languages.

7

Conclusions

We have considered a semantics for dynamic typing for distributed programming in polymorphic languages, that allows the addition of user-defined marshalling operations. The semantics of this are expressed in an internal language with recursion and dynamic type dispatch at both the term and type levels. User-defined marshalling amounts to reifying dynamic type dispatch to the programmer level in an ML-like language. In practice there is an obvious inefficiency in having the marshaller make a copy of a data structure before transmitting it. This is somewhat orthogonal to the concern of the current paper. It should be possible to apply "deforestation" optimizations to build transformed data structures "on the wire" This remains an important topic for further work.

Bibliography 1

2

3 4

5 6 7 8 9

10

11

12

13 14 15 16

Martin Abadi, Luca Cardeli, Benjamin Pierce, and Didier Remy. Dynamic typing in polymorphic languages. In Peter Lee, editor, Proceedings of the ACM SIGPLAN Workshop on ML and its Applications, San Francisco, California, 1992. CarnegieMellon University Technical Report CMU-CS-93-105. Martin Abadi, Luca Cardelli, Benjamin Pierce, and Gordon Plotkin. Dynamic typing in a statically typed language. ACM Transactions on Programming Languages and Systems, 13(2):237-268, 1991. Roberto Amadio and Luca Cardelli. Subtyping recursive types. ACM Transactions on Programming Languages and Systems, 15(4):575--631, 1993. J. Bacon and K. G. Hamilton. Distributed computing with RPC: The Cambridge approach. In Proceedings of lFIP Conference on Distributed Computing, Amsterdam, 1987. North-Holland. A. Birrell, G. Nelson, S. Owicki, and E. Wobber. Network objects. Technical report, DEC Systems Research Center, Palo Alto, California, 1993. Val Breazu-Tannen, Thierry Coquand, Carl Gunter, and Andre Scedrov. Inheritance as implicit coercion. Information and Computation, 93(1):172-221, 1991. Peter Buneman and Atsushi Ohori. Polymorphism and type inference in database programming. ACM Transactions on Database Systems, 1996. To appear. Luca Cardelli. Typeful programming. Technical report, DEC Systems Research Center, 1989. Roberto Di Cosmo and Delia Kesner. A confluent reduction for the extensional typed ~,-calculus with pairs, sums, recursion and terminal object. In Proceedings of the International Conference on Automata, Languages and Programming, volume 700 of Lecture Notes in Computer Science, pages 645--656. Springer-Verlag, 1993. Daniel Dougherty. Some lambda calculi with categorical sums and products. In Rewriting Techniques and Applications, Lecture Notes in Computer Science. Springer-Verlag, 1993. Catherine Dubois, Francois Rouaix, and Pierre Weis. Extensional polymorphism. In Proceedings of ACM Symposium on Principles of Programming Languages, San Francisco, California, 1995. ACM Press. Dominic Duggan. Dynamic typing for distributed programming in polymorphic languages. To appear in Transactions on Programming Languages and Systems, 1998. Dominic Duggan. Finite subtype inference with explicit polymorphism. Submitted for publication, 1998. Dominic Duggan and John Ophel. Scoped parametric overloading. Submitted for publication, 1997. Dominic Duggan and John Ophel. Type-checking multi-parameter type classes. Submitted for publication, 1997. Tim Freeman and Frank Pfenning. Refinement types for ML. In Proceedings of A CM S1GPLAN Conference on Programming Language Design and Implementation, pages 268-277. ACM Press, 1991.

296 James Gosling, Bill Joy, and Guy Steele. The Java Language Specification. The Java Series. Addison-Wesley, 1997. 18 Graham Hamilton, Michael L. Powell, and James J. Mitchell. Subcontract: A flexible base for distributed programming. In Symposium on Operating Systems Principles, pages 69-79. ACM Press, 1993. 19 Robert Harper and John C. Mitchell. On the type structure of Standard ML. ACM Transactions on Programming Languages and Systems, 15(2):211-252, 1993. 20 Robert Harper and Gregory Morrisett. Compiling polymorphism using intensional type analysis. In Proceedings of ACM Symposium on Principles of Programming Languages, San Francisco, California, 1995. ACM Press. 21 Maurice Herlihy and Barbara Liskov. A value transmission method for abstract data types. ACM Transactions on Programming Languages and Systems, 4(4): 527-551, 1982. 22 Frederick Knabe. Language Support for Mobile Agents. PhD thesis, Carnegie Mellon University, 1995. 23 Clifford Krumvieda. Distributed ML: Abstractionfor Efficient and Fault-Tolerant Programming. PhD thesis, Cornell University, Ithaca, New York, 1993. 24 Xavier Leroy and Michel Mauny. Dynamics in ML. Journal of Functional Programming, 3(4):431-463, 1993. 25 Xavier Leroy and Pierre Weiss. Dynamics in ML. In Proceedings of ACM Symposium on Functional Programming and Computer Architecture, 1991. 26 Barbara Liskov. Distributed programming in Argus. Communications of the ACM, 31(3), 1988. 27 J. Gregory Morrisett. Compiling With Types. PhD thesis, Carnegie-Mellon University, 1995. 28 Greg Nelson. Systems Programming in Modula-3. Prentice-Hall Series in Innovative Technology. Prentice-Hall, 1991. 29 Atsushi Ohori and Kazuhiko Kato. Semantics for communication primitives in a polymorphic language. In Proceedings of ACM Symposium on Principles of Programming Languages, pages 99-112. ACM Press, 1993. 30 David Tarditi, Greg Morrisett, Perry Cheng, Christopher Stone, Robert Harper, and Peter Lee. TIL: A type-directed optimizing compiler for ML. In Proceedings of A CM SIGPLAN Conference on Programming Language Design and Implementation, Philadelphia, Pennsylvania, 1996. ACM Press. 31 Bent Thomsen, Lone Leth, Sanjiva Prasad, Tsung-Min Kuo, Andre Kramer, Fritz Knabe, and Alessandro Giacalone. Facile Antigua release programming guide. Technical Report ECRC-93-20, European Computer-Industry Research Centre, Munich, Germany, 1993. 32 Andrew Wright and Robert Cartwright. A practical soft type system for Scheme. In Proceedings of ACM Symposium on Lisp and Functional Programming, pages 250-262, Orland, Florida, 1994. ACM Press. 17

297

A

K i n d Rules for X M L ~ F,~'-s <: ~,};A F e : {t (~-~,)/ct}t F;A F (t(ctl : Pl,.-.,ct, :P,) ==~ e) : (Act <: t_(pl,...,p,).x)

(TMCASE (TMABORT

F;A F abort : (Act <: l.x) F;A F el : (Act <: Pl A')

F;A F e2 : (Act <: P2.X)

tc(pl)ntc(p2) = {}

F;A k el ~e2 : (Act <: Pl U p2,z) (TMJOIN

F;A F e: (Act<: p.~) r;A F el(e) : (Vet<: p.x)

(TMCLOS)

Fig. 5. Type Rules for Dynamic Type Dispatch with XML A

t has arity T" --+ T FFt(

(TYCON

~ t <: pl -~ ... -+ p, -~ t(pl .... ,p,)

(TYABS) F , ~ . <: N ; z F 1:' (

r'F (t(ct,,...,ct,) ~z)

>Z<:

{~}C_FV(z)f3FV('~')

) (t(ctl,...,ct,)~

1;')<: (Act <: t(pl,...,pn).X) (TYCASE)

rF~l(

~<:(Act<:pi.x)

FFt2(

FFII~I:2(

> ~2 <: (Act <: p2.z)

tc(pl)F~tc(p2) ----{}

>z~()~<:(Act<:p!Up2.z) (TYJOIN)

FF~(

~ ~' <: (Act <: p.x)

I',Ct<:x'F~<: X r~- ((;~ct.~)r ~

FF~'<: X'

{~'Ict}~ <: x

(TYCLOS)

(TYFUNBETA)

FF z<:X-~ Z'

F;F,-~,<:~F'~i<:pfori=l,...,n r ; r ~ (~z(t.(~) ~

rF~<:p-~x

F;FF~ <:~-~

~) (tj(?))) ( > {?/~}~s <: P

rF~'<:p

p=U~

FF (I:I:')~-d cl(t(~) =:} I;(t(~)))(I:')<: )(,

(TYFUNETA)

(TYCASEBETA)

(TYCASECOP)

rFt<: (Xl -+X2)-+ (Xl oX2) F F tyrec(~) ( ) (~ (~ct.tyrec(1:)(ct)))<: (XI -~ X2)

Fig. 6. Kind and Conversion Rules for Type Operators in XMLA

(TYFIXBETA)

Stack-Based Typed Assembly Language? Greg Morrisett, Karl Crary, Neal Glew, and David Walker Cornell University

Abstract. In previous work, we presented a Typed Assembly Language (TAL). TAL is sufficiently expressive to serve as a target language for compilers of high-level languages such as ML. This work assumed such a compiler would perform a continuation-passing style transform and eliminate the control stack by heap-allocating activation records. However, most compilers are based on stack allocation. This paper presents STAL, an extension of TAL with stack constructs and stack types to support the stack allocation style. We show that STAL is sufficiently expressive to support languages such as Java, Pascal, and ML; constructs such as exceptions and displays; and optimizations such as tail call elimination and callee-saves registers. This paper also formalizes the typing connection between CPS-based compilation and stack-based compilation and illustrates how STAL can formally model calling conventions by specifying them as formal translations of source function types to STAL types.

1

Introduction and Motivation

Statically typed source languages have efficiency and software engineering advantages over their dynamically typed counterparts. Modern type-directed compilers [19,25,7,32,20,29,12] exploit the properties of typed languages more extensively than their predecessors by preserving type information computed in the front end through a series of typed intermediate languages. These compilers use types to direct sophisticated transformations such as closure conversion [18,31,17,1,21], region inference [8], subsumption elimination [9,11], and unboxing [19,22,28]. Without types these transformations are, in many cases, less effective or impossible. Furthermore, the type translation partially specifies the corresponding term translation and often captures the critical concerns in an elegant and succinct fashion. Strong type systems not only describe but also enforce many important invariants. Consequently, developers of type-based compilers may invoke a type-checker after each code transformation, and if the output fails to type-check, the developer knows that the compiler contains an internal error. Although type-checkers for decidable type systems will not catch ?

This material is based on work supported in part by the AFOSR grant F49620-971-0013, ARPA/RADC grant F30602-96-1-0317, ARPA/AF grant F30602-95-1-0047, and AASERT grant N00014-95-1-0985. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not reflect the views of these agencies.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 28–52, 1998. c Springer-Verlag Berlin Heidelberg 1998

Stack-Based Typed Assembly Language

29

all compiler errors, they have proven themselves valuable debugging tools in practice [24]. Despite the numerous advantages of compiling with types, until recently, no compiler propagated type information through the final stages of code generation. The TIL/ML compiler, for instance, preserves types through approximately 80% of compilation but leaves the remaining 20% untyped. Many of the complex tasks of code generation including register allocation and instruction scheduling are left unchecked; types are not used to specify or explain these low-level code transformations. These observations motivated our exploration of very low-level type systems and corresponding compiler technology. In Morrisett et al. [23], we presented a typed assembly language (TAL) and proved that its type system was sound with respect to an operational semantics. We demonstrated the expressiveness of this type system by sketching a type-preserving compiler from an ML-like language to TAL. The compiler ensured that well-typed source programs were always mapped to well-typed assembly language programs and that they preserved source level abstractions such as user-defined abstract data types and closures. Furthermore, we claimed that the type system of TAL did not interfere with many traditional compiler optimizations including inlining, loop-unrolling, register allocation, instruction selection, and instruction scheduling. However, the compiler we presented was critically based on a continuationpassing style (CPS) transform, which eliminated the need for a control stack. In particular, activation records were represented by heap-allocated closures as in the SML of New Jersey compiler (SML/NJ) [2,3]. For example, Figure 1 shows the TAL code our heap-based compiler would produce for the recursive factorial computation. Each function takes an additional argument which represents the control stack as a continuation closure. Instead of “returning” to the caller, a function invokes its continuation closure by jumping directly to the code of the closure, passing the environment of the closure and the result in registers. Allocating continuation closures on the heap has many advantages over a conventional stack-based implementation. First, it is straightforward to implement control primitives such as exceptions, first-class continuations, or user-level lightweight coroutine threads [3,31,34]. Second, Appel and Shao [5] have shown that heap allocation of closures can have better space properties, primarily because it is easier to share environments. Third, there is a unified memory management mechanism (namely the garbage collector) for allocating and collecting all kinds of objects, including activation frames. Finally, Appel and Shao [5] have argued that, at least for SML/NJ, the locality lost by heap-allocating activation frames is negligible. Nevertheless, there are also compelling reasons for providing support for stacks. First, Appel and Shao’s work did not consider imperative languages, such as Java, where the ability to share environments is greatly reduced, nor did it consider languages that do not require garbage collection. Second, Tarditi and Diwan [14,13] have shown that with some cache architectures, heap allocation of continuations (as in SML/NJ) can have substantial overhead due to a loss of

30

Greg Morrisett et al.

locality. Third, stack-based activation records can have a smaller memory footprint than heap-based activation records. Finally, many machine architectures have hardware mechanisms that expect programs to behave in a stack-like fashion. For example, the Pentium Pro processor has an internal stack that it uses to predict return addresses for procedures so that instruction pre-fetching will not be stalled [16]. The internal stack is guided by the use of call/return primitives which use the standard control stack. Clearly, compiler writers must weigh a complex set of factors before choosing stack allocation, heap allocation, or both. The target language must not constrain these design decisions. In this paper, we explore the addition of a stack to our typed assembly language in order to give compiler writers the flexibility they need. Our stack typing discipline is remarkably simple, but powerful enough to compile languages such as Pascal, Java, or ML without adding high-level primitives to the assembly language. More specifically, the typing discipline supports stack allocation of temporary variables and values that do not escape, stack allocation of procedure activation frames, exception handlers, and displays, as well as optimizations such as callee-saves registers. Unlike the JVM architecture [20], our system does not constrain the stack to have the same size at each control-flow point, nor does it require new high-level primitives for procedure call/return. Instead, our assembly language continues to have low-level RISC-like primitives such as loads, stores, and jumps. However, source-level stack allocation, general source-level stack pointers, general pointers into either the stack or heap, and some advanced optimizations cannot be typed. A key contribution of the type structure is that it provides a unifying declarative framework for specifying procedure calling conventions regardless of the allocation strategy. In addition, the framework further elucidates the connection between a heap-based continuation-passing style compiler, and a conventional stack-based compiler. In particular, this type structure makes explicit the notion that the only differences between the two styles are that, instead of passing the continuation as a boxed, heap-allocated tuple, a stack-based compiler passes the continuation unboxed in registers and the environments for continuations are allocated on the stack. The general framework makes it easy to transfer transformations developed for one style to the other. For instance, we can easily explain the callee-saves registers of SML/NJ [2,3,4] and the callee-saves registers of a stack-based compiler as instances of a more general CPS transformation that is independent of the continuation representation.

2

Overview of TAL and CPS-Based Compilation

In this section, we briefly review our original proposal for typed assembly language (TAL) and sketch how a polymorphic functional language, such as ML, can be compiled to TAL in a continuation-passing style, where continuations are heap-allocated. Figure 2 gives the syntax for TAL. Programs (P ) are triples consisting of a heap, register file, and instruction sequence. Heaps map labels to heap values

Stack-Based Typed Assembly Language

31

(H, {}, I) where H = l fact: code[ ]{r1:hi,r2:int,r3:τk }. bneq r2,l nonzero unpack [α,r3],r3 % zero branch: call k (in r3) with 1 ld r4,r3(0) % project k code ld r1,r3(1) % project k environment mov r2,1 jmp r4 % jump to k l nonzero: code[ ]{r1:hi,r2:int,r3:τk }. sub r4,r2,1 % n−1 malloc r5[int, τk ] % create environment for cont in r5 st r5(0),r2 % store n into environment st r5(1),r3 % store k into environment malloc r3 [∀[ ].{r1:hint 1 , τk1 i,r2:int}, hint 1 , τk1 i] % create cont closure mov r2,l cont st r3(0),r2 % store cont code st r3(1),r5 % store environment hn, ki mov r2,r4 % arg := n − 1 mov r3,pack [hint 1 , τk1 i,r3] as τk % abstract environment type jmp l fact % recursive call l cont: code[ ]{r1:hint 1 , τk1 i,r2:int}. % r2 contains (n − 1)! ld r3,r1(0) % retrieve n ld r4,r1(1) % retrieve k mul r2,r3,r2 % n × (n − 1)! unpack [α,r4],r4 % unpack k ld r3,r4(0) % project k code ld r1,r4(1) % project k environment jmp r3 % jump to k l halt: code[ ]{r1:hi,r2:int}. mov r1,r2 halt[int] % halt with result in r1 and I =

malloc r1[ ] malloc r2[ ] malloc r3[∀[ ].{r1:hi,r2:int}, hi] mov r4,l halt st r3(0),r4 st r3(1),r2 mov r2,6 mov r3,pack [hi,r3] as τk jmp l fact

and τk = ∃α.h∀[ ].{r1:α,r2:int}1 , α1i

% create empty environment (hi) % create empty environment % create halt closure in r3 % % % % % %

store cont code store environment hi load argument (6) abstract environment type begin fact with {r1 = hi, r2 = 6, r3 = haltcont}

Fig. 1. Typed Assembly Code for Factorial

32

Greg Morrisett et al.

types initialization flags label assignments type assignments register assignments

τ ::= α | int | ∀[∆].Γ | hτ1ϕ1 , . . . , τnϕn i | ∃α.τ ϕ ::= 0 | 1 Ψ ::= {`1 :τ1 , . . . , `n :τn} ∆ ::= · | α, ∆ Γ ::= {r1 :τ1 , . . . , rn :τn}

registers word values small values heap values heaps register files

r w v h H R

instructions arithmetic ops branch ops instruction sequences programs

::= r1 | · · · | rk ::= ` | i | ?τ | w[τ ] | pack [τ, w] as τ 0 ::= r | w | v[τ ] | pack [τ, v] as τ 0 ::= hw1 , . . . , wn i | code[∆]Γ.I ::= {`1 7→ h1 , . . . , `n 7→ hn } ::= {r1 7→ w1, . . . , rn 7→ wn }

ι ::= aop rd , rs , v | bop r, v | ld rd , rs (i) | malloc r[~τ ] | mov rd , v | st rd (i), rs | unpack [α, rd ], v | aop ::= add | sub | mul bop ::= beq | bneq | bgt | blt | bgte | blte I ::= ι; I | jmp v | halt [τ ] P ::= (H, R, I)

Fig. 2. Syntax of TAL

which are either tuples of word-sized values or code sequences. Register files map registers to word-sized values. Instruction sequences are lists of instructions terminated by either a jmp or halt instruction. The context ∆ binds the free type variables of Γ in ∀[∆].Γ , and of both Γ and I in code[∆]Γ.I. The instruction unpack [α, r], v binds α in the following instructions. We consider syntactic objects to be equivalent up to alpha-conversion, and consider label assignments, register assignments, heaps, and register files equivalent up to reordering of labels ~ denotes a and registers. Register names do not alpha-convert. The notation X sequence of zero or more Xs, and | · | denotes the length of a sequence. The instruction set consists mostly of conventional RISC-style assembly operations, including arithmetic, branches, loads, and stores. One exception, the unpack instruction, strips the quantifier from the type of an existentially typed value and introduces a new type variable into scope. On an untyped machine, this is implemented by an ordinary move. The other non-standard instruction is malloc, which is explained below. Evaluation is specified as a deterministic rewriting system that takes programs to programs (see Morrisett et al. [23] for details). The types for TAL consist of type variables, integers, tuple types, existential types, and polymorphic code types. Tuple types contain initialization flags (either 0 or 1) that indicate whether or not components have been initialized. For example, if register r has type hint 0 , int 1 i, then it contains a label bound in the heap to a pair that can contain integers, where the first component may not have been initialized, but the second component has. In this context, the type system

Stack-Based Typed Assembly Language

33

allows the second component to be loaded, but not the first. If an integer value is stored into r(0) then afterwards r has the type hint 1 , int 1 i, reflecting the fact that the first component is now initialized. The instruction malloc r[τ1 , . . . , τn ] heap-allocates a new tuple with uninitialized fields and places its label in register r. TAL code types (∀[α1 , . . . , αn ].Γ ) describe code blocks (code[α1 , . . . , αn ]Γ.I), which are instruction sequences, that expect a register file of type Γ and in which the type variables α1 , . . . , αn are held abstract. In other words, Γ serves as a register file pre-condition that must hold before control may be transferred to the code block. Code blocks have no post-condition because control is either terminated via a halt instruction or transferred to another code block. The type variables that are abstracted in a code block provide a means to write polymorphic code sequences. For example, the polymorphic code block code[α]{r1:α, r2:∀[].{r1:hα1 , α1i}}. malloc r3[α, α] st r3(0), r1 st r3(1), r1 mov r1, r3 jmp r2 roughly corresponds to a CPS version of the ML function fn (x:α) => (x, x). The block expects upon entry that register r1 contains a value of the abstract type α, and r2 contains a return address (or continuation label) of type ∀[].{r1 : hα1 , α1 i}. In other words, the return address requires register r1 to contain an initialized pair of values of type α before control can be returned to this address. The instructions of the code block allocate a tuple, store into the tuple two copies of the value in r1, move the pointer to the tuple into r1 and then jump to the return address in order to “return” the tuple to the caller. If the code block is bound to a label `, then it may be invoked by simultaneously instantiating the type variable and jumping to the label (e.g., jmp `[int ]). Source languages like ML have nested higher-order functions that might contain free variables and thus require closures to represent functions. At the TAL level, we represent closures as a pair consisting of a code block label and a pointer to an environment data structure. The type of the environment must be held abstract in order to avoid typing difficulties [21], and thus we pack the type of the environment and the pair to form an existential type. All functions, including continuation functions introduced during CPS conversion, are thus represented as existentials. For example, once CPS converted, a source function of type int →hi has type (int , (hi→void ))→void.1 After closures are introduced, the code will have type: ∃α1 .h(α1 , int, ∃α2 .h(α2 , hi) → void, α2 i) → void , α1i 1

The void return types are intended to suggest the non-returning aspect of CPS code.

34

Greg Morrisett et al.

Finally, at the TAL level the function will be represented by a value with the type: ∃α1 .h∀[].{r1:α1, r2:int , r3:∃α2.h∀[].{r1:α2 , r2:hi}1 , α12 i}1 , α11 i Here, α1 is the abstracted type of the closure’s environment. The code for the closure requires that the environment be passed in register r1, the integer argument in r2, and the continuation in r3. The continuation is itself a closure where α2 is the abstracted type of its environment. The code for the continuation closure requires that the environment be passed in r1 and the unit result of the computation in r2. To apply a closure at the TAL level, we first use the unpack operation to open the existential package. Then the code and the environment of the closure pair are loaded into appropriate registers, along with the argument to the function. Finally, we use a jump instruction to transfer control to the closure’s code. Figure 1 gives the CPS-based TAL code for the following ML expression which computes six factorial: let fun fact n = if n = 0 then 1 else n * (fact(n - 1)) in fact 6 end

3

Adding Stacks to TAL

In this section, we show how to extend TAL to achieve a Stack-based Typed Assembly Language (STAL). Figure 3 defines the new syntactic constructs for the language. In what follows, we informally discuss the dynamic and static semantics for the modified language, leaving formal treatment to Appendix A.

types τ stack types σ type assignments ∆ register assignments Γ word values w small values v register files R stacks S instructions ι

::= · · · | ns ::= ρ | nil | τ ::σ ::= · · · | ρ, ∆ ::= {r1 :τ1, . . . , rn :τn , sp:σ} ::= · · · | w[σ] | ns ::= · · · | v[σ] ::= {r1 7→ w1 , . . . , rn 7→ wn, sp 7→ S} ::= nil | w::S ::= · · · | salloc n | sfree n | sld rd , sp(i) | sst sp(i), rs

Fig. 3. Additions to TAL for Simple Stacks

Operationally, we model stacks (S) as lists of word-sized values. Uninitialized stack slots are filled with nonsense (ns). Register files now include a distinguished

Stack-Based Typed Assembly Language

35

register, sp, which represents the current stack. There are four new instructions that manipulate the stack. The salloc n instruction places n words of nonsense on the top of the stack. In a conventional machine, assuming stacks grow towards lower addresses, an salloc instruction would correspond to subtracting n from the current value of the stack pointer. The sfree n instruction removes the top n words from the stack, and corresponds to adding n to the current stack pointer. The sld r, sp(i) instruction loads the ith word of the stack into register r, whereas the sst sp(i), r stores register r into the ith word. Note, the instructions ld and st cannot be used with the stack pointer. A program becomes stuck if it attempts to execute: – sfree n and the stack does not contain at least n words, – sld r, sp(i) and the stack does not contain at least i + 1 words or else the ith word of the stack is ns, or – sst sp(i), r and the stack does not contain at least i + 1 words. As in the original TAL, the typing rules for the modified language prevent well-formed programs from becoming stuck. Stacks are described by stack types (σ), which include nil and τ ::σ. The latter represents a stack of the form w::S where w has type τ and S has type σ. Stack slots filled with nonsense have type ns. Stack types also include stack type variables (ρ) which may be used to abstract the tail of a stack type. The ability to abstract stacks is critical for supporting procedure calls and is discussed in detail later. As before, the register file for the abstract machine is described by a register file type (Γ ) mapping registers to types. However, Γ also maps the distinguished register sp to a stack type σ. Finally, code blocks and code types support polymorphic abstraction over both types and stack types. One of the uses of the stack is to save temporary values during a computation. The general problem is to save on the stack n registers, say r1 through rn , of types τ1 through τn , perform some computation e, and then restore the temporary values to their respective registers. This would be accomplished by the following instruction sequence where the comments (delimited by %) show the stack’s type at each step of the computation.

salloc n sst sp(0), r1 .. .

%σ % ns::ns:: · · · ::ns::σ % τ1 ::ns:: · · · ::ns::σ

sst sp(n − 1), rn % τ1 ::τ2 :: · · · ::τn ::σ code for e % τ1 ::τ2 :: · · · ::τn ::σ % τ1 ::τ2 :: · · · ::τn ::σ sld r1 , sp(0) .. . sld rn , sp(n − 1) % τ1 ::τ2 :: · · · ::τn ::σ sfree n %σ

36

Greg Morrisett et al.

If, upon entry, ri has type τi and the stack is described by σ, and if the code for e leaves the state of the stack unchanged, then this code sequence is well-typed. Furthermore, the typing discipline does not place constraints on the order in which the stores or loads are performed. It is straightforward to model higher-level primitives, such as push and pop. The former can be seen as simply salloc 1 followed by a store to sp(0), whereas the latter is a load from sp(0) followed by sfree 1. Also, a “jump-and-link” or “call” instruction which automatically moves the return address into a register or onto the stack can be synthesized from our primitives. To simplify the presentation, we did not include these instructions in STAL; a practical implementation, however, would need a full set of instructions appropriate to the architecture. The stack is commonly used to save the current return address, and temporary values across procedure calls. Which registers to save and in what order is usually specified by a compiler-specific calling convention. Here we consider a simple calling convention where it is assumed there is one integer argument and one unit result, both of which are passed in register r1, and the return address is passed in the register ra. When invoked, a procedure may choose to place temporaries on the stack as shown above, but when it jumps to the return address, the stack should be in the same state as it was upon entry. Naively, we might expect the code for a function obeying this calling convention to have the following STAL type: ∀[].{r1:int , sp:σ, ra:∀[].{r1:hi, sp:σ}} Notice that the type of the return address is constrained so that the stack must have the same shape upon return as it had upon entry. Hence, if the procedure pushes any arguments onto the stack, it must pop them off. However, this typing is unsatisfactory for two reasons. The first problem is that there is nothing preventing the procedure from popping off values from the stack and then pushing new values (of the appropriate type) onto the stack. In other words, the caller’s stack frame is not protected from the function’s code. The second problem is much worse: such a function can only be invoked from states where the stack is exactly described by σ. This effectively prevents invocation of the procedure from two different points in the program. For example, there is no way for the procedure to push its return address on the stack and jump to itself. The solution to both problems is to abstract the type of the stack using a stack type variable: ∀[ρ].{r1:int , sp:ρ, ra:∀[].{r1 : int , sp:ρ}} To invoke a function with this type, the caller must instantiate the bound stack type variable ρ with the current type of the stack. As before, the function can only jump to the return address when the stack is in the same state as it was upon entry. However, the first problem above is addressed because the type checker treats ρ as an abstract stack type while checking the body of the code. Hence, the code cannot perform an sfree, sld, or sst on the stack. It must

Stack-Based Typed Assembly Language

37

first allocate its own space on the stack, only this space may be accessed by the function, and the space must be freed before returning to the caller.2 The second problem is solved because the stack type variable may be instantiated in different ways. Hence multiple call sites with different stack states, including recursive calls, may now invoke the function. In fact, a recursive call will usually instantiate the stack variable with a different type than the original call because, unless it is a tail call, it will need to store its return address on the stack.

(H, {sp 7→ nil}, I) where H = l fact: code[ρ]{r1 : hi, r2 : int, sp : ρ, ra : τρ }. bneq r2,l nonzero[ρ] % if n = 0 continue mov r1,1 % result is 1 jmp ra % return l nonzero: code[ρ]{r1 : hi, r2 : int, sp : ρ, ra : τρ }. sub r3,r2,1 % n−1 salloc 2 % save n and return address to stack sst sp(0),r2 sst sp(1),ra mov r2,r3 % recursive call f act(n − 1) mov ra,l cont[ρ] jmp l fact[int::τρ::ρ] l cont: code[ρ]{r1 : int, sp : int::τρ ::ρ}. sld r2,sp(0) % restore n and return address sld ra,sp(1) sfree 2 mul r1,r2,r1 % result is n × f act(n − 1) jmp ra % return l halt: code[]{r1 : int, sp : nil}. halt [int] and I =

malloc r1[] mov r2,6 mov ra,l halt jmp l fact[nil]

% environment % argument % return address for initial call

and τρ = ∀[].{r1 : int, sp : ρ}

Fig. 4. STAL Factorial Example 2

Some intuition on this topic may be obtained from Reynolds’ theorem on parametric polymorphism [26] but a formal proof is difficult.

38

Greg Morrisett et al.

Figure 4 gives stack-based code for the factorial example of the previous section. The function is invoked by moving its environment (an empty tuple) into r1, the argument into r2, and the return address label into ra and jumping to the label l fact. Notice that the nonzero branch must save the argument and current return address on the stack before jumping to the fact label in a recursive call. It is interesting to note that the stack-based code is quite similar to the heap-based code of Figure 1. Indeed, the code remains in a continuationpassing style, but instead of passing the continuation as a heap-allocated tuple, the environment of the continuation is passed in the stack pointer and the code of the continuation is passed in the return address register. To more fully appreciate the correspondence, consider the type of the TAL version of l fact from Figure 1: ∀[].{r1:hi, r2:int, r3:∃α.h∀[].{r1:α, r2:int}1 , α1 i} We could have used an alternative approach where we pass the components of the continuation closure unboxed in separate registers. To do so, the caller must unpack the continuation and the function must abstract the type of the continuation’s environment resulting in a quantifier rotation: ∀[α].{r1:hi, r2:int, r3:∀[].{r1:α, r2:int }, r4:α} Now, it is clear that the STAL code, which has type ∀[ρ].{r1:hi, r2:int, ra:∀[].{sp:ρ, r1:int}, sp:ρ} is essentially the same! Indeed, the only difference between a CPS-based compiler, such as SML/NJ, and a conventional stack-based compiler, is that for the latter, continuation environments are allocated on a stack. Our type system describes this well-known connection elegantly. Our techniques can be applied to other calling conventions and do not appear to inhibit most optimizations. For instance, tail calls can be eliminated in CPS simply by forwarding a continuation closure to the next function. If continuations are allocated on the stack, we have the mechanisms to pop the current activation frame off the stack and to push any arguments before performing the tail call. Furthermore, the type system is expressive enough to type this resetting and adjusting for any kind of tail call, not just a self tail call. As another example, some CISC-style conventions push the arguments, the environment, and then the return address on the stack, and return the result on the stack. With this convention, the factorial code would have type: ∀[ρ].{sp:∀[]{sp:int::ρ}::hi::int::ρ} Callee-saves registers (registers whose values must be preserved across function calls) can be handled in the same fashion as the stack pointer. In particular, the function holds abstract the type of the callee-saves register and requires that the register have the same type upon return. For instance, if we wish to preserve register r3 across a call to factorial, we would use the type: ∀[ρ, α].{r1:hi, r2:int, r3:α, ra:∀[]{sp:ρ, r1:int , r3:α}, sp:ρ}

Stack-Based Typed Assembly Language

39

Translating this type back in to a boxed, heap allocated closure, we obtain: ∀[α].{r1:hi, r2 : int , r3:α, ra:∃β.h∀[]{r1:β, r2:int, r3:α}1 , β 1 i} This is the type of the callee-saves approach of Appel and Shao [4]. Thus we see how our correspondence enables transformations developed for heap-based compilers to be used in traditional stack-based compilers and vice versa. The generalization to multiple callee-saves registers and other calling conventions should be clear. Indeed, we have found that the type system of STAL provides a concise way to declaratively specify a variety of calling conventions.

4

Exceptions

We now consider how to implement exceptions in STAL. We will find that a calling convention for function calls in the presence of exceptions may be derived from the heap-based CPS calling convention, just as was the case without exceptions. However, implementing this calling convention will require that the type system be made more expressive by adding compound stack types. This additional expressiveness will turn out to have uses beyond exceptions, allowing most compiler-introduced uses of pointers into the midst of the stack. 4.1

Exception Calling Conventions

In a heap-based CPS framework, exceptions are implemented by passing two continuations: the usual continuation and an exception continuation. Code raises an exception by jumping to the latter. For an integer to unit function, this calling convention is expressed as the following TAL type (ignoring the outer closure and environment): ∀[ ].{r1:int, ra:∃α1 .h∀[ ].{r1:α1, r2:hi}1 , α11i, re:∃α2 .h∀[ ].{r1:α2 , r2:exn}1 , α12 i} Again, the caller might unpack the continuations: ∀[α1 , α2].{r1:int, ra:∀[ ].{r1:α1, r2:hi}, ra0 :α1, re:∀[ ].{r1:α2 , r2:exn}, re0 :α2 } Then the caller might (erroneously) attempt to place the continuation environments on stacks, as before: ∀[ρ1 , ρ2 ].{r1:int, ra:∀[ ].{sp:ρ1 , r1:hi}, sp:ρ1 , re:∀[ ].{sp:ρ2 , r1:exn}, sp0 :ρ2 } Unfortunately, this calling convention uses two stack pointers, and STAL has only one stack.3 Observe, though, that the exception continuation’s stack is necessarily a tail of the ordinary continuation’s stack. This observation leads to the following calling convention for exceptions with stacks: ∀[ρ1 , ρ2 ].{ sp:ρ1 ◦ ρ2 , r1:int , ra:∀[ ].{sp:ρ1 ◦ ρ2 , r1:hi}, re0 :ptr (ρ2 ), re:∀[ ].{sp:ρ2 , r1:exn}} 3

Some language implementations use a separate exception stack; with some minor modifications, this calling convention would be satisfactory for such implementations.

40

Greg Morrisett et al.

This type uses two new constructs we now add to STAL (see Figure 5). When σ1 and σ2 are stack types, the stack type σ1 ◦ σ2 is the result of appending the two types. Thus, in the above type, the function is presented with a stack with type ρ1 ◦ ρ2 , all of which is expected by the regular continuation, but only a tail of which (ρ2 ) is expected by the exception continuation. Since ρ1 and ρ2 are quantified, the function may still be used for any stack so long as the exception continuation accepts some tail of that stack. To raise an exception, the exception is placed in r1 and the control is transfered to the exception continuation. This requires cutting the actual stack down to just that expected by the exception continuation. Since the length of ρ1 is unknown, this can not be done by sfree. Instead, a pointer to the desired position in the stack is supplied in re0 , and is moved into sp. The type ptr (σ) is the type of pointers into the stack at a position where the stack has type σ. Such pointers are obtained simply by moving sp into a register. 4.2

Compound Stacks

The additional syntax to support exceptions is summarized in Figure 5. The new type constructors were discussed above. The word value ptr (i) is used by the operational semantics to represent pointers into the stack; the element pointed to is i words from the bottom of the stack. (See Figure 7 for details.) Of course, on a real machine, these would be implemented by actual pointers. The instructions mov rd , sp and mov sp, rs save and restore the stack pointer, and the instructions sld rd , rs (i) and sst rd (i), rs allow for loading from and storing to pointers.

types τ stack types σ word values w instructions ι

::= · · · | ptr (σ) ::= · · · | σ1 ◦ σ2 ::= · · · | ptr (i) ::= · · · | mov rd , sp | mov sp, rs | sld rd , rs (i) | sst rd (i), rs

Fig. 5. Additions to TAL for Compound Stacks

The introduction of pointers into the stack raises a delicate issue for the typing system. When the stack pointer is copied into a register, changes to the stack are not reflected in the type of the copy, and can invalidate a pointer. Consider the following incorrect code: % begin with sp : τ ::σ, sp 7→ w::S (τ 6= ns) mov r1, sp % r1 : ptr (τ ::σ) sfree 1 % sp : σ, sp 7→ S salloc 1 % sp : ns::σ, sp 7→ ns::S sld r2, r1(0) % r2 : τ but r2 7→ ns

Stack-Based Typed Assembly Language

41

When execution reaches the final line, r1 still has type ptr (τ ::σ), but this type is no longer consistent with the state of the stack; the pointer in r1 points to ns. To prohibit erroneous loads of this sort, the type system requires that the pointer rs be valid in the instructions sld rd , rs(i), sst rd (i), rs , and mov sp, rs . An invariant of our system is that the type of sp always describes the current stack, so using a pointer into the stack will be sound if that pointer’s type is consistent with sp’s type. Suppose sp has type σ1 and r has type ptr (σ2 ), then r is valid if σ2 is a tail of σ1 (formally, if there exists some σ 0 such that σ1 = σ 0 ◦σ2 ). If a pointer is invalid, it may be neither loaded from nor moved into the stack pointer. In the above example the load will be rejected because r1’s type τ ::σ is not a tail of sp0 s type, ns::σ. 4.3

Using Compound Stacks

Recall the type for a function in the presence of exceptions: ∀[ρ1 , ρ2 ].{ sp:ρ1 ◦ ρ2 , r1:int , ra:∀[ ].{sp:ρ1 ◦ ρ2 , r1:hi}, re0 :ptr (ρ2 ), re:∀[ ].{sp:ρ2 , r1:exn}} An exception may be raised within the body of such a function by restoring the handler’s stack from re0 and jumping to the handler. A new exception handler may be installed by copying the stack pointer to re0 and making forthcoming function calls with the stack type variables instantiated to nil and ρ1 ◦ ρ2 . Calls that do not install new exception handlers would attach their frames to ρ1 and pass on ρ2 unchanged. Since exceptions are probably raised infrequently, an implementation could save a register by storing the exception continuation’s code pointer on the stack, instead of in its own register. If this convention were used, functions would expect stacks with the type ρ1 ◦ (τhandler ::ρ2 ) and exception pointers with the type ptr (τhandler ::ρ2 ) where τhandler = ∀[ ].{sp:ρ2 , r1:exn}. This last convention illustrates a use for compound stacks that goes beyond implementing exceptions. We have a general tool for locating data of type τ amidst the stack by using the calling convention: ∀[ρ1 , ρ2 ].{sp:ρ1 ◦ (τ ::ρ2 ), r1:ptr(τ ::ρ2 ), . . .} One application of this tool would be for implementing Pascal with displays. The primary limitation of this tool is that if more than one piece of data is stored amidst the stack, although quantification may be used to avoid specifying the precise locations of that data, function calling conventions would have to specify in what order data appears on the stack. It appears that this limitation could be removed by introducing a limited form of intersection type, but we have not yet explored the ramifications of this enhancement.

5

Related and Future Work

Our work is partially inspired by Reynolds [27], which uses functor categories to “replace continuations by instruction sequences and store shapes by descriptions

42

Greg Morrisett et al.

of the structure of the run-time stack.” However, Reynolds was primarily concerned with using functors to express an intermediate language of a semanticsbased compiler for Algol, whereas we are primarily concerned with type structure for general-purpose target languages. Stata and Abadi [30] formalize the Java bytecode verifier’s treatment of subroutines by giving a type system for a subset of the Java Virtual Machine language. In particular, their type system ensures that for any program control point, the Java stack is of the same size each time that control point is reached during execution. Consequently, procedure call must be a primitive construct (which it is in JVML). In contrast, our treatment supports polymorphic stack recursion, and hence procedure calls can be encoded with existing assemblylanguage primitives. Tofte and others [8,33] have developed an allocation strategy involving regions. Regions are lexically scoped containers that have a LIFO ordering on their lifetimes, much like the values on a stack. As in our approach, polymorphic recursion on abstracted region variables plays a critical role. However, unlike the objects in our stacks, regions are variable-sized, and objects need not be allocated into the region which was most recently created. Furthermore, there is only one allocation mechanism in Tofte’s system (the stack of regions) and no need for a garbage collector. In contrast, STAL only allows allocation at the top of the stack and assumes a garbage collector for heap-allocated values. However, the type system for STAL is considerably simpler than the type system of Tofte et al., as it requires no effect information in types. Bailey and Davidson [6] also describe a specification language for modeling procedure calling conventions and checking that implementations respect these conventions. They are able to specify features such as a variable number of arguments that our formalism does not address. However, their model is explicitly tied to a stack-based calling convention and does not address features such as exception handlers. Furthermore, their approach does not integrate the specification of calling conventions with a general-purpose type system. Although our type system is sufficiently expressive for compilation of a number of source languages, it falls short in several areas. First, it cannot support general pointers into the stack because of the ordering requirements; nor can stack and heap pointers be unified so that a function taking a tuple argument can be passed either a heap-allocated or a stack-allocated tuple. Second, threads and advanced mechanisms for implementing first-class continuations such as the work by Hieb et al. [15] cannot be modeled in this system without adding new primitives. However, we claim that the framework presented here is a practical approach to compilation. To substantiate this claim, we are constructing a compiler called TALC that maps the KML language [10] to a variant of STAL described here, suitably adapted for the Intel IA32 architecture. We have found it straightforward to enrich the target language type system to include support for other type constructors, such as references, higher-order constructors, and recursive types. The compiler uses an unboxed stack allocation style of continuation passing.

Stack-Based Typed Assembly Language

43

Although we have discussed mechanisms for typing stacks at the assembly language level, our techniques generalize to other languages. The same mechanisms, including the use of polymorphic recursion to abstract the tail of a stack, can be used to introduce explicit stacks in higher level calculi. An intermediate language with explicit stacks would allow control over allocation at a point where more information is available to guide allocation decisions.

6

Summary

We have given a type system for a typed assembly language with both a heap and a stack. Our language is flexible enough to support the following compilation techniques: CPS using both heap allocation and stack allocation, a variety of procedure calling conventions, displays, exceptions, tail call elimination, and callee-saves registers. A key contribution of the type system is that it makes procedure calling conventions explicit and provides a means of specifying and checking calling conventions that is grounded in language theory. The type system also makes clear the relationship between heap allocation and stack allocation of continuation closures, capturing both allocation strategies in one calculus.

References 1. Andrew W. Appel and Trevor Jim. Continuation-passing, closure-passing style. In Sixteenth ACM Symposium on Principles of Programming Languages, pages 293–302, Austin, January 1989. 2. Andrew W. Appel and David B. MacQueen. Standard ML of New Jersey. In Martin Wirsing, editor, Third International Symposium on Programming Language Implementation and Logic Programming, pages 1–13, New York, August 1991. SpringerVerlag. Volume 528 of Lecture Notes in Computer Science. 3. Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 4. Andrew Appel and Zhong Shao. Callee-saves registers in continuation-passing style. Lisp and Symbolic Computation, 5:189–219, 1992. 5. Andrew Appel and Zhong Shao. An empirical and analytic study of stack vs. heap cost for languages with clsoures. Journal of Functional Programming, 1(1), January 1993. 6. Mark Bailey and Jack Davidson. A formal model of procedure calling conventions. In Twenty-Second ACM Symposium on Principles of Programming Languages, pages 298–310, San Francisco, January 1995. 7. Lars Birkedal, Nick Rothwell, Mads Tofte, and David N. Turner. The ML Kit (version 1). Technical Report 93/14, Department of Computer Science, University of Copenhagen, 1993. 8. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In Twenty-Third ACM Symposium on Principles of Programming Languages, pages 171–183, St. Petersburg, January 1996. 9. Val Breazu-Tannen, Thierry Coquand, Carl A. Gunter, and Andre Scedrov. Inheritance as implicit coercion. Information and Computation, 93:172–221, 1991.

44

Greg Morrisett et al.

10. Karl Crary. KML Reference Manual. Department of Computer Science, Cornell University, 1996. 11. Karl Crary. Foundations for the implementation of higher-order subtyping. In ACM SIGPLAN International Conference on Functional Programming, pages 125–135, Amsterdam, June 1997. 12. Allyn Dimock, Robert Muller, Franklyn Turbak, and J. B. Wells. Strongly typed flow-directed reprsentation transformations. In ACM SIGPLAN International Conference on Functional Programming, pages 11–24, Amsterdam, June 1997. 13. Amer Diwan, David Tarditi, and Eliot Moss. Memory subsystem performance of programs using copying garbage collection. In Twenty-First ACM Symposium on Principles of Programming Languages, pages 1–14, January 1994. 14. Amer Diwan, David Tarditi, and Eliot Moss. Memory system performance of programs with intensive heap allocation. ACM Transactions on Computer Systems, 13(3):244–273, August 1995. 15. Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing control in the presence of first-class continuations. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 66–77, June 1990. Published as SIGPLAN Notices, 25(6). 16. Intel Corporation. Intel Architecture Optimization Manual. Intel Corporation, P.O. Box 7641, Mt. Prospect, IL, 60056-7641, 1997. 17. David Kranz, R. Kelsey, J. Rees, P. R. Hudak, J. Philbin, and N. Adams. ORBIT: An optimizing compiler for Scheme. In Proceedings of the ACM SIGPLAN ’86 Symposium on Compiler Construction, pages 219–233, June 1986. 18. P. J. Landin. The mechanical evaluation of expressions. Computer J., 6(4):308–20, 1964. 19. Xavier Leroy. Unboxed objects and polymorphic typing. In Nineteenth ACM Symposium on Principles of Programming Languages, pages 177–188, Albuquerque, January 1992. 20. Tim Lindholm and Frank Yellin. The Java Virtual Machine Specification. AddisonWesley, 1996. 21. Y. Minamide, G. Morrisett, and R. Harper. Typed closure conversion. In TwentyThird ACM Symposium on Principles of Programming Languages, pages 271–283, St. Petersburg, January 1996. 22. Gregory Morrisett. Compiling with Types. PhD thesis, Carnegie Mellon University, 1995. Published as CMU Technical Report CMU-CS-95-226. 23. Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From System F to typed assembly language. In Twenty-Fifth ACM Symposium on Principles of Programming Languages, San Diego, January 1998. Extended version published as Cornell University technical report TR97-1651, November 1997. 24. G. Morrisett, D. Tarditi, P. Cheng, C. Stone, R. Harper, and P. Lee. The TIL/ML compiler: Performance and safety through types. In Workshop on Compiler Support for Systems Software, Tucson, February 1996. 25. Simon L. Peyton Jones, Cordelia V. Hall, Kevin Hammond, Will Partain, and Philip Wadler. The Glasgow Haskell compiler: a technical overview. In Proc. UK Joint Framework for Information Technology (JFIT) Technical Conference, July 1993. 26. John C. Reynolds. Types, abstraction and parametric polymorphism. In Information Processing ’83, pages 513–523. North-Holland, 1983. Proceedings of the IFIP 9th World Computer Congress.

Stack-Based Typed Assembly Language

45

27. John Reynolds. Using functor categories to generate intermediate code. In TwentySecond ACM Symposium on Principles of Programming Languages, pages 25–36, San Francisco, January 1995. 28. Zhong Shao. Flexible representation analysis. In ACM SIGPLAN International Conference on Functional Programming, pages 85–98, Amsterdam, June 1997. 29. Z. Shao. An overview of the FLINT/ML compiler. In Workshop on Types in Compilation, Amsterdam, June 1997. ACM SIGPLAN. Published as Boston College Computer Science Dept. Technical Report BCCS-97-03. 30. Raymie Stata and Mart´ın Abadi. A type system for java bytecode subroutines. In Twenty-Fifth ACM Symposium on Principles of Programming Languages, San Diego, January 1998. 31. Guy L. Steele Jr. Rabbit: A compiler for Scheme. Master’s thesis, MIT, 1978. 32. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 181–192, Philadelphia, May 1996. 33. Mads Tofte and Jean-Pierre Talpin. Implementation of the typed call-by-value λcalculus using a stack of regions. In Twenty-First ACM Symposium on Principles of Programming Languages, pages 188–201, January 1994. 34. Mitchell Wand. Continuation-based multiprocessing. In Proceedings of the 1980 LISP Conference, pages 19–28, August 1980.

A

Formal STAL Semantics

This appendix contains a complete technical description of our calculus, STAL. The STAL abstract machine is very similar to the TAL abstract machine (described in detail in Morrisett et al. [23]). The syntax appears in Figure 6. The operational semantics is given as a deterministic rewriting system in Figure 7. The notation a[b/c] denotes capture avoiding substitution of b for c in a. The notation a{b 7→ c} represents map update:  {b 7→ c, b1 7→ c1 , . . . , bn 7→ cn },    if b ∈ / {b1 , . . . , bn } {b1 7→ c1 , b2 7→ c2 , . . . , bn 7→ cn }{b 7→ c} = {b1 7→ c, b2 7→ c2 , . . . , bn 7→ cn },    if b = b1 To make the presentation simpler for the branching rules, some extra notation is used for expressing sequences of type and stack type instantiations. We introduce a new syntactic class (ψ) for type sequences: ψ ::= · | τ, ψ | σ, ψ The notation w[ψ] stands for the obvious iteration of instantiations; the substitution notation I[ψ/∆] is defined by: I[·/·] = I I[τ, ψ/α, ∆] = I[τ /α][ψ/∆] I[σ, ψ/ρ, ∆] = I[σ/ρ][ψ/∆]

46

Greg Morrisett et al.

The static semantics is similar to TAL’s but requires extra judgments for definitional equality of types, stack types, and register file types and uses a more compositional style for instructions. Definitional equality is needed because two stack types (such as (int ::nil ) ◦ (int::nil) and int ::int ::nil ) may be syntactically different but represent the same type. The judgments are summarized in Figure 8, the rules for type judgments appear in Figure 9, and the rules for term judgments appear in Figures 10 and 11. The notation ∆0 , ∆ denotes appending ∆0 to the front of ∆, that is: ·, ∆ = ∆ (α, ∆ ), ∆ = α, (∆0, ∆) 0

(ρ, ∆0 ), ∆ = ρ, (∆0 , ∆) As with TAL, STAL is type sound: Proposition A1 (Type Soundness) If ` P and P 7−→∗ P 0 then P 0 is not stuck. This proposition is proved using the following two lemmas. Lemma 1 (Subject Reduction). If ` P and P 7−→ P 0 then ` P 0 . A well-formed terminal state has the form (H, R{r1 7→ w}, halt [τ ]) where there exists a Ψ such that ` H : Ψ and Ψ ; · ` w : τ wval. Lemma 2 (Progress). If ` P then either P is a well-formed terminal state or there exists P 0 such that P 7−→ P 0 .

Stack-Based Typed Assembly Language

47

types stack types initialization flags label assignments type assignments register assignments

τ σ ϕ Ψ ∆ Γ

::= α | int | ns | ∀[∆].Γ | hτ1ϕ1 , . . . , τnϕn i | ∃α.τ | ptr(σ) ::= ρ | nil | τ ::σ | σ1 ◦ σ2 ::= 0 | 1 ::= {`1 :τ1 , . . . , `n :τn } ::= · | α, ∆ | ρ, ∆ ::= {r1 :τ1 , . . . , rn :τn , sp:σ}

registers word values small values heap values heaps register files stacks

r w v h H R S

::= r1 | · · · | rk ::= ` | i | ns | ?τ | w[τ ] | w[σ] | pack [τ, w] as τ 0 | ptr (i) ::= r | w | v[τ ] | v[σ] | pack [τ, v] as τ 0 ::= hw1, . . . , wni | code[∆]Γ.I ::= {`1 7→ h1 , . . . , `n 7→ hn } ::= {r1 7→ w1 , . . . , rn 7→ wn, sp 7→ S} ::= nil | w::S

instructions

arithmetic ops branch ops instruction sequences programs

ι ::= aop rd , rs , v | bop r, v | ld rd , rs (i) | malloc r[~τ ] | mov rd , v | mov sp, rs | mov rd , sp | salloc n | sfree n | sld rd , sp(i) | sld rd , rs (i) | sst sp(i), rs | sst rd (i), rs | st rd (i), rs | unpack [α, rd ], v aop ::= add | sub | mul bop ::= beq | bneq | bgt | blt | bgte | blte I ::= ι; I | jmp v | halt [τ ] P ::= (H, R, I)

Fig. 6. Syntax of STAL

48

Greg Morrisett et al.

(H, R, I) 7−→ P where then P = ˆ (H, R{rd 7→ R(rs ) + R(v)}, I 0) and similarly for mul and sub beq r, v; I 0 (H, R, I 0 ) when R(r) 6= 0 and similarly for bneq, blt, etc. beq r, v; I 0 (H, R, I 00 [ψ/∆]) ˆ when R(r) = 0 where R(v) = `[ψ] and H(`) = code[∆]Γ.I 00 and similarly for bneq, blt, etc. jmp v (H, R, I 0 [ψ/∆]) ˆ where R(v) = `[ψ] and H(`) = code[∆]Γ.I 0 ld rd , rs (i); I 0 (H, R{rd 7→ wi }, I 0 ) where R(rs ) = ` and H(`) = hw0 , . . . , wn−1 i and 0 ≤ i < n malloc rd [τ1 , . . . , τn ]; I 0 (H{` 7→ h?τ1 , . . . , ?τn i}, R{rd 7→ `}, I 0 ) where ` 6∈ H ˆ mov rd , v; I 0 (H, R{rd 7→ R(v)}, I 0) 0 mov rd , sp; I (H, R{rd 7→ ptr (|S|)}, I 0 ) 0 mov sp, rs ; I (H, R{sp 7→ wj :: · · · ::w1 ::nil}, I 0) where R(sp) = wn:: · · · ::w1 ::nil and R(rs ) = ptr (j) with 0 ≤ j ≤ n salloc n; I 0 (H, R{sp 7→ ns:: · · ::ns} ::R(sp)}, I 0 ) | ·{z if I = add rd , rs , v; I 0

sfree n; I 0 sld rd , sp(i); I 0 sld rd , rs (i); I 0 sst sp(i), rs ; I 0 sst rd (i), rs ; I 0 st rd (i), rs ; I 0 unpack [α, rd ], v; I 0

n

(H, R{sp 7→ S}, I 0 ) where R(sp) = w1 :: · · · ::wn ::S (H, R{rd 7→ wi }, I 0 ) where R(sp) = w0 :: · · · ::wn−1 ::nil and 0 ≤ i < n (H, R{rd 7→ wj−i }, I 0 ) where R(rs ) = ptr (j) and R(sp) = wn :: · · · ::w1 ::nil and 0 ≤ i < j ≤ n (H, R{sp 7→ w0 :: · · · ::wi−1 ::R(rs )::S}, I 0 ) where R(sp) = w0 :: · · · ::wi::S and 0 ≤ i (H, R{sp 7→ wn :: · · · ::wj−i+1 ::R(rs )::wj−i−1 :: · · · ::w1::nil}, I 0 ) where R(rd ) = ptr(j) and R(sp) = wn :: · · · ::w1 ::nil and 0 ≤ i < j ≤ n (H{` 7→ hw0, . . . , wi−1, R(rs ), wi+1 , . . . , wn−1 i}, R, I 0 ) where R(rd ) = ` and H(`) = hw0 , . . . , wn−1 i and 0 ≤ i < n (H, R{rd 7→ w}, I 0 [τ /α]) ˆ where R(v) = pack [τ, w] as τ 0

 R(r)  w

when when ˆ Where R(v) = ˆ 0 when   R(v )[τ ] ˆ 0 )] as τ 0 when pack [τ, R(v

v v v v

=r =w = v 0 [τ ] = pack [τ, v 0 ] as τ 0

Fig. 7. Operational Semantics of STAL

Stack-Based Typed Assembly Language

Judgement ∆`τ ∆`σ `Ψ

Meaning τ is a valid type σ is a valid stack type Ψ is a valid heap type (no context is used because heap types must be closed) ∆`Γ Γ is a valid register file type ∆ ` τ1 = τ2 τ1 and τ2 are equal types ∆ ` σ1 = σ2 σ1 and σ2 are equal stack types ∆ ` Γ1 = Γ2 Γ1 and Γ2 are equal register file types ∆ ` τ1 ≤ τ2 τ1 is a subtype of τ2 ∆ ` Γ1 ≤ Γ2 Γ1 is a register file subtype of Γ2 `H :Ψ the heap H has type Ψ Ψ `S:σ the stack S has type σ Ψ `R:Γ the register file R has type Γ Ψ ` h : τ hval the heap value h has type τ Ψ ; ∆ ` w : τ wval the word value w has type τ Ψ ; ∆ ` w : τ ϕ fwval the word value w has flagged type τ ϕ (i .e., w has type τ or w is ?τ and ϕ is 0) Ψ ; ∆; Γ ` v : τ the small value v has type τ Ψ ; ∆; Γ ` ι ⇒ ∆0 ; Γ 0 given a context of type Ψ ; ∆; Γ , ι is a well formed instruction and produces a context of type Ψ ; ∆0 ; Γ 0 Ψ ; ∆; Γ ` I I is a valid sequence of instructions `P P is a valid program

Fig. 8. Static Semantics of STAL (judgments)

49

50

Greg Morrisett et al.

∆`τ

∆`σ

`Ψ

∆`τ =τ ∆`τ ∆ ` τ1 = τ2

∆`Γ ∆`σ=σ ∆`σ

∆ ` σ1 = σ2

· ` τi ` {`1 7→ τ1 , . . . , `n 7→ τn }

∆`Γ =Γ ∆`Γ

∆ ` Γ1 = Γ2

∆ ` τ2 = τ1 ∆ ` τ1 = τ2

∆ ` τ1 = τ2 ∆ ` τ2 = τ3 ∆ ` τ1 = τ3

∆ ` σ2 = σ1 ∆ ` σ1 = σ2

∆ ` σ1 = σ2 ∆ ` σ2 = σ3 ∆ ` σ1 = σ3 (α ∈ ∆) ∆`α=α ∆ ` int = int 0 ∆ , ∆ ` Γ1 = Γ2 ∆ ` τi = τi0 ϕ ϕ ϕ ∆ ` ∀[∆0 ].Γ1 = ∀[∆0 ].Γ2 ∆ ` hτ1 1 , . . . , τnϕn i = hτ10 1 , . . . , τn0 n i α, ∆ ` τ1 = τ2 ∆ ` σ 1 = σ2 ∆ ` ∃α.τ1 = ∃α.τ2 ∆ ` ns = ns ∆ ` ptr (σ1 ) = ptr(σ2 ) ∆`ρ=ρ

(ρ ∈ ∆)

∆ ` nil = nil

∆ ` σ1 = σ10 ∆ ` σ2 = σ20 ∆ ` τ 1 = τ 2 ∆ ` σ1 = σ2 ∆ ` τ1 ::σ1 = τ2 ::σ2 ∆ ` σ1 ◦ σ2 = σ10 ◦ σ20 ∆`σ ∆`σ ∆ ` nil ◦ σ = σ ∆ ` σ ◦ nil = σ ∆ ` τ ∆ ` σ1 ∆ ` σ 2 ∆ ` (τ ::σ1 ) ◦ σ2 = τ ::(σ1 ◦ σ2 ) ∆ ` σ1 ∆ ` σ2 ∆ ` σ3 ∆ ` (σ1 ◦ σ2 ) ◦ σ3 = σ1 ◦ (σ2 ◦ σ3 ) ∆ ` σ = σ0 ∆ ` τi = τi0 ∆ ` {sp:σ, r1 7→ τ1 , . . . , rn 7→ τn } = {sp:σ0 , r1 :τ10 , . . . , rn :τn0 } ∆ ` τ1 ≤ τ2

∆ ` Γ1 ≤ Γ2 ∆ ` τ1 = τ2 ∆ ` τ1 ≤ τ2

∆`

∆ ` τ1 ≤ τ2 ∆ ` τ2 ≤ τ3 ∆ ` τ1 ≤ τ3 ∆ ` τi ϕi−1 ϕi+1 ≤ hτ1ϕ1 , . . . , τi−1 , τi0 , τi+1 , . . . , τnϕn i

ϕi−1 ϕi+1 hτ1ϕ1 , . . . , τi−1 , τi1 , τi+1 , . . . , τnϕn i

∆ ` σ = σ0

∆ ` τi = τi0

(for 1 ≤ i ≤ n)

∆ ` {sp:σ, r1 :τ1 , . . . , rm :τm } ≤ {sp:σ

∆ ` τi 0

(for n < i ≤ m)

, r1 :τ10 , . . . , rn :τn0 }

Fig. 9. Static Semantics of STAL, Judgments for Types

(m ≥ n)

Stack-Based Typed Assembly Language

`P

`H:Ψ

Ψ `S:σ

Ψ `R:Γ `H :Ψ

Ψ ` R : Γ Ψ ; ·; Γ ` I ` (H, R, I)

` Ψ Ψ ` hi : τi hval (Ψ = {`1 :τ1 , . . . , `n :τn }) ` {`1 7→ h1 , . . . , `n 7→ hn } : Ψ Ψ ; · ` w : τ wval Ψ ` S : σ Ψ ` nil : nil Ψ ` w::S : τ ::σ Ψ ` S : σ Ψ ; · ` wi : τi wval (for 1 ≤ i ≤ n) (m ≥ n) Ψ ` {sp 7→ S, r1 7→ w1 , . . . , rm 7→ wm } : {sp:σ, r1 :τ1 , . . . , rn :τn } Ψ ` h : τ hval

Ψ ; ∆ ` w : τ wval

Ψ ; ∆ ` w : τ ϕ fwval

Ψ ; · ` wi : τiϕi fwval

∆ ` Γ Ψ ; ∆; Γ ` I Ψ ` code[∆]Γ.I : ∀[∆].Γ hval

Ψ ` hw1 , . . . , wn i : hτ1ϕ1 , . . . , τnϕn i hval ∆ ` τ1 ≤ τ2 (Ψ (`) = τ1 ) Ψ ; ∆ ` ` : τ2 wval Ψ ; ∆ ` w : ∀[α, ∆0 ].Γ wval

∆`τ

Ψ ; ∆ ` i : int wval

∆`σ

Ψ ; ∆ ` w[τ ] : ∀[∆0 ].Γ [τ /α] wval ∆`τ

Ψ ; ∆ ` w : ∀[ρ, ∆0 ].Γ wval

Ψ ; ∆ ` w[σ] : ∀[∆0 ].Γ [σ/ρ] wval

0

Ψ ; ∆ ` w : τ [τ /α] wval

Ψ ; ∆ ` pack [τ, w] as ∃α.τ 0 : ∃α.τ 0 wval ∆`σ (|σ| = i) Ψ ; ∆ ` ptr(i) : ptr (σ) wval Ψ ; ∆ ` w : τ wval Ψ ; ∆ ` w : τ ϕ fwval ∆`τ

Ψ ; ∆; Γ ` v : τ

Ψ ; ∆; Γ ` r : τ

Ψ ; ∆; Γ ` v : ∀[α, ∆0 ].Γ 0 0

0

Ψ ; ∆; Γ ` v[τ ] : ∀[∆ ].Γ [τ /α] ∆`τ

Ψ ; ∆ ` ns : ns wval ∆`τ Ψ ; ∆ ` ?τ : τ 0 fwval

(Γ (r) = τ ) ∆`σ

Ψ ; ∆ ` w : τ wval Ψ ; ∆; Γ ` w : τ

Ψ ; ∆; Γ ` v : ∀[ρ, ∆0 ].Γ 0

Ψ ; ∆; Γ ` v[σ] : ∀[∆0 ].Γ 0 [σ/ρ]

Ψ ; ∆; Γ ` v : τ 0 [τ /α]

Ψ ; ∆; Γ ` pack [τ, v] as ∃α.τ 0 : ∃α.τ 0 ∆ ` τ1 = τ2 Ψ ; ∆ ` w : τ2 wval · ` τ1 = τ2 Ψ ` h : τ2 hval Ψ ` h : τ1 hval Ψ ; ∆ ` w : τ1 wval ∆ ` τ1 = τ2 Ψ ; ∆; Γ ` v : τ2 Ψ ; ∆; Γ ` v : τ1 Ψ ; ∆; Γ ` I Ψ ; ∆; Γ ` ι ⇒ ∆0 ; Γ 0 Ψ ; ∆0 ; Γ 0 ` I Ψ ; ∆; Γ ` ι; I

∆ ` Γ1 ≤ Γ2 Ψ ; ∆; Γ1 ` v : ∀[ ].Γ2 Ψ ; ∆; Γ1 ` jmp v

∆ ` τ Ψ ; ∆; Γ ` r1 : τ Ψ ; ∆; Γ ` halt [τ ]

Fig. 10. STAL Static Semantics, Term Constructs except Instructions

51

52

Greg Morrisett et al.

Ψ ; ∆; Γ ` ι ⇒ ∆0 ; Γ 0 Ψ ; ∆; Γ ` rs : int Ψ ; ∆; Γ ` v : int Ψ ; ∆; Γ ` aop rd , rs , v ⇒ ∆; Γ {rd :int} Ψ ; ∆; Γ1 ` r : int Ψ ; ∆; Γ1 ` v : ∀[ ].Γ2 ∆ ` Γ1 ≤ Γ2 Ψ ; ∆; Γ1 ` bop r, v ⇒ ∆; Γ1 ϕ

n−1 Ψ ; ∆; Γ ` rs : hτ0ϕ0 , . . . , τn−1 i

(ϕi = 1 ∧ 0 ≤ i < n) Ψ ; ∆; Γ ` ld rd , rs (i) ⇒ ∆; Γ {rd :τi } ∆ ` τi Ψ ; ∆; Γ ` malloc r[τ1 , . . . , τn ] ⇒ ∆; Γ {r:hτ10 , . . . , τn0 i} Ψ ; ∆; Γ ` v : τ Ψ ; ∆; Γ ` mov rd , v ⇒ ∆; Γ {rd :τ } Ψ ; ∆; Γ ` mov rd , sp ⇒ ∆; Γ {rd :ptr (σ)} Ψ ; ∆; Γ ` rs : ptr (σ2 ) ∆ ` σ1 = σ3 ◦ σ2 Ψ ; ∆; Γ ` mov sp, rs ⇒ ∆; Γ {sp:σ2 }

(Γ (sp) = σ) (Γ (sp) = σ1 ) (Γ (sp) = σ)

Ψ ; ∆; Γ ` salloc n ⇒ ∆; Γ {sp: ns:: · · ::ns} ::σ} | ·{z n

∆ ` σ1 = τ0 :: · · · ::τn−1 ::σ2 (Γ (sp) = σ1 ) Ψ ; ∆; Γ ` sfree n ⇒ ∆; Γ {sp:σ2 } ∆ ` σ1 = τ0 :: · · · ::τi ::σ2 (Γ (sp) = σ1 ∧ 0 ≤ i) Ψ ; ∆; Γ ` sld rd , sp(i) ⇒ ∆; Γ {rd :τi} Ψ ; ∆; Γ ` rs : ptr(σ3 ) ∆ ` σ1 = σ2 ◦ σ3 ∆ ` σ3 = τ0 :: · · · ::τi ::σ4

(Γ (sp) = σ1 ∧ 0 ≤ i) Ψ ; ∆; Γ ` sld rd , rs (i) ⇒ ∆; Γ {rd :τi } ∆ ` σ1 = τ0 :: · · · ::τi ::σ2 Ψ ; ∆; Γ ` rs : τ (Γ (sp) = σ1 ∧ 0 ≤ i) Ψ ; ∆; Γ ` sst sp(i), rs ⇒ ∆; Γ {sp:τ0 :: · · · ::τi−1 ::τ ::σ2 } Ψ ; ∆; Γ ` rd : ptr (σ3 ) Ψ ; ∆; Γ ` rs : τ ∆ ` σ1 = σ 2 ◦ σ 3 ∆ ` σ3 = τ0 :: · · · ::τi ::σ4 ∆ ` σ5 = τ0 :: · · · ::τi−1 ::τ ::σ4 Ψ ; ∆; Γ ` sst rd (i), rs ⇒ ∆; Γ {sp:σ2 ◦ σ5 , rd :ptr(σ5 )} ϕ

n−1 Ψ ; ∆; Γ ` rd : hτ0ϕ0 , . . . , τn−1 i

Ψ ; ∆; Γ ` st rd (i), rs ⇒

(Γ (sp) = σ1 ∧ 0 ≤ i)

Ψ ; ∆; Γ ` rs : τi

ϕi−1 ϕi+1 ϕn−1 ∆; Γ {rd :hτ0ϕ0 , . . . , τi−1 , τi1 , τi+1 , . . . , τn−1 i}

Ψ ; ∆; Γ ` v : ∃α.τ (α 6∈ ∆) Ψ ; ∆; Γ ` unpack [α, rd ], v ⇒ α, ∆; Γ {rd :τ }

Fig. 11. STAL Static Semantics, Instructions

(0 ≤ i < n)

How Generic Is a Generic Back End? Using MLRISC as a Back End for the TIL Compiler? (Preliminary Report) Andrew Bernard?? , Robert Harper, and Peter Lee School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213

Abstract. We describe the integration of MLRISC, a “generic” compiler back end, with TIL, a type-directed compiler for Standard ML. The TIL run-time system uses a form of type information to enable partially tag-free garbage collection. We show how we propagate this information through the final phases of the compiler, even though the back end is unaware of the existence of this information. Additionally, we identify the characteristics of MLRISC that enable us to use it with TIL and suggest ways in which it might better support our compiler. Preliminary performance measurements show that we pay a significant cost for using MLRISC, relative to a custom back end.

1

Introduction

We describe how we integrated MLRISC, a “generic” compiler back end, with TIL, a type-directed compiler for the Standard ML (SML) programming language. A type-directed compiler uses variable type information to guide successive translations between intermediate languages [13]. Type-directed compilers rely on complete variable type information for most or all phases of compilation— thus, types are preserved by intermediate code transformations during each phase. A generic compiler back end translates a low-level intermediate language ?

??

This research was sponsored in part by the Advanced Research Projects Agency CSTO under the title “The Fox Project: Advanced Languages for Systems Software”, ARPA Order No. C533, issued by ESC/ENS under Contract No. F19628-95-C-0050, and in part by the National Science Foundation under Grant No. CCR-9502674. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the Defense Advanced Research Projects Agency, the National Science Foundation, or the U.S. Government. This material is based on work supported under under a National Science Foundation Graduate Fellowship. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 53–77, 1998. c Springer-Verlag Berlin Heidelberg 1998

54

Andrew Bernard, Robert Harper, and Peter Lee

into machine code—the intention is that the back end does not depend on a particular source language or front-end implementation technology. A compiler back end could itself be type directed—both Typed Assembly Language [12] and Proof-Carrying Code [14] [15] encode variable type information at the assemblylanguage level—although this is not common practice, and, as a matter of fact, MLRISC is not type directed. TIL translates a source program through a succession of typed intermediate languages until it arrives at conventional assembly code. The typed intermediate languages used in TIL specify types explicitly at all variable binding sites. Thus, TIL can always determine the type of a variable without having to resort to type inference. The universal availability of type information permits TIL to check types after any phase of compilation—this helps to ensure the correctness of compiler optimizations during development. Type information also allows TIL to perform additional optimizations that are not directly available to conventional compilers [13] [10]. A principal benefit of type-directed compilation is that it facilitates sound tag-free garbage collection [18]. Sound garbage collection requires that the heap pointers used by an executing program be identified unambiguously. Tag-free garbage collection permits these heap pointers to be identified without perturbing the run-time representations of values. In TIL’s run-time model, word-sized values (e.g., integers and heap pointers) are not tagged, whereas composite values contain tags for their constituent locations. The location of a word-sized value is tagged instead of the value itself— this “out-of-band” tagging scheme allows a single location tag, or trace value, to identify many different run-time values. A trace table is a static encoding of trace values either for a given procedure activation or for a set of static storage locations. A procedure activation requires trace values for the machine registers used by the procedure as well as its stack frame slots. A complete set of trace tables can be synthesized at compile time for a program because all the possible procedure activation shapes can be statically determined. RTL (Register Transfer Language) is the lowest-level intermediate language used in TIL. A back end for TIL thus translates RTL to assembly language. RTL is an imperative language that resembles the instruction set of a RISC processor, but it also provides complex primitives that are tailored to SML. An RTL pseudo register identifies a procedure-local, word-sized storage location: pseudo registers are mapped to machine registers and stack slots by the back end. RTL is not a typed intermediate language, but RTL does annotate pseudo registers with trace values; these are similar to, but distinct from, run-time trace values (the latter are a translation of the former). There are actually two versions of the TIL compiler: TIL1 [17] [13] is the first-generation compiler; its successor is called TILT. We will refer to the TIL compiler when the discussion applies to either compiler interchangeably. Both compilers share a common RTL language: the back end of TIL1 translates RTL directly to assembly language. This back end operates on RTL itself and explicitly propagates trace values through the spilling and register allocation phases.

How Generic Is a Generic Back End?

55

As only the trace value component of this back end was customized for TIL1, it largely duplicates other work. MLRISC1 , on the other hand, is a compiler back end implemented in SML that transforms an abstract intermediate language into the assembly language for a particular processor architecture. Taking RTL, MLRISC, and the TIL runtime system as given, our task is to transform RTL code into MLRISC code, and to make the object code generated by MLRISC compatible with the TIL run-time system. We impose an additional constraint on our implementation: we may not customize the interface of MLRISC specifically for TIL, as this would make it necessary to track such customizations across new versions of MLRISC. As MLRISC does not propagate trace information, we cannot use it as a “drop in” replacement for the TIL1 back end—we must have an additional mechanism that derives run-time trace values from RTL trace values. Run-time trace values are encoded with references to machine register numbers and stack slots. This means that our mechanism must operate in concert with MLRISC, because the global register allocation and spilling phases of MLRISC assign these locations. This is the principal difficulty addressed by our work: how do we translate abstract trace values to concrete trace values “in parallel” with the abstract-toconcrete code translation performed by MLRISC? Note that significant correctness questions are inevitably raised by the specification of such a translation, because trace values represent invariants—much in the same way that types represent invariants—that may be perturbed by the back end. Trace tables betray an implicit expectation by the run-time system that the object code produced by the back end will (loosely) reflect the original type structure of the program. As the back end is presented with no explicit type structure, how can we expect its code transformations to respect such a type structure? Another contribution of our work is that we describe the implicit constraints that we expect MLRISC to satisfy to ensure the soundness of the trace value translation. In the remainder of this paper, we focus on how TIL communicates trace information to the run-time system by way of MLRISC. In Section 2 we detail the compiler and run-time system, whereas in Section 3 we present MLRISC. We discuss in Section 4 the techniques we use to marry MLRISC to RTL and the TIL run-time system, and Section 5 is an assessment of our experience. In Section 6 we propose improvements to the current implementation, and in Section 7 we draw together summarizing conclusions.

2

TIL

The TIL compiler is characterized by its aggressive use of type information. TIL compiles programs written in the Standard ML ’97 programming language to DEC Alpha assembly language. All transformations (i.e., compiler phases) 1

We chose to use MLRISC as a back end for our compiler because our research does not directly address back end implementation technology. With MLRISC, we hope to leverage the work of a larger group of researchers.

56

Andrew Bernard, Robert Harper, and Peter Lee

in TIL are based on explicitly typed intermediate languages. However, RTL, the lowest-level intermediate language used in TIL, is not typed in the same sense that the other intermediate languages are typed. RTL pseudo registers are tagged with trace values that represent a degenerated form of type information that is tailored to the run-time system. Figure 1 is a depiction of the intermediate languages used in TILT; see Morrisett [13] for a description of the intermediate languages used in TIL1. SML

Abstract Syntax

HIL

MIL

RTL

MLRISC

Assembly Language

Fig. 1. Intermediate Languages in TILT

2.1

HIL and MIL

The TILT elaborator translates programs from abstract syntax to HIL (Highlevel Intermediate Language). HIL is an explicitly-typed refinement of the SML programming language, including the module system; a detailed discussion of HIL is beyond the scope of this paper (see Harper and Stone [11] for further details). HIL is translated to MIL (Mid-level Intermediate Language) by the phase splitter, which is responsible for eliminating modules and breaking abstraction barriers. MIL is a lower-level, explicitly-typed, polymorphic intermediate language that does not provide modules. 2.2

RTL

TILT translates MIL to RTL (Register Transfer Language) after performing closure conversion, determining data representations, and making heap allocation explicit, among other things. RTL resembles an abstract assembly language in which there are an unbounded supply of local pseudo registers for each procedure. Pseudo registers are identified by positive integers and are automatically mapped to machine registers and stack slots by the back end. Each pseudo register is annotated with a trace value that classifies the kinds of values that the register can contain—one can think of trace values as degenerated type information that is present only for the benefit of the run-time system. RTL trace values are derived directly from MIL variable types. Figure 3 contains one possible RTL translation2 of the SML function in Figure 23 . Pseudo registers are given names (e.g., x) in this example to clarify 2

3

This is not the actual RTL code currently produced by TILT for this function: it has been simplified by hand to clarify its correspondence to the original SML code. This correspondence is obscured by the poor RTL code that TILT currently generates. This function was not written to compute anything interesting. Rather, the particular intermediate code it translates to helps to illustrate points later in this paper.

How Generic Is a Generic Back End?

57

the presentation; in an actual RTL program, pseudo registers are identified by positive integers. Variables introduced by the compiler are prefixed by an underscore. Pseudo register trace values are written following the pseudo-register name in parentheses—trace values encode the “traceability” of a pseudo register, perhaps by projecting the contents of another pseudo register (Table 1). Trace values identify pointers into the heap to the run-time system—we explain the role of run-time trace values in Section 2.3. Section 2.3 also contains a discussion of type environments, of which tenv1(trace) is one example. fun f(x, n: int, l: int list, l2: int list) = g(x, n, if length l>0 then hd l else 1)

Fig. 2. An SML Function to Take the Head of a List

trace A pointer into the heap notrace int An integer notrace code A pointer to machine code notrace real A floating-point number label A pointer to data, but not into the heap compute path May be a pointer into the heap: path is an expression that evaluates (at run time) to the actual trace value unset Uninitialized locative A pointer into the middle of an item in the heap (cannot be traced)

Table 1. Trace Values for an RTL Pseudo Register

On entry to the body of the procedure, the argument pseudo registers listed in the procedure header contain the values of the actual arguments passed to the procedure. Similarly, on exit from the body, the result pseudo register listed in the procedure header will be copied into the actual machine-level result register, if necessary. The arguments and results of each procedure call are simply listed in order, as RTL uses an implicit calling convention. When a given pseudo register needs to be moved to/from a specific argument/result machine register according to a particular calling convention, this code is generated as part of the call/entry/exit sequence. 2.3

The Run-Time System

Run-Time Type Information. Certain benefits arise from the use of typed intermediate languages: for example, types can be checked after compiler passes to help ensure correctness. Additionally, type-based information can be used at

58

Andrew Bernard, Robert Harper, and Peter Lee

procedure f: arguments = [_tenv1(trace), x(compute _tenv1(trace).0), n(notrace_int), l(trace), l2(trace)] results = [_t3(notrace_int)] { call "length" arguments = [l(trace)] results = [_t1(notrace_int)] bcndi2 le, _t1(notrace_int), 0, _L1 call "hd" arguments = [l(trace)] results = [_t2(notrace_int)] br _L2 _L1: mv 1, _t2(notrace_int) _L2: call "g" arguments = [_tenv1(trace), x(compute _tenv1(trace).0), n(notrace_int), _t2(notrace_int)] results = [_t3(notrace_int)] }

; arguments to f

; result of f ; _t1 <- length(l) ; (call #1) ; if _t1<=0 goto _L1 ; _t2 <- hd(l) ; (call #2) ; goto _L2 ; _t2 <- 1 ; _t3 <- g(x, n, _t2) ; (call #3)

Fig. 3. A Translation of the code in Figure 2 to RTL

How Generic Is a Generic Back End?

59

run time for dynamic type dispatch and tag-free garbage collection[13]. In TIL, the main ramification for the back end is that the run-time system uses a simple form of type information to reclaim storage. The TIL run-time system uses a tracing, copying garbage collector to reclaim unused values in the heap. When the garbage collector is invoked, it must determine the locations of all the heap pointers that are in use so that it does not reclaim accessible memory. The garbage collector is said to trace these pointers to determine the layout of the objects they address, and to copy these objects to new locations. Traceable pointers may reside in registers, on the stack, in the data segment, or in the heap. However, these locations may also contain a variety of non-pointer values (e.g., word-sized integers) that cannot be safely traced. The “traceability” of a given location is determined by its type: trace values, which are derived directly from types, specify to the garbage collector which locations should be traced. Machine registers and stack frame slots are tagged with trace values according to a static table that is indexed by the return address of an active procedure. Static (i.e., global) storage locations are tagged by a corresponding set of static tables. The header of a heap value contains trace values for its slots. Tagging locations is potentially more efficient than tagging values because a single location tag can be shared for many values. Note that the usual run-time model for garbage collection is not tag free. A typical implementation of tagged garbage collection makes the representations of heap pointers and non-heap pointers disjoint by encoding “tags” in the loworder bits of each word. This approach introduces extra overhead, constrains the range of representable values, and complicates interoperability with other languages. Part of the purpose of TIL is to determine whether these pitfalls can be avoided in a garbage-collected programming language implementation. Representing Type Information. At run time, vestigial type information is represented as type environments and trace tables. Type environments supply type information for variables whose types cannot be resolved at compile time (e.g., polymorphic variables). For example, in Figure 2, the type of x is polymorphic and thus cannot be statically determined by the compiler—x could take the value 3, "three", [1, 1, 1], or any of a number of values that have distinct representations at run time. A type environment for f has the caller pass an explicit representation of x’s type so that both the run-time system and f can operate on it [18] [10]. Type environments are needed only for functions such as f where complete type information is not available at compile time. A type environment is a record of values that encode properties of types that are important to the run-time system. Note that type environments are unlike the explicit value descriptors used in many other language implementations, in that type environments are constructed only in contexts where they are specifically needed, as opposed to being an integral part of every value. Trace tables map machine registers, stack slots, and static locations to trace values. A trace table gives a value of yes to those locations that are known to contain pointers into the heap. Other trace values allow the status of a location

60

Andrew Bernard, Robert Harper, and Peter Lee

to depend on a type environment or on the dynamic caller’s trace table: the trace values that can be attached to a storage location by a trace table are documented in Table 2. This table resembles Table 1 because run-time trace values are derived from the corresponding RTL trace values. By contrast, objects in the heap have special headers that specify trace values for locations in the object. Contains a pointer into the heap Does not contain a pointer into the heap Contains the saved value of a callee-save register: id identifies a machine register in the dynamic caller’s activation whose trace value should be used for this location. stack offset, index A polymorphic location: offset is the offset in the current stack frame of a pointer to a type environment that contains the trace value of this location at index index. global label, index A polymorphic location: label is the label of a type environment in the data segment that contains the trace value of this location at index index. unset Uninitialized impossible Contains a heap pointer, but cannot be traced

yes no callee id

Table 2. Trace Values for a Trace Table Storage Location

Locating Pointers. Trace tables are consulted by the run-time system only when the garbage collector is invoked. This invocation takes the form of a library call from an active procedure that is unable to allocate storage. At this time, the collector must locate and trace all pointers into the heap: this process is nontrivial because pointers can potentially reside in any machine register, stack location, or static variable, and because the pointers themselves contain no identifying information (e.g., tags). For static variables, we create a table in a known location that points to the trace tables for all static regions. Tracing the stack is more difficult because the stack depends on the dynamic behavior of the program. However, there are only a statically computable number of distinct activations, each of which can be determined by the compiler. Thus, we index a static table for each activation record according to the return address of a call site in the corresponding procedure. Because the collector is invoked via a procedure call, we simply use the return address in its activation record to locate the trace table of the most recent stack frame. Each trace table includes the size of its stack frame, so we can use these offsets to “walk up” the stack. This means that we need to generate a trace table for each direct call to the collector and for each call to another procedure that might indirectly call the collector4 . At collection time, a callee-save register initialized by an active procedure may have had its value saved on the stack by another procedure, or the original 4

In practice, we simply assume that all procedures might indirectly call the collector.

How Generic Is a Generic Back End?

61

value may be left intact. The trace table of the most recent procedure activation holds the correct trace values for the machine register file at the time the collector is invoked. If any callee-save registers are not allocated by the most recently called procedure, then their trace values are determined according to the trace table of the next most recently called procedure: this is the function of the callee trace value. A callee trace value is also possible when a callee-save register is saved to the stack (and the register is presumably overwritten)—in this case, the proper stack location is given the callee trace value and the trace table of the next most recent activation record is consulted to determine the status of the stack location. This process can continue as long as the trace value of a location is specified as callee. Example. In Figure 4, we show a possible DEC Alpha Assembly Language translation of the example function of Figure 2, whereas in Figure 5 we document the registers used in Figure 4. $ tenv, $x, $n, $l, and $l2 are assigned to calleesave temporaries in this procedure (e.g., $11, $12, etc. according to the standard calling convention). $ tenv1 contains the type environment for the function. The procedure begins by allocating a stack frame of size 32 and saving the return address and the callee-save registers on the stack. The arguments to the procedure are then moved from registers defined by the calling convention into the corresponding callee-save temporaries. Next, a call is made to length with the value of l as an argument (the standard calling convention requires all calls to jump through $pv). The result of the length call is then compared against zero and a branch is taken to L1 if it is not strictly positive. Assuming the branch is not taken, a call is made to hd and the result is saved in $t2; otherwise $t2 gets the value 1. The two control paths next converge at a call to g with arguments x, n, and $t2—the result of this call becomes the result of f after it restores the caller’s register file from the stack frame. The ldgp instructions are a peculiarity of the Alpha standard calling convention. The important things to notice in Figure 4 are the three call sites (commented (call #n)) for which we must construct trace tables. These trace tables must correctly identify the trace status of values in the register file and the local stack frame at the time of the corresponding call. In Figure 6, we show a trace table for call #2—notice that the trace values of stack slots saving callee-save registers depend on the dynamic caller’s trace table (these slots are given trace status callee n).

3

MLRISC

MLRISC [8] is a generic compiler back end developed by Lal George at Bell Laboratories. MLRISC is “generic” in the sense that it can be used to compile many different programming languages. The interface language to MLRISC, also called “MLRISC”, is essentially an architecture-independent assembly language: MLRISC is thus suited to compiling programming languages for which a translation to assembly language is feasible; to date, MLRISC has been used to compile SML

62

f:

Andrew Bernard, Robert Harper, and Peter Lee

ldgp subl stl stl stl stl stl stl mov mov mov mov mov stl mov lda jsr ldgp cmple bne mov lda jsr ldgp mov br _L1: lda _L2: mov mov mov mov lda jsr ldgp ldl ldl ldl ldl ldl ldl addl jmp

$gp, 0($pv) $sp, 32, $sp $ra, 0($sp) $_tenv, 8($sp) $x, 12($sp) $n, 16($sp) $l, 20($sp) $l2, 24($sp) $arg0, $_tenv $arg1, $x $arg2, $n $arg3, $l $arg4, $l2 $_tenv, 28($sp) $l, $arg0 $pv, length $ra, ($pv) $gp, 0($ra) $res, 0, $t0 $t0, _L1 $l, $arg0 $pv, hd $ra, ($pv) $gp, 0($ra) $res, $t2 $zero, _L2 $t2, 1 $_tenv, $arg0 $x, $arg1 $n, $arg2 $t2, $arg3 $pv, g $ra, ($pv) $gp, 0($ra) $ra, 0($sp) $_tenv, 8($sp) $x, 12($sp) $n, 16($sp) $l, 20($sp) $l2, 24($sp) $sp, 32, $sp $zero, ($ra)

; ; ; ;

set global pointer alloc frame save return address save callee save

; get arguments

; save type environment ; $t1 <- length(l) ; (call #1) ; set global pointer ; if $t1<=0 goto _L1 ; $t2 <- hd(l) ; (call #2) ; set global pointer ; goto _L2 ; $t2 <- 1 ; $t3 <- g(x, n, $t2)

; ; ; ;

(call #3) set global pointer restore return address restore callee save

; dealloc frame ; return

Fig. 4. A Translation of the SML function in Figure 2 to DEC Alpha Assembly Language

How Generic Is a Generic Back End?

63

$argn Argument n $res Result $tn Caller-save temporary n $zero Always zero

$sp Stack pointer $pv Call address $ra Return address

Fig. 5. Registers Used in Figure 4 $ tenv yes $x stack 28, 0 $n no 8($sp) callee 12($sp) callee 16($sp) callee 20($sp) callee 24($sp) callee 28($sp) yes

Always trace Trace according to $ tenv Never trace

$ tenv Use dynamic $x Use dynamic $n Use dynamic $l Use dynamic $l2 Use dynamic Always trace

caller’s caller’s caller’s caller’s caller’s

trace trace trace trace trace

value value value value value

for for for for for

$ tenv $x $n $l $l2

Fig. 6. A Trace Table for Call #2 of Figure 4

and Tiger [3]. Our compiler differs from other compilers using MLRISC [4] [2] [3], however, in that TIL does not use dynamic tag bits to distinguish heap pointers from other word-sized values. In MLRISC, as in RTL, local storage locations are identified by numbered pseudo registers. Pseudo registers are transparently mapped to machine registers or spilled to the stack by MLRISC. Pseudo registers in MLRISC, however, carry no trace values or other type information; there are distinct classes of integer, floating-point, and condition-code pseudo registers, but an integer pseudo register that happens to be used as a heap pointer is not distinguished in any way. The principal challenge in integrating MLRISC with TIL, then, is to propagate pseudo-register trace values to the run-time system in the form of run-time trace values. Because pseudo registers are transformed into machine registers and stack slots, trace values for these locations will be based on the code transformations performed by MLRISC (e.g., register allocation, spilling). In Figure 7, we show the SML function in Figure 2 as it might be translated to MLRISC. Pseudo registers are given names (e.g., x) in this example to clarify the presentation; in an actual MLRISC program, pseudo registers are identified by positive integers (e.g., 500). Machine registers are referred to by a small positive integer (e.g., 16) and can be used interchangeably with pseudo registers in MLRISC code. Names generated by the compiler are prefixed by an underscore; tenv1 contains the type environment (see Section 2.3) for the function and cs1 through cs5 are used to hold the saved values of the callee-save registers. Following the Alpha standard calling convention, this code uses machine registers 16 through 20 to hold arguments and register 0 to hold the result; in MLRISC, unlike RTL, calling conventions are explicitly specified in terms of primitive op-

64

Andrew Bernard, Robert Harper, and Peter Lee

erations. An MLRISC procedure is a sequence of imperative statements, each of which may refer to applicative expressions; the terms “statement” and “expression” have the normal connotations of programming language terminology. In Table 3 and Table 4, we document the MLRISC constructs used in this example. Expressions can be nested to an arbitrary depth, so, in general, a single statement can generate many assembly language instructions. bcc cexp, label

Branch to label label if the result of evaluating conditional expression cexp is true. call addr Call the procedure at the address formed by evaluating expression addr. copy dst, src Copy the registers listed in src into the corresponding registers listed in dst ; this is a “parallel” operation: no register can appear more than once in the union of src and dst. copy statements are coalesced [9] by MLRISC whenever possible. mv dst, exp Move the result of evaluating expression exp into register dst. jmp addr Jump to the code at the address formed by evaluating expression addr. ret Return from the current procedure. store32 addr, exp Store the result of evaluating expression exp as a 32-bit value at the address formed by evaluating expression addr.

Table 3. Selected MLRISC Statements

4

Techniques

This section discusses the translation techniques we use to integrate MLRISC with TIL. In Section 4.1 we touch on the technology that translates RTL code to MLRISC code, whereas in Section 4.2 we outline how we construct trace tables for MLRISC from RTL trace values. Finally, in Section 4.3 we justify the correctness of trace values for translated code. 4.1

From RTL to MLRISC

Translating RTL “instructions” to MLRISC “statements” is relatively straightforward—the principal difficulty lies in generating efficient code for conditional branches. RTL provides two forms of conditional branch instruction: one that compares a pseudo register against zero, and one that compares two pseudo registers. The current translation from MIL to RTL favors the former kind of branch, even for comparisons between two pseudo registers. It does this by storing the boolean result of each comparison in a third pseudo register and then testing the third pseudo register against zero. Although this idiom matches the use of conditionals in certain RISC architectures (e.g., the Alpha), it cannot

How Generic Is a Generic Back End?

f: mv mv store32 copy

gp, reg pv sp, sub(reg sp, const frame) add(reg sp, li 0), reg ra [cs1, cs2, cs3, cs4, cs5], [11, 12, 13, 14, 15] copy [_tenv1, x, n, l, l2], [16, 17, 18, 19, 20] store32 add(reg sp, const _tenv1_offset), reg _tenv1 copy [16], [l] mv pv, label "length" call reg pv mv gp, reg pv copy [_t1], [0] bcc cmp(le, reg _t1, li 0), _L1 copy [16], [l] mv pv, label "hd" call reg pv mv gp, reg pv copy [_t2], [0] jmp label _L2 _L1: mv _t2, li 1 _L2: copy [16, 17, 18, 19], [_tenv1, x, n, _t2] mv pv, label "g" call reg pv mv gp, reg pv copy [_t3], [0] copy [0], [_t3] mv ra, add(reg sp, li 0), copy [11, 12, 13, 14, 15], [cs1, cs2, cs3, cs4, cs5] mv sp, add(reg sp, const frame) ret

; set global pointer ; alloc frame ; save return address ; save callee save ; get arguments

; save type environment ; _t1 <- length(l) ; (call #1) ; set global pointer ; if _t1<=0 goto _L1 ; _t2 <- hd(l) ; (call #2) ; set global pointer ; goto _L2 ; _t2 <- 1

; _t3 <- g(x, n, _t2) ; (call #3) ; set global pointer

; restore return address ; restore callee save ; dealloc frame ; return

Fig. 7. A Translation of the SML function in Figure 2 to MLRISC

65

66

Andrew Bernard, Robert Harper, and Peter Lee

add (exp1, exp2 )

Evaluates to the result of adding the results of evaluating exp1 and exp2. cmp (cmp, exp1, exp2 ) Evaluates to true if the result of evaluating exp1 and exp2 are ordered according to comparison cmp. This expression evaluates to a condition code, as opposed to an integer. const fn Evaluates to the result of calling the function fn during the final code generation phase. const allows constants in the final assembly language program to depend on the results of earlier phases (e.g., spilling). label string Evaluates to the address of label string. li n Evaluates to the integer n. load32 addr Evaluates to the result of loading a 32-bit value from the address formed by evaluating expression addr. reg id Evaluates to the contents of pseudo register id. sub (exp1, exp2 ) Evaluates to the result of subtracting the result of evaluating exp2 from the result of evaluating exp1.

Table 4. Selected MLRISC Expressions

be expressed efficiently in MLRISC, because there is no statement to move the result of a comparison directly into an integer pseudo register. We address this problem by “preprocessing” the RTL code into two-operand conditional branch form whenever the result of a compare instruction is used by an immediately following branch instruction, and the boolean result is not used anywhere else in the procedure. Although RTL and MLRISC treat pseudo registers in much the same way (i.e., as local storage locations for a procedure), one cannot interchange the two notions. For example, the MLRISC translation of an RTL instruction that refers to pseudo register 500 cannot simply refer to pseudo register 500, because pseudo registers in MLRISC code must be allocated explicitly through MLRISC. To overcome this difficulty, we maintain a mapping from RTL pseudo registers to MLRISC pseudo registers and allocate a new pseudo register from MLRISC whenever we see an RTL pseudo register that does not have an existing mapping. As MLRISC pseudo registers, unlike RTL pseudo registers, carry no explicit trace values, we construct a separate mapping from MLRISC pseudo registers to runtime trace values. Storing the trace values “off to the side” allows us to forget about RTL pseudo registers entirely for the later phases of the translation. Our translation “forces” certain pseudo registers to spill by manually replacing them with memory accesses; this transformation is accomplished as a separate pass over the MLRISC code just before we pass it to MLRISC. Pseudo registers that are forced to spill include those holding type environments referred to by trace values, those saving callee-save registers in the presence of exception handlers, as well as any global registers that do not fit in the machine register file. Because the trace value of a given pseudo register can refer to a type environment on the stack to resolve its status (e.g., stack in Table 2), we must ensure

How Generic Is a Generic Back End?

67

that these type environments are in fact on the stack and not being held in a machine register. The callee-save registers must be restored to their former values at the end of an exception handler, so we force the pseudo registers that are used to save these registers to be spilled to the stack so that we can later restore them. Finally, for performance, TIL reserves a small number of machine registers to hold global values that are used by most procedures (e.g., the current heap and limit pointers). Unfortunately, certain machine architectures (most notably, the Intel x86) do not have enough registers for this scheme to be feasible, so we rewrite code using these registers with references to global memory locations.

4.2

Constructing Trace Tables

The most interesting part of the translation from RTL to MLRISC is constructing trace tables for call sites. As MLRISC does not explicitly propagate type information, we construct trace tables by passing trace values “around” MLRISC’s code generator. Trace tables are represented as data pseudo operations that are compiled into the data segment of the program. Because trace tables are encoded in terms of machine register numbers and stack offsets, and because trace values are attached indirectly to pseudo registers, we must account for the results of spilling and register allocation during trace table generation. For example, if pseudo register 500 has the run-time trace value yes and is mapped to machine register 12, then a trace table should contain a yes entry for machine register 12. This implies that we must generate code and trace table data in separate phases—first we translate the code to obtain a pseudoregister mapping, then we generate trace table data based on this mapping. We must also generate a single trace table for all the static locations in a module: this is accomplished by mapping the RTL label for each static location to a corresponding MLRISC label. In Figure 8 we illustrate how an RTL module containing procedures and static variables is transformed into MLRISC code statements and data directives. Note that for reasons of expediency, trace tables are generated first in terms of RTL data directives which are then translated to MLRISC data directives. This allows us to reuse the trace table module from the TIL1 back end. Procedures (RTL Code) Register Map (MLRISC) Globals (RTL Data)

Text (MLRISC Code) Trace Tables (RTL Data)

Data (MLRISC Data)

Fig. 8. Generating MLRISC Code and Trace Tables from RTL

68

Andrew Bernard, Robert Harper, and Peter Lee

The results of register allocation and spilling are not difficult to obtain from MLRISC. The mapping from pseudo registers to machine registers is exported as a data structure by the MLRISC interface. We can construct the mapping from spilled pseudo registers to stack offsets because MLRISC spills pseudo registers via a “call back” to our code. It is important to understand that the mappings used by these phases must be accessible for our technique to work—for a back end that does not export this information, we cannot determine how the pseudo registers are represented at run time, and therefore cannot construct trace tables from pseudo-register trace values. This problem is explored further in Section 5.2 4.3

Register Allocation

Suppose that in Figure 7, pseudo-registers l2 (trace value yes) and t2 (trace value no) have both been mapped to machine register 4 by MLRISC’s register allocator. Which trace value should we give machine register 4 when constructing a trace table? Obviously, we cannot resolve this conflict with just the pseudoregister mapping—we should look at the code to determine which pseudo register was defined most recently on the control path to the call site in question. However, because the code may contain arbitrary branches and loops, a linear scan will not resolve this ambiguity in general. Notice that pseudo registers can be mapped to the same machine register only if they have non-overlapping live ranges: otherwise, definitions of the pseudo registers would interfere with each other. Thus, for a given call site we can resolve a conflicting register assignment by choosing the trace value of the pseudo register that is live across the call site [7], as there can be only one. Figure 7 additionally illustrates a deeper problem, in that the correct trace value for machine register 4 at call #3 depends on the run-time contents of pseudo-register t1: if t1 is greater than zero, then machine register 4 will be overwritten by the result of call #2. This example suggests that there are cases where the code generation transformations induced by the back end will make it impossible to give a fixed trace value to a particular machine register. Fortunately, such unpredictable trace values can only arise when none of the pseudo registers in question are live across the call site—otherwise, the generated code would be incorrect, because the definition of one pseudo register could interfere with the later use of another. Returning to our example, we know that neither l2 nor t2 can be used after call #3, because such a use might read the value of the wrong pseudo register. This observation suggests that we must take into account the next use of a pseudo register after the call site as well as its definition before the call site—it is not sufficient to simply note the trace value at the most recent definition. We can give a machine register the trace value no if the contents of that register will not be used after the corresponding call site: because the register will not be used, its contents need not be retained by the garbage collector. This happy coincidence allows us to give the trace value no to machine register 4 for our example.

How Generic Is a Generic Back End?

69

Thus, to construct a trace table for a given call site, we map the pseudo registers live across the site through the pseudo-register mapping and pair the resulting machine registers with the trace values for the corresponding pseudo registers. All other machine registers are given the trace value no. Liveness analysis has the added benefit of minimizing storage retained during garbage collection. This, in turn, enhances performance by reducing the load on the collector and also enables certain programs to terminate that would not otherwise [16]. Our liveness analysis is based on well-understood data-flow techniques [1]. Note that the call-site liveness analysis must be at least as precise as the registerallocation liveness analysis for this technique to work. We give trace values to stack slots holding spilled pseudo registers with an analogous technique—in Figure 9 we show the construction of a trace table for call #2 of Figure 7 from the live pseudo-register set, a run-time trace value mapping, and a sample MLRISC register mapping.

Live Set tenv1 x n

MLRISC Register Map RTL Trace Map tenv1 7→ 11 tenv1 7→ yes x 7→ 12 x 7→ compute tenv1.0 n 7→ 13 n 7→ no ... ... ↓ Trace Table 11 7→ yes 12 7→ stack 12, 0 13 7→ no

Fig. 9. Constructing a Trace Table

5 5.1

Assessment TIL

RTL is not well-suited as a source language for MLRISC because the languages are similar enough that the translation between them is essentially wasted work. The principal difference between RTL and MLRISC is that RTL pseudo registers are annotated with trace information; as we describe in Section 4.1, it is not difficult to simulate this capability for MLRISC pseudo registers, so there is no compelling reason to use RTL as a separate translation step. We thus see the use of RTL as an intermediate language as vestigial: we plan to translate directly from the MIL intermediate language in the future. We originally decided to translate from RTL to expedite development of the compiler, as MIL was in a fluid state of development at the time.

70

Andrew Bernard, Robert Harper, and Peter Lee

5.2

MLRISC

This section seeks to identify the specific features of MLRISC that made it possible for us to integrate it with TIL. We also suggest additional features not found in MLRISC that would have made our job easier, or would have resulted in more efficient code generation. We speculate that our experience may be of use to designers of other generic back-ends [5] [6]. We divide the relevant features into two classes: those that are essential to the propagation of type information, and those that can enhance performance or simplify translation when using type-directed techniques. An underlying theme of our characterization is that the back end needs to do more than simply emit assembly code on behalf of the client—it should also return information about how the translation was accomplished. We first present a brief summary of our conclusions. These are the key features of MLRISC that enable the type-directed translation techniques of our compiler: – – – –

A A A A

visible pseudo-register-to-machine-register mapping machine-register mapping that is unique for a given pseudo register visible pseudo-register-to-stack-slot mapping spilled pseudo register will not also be mapped to a machine register

These are features not found in MLRISC that might enhance the performance of our translation: – Visible liveness information – An extensible spill mechanism Essential Features. As the TIL run-time system uses trace tables that contain machine register numbers, a client-accessible pseudo-register mapping is needed to propagate trace values. Input trace values are attached to pseudo registers, so we must be able to uncover which pseudo registers are mapped to which machine registers if we are to encode trace value mappings for the latter. Although it might be possible to deduce the pseudo-register mapping by comparing the output object code with the input pseudo code, this is likely to be difficult for a back end that performs aggressive optimizations (e.g., global instruction scheduling). The shape of the mapping to machine registers can also present problems to the implementor. MLRISC maps each pseudo register to at most one machine register [9]; thus, when an unspilled pseudo register is live across a call site, we can always precisely identify which machine register it is mapped to. If a single pseudo register might be mapped to one of several machine registers at different points in the code, then MLRISC would need to tell us the mapping ranges for each register. Again, it might be possible to deduce this information from the object code—or even from the pseudo code, if we know the back end’s register allocation algorithm—but such a deduction algorithm is likely to be complex and correspondingly inefficient.

How Generic Is a Generic Back End?

71

Given the mapping to machine registers, we must have a similar mapping to stack slots for pseudo registers that have been spilled transparently by the back end. For MLRISC, there is no special interface to this information, but because we implement the spill mechanism ourselves5 , we can easily reconstruct it. If a back end does not provide a customizable spill mechanism, it must allow the client to query the spill status and location of a given pseudo register so that trace value mappings can be constructed for the stack. Note that to simplify trace table generation, we ensure that a spilled pseudo register will never be mapped to a machine register (i.e., a spilled pseudo register is always on the stack for its entire lifetime). This is accomplished by allocating a new temporary and rewriting the instruction referring to the spilled pseudo register with a reference to the temporary instead; a store instruction (or load in the case of a reload) is then appended (or prepended) to the rewritten instruction. Because the lifetime of the temporary is only between the rewritten instruction and the store (or load), we can assume that it will never be live across a call site [7], and thus need not be traced. Although such a temporary will never be traced, the stack slot containing the original value still might be traced if it is live across the call site in question. Making the “spilled to the stack” and “mapped to a machine register” states exclusive for a given pseudo register simplifies the process of constructing trace tables—if we could not assume this, then we would have to track where the pseudo register is moved to or from the stack and which location(s) (stack slot or register) its next use(s) expect it to be in. As this information is not exported by MLRISC, it is not clear how we would recover it. Our assumption about the lifetimes of spill temporaries may not hold in the presence of global instruction scheduling, but as MLRISC does not currently perform global instruction scheduling, our implementation is sound for the moment. To implement our translation in the presence of global instruction scheduling, we would need access to the liveness analysis of the back end, or we would need to be able to constrain the scheduling of instructions referring to potentially traceable values. Note that the former solution has the added benefit of reducing the overhead incurred by our translation. Desirable Features. It is unfortunate that we must perform a data-flow analysis to determine liveness across call sites, because this work will be largely duplicated by MLRISC’s register allocator. It would be more efficient if we could derive call site liveness from the register allocator’s own liveness information. This might be accomplished in a hypothetical version of MLRISC by returning the pseudo registers that are live into and out of each basic block as part of the translation to assembly language. Additionally, each call site must be isolated in its own basic block: this could be done by TIL if MLRISC were to provide a way to explicitly delimit basic blocks. Note that because spilling is performed in a pass prior to register allocation, liveness information may be lost for spilled 5

When MLRISC decides to spill a pseudo register, it calls a client-supplied function to return an architecture-specific code sequence for the spill.

72

Andrew Bernard, Robert Harper, and Peter Lee

pseudo registers when they are replaced by memory accesses—unless MLRISC retains information showing that they cannot be aliased, it will have to assume that they are always live after the first definition. It might be possible to deduce the liveness of spilled pseudo registers by analyzing the spill and reload patterns via another data-flow analysis, but this seems counterproductive. It is not correct to simply assume that spilled pseudo registers are always live, because the contents of a given stack slot may not be initialized until after the first call site. Our translation to MLRISC includes a “forced spill” pass (see Section 4.1) that replaces certain pseudo registers with memory accesses. As MLRISC implements a similar spilling pass, it would save implementation time if MLRISC were to allow the client to provide an additional spill set as a part of code generation. This would avoid the extra spill pass that currently handles our special spill cases. 5.3

Performance

Benchmarks. Because the optimizer in the TILT compiler is still under construction, we cannot yet take meaningful performance measurements of the object code produced by MLRISC in conjunction with TILT. However, because TILT and TIL1 use the same RTL intermediate language, we can use MLRISC as a back end for the TIL1 compiler: in Table 5, we present the relative execution times of some of the benchmark programs from Tarditi et al. [17]. These measurements show that by using MLRISC as a back end for TIL1, we introduce a significant amount of overhead into the generated code. We believe that this overhead is due to complications in the translation of RTL code to MLRISC code, and is not due to MLRISC itself.

Program FFT Knuth-Bendix Lexgen Life Matmult Simple

TIL1 TIL1-MLRISC TIL1-MLRISC/TIL1 2.02 2.49 1.23 2.28 2.70 1.18 2.66 3.09 1.16 2.07 2.51 1.21 2.66 2.61 0.98 11.91 14.03 1.18

Table 5. TIL1-MLRISC Execution Time Relative to TIL1

The execution times in Table 5 are the time in seconds required to execute the programs on a DEC Alpha 3000/600 workstation with 96mb of RAM. This workstation has a 175MHz Alpha 21064 processor with 8k primary instruction and data caches and a 2mb unified secondary cache. Each figure is the arithmetic mean of ten consecutive runs of the corresponding program. See Tarditi et al. [17] for descriptions of the benchmark programs.

How Generic Is a Generic Back End?

73

We made one change to MLRISC for the purpose of benchmarking: MLRISC ordinarily generates floating-point arithmetic instructions with the sud flags set in the instruction word. Because these instructions are emulated in software on our workstation, we replaced them with the equivalent “garden-variety” instructions (e.g. addt instead of addt/sud). The sud flags control the precise semantics of floating-point operations—see the Alpha Architecture Handbook for more information. As use of the sud flags makes the FFT benchmark about 300 times slower on our workstation, it is not meaningful to take performance measurements with them set. We used a calling convention without integer callee-save registers for these benchmarks because, when used with TIL1, MLRISC often allocates pseudo registers to callee-save registers in such a way that violates the constraints of our trace table encoding. In particular, our encoding requires that the contents of a callee-save register either be saved on the stack or be left in the original machine register during the activation of a procedure. When used with TIL1, however, MLRISC often allocates the pseudo registers used to save the callee-save registers to other (different) callee-save registers—this is not expressible in our trace table encoding. We have encountered this problem with much less frequency when using MLRISC as a back end for TILT, but it remains unresolved. Target Code. The principal techniques outlined in this paper for interfacing MLRISC to TIL operate only on type information, and therefore should not have a direct effect on object code quality. However, there are sources of inefficiency in the code transformations performed by our translation. Additionally, the limitations of these techniques may introduce performance-limiting constraints when used with other back ends. To elaborate on the former point, the details of the translation from RTL to MLRISC has a significant effect on the ultimate quality of the object code, as is indicated by the discussion of conditional branch translation in Section 4.1. It is clear, however, that this particular difficulty arises as an artifact of an unfortunate mismatch between the semantics of conditional values in RTL and MLRISC, and does not represent a general problem with the interaction between TIL and MLRISC. Another valid question might arise about whether the “forced spill” phase outlined in Section 5 will introduce so many new spills as to significantly degrade performance. Although it seems unlikely that the indiscriminate spilling of type environments will have a measurable effect on performance, one cannot so easily dismiss the spilling of the callee-save registers and the pervasive global registers. Note, however, that in each of these cases, spilling is introduced as a consequence of constraints imposed by the run-time system, and not as a consequence of a poor interaction between the compiler and the back end. Thus, the forced spill phase is really a function of the run-time architecture used with TIL and will be required in some form whether or not the object code is generated by MLRISC. A general discussion of the performance of type-directed run-time architectures is beyond the scope of this paper, but see Tarditi et al. [17] and Morrisett [13].

74

Andrew Bernard, Robert Harper, and Peter Lee

A potential performance problem that is directly related to the use of MLRISC as a back end for TIL concerns the constraints that our techniques impose on a back end to simplify trace table generation. These restrictions are discussed in Section 5.2, and although none of them appear to be especially restrictive, it will be difficult to demonstrate this without measurements. Unfortunately, this is particularly awkward to do for our technology, as only one of these constraints (spilled pseudo registers) can be alleviated in MLRISC. Even if we were to remove this limitation, however, we would not be able to execute the resulting code because of the absence of trace tables. It might be more productive to examine individual measurements of these code generation features on other compiler platforms and then use the results as a guide to forming conclusions about the potential drawbacks of our techniques. Compilation Speed. A final performance consideration relates to how the use of our techniques affects the speed of compilation. Because we perform an extra liveness analysis before code generation (see Section 4.3), there is a potential for inefficiency here. Preliminary measurements show that our use of MLRISC has a significant performance cost: the combined RTL-to-MLRISC translation and the subsequent MLRISC code generation phases together perform at less than half the speed of the TIL1 back end when used with TILT. These same measurements also indicate that the bulk of the time is being spent in translation code external to MLRISC; MLRISC on its own is usually faster than the TIL1 back end. Unfortunately, we have not yet isolated the source of this inefficiency: the extra liveness analysis by itself only accounts for a fraction of the translation overhead—it typically consumes less than 5 percent of the total compilation time. It is certainly possible that most of the translation overhead is caused by unoptimized code on our end. In our opinion, it is too early to draw meaningful conclusions about the performance of the translation itself, as there may still be room for substantial optimization.

6

Future Work

In this section, we discuss features of MLRISC that are currently underutilized. MLRISC is able to perform inter-procedural register allocation on procedures in the same call graph. We do not currently take advantage of this feature, but hope to utilize it once we fully understand the complications with regard to trace table construction. Because ML programs typically use function calls for looping, the performance benefits of this optimization may be significant when the compiler has not entirely optimized away procedure calls. MLRISC does not currently perform any global instruction scheduling, but we expect that it will eventually do so. We anticipate that this optimization will introduce complications into our call-site liveness analysis because the live ranges of pseudo registers will be perturbed across basic blocks. For example, if the first definition of a traceable pseudo register is moved forward past a call site, then the garbage collector will trace an uninitialized value at that call site

How Generic Is a Generic Back End?

75

if no corrective action is taken. Because basic blocks in ML programs tend to be so small that local instruction scheduling has little benefit, we think it will be important to find a solution to this problem that does not unduly constrain the back end. This topic is also discussed in Section 5.2 MLRISC provides condition-code pseudo registers in addition to integer and floating-point pseudo registers. We currently do not use these pseudo registers because RTL does not distinguish condition codes from integers. A direct translation from MIL might make it easier to take advantage of these registers and also to correct some additional inefficiencies in the translation of conditional branches. Finally, we hope to isolate the source of the current translation inefficiency so that using MLRISC with TILT is not significantly slower than using the TIL1 back end. We also hope to improve the performance of the object code generated by MLRISC once implementation of the TILT optimizer is complete.

7

Conclusion

We have presented our approach to integrating MLRISC, a generic back end, with TIL, a type-directed compiler. Our work is a solution to a specific instance of a more general problem: how can abstract trace information be mapped to concrete trace information, given that the correct mapping is a function of a parallel code translation performed by the back end? Register allocation and spilling are the critical code translations that must be reproduced to translate trace information. MLRISC exports its register and spill mappings: it is this property of MLRISC that makes it possible to use it with our compiler. As important parts of TILT are still being developed, we cannot draw definitive conclusions yet about the merits of our approach. It is currently unclear if the use of MLRISC will give us a significant improvement in object code quality. It is also unclear whether the “scaffolding” we have constructed around MLRISC can be made efficient enough not to seriously degrade compilation time. It is reasonable to object to our use of RTL as an intermediate language between MIL and MLRISC, because RTL serves essentially the same purpose as MLRISC. We chose to retain RTL from TIL1 only to better compartmentalize our development effort. One could argue that some of the problems we have encountered are due more to the use of RTL than to the use of MLRISC. In particular, we expect that in a hypothetical translation from MIL to MLRISC, a redundant liveness analysis on MIL code would be less onerous due to its more structured control flow. This would appear to undermine our contention that the back end should export the results of its liveness analysis for use by the rest of the compiler. However, we do not believe that simply performing the call-site liveness analysis on MIL code is an adequate long-term solution, because it is not clear that liveness of variables in MIL code necessarily corresponds to liveness of machine registers and stack slots in machine code—we even know that this correspondence will not hold in the presence of global instruction scheduling. For

76

Andrew Bernard, Robert Harper, and Peter Lee

this reason, we think that the availability of liveness information from MLRISC will be crucial to the long-term success of this effort. Our work attests that MLRISC is “generic enough” to be reused as the back end of our compiler, even though TIL is substantially different from Standard ML of New Jersey [4], the compiler for which MLRISC was originally developed. Reuse has attendant costs, however, and the most significant of these appear to be related to the speed of compilation. We suggest that generic compiler technology is a valuable asset, but that more developers will benefit from it if interfaces are made flexible enough to encompass dissimilar compilation strategies.

Acknowledgements Special thanks to Lal George for his helpful insights during the integration of MLRISC and TIL.

References 1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: principles, techniques, tools. Addison-Wesley, 1986. 2. Andrew W. Appel. A runtime system. Lisp and Symbolic Computation 3, pages 343–380, 1990. 3. Andrew W. Appel. Modern Compiler Implementation in ML. Cambridge University Press, 1998. 4. Andrew W. Appel and David B. MacQueen. Standard ML of new jersey. In Third International Symposium on Programming Language Implementation and Logic Programming, pages 1–13. Springer-Verlag, August 1991. 5. Andrew W. Appel et al. The national compiler infrastructure project. 6. Robert P. Wilson et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers. Technical report, Computer Systems Laboratory, Stanford University. 7. Lal George. Personal Communication. 8. Lal George. MLRISC: Customizable and reusable code generators. Technical report, Bell Labs, December 1996. submitted to PLDI. 9. Lal George and Andrew W. Appel. Iterated register coalescing. ACM Transactions on Programming Languages and Systems, 18(3):300–324, May 1996. 10. Robert Harper and Greg Morrisett. Compiling polymorphism using intensional type analysis. In Conference Record of the 22nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 130–141. ACM, January 1995. 11. Robert Harper and Chris Stone. A type-theroretic interpretation of standard ML. Technical report, Carnegie Mellon University, 1997. submitted for publication. 12. Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From system F to typed assembly language. In Conference Record of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 85–97. ACM, January 1998. 13. J. Gregory Morrisett. Compiling with Types. PhD thesis, Carnegie Mellon University, December 1995. Published as CMU Technical Report CMU-CS-95-226.

How Generic Is a Generic Back End?

77

14. George C. Necula. Proof-carrying code. In Conference Record of the 24th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. ACM, January 1997. 15. George C. Necula and Peter Lee. The design and implementation of a certifying compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implemantation, New York, 1998. ACM Press. 16. Zhong Shao and Andrew W. Appel. Space-efficient closure representations. In Conference on Lisp and Functional programming, June 94. 17. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL : A type-directed optimizing compiler for ML. In Proceedingsof the ACM SIGPLAN Conference on Programming Language Design and Implemantation, pages 181–192, New York, May21–24 1996. ACM Press. 18. Andrew Tolmach. Tag-free garbage collection using explicit type parameters. In Proceedings 1994 ACM Conference on Lisp and Functional Programming, June 1994.

A Toolkit for Constructing Type- and Constraint-Based Program Analyses Alexander Aiken, Manuel F¨ ahndrich, Jeffrey S. Foster, and Zhendong Su Electrical Engineering and Computer Science Department? University of California, Berkeley 387 Soda Hall #1776, Berkeley, CA 94720-1776 {aiken,manuel,jfoster,zhendong}@cs.berkeley.edu

Abstract. BANE (the Berkeley Analysis Engine) is a publicly available toolkit for constructing type- and constraint-based program analyses.1 We describe the goals of the project, the rationale for BANE’s overall design, some examples coded in BANE, and briefly compare BANE with other program analysis frameworks.

1

Introduction

Automatic program analysis is central to contemporary compilers and software engineering tools. Program analyses are also arguably the most difficult components of such systems to develop, as significant theoretical and practical issues must be addressed in even relatively straightforward analyses. Program analysis poses difficult semantic problems, and considerable effort has been devoted to understanding what it means for an analysis to be correct [CC77]. However, designing a theoretically well-founded analysis is necessary but not sufficient for obtaining a useful analysis. Demonstrating utility requires implementation and experimentation, preferably with large programs. Many plausible analyses are not beneficial in practice, and others require substantial modification and tuning before producing useful information at reasonable cost. It is important to prototype and realistically test analysis ideas, usually in several variations, to judge the cost/performance trade-offs of multiple design points. We know of no practical analytical method for showing utility, because the set of programs that occur in practice is a very special, and not easily modeled, subset of all programs. Unfortunately, experiments are relatively rare because of the substantial effort involved. BANE (for the Berkeley ANalysis Engine) is a toolkit for constructing typeand constraint-based program analyses. A goal of the project is to dramatically lower the barriers to experimentation and to make it relatively easy for researchers to realistically prototype and test new program analysis ideas (at ? 1

Supported in part by an NDSEG fellowship, NSF National Young Investigator Award CCR-9457812, NSF Grant CCR-9416973, and gifts from Microsoft and Rockwell. The distribution may be obtained from the BANE homepage at http://bane.cs.berkeley.edu.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 78–96, 1998. c Springer-Verlag Berlin Heidelberg 1998

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

79

least type- and constraint-based ideas). To this end, in addition to providing constraint specification and resolution components, the BANE distribution also provides parsers and interfaces for popular languages (currently C and ML) as well as test suites of programs ranging in size from a few hundred to tens of thousands of lines of code. BANE has been used to implement several realistic program analyses, including an uncaught exception inference system for ML programs [FA97,FFA98], points-to analyses for C [FFA97,FFSA98], and a race condition analysis for a factory control language [AFS98]. Each of these analyses also scales to large programs—respectively at least 20,000 lines of ML, 100,000 lines of C, and production factory control programs. These are the largest programs we have available (the peculiar syntax of the control language precludes counting lines of code).

2

System Architecture

Constraint-based analysis is appealing because elaborate analyses can be expressed with a concise and simple set of constraint generation rules. These rules separate analysis specification (constraint generation) from implementation (constraint resolution). Implementing an analysis using BANE involves only writing code to (1) generate the appropriate constraints from the program text and (2) interpret the solutions of the constraints. Part (1) is usually a simple recursive walk of the abstract-syntax tree, and part (2) is usually testing for straightforward properties of the constraint solutions. The system takes care of constraint representation, resolution, and transformation. Thus, BANE frees the analysis designer from writing a constraint solver, usually the most difficult portion of a constraint-based analysis to design and engineer. In designing a program analysis toolkit one soon realizes that no single formalism covers both a large fraction of interesting analyses and provides uniformly good performance in an implementation. BANE provides a number of different constraint sorts: constraint languages and associated resolution engines that can be reused as appropriate for different applications. Each sort is characterized by a language of expressions, a constraint relation, a solution space, and an implementation strategy. In some cases BANE provides multiple implementations of the same constraint language as distinct sorts because the different implementations provide different engineering trade-offs to the user. Extending BANE with new sorts is straightforward. An innovation in BANE is support for mixed constraints: the use of multiple sorts of constraints in a single application [FA97]. In addition to supporting naturally multi-sorted applications, we believe the ability to change constraint languages allows analysis designers to explore fine-grain engineering decisions, targeting subproblems of an analysis with the constraint system that gives the best efficiency/precision properties for the task at hand. Section 3 provides an example of successively refining an analysis through mixed constraints.

80

Alexander Aiken et al. ∪ ∩ ¬{c1 , . . . , cn } 0 1

: : : : :

Set Set → Set Set Set → Set Set for any set of Set-constructors ci ∈ ΣSet Set Set

Fig. 1. Operations in the sort Set. Mixed constraint systems are formalized using a many-sorted algebra of expressions. Each sort s includes a set of variables Vs , a set of constructors Σs , and possibly some other operations. Each sort has a constraint relation ⊆s . Constraints and resolution rules observe sorts; that is, a constraint X ⊆s Y implies X and Y are s-expressions. The user selects the appropriate mixture of constraints by providing constructor signatures. If S is the set of sorts, each n-ary constructor c is given a signature c : ι1 . . . ιn → S where ιi is s or s for some s ∈ S. Overlined sorts mark contravariant arguments of c; the rest are covariant arguments. For example, let sort Term be a set of constructors ΣTerm and variables VTerm with no additional operations. Pure terms over ΣTerm and VTerm are defined by giving constructor signatures c : Term .{z . . Term} → Term c ∈ ΣTerm | arity(c)

As another example, let Set be a sort with the set operators in Figure 1 (the set operations plus least and greatest sets). Pure set expressions are defined by the signatures c : Set . . Set} → Set c ∈ ΣSet | .{z arity(c)

There are many examples of program analyses based on equations between Terms (e.g., [DM82,Hen92,Ste96]) and based on inclusion constraints between Set expressions (e.g., [And94,AWL94,EST95,FFK+ 96,Hei94]). The literature also has natural examples of mixed constraint systems, although they have not been recognized previously as a distinct category. For example, many effect systems [GJSO92] use a function space constructor ·

· → · : Term Set Term → Term where the Set expressions are used only to carry the set of latent effects of the function. These three examples—terms, set expressions, and a mixed language with set and term components—illustrate that by altering the signatures of constructors

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

81

a range of analysis domains can be realized. For example, a flow-based analysis using set expressions can be coarsened to a unification-based analysis using terms. Similarly, a term-based analysis can be refined to an effect analysis by adding a Set component to the → constructor. 2.1

The Framework

From the user’s point of view, our framework consists of a number of sorts of expressions together with resolution rules for constraints over those expressions. In addition, the user must provide constructor signatures specifying how the different sorts are combined. In this section we focus on the three sorts Term, FlowTerm, and Set. The distributed implementation also supports a Row sort [R´em89] for modeling records. Besides constructors and variables a sort may have arbitrary operations peculiar to that sort; for example, sort Set includes set operations. Each sort s has a constraint relation ⊆s and resolution rules. Constraints and resolution rules preserve sorts, so that X ⊆s Y implies X and Y are s-expressions. For example, for the Term sort, the constraint relation ⊆Term is equality, and the resolution rules implement term unification for constructors with signatures Term . . . Term → Term. For clarity we write the constraint relation of term unification as “=t ” instead of ⊆Term . The resolution rules in Figure 2 are read as left-to-right rewrite rules. The leftand right-hand sides of rules are conjunctions of constraints. Sort FlowTerm has the expressions of sort Term but a different set of resolution rules (see Figure 2b). FlowTerm uses inclusion instead of equality constraints. The inclusion constraints are more precise, but also more expensive to resolve, requiring exponential time in the worst case. For certain applications, however, FlowTerm is very efficient [HM97]. We write ⊆ft for the FlowTerm constraint relation. The constructor rules connect constraints of different sorts. For example, in sort FlowTerm the rule S ∧ c(T1 , . . . , Tn ) ⊆ft c(T10 , . . . , Tn0 ) ≡ S ∧ T1 ⊆ι1 T10 ∧ · · · ∧ Tn ⊆ιn Tn0 if c : ι1 · · · ιn → FlowTerm says constraints propagate structurally to constructor arguments; this is where FlowTerm has a precision advantage over Term (see below). Note this rule preserves sorts. The rule for constructors of sort Term (Figure 2a) is slightly different because ⊆Term is equality, a symmetric relation. Thus, constraints on constructor arguments are also symmetric: S ∧ f(T1 , . . . , Tn ) =t f(T10 , . . . , Tn0 ) ≡ S ∧ T1 ⊆ι1 T10 ∧ T10 ⊆ι1 T1 ∧ · · · ∧ Tn ⊆ιn Tn0 ∧ Tn0 ⊆ιn Tn if f : ι1 · · · ιn → Term Figure 2c shows the rules for the Set sort. In addition to the standard rules [AW93], Set includes special rules for set complement, which is problematic in the presence of contravariant constructors. We deal with set complement using

82

Alexander Aiken et al.

S ∧ f (T1 , . . . , Tn ) =t f (T10 , . . . , Tn0 ) ≡ S ∧ T1 ⊆ι1 T10 ∧ T10 ⊆ι1 T1 ∧ · · · ∧ Tn ⊆ιn Tn0 ∧ Tn0 ⊆ιn Tn if f : ι1 · · · ιn → Term S ∧ f (. . . ) =t g(. . . ) ≡ inconsistent if f 6= g (a) Resolution rules for sort Term.

S ∧ c(T1 , . . . , Tn ) ⊆ft c(T10 , . . . , Tn0 ) ≡ S ∧ T1 ⊆ι1 T10 ∧ · · · ∧ Tn ⊆ιn Tn0 if c : ι1 · · · ιn → FlowTerm S ∧ c(. . . ) ⊆ft d(. . . ) ≡ inconsistent if c 6= d S ∧ α ⊆ft c(T1 , . . . , Tn ) ≡ S ∧ α = c(α1, . . . , αn ) ∧ αi ⊆ιi Ti αi fresh, c : ι1 · · · ιn → FlowTerm S ∧ c(T1 , . . . , Tn ) ⊆ft α ≡ S ∧ α = c(α1, . . . , αn ) ∧ Ti ⊆ιi αi αi fresh, c : ι1 · · · ιn → FlowTerm (b) Resolution rules for sort FlowTerm. S ∧ 0 ⊆s T ≡ S S ∧ T ⊆s 1 ≡ S S ∧ c(T1 , . . . , Tn ) ⊆s c(T10 , . . . , Tn0 ) ≡ S ∧ T1 ⊆ι1 T10 ∧ · · · ∧ Tn ⊆ιn Tn0 if c : ι1 · · · ιn → Set S ∧ c(. . . ) ⊆s d(. . . ) ≡ inconsistent if c 6= d S ∧ T1 ∪ T2 ⊆s T ≡ S ∧ T1 ⊆s T ∧ T2 ⊆s T S ∧ T ⊆s T1 ∩ T2 ≡ S ∧ T ⊆s T1 ∧ T ⊆s T2 S ∧ α ⊆s α ≡ S S ∧ α ∩ T ⊆s α ≡ S S ∧ T1 ⊆s P at(T2 , T3 ) ≡ S ∧ T1 ∩ T3 ⊆s T2 S ∧ α ∩ T1 ⊆s T2 ≡ S ∧ α ⊆s P at(T2 , T1 ) S ∧ ¬{c1 , . . . , cn } ⊆s ¬{d1 , . . . , dm } ≡ S if {d1 , . . . , dm } ⊆ {c1 , . . . , cn } S ∧ c(. . . ) ⊆s ¬{d1 , . . . , dm } ≡ S if c 6∈ {d1 , . . . , dm } (c) Resolution rules for sort Set. S ∧ X ⊆ι α ∧ α ⊆ι Y ≡ S ∧ X ⊆ι α ∧ α ⊆ι Y ∧ X ⊆ι Y S ∧ T1 ⊆ι T2 ≡ S ∧ T2 ⊆ι T1 (d) General rules.

Fig. 2. Resolution rules for constraints.

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

83

two mechanisms. First, explicit complements have the form ¬{c1 , . . . , cn }, which has all values of sort Set except those with head constructor c1 , . . . ,cn . Second, more general complements are represented implicitly. Define ¬R to be the set such that R ∩ ¬R = 0 and R ∪ ¬R = 1 (in all solutions). Now define P at(T, R) = (T ∩ R) ∪ ¬R The operator Pat 2 encapsulates a disjoint union involving a complement. Pat is equivalent to in power to disjoint union, but constraint resolution involving Pat does not require computing complements. Of course, wherever P at(T, R) is used the set ¬R must exist; this is an obligation of the analysis designer (see [FA97] for details). Given the definitions of Pat and ¬{c1 , . . . , cn }, basic set theory shows the rules in Figure 2c are sound. Our specification of sort Set is incomplete. We have omitted some rules for simplifying intersections and some restrictions on the form of solvable constraints. The details may be found in [AW93,FA97]. Figure 2d gives two general rules that apply to all sorts. The first rule expresses that ⊆ι is transitive. The second flips constraints that arise from contravariant constructor arguments. We now present a small example of a mixed constraint system. Consider an effect system where each function type carries a set of atomic effects (e.g., the set of globally visible variables that may be modified by invoking the function). Let the constructors have signatures ·

· → · : FlowTerm Set FlowTerm → FlowTerm int : FlowTerm (the atomic effects) a1 , . . . , an : Set The following constraint γ

a ∪a

1 2 β ⊆ft int → int α −→

is resolved as follows: a ∪a

γ

1 2 β ⊆ft int → int α −→ ⇒ α ⊆tf int ∧ a1 ∪ a2 ⊆s γ ∧ β ⊆ft int ⇒ int ⊆tf α ∧ a1 ∪ a2 ⊆s γ ∧ β ⊆ft int ⇒ α = int ∧ a1 ∪ a2 ⊆s γ ∧ β = int

Thus in all solutions α and β are both int and γ is a superset of a1 ∪ a2 . 2.2

Scalability

The main technical challenge in BANE is to develop methods for scaling constraint-based analyses to large programs. Designing for scalability has led 2

Pat stands for “pattern,” because it is used most often to express pattern matching.

84

Alexander Aiken et al.

to a system with a significantly different organization than other program analysis systems [Hei94,AWL94]. To handle very large programs it is essential that the implementation be structured so that independent program components can be analyzed separately first and the results combined later. Consider the following generic inference rule where expressions are assigned types under some set of assumptions A and constraints C A, C ` e1 : τ1 A, C ` e2 : τ2 A, C ` E[e1 , e2 ] : τ where E[e1 , e2 ] is a compound expression with subexpressions e1 and e2 . In all other implementations we know of, such inference systems are realized by accumulating a set of global constraints C. In BANE one can write rules as above, but the following alternative is also provided: A, C1 ` e1 : τ1 A, C2 ` e2 : τ2 A, C1 ∧ C2 ` E[e1 , e2 ] : τ C1 contains only the constraints required to type e1 (similarly for C2 and e2 ). This structure has advantages. First, separate analysis of program components is trivial by design rather than added as an afterthought. Second, the running time of algorithms that examine the constraints (e.g., constraint simplification, which replaces constraint systems by equivalent, and smaller, systems) is guaranteed to be a function only of the expression being analyzed; in particular, the running time is independent of the rest of the program. Note that this design changes the primitive operation for accumulating constraints from adding individual constraints to a global system to combining independent constraint systems. Because this latter operation is more expensive, BANE applications tend to use a mixture of the two forms of rules to obtain good overall performance and scalability. Many other aspects of the BANE architecture have been engineered primarily for scalability [FA96]. The emphasis on scalability, plus the overhead of supporting general user-specified constructor signatures, has a cost in runtime performance, but this cost appears to be small. For example, a BANE implementation of the type inference system for core Standard ML performs within a factor of two of the hand-written implementation in the SML/NJ compiler. In other cases a well-engineered constraint library can substantially outperform hand-written implementations. BANE implementations of a class of cubic-time flow analyses can be orders of magnitude faster than special-purpose systems because of optimizations implemented in the solver for BANE’s set constraint sort [FFSA98].

3

The BANE Interface by Example

This section presents a simple analysis written in BANE. We show by example how an analysis can be successively refined using mixed constraints. BANE is

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

85

a library written in Standard ML of New Jersey [MTH90]. Writing a program analysis using BANE requires ML code to traverse abstract syntax while generating constraints and ML code to extract the desired information from the solutions of the constraints. For reasons of efficiency, BANE’s implementation is stateful. BANE provides the notion of a current constraint system (CCS) into which all constraints are added. Functionality to create new constraint systems and to change the CCS are provided, so one is not limited to a single global constraint system. For simplicity, the examples in this section use only a single constraint system. 3.1

A Trivial Example: Simple Type Inference for a Lambda Calculus

This example infers types for a lambda calculus with the following abstract syntax: datatype ast = Var of string | Int of int | Fn of {formal:string, body:ast} | App of {function:ast, argument:ast}

The syntax includes identifiers (strings), primitive integers, abstraction, and application. The language of types consists of the primitive type int, a function type →, as well as type variables v. τ ::= v | int | τ → τ The first choice is the sort of expressions and constraints to use for the type inference problem. All that is needed in this case are terms and term equality; the appropriate sort is Term (structure Bane.Term). To make the code more readable, we rebind this structure as structure TypeSort. structure TypeSort = Bane.Term

BANE uses distinct ML types for expressions of distinct sort. In this case, type expressions have ML type type ty = TypeSort.T Bane.expr

Next, we need the type constructors for integers and functions. The integer type constructor can be formed using a constant signature, and a standard function type constructor is predefined. val int_tycon = Cons.new {name="int", signa=TypeSort.constSig} val fun_tycon = TypeSort.funCon

The constant integer type is created by applying the integer constructor to an empty list of arguments. We also define a function to apply the function type constructor to the domain and range, using the generic function Bane.Common.cons : ’a constructor * genE list -> ’a expr that applies a constructor of sort

86

Alexander Aiken et al.

A ` x : A[x]

α fresh A[x 7→ α] ` e : τ A ` λx.e : α → τ

[VAR]

[ABS]

A ` i : int A ` e 1 : τ1 A ` e2 : τ2 α fresh τ1 = τ2 → α A ` e1 e2 : α

[INT]

[APP]

Fig. 3. Type inference rules for example lambda calculus

’a to a list of arguments. In general, constructor arguments can have a variety of distinct sorts with distinct ML types. Since ML only allows homogeneously typed lists, BANE uses an ML type genE for expressions of any sort. The lack of subtyping in ML forces us to use conversion functions TypeSort.toGenE to convert the domain and range from TypeSort.T Bane.expr to Bane.genE. val intTy = Bane.Common.cons (int_tycon, []) fun funTy (domain,range) = Common.cons (fun_tycon, [TypeSort.toGenE domain, TypeSort.toGenE range])

Finally, we define a function for creating fresh type variables by specializing the generic function Bane.Var.freshVar : ’a Bane.sort -> ’a Bane.expr. We also bind operator == to the equality constraint of TypeSort. fun freshTyVar () = Bane.Var.freshVar TypeSort.sort infix == val op == = TypeSort.unify

With these auxiliary bindings, the standard type inference rules in Figure 3 are translated directly into a case analysis on the abstract syntax. Type environments are provided by a module with the following signature: signature ENV = sig type name = string type ’a env val empty : ’a env val insert : ’a env * name * ’a -> ’a env val find : ’a env * name -> ’a option end

The type of identifiers is simply looked up in the environment. If the environment contains no assumption for an identifier, an error is reported. fun elaborate env ast = case ast of

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

87

Var x => (case Env.find (env, x) of SOME ty => ty | NONE => )

The integer case is even simpler: | Int i => intTy

Abstractions are typed by creating a fresh unconstrained type variable for the lambda bound formal, extending the environment with a binding for the formal, and typing the body in the extended environment. | Fn {formal,body} => let val v = freshTyVar () val env’ = Env.insert (env,formal,v) val body_ty = elaborate env’ body in funTy (v, body_ty) end

For applications we obtain the function type ty1 and the argument type ty2 via recursive calls. A fresh type variable result stands for the result of the application. Type ty1 must be equal to a function type with domain ty2 and range result. The handler around the equality constraint catches inconsistent constraints in the case where ty1 is not a function, or the domain and argument don’t agree. | App {function,argument} => let val ty1 = elaborate env function val ty2 = elaborate env argument val result = freshTyVar () val fty = funTy (ty2, result) in (ty1 == fty) handle exn => ; result end

We haven’t specified whether our type language for lambda terms includes recursive types. The Term sort allows recursive solutions by default. If only nonrecursive solutions are desired, an occurs check can be enabled via a BANE option: Bane.Flags.set (SOME TypeSort.sort) "occursCheck";

As an example, consider the Y combinator Y = λf.(λx.f (x x))(λx.f (x x)) Its inferred type is (α → α) → α where the type variable α is unconstrained. With the occurs check enabled, type inference for Y fails.

88

Alexander Aiken et al.

3.2

Type Inference with Flow Information

The simple type inference described above yields type information for each lambda term or fails if the equality constraints have no solution. Suppose we want to augment type inference to gather information about the set of lambda abstractions to which each lambda expression may evaluate. We assume the abstract syntax is modified so that lambda abstractions are labeled: | Fn of {formal:string, body:ast, label:string}

Our goal is to refine function types to include a label-set, so that the type of a lambda term not only describes the domain and the range, but also an approximation of the set of syntactic abstractions to which it may evaluate. The function type constructor thus becomes a ternary constructor fun(dom, rng, labels). The resulting analysis is similar to the flow analysis described in [Mos96]. The natural choice of constraint language for label-sets is obviously set constraints, and we bind the structure LabelSet to one particular implementation of set constraints: structure LabelSet = Bane.SetIF

We define the new function type constructor containing an extra field for the label-set by building a signature with three argument sorts, the first two being Type sorts and the last being a LabelSet sort. Note how the variance of each constructor argument is specified in the signature through the use of functions TypeSort.ctv arg (contravariance) and TypeSort.cov arg (covariance). Resolution of equality constraints itself does not require variance annotations, but other aspects of BANE do. val funSig = TypeSort.newSig {args=[TypeSort.ctv_arg TypeSort.genSort, TypeSort.cov_arg TypeSort.genSort, TypeSort.cov_arg LabelSet.genSort], attributes=[]} val fun_tycon = Bane.Cons.new {name="fun", signa=funSig}

We are now using a mixed constraint language: types are terms with embedded label-sets. Constraints between types are still equality constraints, and as a result, induced constraints between label sets are also equalities. The type rules for abstraction and application are easily modified to include label information. α fresh A[x 7→ α] ` e : τ {l} ⊆ fresh A ` λl x.e : fun(α, τ, )

[ABS]

A ` e1 : τ1 A ` e2 : τ2 α, fresh τ1 = fun(τ2 , α, ) A ` e1 e2 : α

[APP]

Because Term constraints generate equality constraints on the embedded Sets, the label-sets of distinct abstractions may be equated during type inference. As a result, the [ABS] rule introduces a fresh label-set variable along with a constraint {l} ⊆ to correctly model that the lambda abstraction evaluates to

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

89

itself. (Note that this inclusion constraint is between Set expressions.) Using a constrained variable rather than a constant set {l} allows the label-set to be merged with other sets through equality constraints. The handling of arroweffects in region inference is similar [TT94]. The label-set variable introduced by each use of the [APP] rule stands for the set of abstractions potentially flowing to that application site. The code changes required to accommodate the new rules are minimal. For abstractions, the label is converted into a constant set constructor with the same name through Cons.new. A constant set expression is then built from the constructor and used to constrain the fresh label-set variable labelvar. Finally, the label-set variable is used along with the domain and range to build the function type of the abstraction. | Fn {formal,body,label} => let val v = freshTyVar () val env’ = Env.insert (env,formal,v) val body_ty = elaborate env’ body (* create a new constant constructor *) val c = Cons.new {name=label, signa=LabelSet.constSig} val lab = Common.cons (c,[]) val labelvar = freshLabelVar () in (lab <= labelvar); funTy (v, body_ty, labelvar) end

The changes to the implementation of [APP] are even simpler, requiring only the introduction of a fresh label-set variable. The label-set variable may be stored in a map for later inspection of the set of abstractions flowing to particular application sites. | App {function,argument} => let val ty1 = elaborate env function val ty2 = elaborate env argument val result = freshTyVar () val labels = freshLabelVar () val fty = funTy (ty2, result, labels) in (ty1 == fty) handle exn => ; result end

We now provide a number of examples showing the information gathered by the flow analysis. Consider the standard lambda encodings for values true, false,

90

Alexander Aiken et al.

nil, and cons, and their inferred types. true = λtrue x.λtrue1 y.x false = λfalse x.λfalse1 y.y nil = λnil x.λnil1 y.x cons = λcons hd .λc1 tl .λc2 x.λc3 y.y hd tl

1 2 α −→ β −→ α \ true ⊆ 1 ∧ true1 ⊆ 2 1 2 α −→ β −→ β \ false ⊆ 1 ∧ false1 ⊆ 2 1 2 α −→ β −→ α \ nil ⊆ 1 ∧ nil1 ⊆ 2 1 2 3 4 5 6 α −→ β −→ γ −→ (α −→ β −→ δ) −→ δ \ cons ⊆ 1 ∧ c1 ⊆ 2 ∧ c2 ⊆ 3 ∧ c3 ⊆ 6

The analysis yields constrained types τ \ C, where the constraints C describe the label-set variables embedded in type τ . (To improve the readability of types, function types are written using the standard infix form with label-sets on the arrow.) For example, the type of nil

1 2 β −→ α \ nil ⊆ 1 ∧ nil1 ⊆ 2 α −→

has the label-set 1 on the first arrow, and associated constraint nil ⊆ 1 . The label-set is extracted from the final type using the following BANE code fragment: val ty = elaborate error baseEnv e val labels = case Common.deCons (fun_tycon, ty) of SOME [dom,rng,lab] => LabelSet.tlb (LabelSet.fromGenE lab) | NONE => []

The function Common.deCons is used to decompose constructed expressions. In this case we match the final type expression against the pattern fun(dom, rng, lab). If the match succeeds, deCons returns the list of arguments to the constructor. In this case we are interested in the least solution of the label component lab. We obtain this information via the function LabelSet.tlb, which returns the transitive lower-bound (TLB) of a given expression. The TLB is a list of constructed expressions c(. . . ), in our case a list of constants corresponding to abstraction labels. A slightly more complex example using the lambda expressions defined above is 1 3 2 ((α −→ ι1 −→ α) −→ head = λheadl.l nil (λhead1 x.λhead2 y.x) 4 6 5 ι2 −→ β) −→ γ) (β −→ 7 −→ γ \ head ⊆ 7 ∧ nil ⊆ 1 ∧ nil1 ⊆ 2 ∧ head1 ⊆ 4 ∧ head2 ⊆ 5 1 2 head (cons true nil) : α −→ β −→ α \ true ⊆ 1 ∧ true1 ⊆ 2 The expression head (cons true nil) takes the head of the list containing true. Even though the function head is defined to return nil if the argument is the empty list, the flow analysis correctly infers that the result in this case is true.

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

91

The use of equality constraints may cause undesired approximations in the flow information. Consider an example taken from Section 3.1 of Mossin’s thesis [Mos96] select = λselect x.λsel1 y.λsel2 f.if x then f x else f y The select function takes three arguments, x, y, and z, and depending on the truth value of x, returns the result of applying f to either x or y. The abbreviation if p then e1 else e2 stands for the application p e1 e2 . The type constraints for the two applications of f cause the flow information of x and y to be merged. As a result, the application select true false (λz.z) does not resolve the condition of the if-then-else to true. To observe the approximation directly in the result type, we modify the example slightly: select’ = λselect x.λsel1 y.λsel2 f.if x then f x x else f y x Now f is applied to two arguments, the first being either x or y, the second being x in both cases. We modify the example use of select such that f now ignores its first argument and simply returns the second, i.e. x. The expression thus evaluates to true. select’ true false (λz.λw.w) The inferred type for this application is

1 2 τ −→ τ τ \ τ = τ −→ true ∪ false ⊆ 1 true1 ∪ false1 ⊆ 2

where the label-set of the function type indicates that the result can be either true or false. This approximation can be overcome through the use of subtyping. 3.3

Type Inference with Flow Information and Subtyping

The inclusion relation on label-sets embedded within types can be lifted to a natural subtyping relation on structural types. This idea has been described in the context of control-flow analysis in [HM97], for a more general flow analysis in [Mos96], and for more general set expressions in [FA97]. A subtype-based analysis where sets are embedded within terms can be realized in BANE through the use of the FlowTerm sort. The FlowTerm sort provides inclusion constraints instead of equality for the same language and solution space as the Term sort. To take advantage of the extra precision of subtype inference in our example, we first change the TypeSort structure to use the FlowTerm sort. structure TypeSort = Bane.FlowTerm

92

Alexander Aiken et al.

The definition of the function type constructor with labels remains the same, although the domain and range are now of sort FlowTerm. val funSig = TypeSort.newSig {args=[TypeSort.ctv_arg TypeSort.genSort, TypeSort.cov_arg TypeSort.genSort, TypeSort.cov_arg LabelSet.genSort], attributes=[]} val fun_tycon = Bane.Cons.new {name="fun", signa=funSig}

The inference rules for abstraction and application change slightly. In the [ABS] rule, it is no longer necessary to introduce a fresh label-set variable, since label sets are no longer merged in the subtype approach. Instead the singleton set can be directly embedded within the function type. In the [APP] rule, we simply replace the equality constraint with an inclusion.

A[x 7→ α] ` e : τ A ` λl x.e : fun(α, τ, {l})

A ` e1 : τ1 A ` e2 : τ2 τ1 ⊆ fun(τ2 , α, ) A ` e1 e2 : α

[ABS]

[APP]

Note that the inclusion constraint in the [APP] rule allows subsumption not only on the label-set of the function, but also on the domain and the range, since fun(dom, range, labels) ⊆ fun(τ2 , α, ) ⇔ τ2 ⊆ dom ∧ range ⊆ α ∧ labels ⊆ We return to the example of the previous section where flow information was merged: select’ true false (λz.λw.w) Using subtype inference, the type of this expression is

1 2 τ −→ τ τ \ τ = τ −→ true ⊆ 1 true1 ⊆ 2

The flow information now precisely models the fact that only true is passed as the second argument to λz.λw.w.

4

Analysis Frameworks

We conclude by comparing BANE with other program analysis frameworks. There have been many such frameworks in the past; see for example [ATGL96,AM95,Ass96,CDG96,DC96,HMCCR93,TH92,Ven89,YH93]. Most frameworks are based on standard dataflow analysis, as first proposed by Cocke [Coc70] and developed by Kildall [Kil73] and Kam and Ullman [KU76], while others are based on more general forms of abstract interpretation [Ven89,YH93].

A Toolkit for Constructing Type- and Constraint-Based Program Analyses

93

In previous frameworks the user specifies a lattice and a set of transfer functions, either in a specialized language [AM95], in a Yacc-like system [TH92], or as a module conforming to a certain interface [ATGL96,CDG96,DC96,HMCCR93]. The framework traverses a program representation (usually a control flow graph) either forwards or backwards, calling user-defined transfer functions until the analysis reaches a fixed point. A fundamental distinction between BANE and these frameworks is the interface with a client analysis. In BANE, the interface is a system of constraints, which is an explicit data structure that the framework understands and can inspect and transform for best effect. In other frameworks the interface is the transfer and lattice functions, all of which are defined by the client. These functions are opaque—their effect is unknown to the framework—which in general means that the dataflow frameworks have less structure that can be exploited by the implementation. For example, reasoning about termination of the framework is impossible without knowledge of the client. Additionally, using transfer functions implies that information can flow conveniently only in one direction, which gives rise to the restriction in dataflow frameworks that analyses are either forwards or backwards. An analysis that is neither forwards nor backwards (e.g., most forms of type inference) is at best awkward to code in this model. On the other hand, dataflow frameworks provide more support for the task of implementing traditional dataflow analyses than BANE, since they typically manage the control flow graph and its traversal as well as the computation of abstract values. With BANE the user must write any needed traversal of the program structure, although this is usually a simple recursive walk of the abstract syntax tree. Since BANE has no knowledge of the program from which constraints are generated, BANE cannot directly exploit any special properties of program structure that might make constraint solving more efficient. While there is very little experimental evidence on which to base any conclusion, it is our impression that an analysis implemented using the more general frameworks with user-defined transfer functions suffers a significant performance penalty (perhaps an order of magnitude) compared with a special-purpose implementation of the same analysis. Note that the dataflow frameworks target a different class of applications than BANE, and we do not claim that BANE is particularly useful for traditional dataflow problems. However, as discussed in Section 2.2, we do believe for problems with a natural type or constraint formulation that BANE provides users with significant benefits in development time together with good scalability and good to excellent performance compared with hand-written implementations of the same analyses.

5

Conclusions

BANE is a toolkit for constructing type- and constraint-based program analyses. An explicit goal of the project is to make realistic experimentation with program analysis ideas much easier than is now the case. We hope that other researchers

94

Alexander Aiken et al.

find BANE useful in this way. The BANE distribution is available on the World Wide Web from http://bane.cs.berkeley.edu.

References AFS98.

AM95. And94. Ass96.

ATGL96.

AW93.

AWL94.

CC77.

CDG96.

Coc70. DC96.

DM82.

EST95. FA96.

A. Aiken, M. F¨ ahndrich, and Z. Su. Detecting Races in Relay Ladder Logic Programs. In Tools and Algorithms for the Construction and Analysis of Systems, 4th International Conference, TACAS’98, volume 1384 of LNCS, pages 184–200, Lisbon, Portugal, 1998. Springer. M. Alt and F. Martin. Generation of efficient interprocedural analyzers with PAG. Lecture Notes in Computer Science, 983:33–50, 1995. L. Andersen. Program Analysis and Specialization for the C Programming Language. PhD thesis, DIKU, University of Cophenhagen, May 1994. U. Assmann. How to Uniformly Specify Program Analysis and Transformation with Graph Rewrite Systems. In Proceedings of the Sixth International Conference on Compiler Construction (CC ’96), pages 121–135. Springer-Verlag, April 1996. A. Adl-Tabatabai, T. Gross, and G. Lueh. Code Reuse in an Optimizing Compiler. In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’96), pages 51–68, October 1996. A. Aiken and E. Wimmers. Type Inclusion Constraints and Type Inference. In Proceedings of the 1993 Conference on Functional Programming Languages and Computer Architecture, pages 31–41, Copenhagen, Denmark, June 1993. A. Aiken, E. Wimmers, and T.K. Lakshman. Soft Typing with Conditional Types. In Twenty-First Annual ACM Symposium on Principles of Programming Languages, pages 163–173, January 1994. P. Cousot and R. Cousot. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Contruction or Approximation of Fixed Points. In Fourth Annual ACM Symposium on Principles of Programming Languages, pages 238–252, January 1977. C. Chambers, J. Dean, and D. Grove. Frameworks for Intra- and Interprocedural Dataflow Analysis. Technical Report 96-11-02, Department of Computer Science and Engineering, University of Washington, November 1996. J. Cocke. Global Common Subexpression Elimination. ACM SIGPLAN Notices, 5(7):20–24, July 1970. M. Dwyer and L. Clarke. A Flexible Architecture for Building Data Flow Analyzers. In Proceedings of the 18th International Conference on Software Engineering (ICSE-18), Berlin, Germany, March 1996. L. Damas and R. Milner. Principle Type-Schemes for Functional Programs. In Ninth Annual ACM Symposium on Principles of Programming Languages, pages 207–212, January 1982. J. Eifrig, S. Smith, and V. Trifonov. Sound Polymorphic Type Inference for Objects. In OOPSLA ’95, pages 169–184, 1995. M. F¨ ahndrich and A. Aiken. Making Set-Constraint Based Program Analyses Scale. In First Workshop on Set Constraints at CP’96, Cambridge, MA, August 1996. Available as Technical Report CSD-TR-96-917, University of California at Berkeley.

A Toolkit for Constructing Type- and Constraint-Based Program Analyses FA97.

95

M. F¨ ahndrich and A. Aiken. Program Analysis Using Mixed Term and Set Constraints. In Proceedings of the 4th International Static Analysis Symposium, pages 114–126, 1997. FFA97. J. Foster, M. F¨ ahndrich, and A. Aiken. Flow-Insensitive Points-to Analysis with Term and Set Constraints. Technical Report UCB//CSD-97-964, University of California, Berkeley, July 1997. FFA98. M. F¨ ahndrich, J. Foster, and A. Aiken. Tracking down Exceptions in Standard ML Programs. Technical Report UCB/CSD-98-996, EECS Department, UC Berkeley, February 1998. FFK+ 96. C. Flanagan, M. Flatt, S. Krishnamurthi, S. Weirich, and M. Felleisen. Catching Bugs in the Web of Program Invariants. In Proceedings of the 1996 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 23–32, May 1996. FFSA98. M. F¨ ahndrich, J. Foster, Z. Su, and A. Aiken. Partial Online Cycle Elimination in Inclusion Constraint Graphs. In Proceedings of the ACM SIGPLAN ’98 Conference on Programming Language Design and Implementation, 1998. GJSO92. D. Gifford, P. Jouvelot, M. Sheldon, and J. O’Toole. Report on the FX91 Programming Language. Technical Report MIT/LCS/TR-531, Massachusetts Institute of Technology, February 1992. Hei94. N. Heintze. Set Based Analysis of ML Programs. In Proceedings of the 1994 ACM Conference on LISP and Functional Programming, pages 306– 17, June 1994. Hen92. F. Henglein. Global Tagging Optimization by Type Inference. In Proceedings of the 1992 ACM Conference on Lisp and Functional Programming, pages 205–215, July 1992. HM97. N. Heintze and D. McAllester. Linear-Time Subtransitive Control Flow Analysis. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1997. HMCCR93. M. Hall, J. Mellor-Crummey, A. Carle, and R. Rodr´ıguez. FIAT: A Framework for Interprocedural Analysis and Transformation. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Proceedings of the 6th International Workshop on Parallel Languages and Compilers, pages 522– 545, Portland, Oregon, August 1993. Springer-Verlag. Kil73. G. A. Kildall. A Unified Approach to Global Program Optimization. In ACM Symposium on Principles of Programming Languages, pages 194– 206, Boston, MA, October 1973. ACM, ACM. KU76. J. Kam and J. Ullman. Global Data Flow Analysis and Iterative Algorithms. Journal of the ACM, 23(1):158–171, January 1976. Mos96. Christian Mossin. Flow Analysis of Typed Higher-Order Programs. PhD thesis, DIKU, Department of Computer Science, University of Copenhagen, 1996. MTH90. Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. MIT Press, 1990. R´em89. D. R´emy. Typechecking records and variants in a natural extension of ML. In Conference Record of the Sixteenth Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, pages 60–76, January 1989. Ste96. B. Steensgaard. Points-to Analysis in Almost Linear Time. In Proceedings of the 23rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 32–41, January 1996.

96 TH92.

TT94.

Ven89.

YH93.

Alexander Aiken et al. S. Tjiang and J. Hennessy. Sharlit – A tool for building optimizers. In Proceedings of the ACM SIGPLAN ’92 Conference on Programming Language Design and Implementation, pages 82–93, July 1992. M. Tofte and J.-P. Talpin. Implementation of the Typed Call-by-Value λ-Calculus using a Stack of Regions. In Twenty-First Annual ACM Symposium on Principles of Programming Languages, pages 188–201, 1994. G. A. Venkatesh. A framework for construction and evaluation of highlevel specifications for program analysis techniques. In Proceedings of the ACM SIGPLAN ’89 Conference on Programming Language Design and Implementation, pages 1–12, 1989. K. Yi and W. Harrison, III. Automatic Generation and Management of Interprocedural Program Analyses. In Proceedings of the Twnetieth Annual ACM Symposium on Principles of Programming Languages, pages 246–259, January 1993.

Optimizing ML Using a Hierarchy of Monadic Types Andrew Tolmach Paciﬁc Software Research Center Portland State University & Oregon Graduate Institute Dept. of Computer Science, P.S.U., P.O. Box 751, Portland, OR 97207, USA [email protected]

Abstract. We describe a type system and typed semantics that use a hierarchy of monads to describe and delimit a variety of eﬀects, including non-termination, exceptions, and state, in a call-by-value functional language. The type system and semantics can be used to organize and justify a variety of optimizing transformations in the presence of eﬀects. In addition, we describe a simple monad inferencing algorithm that computes the minimum eﬀect for each subexpression of a program, and provides more accurate eﬀects information than local syntactic methods.

1

Introduction

Optimizers are often implemented as engines that repeatedly apply improving transformations to programs. Among the most important transformations are propagation of values from their deﬁning site to their use site, and hoisting of invariant computations out of loops. If we use a pure (side-eﬀect-free) language based on the lambda calculus as our compiler intermediate language, these transformations can be neatly described by the simple equations for beta-reduction (Beta)

let x = e1 in e2 = e2 [e1 /x]

and for the exchange and hoisting of bindings (Exchange)

let x1 = e1 in (let x2 = e2 in e3 ) = let x2 = e2 in (let x1 = e1 in e3 ) (x1 ∈ F V (e2 ); x2 ∈ F V (e1 ))

(RecHoist)

letrec f x = (let y = e1 in e2 ) in e3 = let y = e1 in (letrec f x = e2 in e3 ) (x, f ∈ F V (e1 ); y ∈ F V (e3 ))

where F V (e) is the set of free variables of e. The side conditions nicely express the data dependence conditions under which the equations are valid. Either

Supported, in part, by the US Air Force Materiel Command under contract F1962893-C-0069 and by the National Science Foundation under grant CCR-9503383.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 97–115, 1998. c Springer-Verlag Berlin Heidelberg 1998

98

Andrew Tolmach

orientation of the equation generates a valid transformation.1 Eﬀective compilers for pure, lazy functional languages (e.g., [14]) have been conceived and built on the basis of such transformations, with considerable advantages for modularity and correctness. It would be nice to apply similar methods to the optimization of languages like ML, which have side eﬀects such as I/O, mutable state, and exceptions. Unfortunately, these “rearranging” transformations are not generally valid for such languages. For example, if we apply (Beta) (oriented left-to-right) in a situation where evaluating e1 performs output and x is mentioned twice in e2 , evaluating the resulting expression might produce the output twice. In fact, once an eager evaluation order is ﬁxed, even non-termination becomes a “side eﬀect.” For example, (RecHoist) is not valid unless e1 is known to be terminating (and free of other eﬀects too, of course). A similar challenge long faced lazy functional languages at the source level: how could one obtain the power of side-eﬀecting operations without invalidating simple “equational reasoning” based on (Beta) and similar rules? The eﬀective solution discovered in that context is to use monads [9,13]. An obvious idea, therefore, is to use monads in an internal representation (IR) for compilers of call-by-value languages. Some initial steps in this direction were recently taken by Peyton Jones, Launchbury, Shields, and Tolmach [11]. The aim of that work was to design an IR suitable for both eager and lazy source languages. In this paper we pursue the use of monads with particular reference to eager languages (only), and address the question of how to discover and record several diﬀerent sorts of eﬀects in a single, uniﬁed monadic type system. We introduce a hierarchy of monads, ordered by increasing “strength of eﬀect,” and an inference algorithm for annotating source program subexpressions with their minimal eﬀect. Past approaches to coping with eﬀects have fallen into two main camps. One approach (used, e.g., by SML of New Jersey [1] and the TIL compiler [17]) is to fall back on a weaker form of (Beta), called (Betav ), which is valid in eager settings. (Betav ) restricts the bound expression e to variables, constants, and λ-abstractions; since “evaluating” these expressions never actually causes any computation, they can be moved and substituted with impunity. To augment this rule, these compilers use local syntactic analysis to discover expressions that are demonstrably pure and terminating. Local syntactic analysis must assume that calls to unknown functions may be impure and non-terminating. Still, this form of analysis can be quite eﬀective, particularly if the compiler inlines functions enthusiastically. The other approach (used, e.g., by the ML Kit compiler [4]) uses a sophisticated eﬀect inference system [15] to track the latent eﬀects of functions on a very detailed basis. The goals of this school are typically more far-reaching; the aim is to use eﬀects information to provide more generous 1

Of course, the fact that a transformation is valid doesn’t mean that applying it will necessarily improve the program. For example, (Beta) (oriented left-to-right) is not an improving transformation if e1 is expensive to compute and x appears many times in e2 ; similarly, (RecHoist) (oriented left-to-right) is not improving if f is not applied in e3 .

Optimizing ML Using a Hierarchy of Monadic Types

99

polymorphic generalization rules (e.g., as in [21,16]), or to perform signiﬁcantly more sophisticated optimizations, such as automatic parallelization [6] or stackallocation of heap-like data [18]. In support of these goals, eﬀect inference has generally been used to track store eﬀects at a ﬁne-grained level. Our approach is essentially a simple monomorphic variant of eﬀect inference applied to a wider variety of eﬀects (including non-termination, exceptions, and IO), cast in monadic form, and intended to support transformational codemotion optimizations. We infer information about latent eﬀects, but we do not attempt to calculate eﬀects at a very ﬁne level of granularity. In return, our inference system is particularly simple to state and implement. However, there is nothing fundamentally new about our system as compared with that of Talpin and Jouvelot [15], except our decision to use a monadic syntax and validate it using a typed monadic semantics. A practical advantage of the monadic syntax is that it makes it easy to reﬂect the results of the eﬀect inference in the program itself, where they can be easily consulted (and kept up to date) by subsequent optimizations, rather than in an auxiliary data structure. An advantage of the monadic semantics is that it provides a natural foundation for probing and proving the correctness of transformations in the presence of a variety of eﬀects. In related work, Wadler [20] has recently and independently shown that Talpin and Jouvelot’s eﬀect inference system can be applied in a monadic framework; he uses an untyped semantics, and considers only store eﬀects. In another independent project, Benton and Kennedy are prototyping an ML compiler with an IR that describes eﬀects using a monadic encoding similar to ours [3].

2

Source Language

This section brieﬂy describes an ML-like source language we use to explain our approach. The call-by-value source language is presented in Fig. 1. It is a simple, monomorphic variant of ML, expressed in A-normal form [5], which names the result of each computation and makes evaluation order completely explicit. The class const includes primitive functions as well as constants. The Let construct is monomorphic; that is, Let(x,e1,e2 ) has the same semantics and typing properties as would App(Abs(x,e2),e1 ) (were this legal A-normal form). The restriction to a monomorphic language is not essential (see Sect. 5). All functions are unary; primitives like Plus take a two-element tuple as argument. For simplicity of presentation, we restrict Letrec to single functions. The types of constants are given in Fig. 2. Exceptions carry values of type Exn, which are nullary exception constructors. Raise takes an exception constructor; rather than providing a means for declaring such constructors, we assume an arbitrary pool of constructor constants. Handle catches all exceptions that are raised while evaluating its ﬁrst argument and passes the associated exception value to its second argument, which must be a handler function expecting an Exn. The body of the handler function may or may not choose to reraise the exception depending on its value, which may be tested using EqExn.

100

Andrew Tolmach

datatype typ = Int | Bool | Exn | Tup of typ list | -> of typ * typ datatype const = Integer of int | True | False | DivByZero | ... | Plus | Minus | Times | Divide | EqInt | LtInt | EqBool | EqExn | WriteInt | ...

type varty = var * typ datatype value = Var of var | Const of const

datatype exp = Val of value | Abs of varty * exp | App of value * value | If of value * exp * exp | Let of varty * exp * exp | Letrec of varty * varty * exp * exp | Tuple of value list | Project of int * value | Raise of value | Handle of exp * value

Fig. 1. Abstract syntax for source language (presented as ML datatype) Integer True,False DivByZero Plus,Minus,Times,Divide EqInt,LtInt EqBool EqExn WriteInt

: : : : : : : :

Int Bool Exn Tup[Int, Int] -> Int Tup[Int, Int] -> Bool Tup[Bool, Bool] -> Bool Tup[Exn, Exn] -> Bool Int -> Tup[]

Fig. 2. Typings for constants in initial environment .

The primitive function Divide has the potential to raise a particular exception DivByZero. We supply WriteInt as a paradigmatic state-altering primitive; internal side-eﬀects such as ML reference manipulations would be handled similarly. All other primitives are pure and guaranteed to terminate. The semantics of the remainder of the language are completely ordinary.

3

Intermediate Representation with Monadic Types

Figure 3 shows the abstract syntax of our monadic intermediate representation (IR). (For an example of the code, look ahead to Fig. 11.) For the most part, terms are the same as in the source language, but with the addition of monad annotations on Let and Handle constructs and a new Up construct; these are described in detail below.

Optimizing ML Using a Hierarchy of Monadic Types

datatype monad = ID | LIFT | EXN | ST datatype mtyp = M of monad * vtyp and vtyp = Int | Bool | Exn | Tup of vtyp list | -> of vtyp * mtyp type varty = var * vtyp datatype value = Var of var | Const of const datatype exp = Val of value | Abs of varty * exp | App of value * value | If of value * exp * exp | Let of monad * monad * varty * exp * exp | Letrec of varty * varty * exp * exp | Tuple of value list | Project of int * value | Raise of mtyp * value | Handle of monad * exp * value | Up of monad * monad * exp

Fig. 3. Abstract syntax for monadic typed intermediate representation

Integer True,False DivByZero Plus,Minus,Times Divide EqInt,LtInt EqBool EqExn WriteInt

: : : : : : : : :

Int Bool Exn Tup[Int, Int] -> M(ID,Int) Tup[Int, Int] -> M(EXN,Int) Tup[Int, Int] -> M(ID,Bool) Tup[Bool, Bool] -> M(ID,Bool) Tup[Exn, Exn] -> M(ID,Bool) Int -> M(ST,Tup[])

Fig. 4. Monadic typings for constants in initial environment .

101

102

Andrew Tolmach

Values have ordinary value types (vtyps); expressions have monadic types (mtyps), which incorporate a vtyp and a monad (possibly the identity monad, ID). Since this is a call-by-value language, the domain of each arrow types is a vtyp, but the codomain is an arbitrary mtyp. The monadic types for the constants are speciﬁed in Fig. 4. The typing rules are given in Fig. 5. In this ﬁgure, and throughout our discussion, t ranges value types, m over monads, v over values, c over constants, x,y,z,f over variables, and e over expressions. For this presentation, we use four monads arranged in a simple linear order. In order of “increasing eﬀect,” these are: – ID, the identity monad, which describes pure, terminating computations. – LIFT, the lifting monad, which describes pure but potentially non-terminating computations. – EXN, the monad of exceptions and lifting, which describes computations that may raise an (uncaught) exception, and are potentially non-terminating. – ST, the monad of state, exceptions, and lifting, which describes computations that may write to the “outside world,” may raise an exception, and are potentially non-terminating. We write m1 < m2 iﬀ m1 precedes m2 on this list. Intuitively, m1 < m2 implies that computations in m2 are “more eﬀectful” than those in m1 ; they can provoke any of the eﬀects in m1 and then some. This particular hierarchy captures a number of distinctions that are useful for transforming ML programs. We discuss the extension of our approach to more elaborately stratiﬁed monadic structures in Sect. 6. More formally, suppose for each monad m we are given the standard operations unitm , which turns values into null computations in m, and bindm , which composes computations in m, and that the usual monad laws hold: (Left)

bindm (unitm x) k = k x

(Right)

bindm e unitm = e

(Assoc)

bindm e (λx.bindm (k x) h) = bindm (bindm e k) h

Moreover, suppose that for each value type t and monad m, M[[m]](T [[t]]) gives the domain of values of type M(m,t). Then m1 < m2 implies that there exists an unique embedding upm1 →m2 which, for every value type t, maps M[[m1 ]](T [[t]]) to M[[m2 ]](T [[t]]). The up functions, sometimes called monad morphisms or lifting functions [10], obey these laws: (Unit)

upm1 →m2 ◦ unitm1 = unitm2

(Bind)

upm1 →m2 (bindm1 e k) = bindm2 (upm1 →m2 e) (upm1 →m2 ◦ k)

The up functions can also be viewed as generalizations of unit operations, since, by (Unit), upID→m = unitm . Fig. 6 gives semantic interpretations for types as

Optimizing ML Using a Hierarchy of Monadic Types

E(v) = t E v Var v : t Typeof(c) = t E v Const c : t E v v : t E Val v : M(ID,t) E + {x : t1 } e : M(m2 ,t2 ) E Abs(x : t1 ,e) : M(ID,t1 -> M(m2 ,t2 )) E v v1 : t1 -> M(m2 ,t2 ) E v v2 : t1 E App(v1 ,v2 ) : M(m2 ,t2 ) E v v : Bool E e1 : M(m,t) E e2 : M(m,t) E If(v,e1 ,e2 ) : M(m,t) E e1 : M(m1 ,t1 ) E + {x : t1 } e2 : M(m2 ,t2 ) (m1 ≤ m2 ) E Let(m1 ,m2 ,x : t1 ,e1 ,e2 ) : M(m2 ,t2 ) E + {f : t0 -> M(m1 ,t1 ), x : t0 } e1 : M(m1 ,t1 ) E + {f : t0 -> M(m1 ,t1 )} e2 : M(m2 ,t2 )

(LIFT ≤ m1 )

E Letrec(f : t0 -> M(m1 ,t1 ),x : t0 ,e1 ,e2 ) : M(m2 ,t2 ) E v v1 : t1 . . . E v vn : tn E Tuple(v1 ,. . .,vn ) : M(ID,Tup[t1 , . . . , tn ]) E v v : Tup[t1 , . . . , tn ] (1 ≤ i ≤ n) E Project(i,v) : M(ID,ti ) E v v : Exn E Raise(M(EXN,t),v) : M(EXN,t) E e : M(m,t) E v v : Exn -> M(m,t) (EXN ≤ m) E Handle(m,e,v) : M(m,t) E e : M(m1 ,t) (m1 ≤ m2 ) E Up(m1 ,m2 ,e) : M(m2 ,t)

Fig. 5. Typing rules for intermediate language

103

104

Andrew Tolmach

complete partial orders (CPO’s), and for our monads, together with the associated up and bind functions. Note that the following laws hold under these semantics: (Id) (Compose)

upm→m = id upm0 →m2 = upm1 →m2 ◦ upm0 →m1

(m0 ≤ m1 ≤ m2 )

A typed semantics for terms is given in Figs. 7 and 8. Environments ρ map identiﬁers to values. This semantics is largely predictable. However, the Let construct now serves to make the composition of monadic computations explicit, and the Up construct makes monadic coercions explicit. Intuitively, Let(m1 ,m2 ,(x,t1 ),e1 ,e2 ) evaluates e1 , which has monadic type M(m1 ,t), performing any associated effects, binds the resulting value to x : t1 , and then evaluates e2 , which has monadic type M(m2 ,t2 ). Thus, it essentially plays the role of the usual monadic bind operation; in particular, if m1 = m2 , the semantic interpretation of the above expression in environment ρ is just bindm1 (E[[e1 ]]ρ)(λy.E[[e2 ]]ρ[x := y]) However, our typing rules (Fig. 5) require only that m2 ≥ m1 ; i.e., e2 may be in a more eﬀectful monad than e1 The semantics of a general “mixed-monad” Let is bindm2 (upm1 →m2 (E[[e1 ]]ρ))(λy.E[[e2 ]]ρ[x := y]) The term Let(Up(m1,m2 ,e1 ),m2 ,(x,t),e1,e2 ) has the same semantics, so the more general form of Let is strictly redundant. But this form is useful, because it makes it easier to state (and recognize left-hand sides for) many interesting transformations involving Let whose validity depends on the monad m1 rather than on m2 . For example, a “non-monadic” Let, for which (Beta) is always valid, is simply one in which m1 = ID. Further examples will be shown in Sect. 4. The semantics of the “non-proper morphism” Handle(e,v) deserve special attention. Expression e may be in either EXN or ST, and the meaning of Handle depends on which; the ST version must manipulate the state component. Note that there are two plausible ways to combine state with exceptions. In the semantics we have given (as in ML), handling an exception does not alter the state, but it would be equally reasonable to revert the state on handle. Incidentally, we don’t have to give a semantics when e is in ID or LIFT, because the typing rule for Handle disallows these cases. Of course, such cases might appear in source code; to generate monadic IR for them, e can be coerced into EXN with an explicit Up, or the Handle can be omitted altogether in favor of e, which by its type cannot raise an exception! A Raise expression is handled similarly; the typing rules force it into monad EXN, so semantics need only be given for that case, but the whole expression may be coerced into ST by an explicit Up if necessary.

Optimizing ML Using a Hierarchy of Monadic Types

T : vtyp T [[Int]] T [[Bool]] T [[Exn]] T [[Tup[t1 , . . . , tn ]]] T [[Tup[]]] T [[t1 -> M(m2 ,t2 )]]

→ = = = = = =

CPO Z Z Z T [[t1 ]] × . . . × T [[tn ]] 1 T [[t1 ]] → M[[m2 ]](T [[t2 ]])

M : monad → CPO M[[ID]]c M[[LIFT]]c M[[EXN]]c M[[ST]]c

→ = = = =

CPO c c⊥ (Ok(c) + Fail(Z))⊥ State → ((Ok(c) + Fail(Z)) × State)⊥

bindID x k = k x bindLIFT x k = k a ⊥ bindEXN x k = k a Fail(b)⊥ ⊥ bindST x k s = k a s (Fail(b), s )⊥ ⊥ upm→m x upID→LIFT x upID→EXN x upID→ST x s upLIFT→EXN x upLIFT→ST upEXN→ST

x x⊥ Ok(x)⊥ (Ok(x), s)⊥ Ok(a)⊥ ⊥ x s = (Ok(a), s)⊥ ⊥ x s = (Ok(a), s)⊥ (Fail(b), s)⊥ ⊥ = = = = =

(0 represents false) (n > 0)

if x = a⊥ if x = ⊥ if x = Ok(a)⊥ if x = Fail(b)⊥ if x = ⊥ if x s = (Ok(a), s )⊥ if x s = (Fail(b), s )⊥ if x s = ⊥

if x = a⊥ if x = ⊥ if x = a⊥ if x = ⊥ if x = Ok(a)⊥ if x = Fail(b)⊥ if x = ⊥

Fig. 6. Semantics of types and monads

105

106

Andrew Tolmach

V : (value : t) → Env V[[Var v]]ρ V[[Const (Integer i)]]ρ V[[Const True]]ρ V[[Const False]]ρ V[[Const Plus]]ρ . . . V[[Const Divide]]ρ . . . V[[Const WriteInt]]ρ V[[Const DivByZero]]ρ ... plus (a1 , a2 ) divideby (a1 , a2 )

→ = = = = = = = =

T [[t]] ρ(v) i 1 0 plus divideby writeint divby0

= a1 + a2 = Ok(a1 /a2 )⊥ if a2 =0 if a2 = 0 Fail(divby0)⊥ State = [Z] (sequence written out so far) writeint a s = (Ok(), append (s, [a]))⊥ divby0 = 42 (arbitrary ﬁxed integer)

Fig. 7. Semantics of values

E : (exp : M(m,t)) → Env E [[Val v]]ρ E [[Abs(x,e)]]ρ E [[App(v1 ,v2 )]]ρ E [[If(v,e1 ,e2 )]]ρ E [[Letrec(f ,x,e1 ,e2 )]]ρ E [[Tuple(v1 ,. . .,vn )]]ρ E [[Project(i,v)]]ρ E [[Raise(M(EXN,t),v)]]ρ E [[Handle(m,e,v)]]ρ E [[Let(m1 ,m2 ,x,e1 ,e2 )]]ρ E [[Up(m1 ,m2 ,e)]]ρ

→ = = = = = = = = = = =

M[[m]](T [[t]]) V[[v]]ρ λy.E [[e]]ρ[x := y] (V[[v1 ]]ρ) (V[[v2 ]]ρ) if (V[[v]]ρ) (E [[e1 ]]ρ) (E [[e2 ]]ρ) E [[e2 ]](ρ[f := f ix(λf .λv.E [[e1 ]](ρ[f := f , x := v]))]) (V[[v1 ]]ρ, . . . , V[[vn ]]ρ) proji (V[[v]]ρ) (Fail(V[[v]]ρ))⊥ handlem (E [[e]]ρ)(V[[v]]ρ) bindm2 (upm1 →m2 (E [[e1 ]]ρ))(λy.E [[e2 ]]ρ[x := y]) upm1 →m2 (E [[e]]ρ)

if v at af = at af proji (v1 , . . . , vn ) = vi handleEXN x h = Ok(a)⊥ ha ⊥ handleST x h s = (Ok(a), s )⊥ h a s ⊥

Fig. 8. Semantics of expressions

if v =0 if v = 0 if x = Ok(a)⊥ if x = Fail(a)⊥ if x = ⊥ if x s = (Ok(a), s )⊥ if x s = (Fail(a), s )⊥ if x s = ⊥

Optimizing ML Using a Hierarchy of Monadic Types (LetLeft)

Let(m2 ,m3 ,x,Up(m1 ,m2 ,e1 ),e2 ) = Let(m1 ,m3 ,x,e1 ,e2 ) (m1 ≤ m2 ≤ m3 )

(LetRight)

Let(m1 ,m2 ,x,e,Up(ID,m2 ,Val(Var x))) = Up(m1 ,m2 ,e) (m1 ≤ m2 )

(LetAssoc)

Let(m2 ,m3 ,x,Let(m1 ,m2 ,y,e1 ,e2 ),e3 ) = Let(m1 ,m3 ,y,e1 ,Let(m2 ,m3 ,x,e2 ,e3 )) ∈ F V (e3 )) (m1 ≤ m2 ≤ m3 ; y

(IdentUp)

Up(m,m,e) = e

(ComposeUp)

(LetUp)

107

Up(m1 ,m3 ,e) = Up(m2 ,m3 ,(Up(m1 ,m2 ,e))) (m1 ≤ m2 ≤ m3 ) Up(m2 ,m4 ,Let(m1 ,m2 ,x,e1 ,e2 )) = Let(m3 ,m4 ,x,Up(m1 ,m3 ,e1 ),Up(m2 ,m4 ,e2 )) (m1 ≤ m2 , m3 ≤ m4 )

Fig. 9. Generalized monad laws

4

Transformation Rules

In this section we attempt to motivate our IR, and in particular our choice of monads, by presenting a number of useful transformation laws. These laws can can be proved correct with respect to the denotational semantics of Sect. 3. The proofs are straightforward but tedious, so are omitted here. Of course, this is by no means a complete set of rules needed by an optimizer; there are many others, both general-purpose and speciﬁc to particular operators. Also, as noted earlier, not all valid transformations are improvements. Figure 9 gives general rules for manipulating monadic expressions. (LetLeft), (LetRight), and (LetAssoc) are generalizations of the usual (Left), (Right), and (Assoc) laws for a single monad, which can be recovered from these rules by setting m1 = ID and m2 = m3 in (LetLeft), setting m1 = m2 in (LetRight), and setting m1 = m2 = m3 in (LetAssoc). (IdentUp) and (ComposeUp) are just the (Ident) and (Compose) laws stated in IR syntax; they let us do housekeeping on coercions. Law (Unit) is the special case of (ComposeUp) obtained by setting m1 = ID. (LetUp) permits us to move expressions with suitably weak eﬀects in and out of coercions; (Bind) is the special case of (LetUp) obtained by setting m1 = m2 and m3 = m4 , All these laws have variants involving Letrec, in which Letrec(f ,x,e1,e2 ):M(m,t) behaves just like Let(ID,m,f ,Abs(x,e1),e2 ); we omit the details of these. Figure 10 lists some valid laws for altering execution order. We have full beta reduction for variables bound in the ID monad (BetaID). In general, the order of two bindings can be exchanged if there is no data dependence between them, and if either of them is in the ID monad (ExchangeID) or both are in or below the LIFT monad (ExchangeLIFT). The intuition for the latter rule is that

108

Andrew Tolmach

it harmless to reorder two expressions even if one or both may not terminate, because we cannot detect which one causes the non-termination. On the other hand, there is no similar rule for the EXN monad, because we can distinguish diﬀerent raised exceptions according to the constructor value they carry. This is the principal diﬀerence between LIFT and EXN for the purposes of code motion. Rule (RecHoistID) states that it always valid to lift a pure expression out of a Letrec (if no data dependence is violated). (RecHoistEXN) reﬂects a much stronger property: it is valid to lift a non-terminating or exception-raising expression of a Letrec if the recursive function is guaranteed to be executed at least once. This is the principal advantage of distinguishing EXN from the more general ST monad, for which the transform is not valid. Although the left-hand side of (RecHoistEXN) may seem a crude way to characterize functions guaranteed to be called at least once, and unlikely to appear in practice, it arises naturally if we systematically introduce loop headers for recursions [2], according to the following law: (Hdr)

Letrec(f ,x,e1,e2 ):M(m,t) = Let(ID,m,f ,Abs(z,Letrec(f ,x,e1 [f /f ],App(f ,z))),e2) (f ∈ F V (e1 ); f = z)

(HandleHoistExn) says that an expression that cannot raise an exception can always be hoisted out of a Handle. Finally, (IfHoistID), (ThenHoistID), and (AbsHoistID) show the ﬂexibility with which ID expressions can be manipulated; these are more likely to be useful when oriented right-to-left (“hoisting down” into conditionally executed code). As before, all these rules have variants involving Letrec in place of Let(ID,. . .), which we omit here. As a (rather artiﬁcial) example of the power of these transformations, consider the code in Fig. 11. The computation of w is invariant, so we would like to hoist it above recursive function r. Because the binding for w is marked as pure and terminating, it can be lifted out of the if using (IfHoistID), and can then be exchanged with the pure bindings for s and t using (ExchangeID). This positions it to be lifted out of r using (RecHoistID). Note that the monad annotations tell us that w is pure and terminating even though it invokes the unknown function g, which is actually bound to h. The example also exposes the limitations of monomorphic eﬀects: if f were also applied to an impure function, then g and hence w would be marked as impure, and the binding for w would not be hoistable. In practice, it might be desirable to clone separate copies of f, specialized according to the eﬀectfulness of their g argument. Worse yet, consider a function that is naturally parametric in its eﬀect, such as map. Such a function will always be pessimistically annotated with an eﬀect reﬂecting the most-eﬀectful function passed to it within the program. The obvious solution is to give functions like map a generic type abstracted over a monad variable, analogous to an eﬀect variable in the system of Talpin and Jouvelot [15]. We believe our system can be extended to handle such generic types, but we have not examined the semantic issues involved in detail.

Optimizing ML Using a Hierarchy of Monadic Types

Let(ID,m,x,e1 ,e2 ) = e2 [e1 /x]

(BetaID)

Let(m1 ,m3 ,x1 ,e1 ,Let(m2 ,m3 ,x2 ,e2 ,e3 )) = Let(m2 ,m3 ,x2 ,e2 ,Let(m1 ,m3 ,x1 ,e1 ,e3 )) ∈ F V (e2 ); x2 ∈ F V (e1 )) (m1 = IDor m2 = ID; x1

(ExchangeID)

Let(m1 ,m3 ,x1 ,e1 ,Let(m2 ,m3 ,x2 ,e2 ,e3 )) = Let(m2 ,m3 ,x2 ,e2 ,Let(m1 ,m3 ,x1 ,e1 ,e3 )) ∈ F V (e2 ); x2 ∈ F V (e1 )) (m1 , m2 ≤ LIFT; x1

(ExchangeLIFT)

(RecHoistID)

Letrec(f ,x,Let(ID,m2 ,y,e1 ,e2 ),e3 ):M(m3 ,t) = Let(ID,m3 ,y,e1 ,Letrec(f ,x,e2 ,e3 )) ∈ F V (e3 )) (f, x ∈ F V (e1 ); y

(RecHoistEXN)

Letrec(f ,x,Let(m1 ,m2 ,y,e1 ,e2 ),App(f ,v)) = Let(m1 ,m2 ,y,e1 ,Letrec(f ,x,e2 ,App(f ,v))) ∈ F V (e1 ); y = v) (m1 ≤ EXN; f, x

(HandleHoistEXN)

Handle(m2 ,Let(m1 ,m2 ,x,e1 ,e2 ),v) = Let(m1 ,m2 ,x,e1 ,Handle(m2 ,e2 ,v)) = v) (m1 ≤ EXN; x

(IfHoistID)

If(v,Let(ID,m,x,e1 ,e2 ),e3 ) = Let(ID,m,x,e1 ,If(v,e2 ,e3 )) = v) (x ∈ F V (e3 ); x

(ThenHoistID)

If(v,e1 ,Let(ID,m,x,e2 ,e3 )) = Let(ID,m,x,e2 ,If(v,e1 ,e3 )) = v) (x ∈ F V (e1 ); x

(AbsHoistID)

Abs(x : t,Let(ID,m,y,e1 ,e2 )) = Let(ID,ID,y,e1 ,Abs(x : t,e2 )) = x) (x ∈ F V (e1 ); y

Fig. 10. Code motion laws for monadic expressions

109

110

Andrew Tolmach

let f:(Int -> M(ID,Int * Int)) -> M(ST,Int) = fn (g:Int->M(ID,Int * Int)) => letrec r (x:Int) : M(ST,Int) = letID t:Int * Int = (x,1) in letID s:Bool = EqInt(t) in if s then Up(ID,ST,0) else letID w:Int * Int = g(3) in letID y:Int = Plus(w) in letID z:Int * int = (x,y) in letEXN x’:Int = Divide(z) in letST dummy:() = WriteInt(x’) in r(x’) in r(10) in let h:Int->M(ID,Int * Int) = fn (p:Int) => (p,p) in f(h)

Fig. 11. Example of intermediate code, presented in an obvious concrete analogue of the abstract syntax

5

Monad Inference

It would be possible to translate source programs into type-correct IR programs by simply assuming that every expression falls into the maximally-eﬀectful monad (ST in our case). Every source Let would become a LetST, every variable and constant would be coerced into ST, and every primitive would return a value in ST. Peyton Jones et al. [11] suggest performing such a translation, and then using the monad laws (analogous to those in Fig. 9) and the worker-wrapper transform [12] to simplify the result, hopefully resulting in some less-eﬀectful expression bindings. The main objection to this approach is that it doesn’t allow calls to unknown functions (for which worker-wrapper doesn’t apply) to return non-ST results. For example, in the code of Fig. 11, no local syntactic analysis could discover that argument function g is pure and terminating. To obtain better control over eﬀects, we have developed an inference algorithm for computing the minimal monadic eﬀect of each subexpression in a program. Pure, provably terminating expressions are placed in ID, pure but potentially non-terminating expressions in LIFT, and so forth. The algorithm deals with the latent monadic eﬀects in functions, by recording them in the result types. As an example, it produces the annotations shown in Fig. 11. The input to the algorithm is an typed program in the source language; the output is a program in the monadically typed IR. The term translation is essentially trivial, since the source and target have identical term structure, except for the possible need for Up terms in the target. Consider, for example, the source term If(x,Val y,Raise z). Since Val y is a value, its translation is in the ID monad, whereas the translation of Raise z must be in the EXN or ST

Optimizing ML Using a Hierarchy of Monadic Types E v v : Bool

E e1 ⇒ e1 : M(m1 ,t)

E If(v,e1 ,e2 ):t ⇒ E e1 ⇒ e1 : M(m1 ,t1 )

E e2 ⇒ e2 : M(m1 ,t)

Up(m1 ,m2 ,If(v,e1 ,e2 ))

111

(m1 ≤ m2 )

: M(m2 ,t)

E + {x : t1 } e2 ⇒ e2 : M(m2 ,t2 )

(m1 ≤ m2 ≤ m3 )

E Let(x : t1 ,e1 ,e2 ):t2 ⇒ Up(m2 ,m3 ,Let(m1 ,m2 ,x : t1 ,e1 ,e2 )) : M(m3 ,t2 ) E v v : Exn (EXN ≤ m) E Raise(t,v):t ⇒ Up(EXN,m,Raise(M(EXN,t),v)) : M(m,t)

Fig. 12. Selected translation rules

monad. To glue together these subterm translations we must insert a coercion around the translation of the Val term. Up terms serve exactly this purpose; they add the necessary ﬂexibility to the system to permit all monad constraints to be met. Such a coercion is potentially needed around each subterm in the program. To develop a deterministic, syntax-directed, translation, we turn each typing rule in Fig. 5 (except Up) into a translation rule, simply by recording the inferred type and monad information in the appropriate annotation slots of the output, combining the translations of subterms in the obvious manner, and wrapping an Up term around the result. As examples, Fig. 12 shows the translation rules corresponding to the typing rules for If, Let, and Raise. Each free type and monad in the translated typed term is initially set to a fresh variable; the translation algorithm generates a set of constraints relating these variables just as in an ordinary type inference algorithm. We discuss the solution of these constraints below. As speciﬁed here, the translation is proﬂigate in its introduction of Up coercion terms, most of which will prove (after constraint resolution) to be unnecessary identity coercions. We use a postprocessing step to remove unneeded coercions using the (IdentUp) rule. The translation algorithm generates constraints between types and between monads. Type constraints can be solved using ordinary uniﬁcation, except that unifying the codomain mtyps of two arrow types requires that their monad components be equated as well as their vtyp components. The interesting question is how to record and resolve constraints on the monad variables. Such constraints are introduced explicitly by the side conditions in the Let, Letrec, and Up rules, implicitly by the equating of monads from subexpressions in the If and Handle rules, and (even more) implicitly as a result of ordinary uniﬁcation of arrow types, which mention monads in their codomains. The side-condition constraints are all inequalities of the form m1 ≥ m2 , where m1 is a monad variable and m2 is a variable or an explicit monad. The implicit constraints are all equalities m1 = m2 ; for uniformity, we replace these by a pair of inequalities: m1 ≥ m2 and m2 ≥ m1 . We collect constraints as a side-eﬀect of the translation process, simply by adding them to a global list. It is very common for there to be circularities among the monad constraints. To solve the constraint system, we view it as a directed graph with a node for each

112

Andrew Tolmach

monad and monad variable, and an edge from m1 to m2 for each constraint m1 ≥ m2 . We then partition the graph into its strongly connected components, and sort the components into reverse topological order. We process one component at a time, in this order. Since ≥ is anti-symmetric, all the nodes in a given component must be assigned the same monad; once this has been determined, it is assigned to all the variables in the component before proceeding to the next component. To determine the minimum possible correct assignment for a component, we consult all the edges from nodes in that component to nodes outside the component; because of the order of processing, these nodes must already have received a monad assignment. The maximum of these assignments is the minimum correct assignment for this component. If there are no such edges, the minimum correct assignment is ID. This algorithm is linear in the number of constraints, and hence in the size of the source program. To summarize, we perform monad inference by ﬁrst translating the source program into a form padded with coercion operators and annotated with monad variables, meanwhile collecting constraints on these variables, and then solving the resulting constraint system to ﬁll in the variables in the translated program. The resulting program will contain many null coercions of the form Up(m,m,e); these can be removed by a single postprocessing pass. Our algorithm is very similar to a that of Talpin and Jouvelot [15], restricted to a monomorphic source language. Both algorithms generate essentially the same sets of constraints. Talpin and Jouvelot solve the eﬀect constraints using an extended form of uniﬁcation rather than by a separate mechanism. It would be natural to extend our algorithm to handle Hindley-Milner polymorphism for both types and monads in the Talpin-Jouvelot style. The idea is to generalize all free type and eﬀect variables in let deﬁnitions and allow diﬀerent uses of the bound identiﬁer to instantiate these in diﬀerent ways. In particular, parametric functions like map could be used with many diﬀerent monads, without one use “polluting” the others. Functions not wholly parametric in their eﬀects would place a minimum eﬀect bound on permissible instantiations for monad variables. Supporting this form of monad polymorphism seems desirable even if there is no type polymorphism (e.g., because the program has already been explicitly monomorphized [19]). In whole-program compilation of a monad-polymorphic program, the complete set of eﬀect instantiations for each polymorphic deﬁnition would be known. This set could be used to put an upper eﬀect bound on monad variables within the deﬁnition body and hence determine what transformations are legal there. Alternatively, it could be used to guide the generation of eﬀect-speciﬁc clones as suggested in the previous section. In a separate-compilation setting, monad polymorphism in a library deﬁnition would still be useful for client code, but not for the library code: in the absence of complete information about uses of a deﬁnition, any variable monad in the body of the deﬁnition would need to be treated as ST, the most “eﬀectful” monad, for the purposes of performing transformations within the body.

Optimizing ML Using a Hierarchy of Monadic Types

6

113

Extending the Monad Hierarchy

Our basic approach is not restricted to the linearly-ordered set of monads presented in Sect. 3. It extends naturally to any collection of monads and up embedding operations that form a lattice, with ID as the lattice bottom element. It is clearly reasonable to require a partial order; this is equivalent to requiring that (Ident) and (Compose) hold. From the partial order requirement, the distinguished role for ID, and the assumption that each monad obeys (Left), (Right), and (Assoc), and each up operation obeys (Unit) and (Bind), we can prove the laws of Fig. 9. (The validity of the laws in Fig. 10 naturally depends on the speciﬁc semantics of the monads involved.) By also insisting that any two monads in the collection have a least upper bound under embedding, we guarantee that any two arbitrary expressions (e.g., the two arms of an if) can be coerced into a (unique) common monad, and hence that the monad inference mechanism of Sect. 5 will work. One might be tempted to describe such a lattice by specifying a set of “primitive” monads encapsulating individual eﬀects, and then assuming the existence of arbitrary “union” monads representing combinations of eﬀects. As the Handle discussion in Sect. 3 indicates, however, there is often more than one way to combine two eﬀects, so it makes no sense to talk in a general way about the “union” of two monads. Instead, it appears necessary to specify explicitly, for every monad m in the lattice, – – – –

a semantic interpretation for m; a deﬁnition for bindm ; a deﬁnition of upm→m for each m ≤ m ;2 for each non-proper morphism NP introduced in m, a deﬁnition of npm for every m ≥ m.

The lack of a generic mechanism for combining monads is rather unfortunate, since it turns the proofs of many transformation laws into lengthy case analyses. We conjecture that restricting attention to up operations that represent natural monad transformers [10] might help organize such proofs into simpler form.

7

Status and Conclusions

We believe our approach to inferring and recording eﬀects shows promise in its simplicity and its semantic clarity. It remains to be seen whether eﬀects information of the kind described here can be used to improve the performance of ML code in any signiﬁcant way. To answer this question, we have extended the IR described here to a version that supports full Standard ML; we have implemented the monad inference algorithm for this version, and are currently measuring its eﬀectiveness using the backend of our RML compiler system [19]. 2

Since the (Ident) and (Compose) laws must hold in a partial order, it suﬃces to deﬁne upm→m for just enough choices of m, m to guarantee the existence of least upper bounds, since these deﬁnitions will imply the deﬁnition for other pairs of monads.

114

Andrew Tolmach

Acknowledgements We have beneﬁtted from conversations with John Launchbury and Dick Kieburtz, and from exposure to the ideas in their unpublished papers [7,8]. The comments of the anonymous referees also motivated us to clarify the relationship of our algorithm with the existing work of Talpin and Jouvelot. Phil Wadler made helpful commments on an earlier draft.

References 1. A. Appel. Compiling with Continuations. Cambridge University Press, 1992. 2. A. Appel. Loop headers in λ-calculus or CPS. Lisp and Symbolic Computation, 7(4):337–343, 1994. 3. N. Benton, July 1997. Personal communication. 4. L. Birkedal, M. Tofte, and M. Vejlstrup. From region inference to von Neumann machines via region representation inference. In 23rd ACM Symposium on Principles of Programming Languages (POPL’96), pages 171–183. ACM Press, 1996. 5. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. Proc. SIGPLAN Conference on Programming Language Design and Implementation, 28(6):237–247, June 1993. 6. D. Giﬀord, P. Jouvelot, J. Lucassen, and M. Sheldon. FX-87 REFERENCE MANUAL. Technical Report MIT-LCS//MIT/LCS/TR-407, Massachusetts Institute of Technology, Laboratory for Computer Science, Sept. 1987. 7. R. Kieburtz and J. Launchbury. Encapsulated eﬀects. (unpublished manuscript), Oct. 1995. 8. R. Kieburtz and J. Launchbury. Towards algebras of encapsulated eﬀects. (unpublished manuscript), 1997. 9. J. Launchbury and S. Peyton Jones. State in Haskell. Lisp and Symbolic Computation, pages 293–351, Dec. 1995. 10. S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In 22nd ACM Symposium on Principles of Programming Languages (POPL ’95), Jan. 1995. 11. S. Peyton Jones, J. Launchbury, M. Shields, and A. Tolmach. Bridging the gulf: a common intermediate language for ml and haskel. In 25th ACM Symposium on Principles of Programming Languages (POPL’98), pages 49–61, San Diego, Jan 1998. 12. S. Peyton Jones and J. Launchbury. Unboxed values as ﬁrst class citizens. In Proc. Functional Programming Languages and Computer Architecture (FPCA ’91), pages 636–666, Sept. 191. 13. S. Peyton Jones and P. Wadler. Imperative functional programming. In 20th ACM Symposium on Principles of Programming Languages (POPL’93), pages 71– 84, Jan. 1993. 14. S. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Proceedings of ESOP’96, volume 1058 of Lecture Notes in Computer Science, pages 18–44. Springer Verlag, 1996. 15. J.-P. Talpin and P. Jouvelot. Polymorphic type, region and eﬀect inference. Journal of Functional Programming, 2:245–271, 1992. 16. J.-P. Talpin and P. Jouvelot. The type and eﬀect discipline. Information and Computation, 111(2):245–296, June 1994.

Optimizing ML Using a Hierarchy of Monadic Types

115

17. D. Tarditi. Design and Implementation of Code Optimizations for a Type-Directed Compiler for Standard ML. PhD thesis, Carnegie Mellon University, Dec. 1996. Technical Report CMU-CS-97-108. 18. M. Tofte and J.-P. Talpin. Region-based memory management. Information and Computation, 132(2):109–176, 1 Feb. 1997. 19. A. Tolmach and D. Oliva. From ML to Ada: Strongly-typed language interoperability via source trans lation. Journal of Functional Programming, 1998. (to appear). 20. P. Wadler. The marriage of eﬀects and monads. (unpublished manuscript), Mar. 1998. 21. A. Wright. Typing references by eﬀect inference. In Proc. 4th European Symposium on Programming (ESOP ’92), volume 582 of Lecture Notes in Computer Science, Feb. 1992.

Type-Directed Continuation Allocation? Zhong Shao and Valery Trifonov Dept. of Computer Science Yale University New Haven, CT 06520-8285 {shao,trifonov}@cs.yale.edu

Abstract. Suppose we translate two different source languages, L1 and L2 , into the same intermediate language; can they safely interoperate in the same address space and under the same runtime system? If L1 supports first-class continuations (call/cc) and L2 does not, can L2 programs call arbitrary L1 functions? Would the fact of possibly calling L1 impose restrictions on the implementation strategy of L2 ? Can we compile L1 functions that do not invoke call/cc using more efficient techniques borrowed from the L2 implementation? Our view is that the implementation of a common intermediate language ought to support the so-called pay-as-you-go efficiency: first-order monomorphic functions should be compiled as efficiently as in C and assembly languages, even though they may be passed to arbitrary polymorphic functions that support advanced control primitives (e.g. call/cc). In this paper, we present a typed intermediate language with effect and resource annotations, ensuring the safety of inter-language calls while allowing the compiler to choose continuation allocation strategies.

1 Introduction Safe interoperability requires resolving a host of issues including mixed data representations, multiple function calling conventions, and different implementation protocols. Existing approaches to language interoperability either separate code written in different languages into different address spaces or have the unsafe, ad hoc and insecure foreign function call interface. We position our further discussion of language interoperability in the context of a system hosting multiple languages, each safe in isolation. The supported languages may range from first-order monomorphic (e.g. a safe subset of C, or safe-C for short) to higher-order languages with advanced control, e.g. ML with first-class continuations. We assume that all languages have type systems which ensure runtime safety of accepted programs. In other words, in this paper we do not attempt to solve the problem of cooperating safely with programs written in unsafe languages, which in general can ?

This research was sponsored in part by the DARPA ITO under the title “Software Evolution using HOT Language Technology”, DARPA Order No. D888, issued under Contract No. F30602-96-2-0232, and in part by an NSF CAREER Award CCR-9501624, and NSF Grant CCR-9633390. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 116–135, 1998. c Springer-Verlag Berlin Heidelberg 1998

Type-Directed Continuation Allocation

117

only be achieved at the expense of “sandboxing” the unsafe calls or complex and incomplete analyses of the unsafe code. We believe that interoperability requires a serious and more formal treatment. As a first step, this paper describes a novel type-based technique to support principled language interoperation among languages with different protocols for allocation of activation records. Our framework allows programs written in multiple languages with overlapping features to interact with each other safely and reliably, yet without restricting the expressiveness of each language. An interoperability scheme for activation record allocation should be – safe: it should not be possible to violate the runtime safety of a language by calling a foreign function; – expressive: the scheme should allow inter-language function calls; – efficient: a language implementation should not be forced to use suboptimal methods for its own features in order to provide support for other languages’ features. For instance a language that does not use call/cc should not have to be implemented using heap-based allocation of activation records. Our solution is to ensure safety by using a common typed intermediate language [22] into which all of the source languages are translated. To maintain safety in an expressive interoperability scheme the type system is extended with annotations of the effects of the evaluation of a term, e.g. an invocation of call/cc, and polymorphic types with effect variables, allowing a higher-order function to be invoked with arguments coming from languages with different sets of effects. The central novelty of our approach is the introduction of annotations of the resources necessary for the realization of the effects of an evaluation; for instance a continuation heap may be required when invoking call/cc. Thus our type system can be used to support implementation efficiency by keeping track of the available language-dependent resources, and safety by allowing semantically correct inter-language function calls but banning semantically incorrect ones. In addition to providing safety, making resource handling explicit also opens new opportunities for code optimization beyond what a foreign function call mechanism can offer. A common intermediate language like FLINT [21,22] will likely support a very rich set of features to accommodate multiple source languages. Some of these features may impose implementation restrictions; for example, a practical implementation of first-class continuations (as in SML/NJ or Scheme) often requires the use of advanced stack representations [8] or heap-based activation records [20]. However in some cases stack-based allocation may be more efficient, and ideally we would like to have a compiler that can take advantage of it as long as this does not interfere with the semantic correctness of first-class continuations. Similarly, when compiling a simple safe-C-like language with no advanced control primitives (e.g., call/cc) into FLINT, we may prefer to compile it to code that uses the simple sequential stack of standard C; programs written in ML or Scheme using these safe-C functions must then follow the same allocation strategy when invoking them. This corresponds to the typical case of writing low-level systems modules in C and providing for their use in other languages, therefore we assume this model in the sequel, but the dual problem of compiling safe-C functions

118

Zhong Shao and Valery Trifonov

calling arbitrary ML functions by selectively imposing heap allocation on safe-C is similarly represented and solved within our system. Thus our goal is efficient and expressive interoperability between code fragments written in languages using possibly different allocation disciplines for activation records, for instance, ML with heap allocation and safe-C with stack allocation. The following properties of the interoperability framework are essential for achieving this goal: – ML and safe-C code should interoperate safely with each other within the same address space. – All invocations of safe-C functions in ML functions should be allowed (provided they are otherwise type-correct). – Only the invocations of ML functions that do not capture continuations should be allowed in safe-C functions. – Any activation record that can potentially be captured as part of a first-class continuation should always be allocated on the heap (or using some fancy stack-chunkbased representations [8]). – It should be possible to use stack allocation for activation records of ML functions when they are guaranteed not to be captured with a first-class continuation. – The selection of allocation strategy should be decoupled from the actual function call. The last property gives the compiler the freedom to switch allocation strategies more efficiently, instead of following a fixed foreign function interface mechanism. For example, an implementation of ML may use heap allocation of activation records by default to provide support for continuation capture. However, in cases when the compiler can prove that a function’s activation record is not going to be accessible from any captured continuation, its allocation discipline is ambiguous; stack allocation may be preferred if the function invokes, or is invoked by, safe-C functions which use stack allocation. This specialization of code to a different allocation strategy effectively creates regions of ML code compiled in “safe-C mode” with the aim of avoiding the switch between heap and stack allocation on every cross-language call. In general, the separation of the selection of allocation strategy from the call allows its treatment as a commodity primitive operation and subjects it to other code-motion optimizations, e.g. hoisting it out of loops. The proposed method can be applied to achieving more efficient interoperability with existing foreign code as well, although obviously in this case the usual friction between safety and efficiency can only be eased but not removed. In particular the possibility to select the allocation strategy switch point remains, thus higher efficiency can still be achieved while satisfying a given safety policy by specializing safe code to “unsafe mode” (e.g. for running with stack allocation within a sand-box).

2 A Resourceful Intermediate Language To satisfy the requirements for efficient interoperability, outlined in the previous section, we define an A-normal-form-based typed intermediate language RL (Figure 1) with types having effect and resource annotations. Intuitively, an effect annotation such

Type-Directed Continuation Allocation

119

as CC indicates that a computation may capture a continuation by performing call/cc; a resource annotation such as H (continuation heap) or S (continuation stack) means that the corresponding runtime resource must be available to the computation.1 Nontrivial effects can be primitive, effect variables, or unions of effects; commutativity and associativity of the union with ∅ as a unit are consistent with the typing rules and we assume them for brevity of notation. Each effect can only occur when the proper resources are available, e.g. CC would require the use of heap-based activation record allocation. Both the effect and resource usage annotations are inferred during the translation from the source language to the intermediate language, and can be used to assist code generation and to check the validity of cross-language function calls.

R ESOURCES

r ::= S | H

E FFECTS

T YPES

µ ::= | | |

stack continuation allocation heap continuation allocation

∅ CC t µ∨µ

Typ 3 σ ::= β

where β ∈ BasicTyp r

| σ− →σ µ r

VALUES AND T ERMS

none call with current continuation effect variable, t ∈ EffVar union of effects

resource/effect-annotated function type

| σ cont

resource-annotated continuation type

| ∀t ≤ r. σ

bounded effect-polymorphic type

v ::= | | | | e ::= | | | | |

c constant c ∈ Const x variable x ∈ Var λr x : σ. e resource-annotated abstraction Λt ≤ r. v bounded effect abstraction x[µ] effect application letr x = e in e resource-annotated binding hvir resource-annotated value heiµ adding spurious effects user (e) resource selection @x x application callcc x | throw[σ] x x first class continuations

Fig. 1. Syntax of a resource-aware intermediate language RL

The resources required and effects produced by a function are made explicit in its type. A continuation can potentially produce all effects possible with the set of resources available at the point of its capture; for that reason continuation types only have a resource annotation. 1

In this paper, we focus on application of this system to interoperability issues related to continuation allocation, but more diverse sets of resources will be necessary in a realistic language.

120

Zhong Shao and Valery Trifonov

Function abstractions are annotated with the resources they may require and will maintain. In a higher-order language the effect of the evaluation of a function application may depend on the effects of its functional arguments; this dependence is expressed by means of effect polymorphism. Polymorphic abstractions introduce variables ranging over the set of possible effects of the term. Since the possible effects are determined by the available resources, we have bounded effect polymorphism; the relation µ ≤ r (defined in the context of an effect environment in Figure 3) reflects the dependence between effects and resources, e.g. that callcc can only be performed if continuations are heap-allocated. The effect application x[µ] instantiates the body of the polymorphic abstraction to which x is bound. The language construct user (e) serves to mark the point where a change in the allocation strategy for activation records is required. Instead of having effect subsumption the language is equipped with a construct heiµ for explicitly increasing the set of effects of e to include µ. Example 1. The use of resource annotations to select allocation strategies is shown in the RL code below which includes extra type annotations for clarity. letH applyToInt

H

= hΛt≤ H. λH f : Int − → Int. @ f 42iH t H

add1 CC

H

: ∀t ≤ H. (Int − → Int) − → Int t t = hλH x : Int. letH c = hλH k : Int H cont. letH z = @ succ x in throw[Int] k ziH in callcc ciH H

: Int − → Int CC

add1 Pure

= hλS x : Int. @ succ xiH S

: Int − → Int ∅

H

add1 Wrapped = hλ x : Int. useS (@ add1 Pure x)iH H

: Int − → Int ∅

in @ (applyToInt[CC]) add1 CC ; @ (applyToInt[∅]) add1 Wrapped The function applyToInt is polymorphic in the effect of its parameter, but the parameter’s resource requirements are fixed – it must use heap allocation. We consider two applications of applyToInt. The argument in the first, add1 CC, is a function invoking callcc, which consequently uses heap allocation; on the other hand the argument in the second application, add1 Pure, is pure and uses stack allocation. It is therefore incorrect to apply applyToInt to add1 Pure. We use a wrapper to coerce it to the proper type:

Type-Directed Continuation Allocation

121

we apply applyToInt to add1 Wrapped whose activation record is heap-allocated, and whose function is to switch to stack allocation (via useS ) before calling add1 Pure. Heap allocation is resumed upon return from add1 Pure.

3 Two Source Languages To further illustrate the advantages of this system we consider the problem of translating into RL two source languages (Figure 2): a language HL with control operators (callcc and throw), implemented using heap-based allocation of activation records, and a language SL which always uses stack allocation. HL also allows declaring at the top of a program the identifiers of entities imported from SL code. The type systems of these languages are assumed monomorphic for simplicity, since polymorphism in types is largely orthogonal to the effect polymorphism of RL. SL T YPES SL T ERMS HL T YPES HL T ERMS HL P ROGRAMS

τSL ::= eSL ::= τHL ::= eHL ::= | pHL ::=

β | τSL → τSL c | x | λx : τSL . eSL | eSL eSL | let x = eSL in eSL β | τHL → τHL | τHL cont c | x | λx : τHL . eHL | eHL eHL | let x = eHL in eHL callcc eHL | throw[τHL ] eHL eHL eHL | external(SL) x : τSL in pHL

Fig. 2. Syntax of the source languages SL and HL

The resource annotations in RL provide information about handling of the stack and heap resources, necessary in the following situations: – when calling from HL a function written in SL, which may require switching from heap allocation of activation records to allocation on the stack used by SL; the heap resource must be preserved for use upon return from SL code. – when calling an HL function from SL code, which is only semantically sound when the evaluation of the function does not capture a continuation, since part of the continuation data is stack-allocated; the type system maintains information about the possible effects of the evaluation, in this case whether callcc might be invoked. – when selecting an allocation strategy for HL functions called (directly or indirectly) from within SL code; either their activation records must be allocated on the SL stack, or the latter must be preserved and restored upon return to SL. – when selecting an allocation strategy for HL code invoking SL functions but not callcc, in order to optimize resource handling. Example 2. Consider a program consisting of a main fragment in HL invoking the external SL function applyToInt with the HL function add1 as an argument; the call is meaningful because add1 does not invoke callcc. Only the SL type of the external function is given to the HL program which is separately compiled without access to the detailed effect annotations inferred from the code of the SL fragment.

122

Zhong Shao and Valery Trifonov

SL fragment applyToInt: λf : Int → Int. succ (f 42) The result of its separate compilation into RL, which uses stack allocation (for details of the translation we refer the reader to Section 5) is S

→ Int. letS x = @ f 42 in @ succ x applyToInt = Λt ≤ S. λS f : Int − t S

S

→ Int) − → Int : ∀t ≤ S. (Int − t t HL fragment main: external(SL) applyToInt : (Int → Int) → Int in let add1 = λx : Int. succ x in applyToInt add1 The result of its separate compilation into RL is S

S

→ Int) − → Int. main = λH applyToInt : ∀t≤ S. (Int − ∅ t H let applyToInt H = hΛt ≤ S. H

→ Int. λH f : Int − t letH f S = hλS x : Int. useH (@ f x)iH in useS (@ (applyToInt[t]) f S)iH H

add1

H

: ∀t≤ S. (Int − → Int) − → Int ∅ t = hλH x: Int. @ succ xiH H

: Int − → Int ∅

:

in @ applyToInt H[∅] add1 S S H → Int) − → Int − → Int ∀t≤ S. (Int − ∅ ∅ t

The translation infers polymorphic effect types using a simplified version2 of standard effect inference [23]. The resource annotations are fixed by the source language; the type of an external SL function in an HL program is annotated with the SL resources. In the code produced after translation the external functions are coerced to match the resources of HL using automatically generated wrappers. In the above code, the parameter f of applyToInt H is wrapped to f S before passing it to applyToInt; the function of the wrapper is to switch from the stack allocation discipline used by SL to heap allocation before invoking the code for f, and resume stack allocation upon return. Dually, the call to applyToInt itself is wrapped to enable stack allocation inside HL code. 2

As presented here our system does not keep track of regions associated with effects.

Type-Directed Continuation Allocation

123

Since the full RL type of the SL fragment is not available to it, the effect inference must conseratively approximate the effects of the SL functions. It treats the external applyToInt in the HL fragment as an effect-polymorphic parameter in order to allow its invocations with arguments with different effects. The price we pay for inference with this polymorphism in the case of separate compilation is that we assume that the effects of these invocations are the maximal allowed with the resources shared between the languages (in Example 2 we lose no precision since SL has no effects, but the approximation is reflected in the effect annotation ∅ of the type of the parameter of main). The following code, constructed mechanically given the inferred and expected types of applyToInt, coerces the actual type of applyToInt to the approximation used in the typing of main and performs the top-level application, thus linking the modules. letH

S

applyToInt Glue = hΛt ≤ S. λS f : Int − → Int. h@ applyToInt[t] fi∅ iH t S

S

: ∀t≤ S. (Int − → Int) − → Int ∅ t in @ main applyToInt Glue More precise inference of the resulting effects is possible when the external function is a pre-compiled library routine whose RL type (with its precise effect annotations) is available when compiling main. In those cases we can take advantage of the letpolymorphism in inferring a type of main (in a setting similar to that of Example 1). However even the approximated effects obtained during separate compilation carry information that can be exploited for the optimization of inter-language calls, observing that the range of effects of a function is limited by the resources of its source language. In Example 2, after inlining and applying results of Section 4.4 (Theorem 2), the code for main can be optimized to eliminate the unnecessary switch to heap allocation in the instance of f S. This yields S

S

→ Int) − → Int. main = hλH applyToInt : ∀t≤ S. (Int − ∅ t letH add1 = hλH x : Int. @ succ xiH

(* now dead code *)

add1 S = hλS x: Int. @ succ xiH in useS (@ (applyToInt[∅]) add1 S)iH Thus the HL function add1 has been effectively specialized for the stack allocation strategy used by SL. Example 3. Another optimization is merging of regions with the same resource requirements, illustrated on the following HL code fragment. external(SL) intFn : Int → Int in intFn (intFn 42) which is naively translated to the RL function (shown after inlining of the parameter wrapper)

124

Zhong Shao and Valery Trifonov S

Λt≤ S. λH intFn : Int − → Int. t letH x = huseS (@ intFn 42)iH in useS (@ intFn x) After combining the two useS (·) constructs the equivalent RL term is S

→ Int. Λt≤ S. λH intFn : Int − t S S use ( let x = h@ intFn 42iS in @ intFn x) A generalization of this transformation makes possible lifting of user (·) constructs out of a loop when the resources r are sufficient for all effects of the loop. Since in general a resource wrapper must restore resources upon return, a tail call moved into its scope effectively becomes non-tail; thus lifting a wrapper’s scope over a recursive tail call is only useful when the wrapper is lifted out of the enclosing function as well, i.e. out of the loop.

4 Semantics of RL 4.1 Static Semantics Correctness of resource use is ensured by the type system shown in Figure 3, which keeps track of the resources necessary for the evaluation of a term and a conservative estimate of the effects of the evaluation. An effect environment ∆ specifies the resource bounds of effect variables introduced by effect abstractions and effect-polymorphic types. The rules for effect sequents reflect the dependence of effects on resources (in this language this boils down to the dependence of the call/cc effect CC on the heap allocation resource H) and form the basis of effect polymorphism. The function MaxEff yields the maximal effect possible with a given resource; in this system we have MaxEff (S) = ∅ and MaxEff (H) = CC. Rule (Eff-max) effectively states that the resource r 0 can be used instead of resource r if r 0 provides for all effects possible under r. In the sequents assigning types to values and terms the type environment Γ maps free variables to types. Type judgments for values associate with a value v and a pair of environments ∆ and Γ only a type σ, since values have no effects and therefore their evaluation requires no resources of the kind we control. The function θ maps constants to their predefined types. Sequents for terms have the form r; ∆; Γ `e e : µ σ, where r represents the available allocation resource, σ is the type of e, and µ represents the effects of its evaluation. Rules (Exp-let) and (Exp-val) establish the correspondence between the resource annotations in these constructs and the currently available allocation resource; the effect of lifting a value to a term is none, while the effect of sequencing two computations via let is the union of their effects. Any effect allowed with the current resource may be added to the effects of a term using rule (Exp-spurious). 0 The central novelty is the user (·) construct for resource manipulation; its typing rule (Exp-use) imposes the crucial restriction that the effect µ of the term e must be

Type-Directed Continuation Allocation

E FFECT E NVIRONMENT F ORMATION (Env-eff-empty) `∆ ∅

(Env-eff-ext) `∆ ∆ ∆ ` ∆t , t ≤ r

T YPE E NVIRONMENT F ORMATION (Env-typ-empty) `∆ ∆ ∆ `Γ ∅

(Env-typ-ext) ∆ `Γ Γ ∆ `σ σ ∆ `Γ Γx , x : σ

E FFECTS (Eff-empty) `∆ ∆ ∆ `µ ∅ ≤ r

(Eff-CC) `∆ ∆ µ ∆ ` CC ≤ H

(Eff-var) `∆ ∆ ∆(t) = r ∆ `µ t ≤ r

(Eff-combine) `µ µ0 ≤ r, µ00 ≤ r ∆ `µ µ0 ∨ µ00 ≤ r

(Eff-max) ∆ `µ µ ≤ r ∆ `µ MaxEff (r) ≤ r0 ∆ `µ µ ≤ r0

(Val-const) ∆ `Γ Γ ∆; Γ `v c : θ(c)

(Val-var) ∆ `Γ Γ Γ (x) = σ ∆; Γ `v x : σ

(Val-abs) ∆ `Γ Γ ∆ `σ σ r; ∆; Γx , x : σ `e e :

T YPES (Typ-fun) ∆ `µ µ ≤ r ∆ `σ σ, σ0 r ∆ `σ σ − → σ0

(Typ-basic) `∆ ∆ ∆ `σ β

µ

(Typ-cont) ∆ `σ σ ∅ `µ CC ≤ r ∆ `σ σ r cont (Typ-poly) `∆ ∆ ∆t , t ≤ r `σ σ ∆ `σ ∀t ≤ r. σ T ERMS (Exp-let) r; ∆; Γ `e e :

µ

σ r; ∆; Γx , x : σ `e e0 :

r; ∆; Γ `e letr x = e in e0 :

σ0

∆; Γx `v λr x : σ. e : σ − → σ0 µ

(Val-poly) ∆ `Γ Γ ∆t , t ≤ r; Γ `v v : σ ∆; Γ `v Λt ≤ r. v : ∀t ≤ r. σ (Val-tapp) Γ (x) = ∀t ≤ r. σ ∆ `µ µ ≤ r ∆; Γ `v x[µ] : [µ/t]σ

σ0

σ0

∅

(Exp-spurious) r; ∆; Γ `e e : σ ∆ `µ µ0 ≤ r r; ∆; Γ `e heiµ0 : (Exp-use) r0 ; ∆; Γ `e e :

µ

µ∨µ0

σ

σ ∆ `µ µ ≤ r 0

r; ∆; Γ `e user (e) : µ r

µ∨µ0

µ0

(Exp-val) ∆; Γ `v v : σ r; ∆; Γ `e hvir : σ

µ

VALUES

125

µ

σ

(Exp-app) r ∆ `Γ Γ Γ (x) = σ0 − → σ Γ (x0 ) = σ0 µ

r; ∆; Γ `e @ x x0 :

µ

σ

(Exp-callcc) r ∆ `Γ Γ Γ (x) = σ r cont − →σ µ

r; ∆; Γ `e callcc x :

µ∨CC

σ

(Exp-throw) ∆ `Γ Γ ∆ `σ σ0 Γ (x) = σ r cont Γ (x0 ) = σ r; ∆; Γ `e throw[σ0 ] x x0 : σ0 MaxEff (r)

Fig. 3. The RL type system

126

Zhong Shao and Valery Trifonov

supported by the resource r available before the alternative resource r 0 is selected. This 0 ensures the correctness of the propagation of µ outside the scope of the user (·). The rules for application and callcc set the correspondence between the available resource and the resource required by the invoked function. In addition, (Exp-callcc) and (Exp-throw) specify that the continuation type is annotated with the same resource, which is needed by the context captured in the continuation and therefore must be matched when it is reactivated. The effect of evaluating a callcc includes CC, while the effect of a throw is that of the rest of the computation, which we estimate as the maximal possible with the current resource. By induction on the structure of a typing derivation it follows that if a term has a type in a given environment, it has exactly one type, and the presence of type annotations allows its effective computation, i.e. there exists a function EffTypeOf such that EffTypeOf (r, ∆, Γ, e) = hµ, σi if and only if r; ∆; Γ `e e : σ. µ

We will also use the function TypeOf with the same arguments, returning the type σ only. 4.2 Dynamic Semantics The operational semantics of RL (Figure 4) is defined by means of a variant of the tail-call-safe Ca EK machine (Flanagan et al. [4]). The machine configuration is a tuple he, E, O, ρi where e is the current term to be evaluated, E is the environment mapping variables to machine values, O is a heap of objects (closures), and ρ is a tuple of machine resources. Depending on the allocation strategy used, ρ is either a continuation stack S, recording (as in the original CaEK machine) the context of the evaluation as a sequence of activation records, or a pair of a current continuation k and a continuation heap K. In the latter form k is a continuation handle and K is a mapping from ContHandles to activation records which offers non-sequential access. In neither case does a function application (app) perform additional allocations of activation records, so both strategies are tail-call safe. Machine values are either small constants or pointers into other structures where larger objects are allocated. All closures are allocated on the heap (the function γ at the bottom of the figure shows the details). The activation records created when evaluating a letr -expression may be allocated either on the continuation heap K (transition rule (letH )) or on the continuation stack S (rule (letS )). An activation record represents a continuation, and in our small language there are only three possibilities: the computation either halts or continues by binding a variable to a computed value or by restoring a resource. Rules (valH ) and (valS ) perform the binding, depending on the allocation mode. The evaluation of user (e) selects the activation record allocation strategy for e, e.g. useS (e) selects stack-based allocation for e (transition rule (useS )). When the current allocation resource is already r we define user (·) as a no-op; if a change of resource is performed, an activation record is pushed on (the top of) the new allocation resource. Correspondingly, heap-based allocation is restored by transition rule (resumeH ) after the evaluation of e.

Type-Directed Continuation Allocation

S EMANTIC D OMAINS MachineVal 3 w ::= Const c | Ptr h | Cont k E ∈ Var → MachineVal h ∈ HeapLocs Object 3 o ::= Closure hx, e, Ei | TyAbs ht, r, vi O ∈ HeapLocs → Object k ∈ ContHandles ActRcd 3 a ::= Bind hx, e, E, ki | Resume S | Halt K ∈ ContHandles → ActRcd S ::= Bind hx, e, E, Si | Resume hk, Ki | Halt

127

machine values environment heap locations closures (objects) object heap continuation handles activation records activation record heap activation record stack

T RANSITION RULES h@ x1 x2 , E, O, ρi 7→1 he0 , E 0 [x0 7→ E(x2 )], O, ρi where E(x1 ) = Ptr h, O(h) = Closure hx0 , e0 , E 0 i

(app) FOR

H EAP -A LLOCATED ACTIVATION R ECORDS

(letH )

hletH x = e1 in e2 , E, H, hk, Kii 7→1 he1 , E, H, hk0 , K[k0 7→ Bind hx, e2 , E|F V (e2 )−x , ki]ii

(valH )

hhviH , E, H, hk, Kii 7→1 he0 , E 0 [x0 7→ w], O0 , hk0 , Kii where K(k) = Bind hx0, e0 , E 0 , k0 i, hw, O0 i = γ (v, E, O)

(callcc)

hcallcc x, E, H, hk, Kii 7→1 he0 , E 0 [x0 7→ Cont k], O, hk, Kii where E(x) = Ptr h, O(h) = Closure hx0 , e0 , E 0 i

(throw)

hthrow[σ] x1 x2 , E, H, hk, Kii 7→1 he0 , E 0 [x0 7→ E(x2 )], O, hk0 , Kii where E(x1 ) = Cont k1 , K(k1 ) = Bind hx0 , e0 , E 0 , k0 i

(useS )

huseS (e), E, H, hk, Kii 7→1 he, E, H, hResume hk, Kiii

(resumeS ) FOR

hhviH , E, H, hk, Kii 7→1 hhviS , E, H, hSii where K(k) = Resume S

S TACK -A LLOCATED ACTIVATION R ECORDS

(letS )

hletS x = e1 in e2 , E, H, hSii 7→1 he1 , E, H, hBind hx, e2 , E|F V (e2 )−x , Siii

(valS )

hhviS , E, H, hBind hx0 , e0 , E 0 , Siii 7→1 he0 , E 0 [x0 → 7 w], O0 , hSii where hw, H 0 i = γ (v, E, O)

(useH ) (resumeH )

huseH (e), E, H, hSii 7→1 he, E, H, hk, [k 7→ Resume S]ii hhviS , E, H, hResume hk, Kiii 7→1 hhviH , E, H, hk, Kii

R EPRESENTATION OF VALUES γ (c, E, O)=hConst c, Oi γ (λr x : σ. e, E, O)=hPtr h, O[h 7→ Closure hx, e, E|F V (e)−x i]i γ (x, E, O)=hE(x), Oi

γ (Λt ≤ r. v, E, O)=hPtr h, O[h 7→ TyAbs ht, r, vi]i where h ∈ / Dom (O) 0

0

γ (x[µ], E, O) = γ ([µ/t]v, E, O) if E(x) = Ptr h , O(h ) = TyAbs ht, r, vi, and `µ µ ≤ r

Fig. 4. Semantics of RL

128

Zhong Shao and Valery Trifonov

Another no-op is the increase of effect sets h·iµ which only serves type-checking purposes. 4.3 Soundness of the Type System The type system maintains the property that the effects of well-typed programs are possible with their available resources, formalized in the following statement, proved by induction on the typing derivation. Lemma 1. If r; ∆; Γ `e e : µ σ is a valid typing judgment, then ∆ `µ µ ≤ r. Semantically this behavior of well-typed programs is expressed as soundness with respect to resource use, extending the standard soundness for safety of the type system, in the following theorem. Theorem 1. If r; ∅; ∅ `e e : µ σ, then the configuration he, ∅, ∅, Halt r i either diverges or evaluates to the configuration hhvir , E, O, hHalt r ii (for some v, E and O), where 4

4

Halt S = hHalt i, and Halt H = hk, Ki for some k and K such that K(k) = Halt . This result is a corollary of the standard properties of progress and subject reduction of the system, the proofs of which we sketch below. To simplify the proofs, we introduce a type-annotated version of the semantics, which maintains type information embedded in the runtime representation. Thus the representation of an abstraction in the typeannotated version is γ (λr x : σ. e, E, O) = hPtr h, O[h 7→ Closure0 hr, x, σ, e, E|F V (e)−x i]i In addition, the runtime environment E is extended to keep the type of each value in its codomain; the value component of E is denoted by V E and the type component by TE. The following definitions are helpful in defining typability of configurations. Definition 1. The bottom bot (ρ) of an allocation resource ρ is defined as follows: 1. if ρ = hSi, then bot (ρ) = bot (S 0 ), if S = Bind hx0 , e0 , E 0 , S 0 i, and bot (ρ) = S otherwise; 2. if ρ = hk, Ki, then bot (ρ) = bot (hk 0 , Ki), if K(k) = Bind hx0 , e0 , E 0 , k 0 i, and bot (ρ) = K(k) otherwise. Definition 2. The outermost continuation heap outerCont(ρ) reachable from allocation resource ρ is 1. 2. 3. 4.

K if ρ = hk, Ki and bot (ρ) = Halt; outerCont(hSi) if ρ = hk, Ki and bot (ρ) = Resume S; ∅, if ρ = hSi and bot (ρ) = Halt; outerCont(hk, Ki) if ρ = hSi and bot (ρ) = Resume hk, Ki.

Type-Directed Continuation Allocation

129

Definition 3. A configuration closed in type environment Γ is typable under resource r with a result type σ and an effect µ, written r; Γ `c he, E, O, ρi : µ σ, if for some σ 0 , µ0 1. Dom (Γ ) ∩ Dom (E) = ∅; and 2. r; ∅; Γ, TE `e e : µ0 σ 0 ; and r

→0 σ; and 3. Γ `ρ hρ, E, Oi ∈ σ 0 ,− µ

4. for each x ∈ Dom (E), (a) if V E(x) = Const c, then TE(x) = θ(c); (b) if V E(x) = Ptr h and O(h) = Closure0 hr1 , x1 , σ1 , e1 , E1 i, then ∅;T E1 `v λr1 x1 : σ1. e1 : TE(x), and similarly for type abstractions; (c) if V E(x) = Cont k, then TE(x) = σ1 r1 cont and r1

→ σ10 Γ `ρ hk, outerCont(ρ)i, E, O ∈ σ1 ,− µ1

and µ = µ1 ∨ µ01 , for some σ10 and µ01 , r

→ σ if and Γ `ρ hρ, E, Oi ∈ σ 0 ,− µ

1. r = S and ρ = hHalti (i.e. an empty stack) and σ = σ 0 and µ = ∅; or 2. r = S and ρ = hBind hx1 , e1 , E1 , S1 ii and S; Γ, x1 : σ 0 `c he1 , E1 , O, S1 i : µ σ; or H → σ, 3. r = S and ρ = hResume hk 0 , K 0 ii and Γ `ρ hhk 0 , K 0 i, E, Oi ∈ σ 0 ,− µ

and similarly for r = H. Note that the environment may contain reachable variables bound to continuations even when the current allocation resource is a stack. Type correctness of these continuations cannot be verified with the stack resource, instead we have to find the corresponding continuation heap. However in this case the type system guarantees that the only continuation heap to which there are references in the environment is the outermost continuation heap, if such exists. The reason is that although it is possible to switch to heap allocation after executing in stack allocation mode, there are no invocations of callcc allowed since they would introduce the CC effect, which is not possible under the stack resource (cf. typing rule (Exp-use) in Figure 3). We can now formulate the progress and subject reduction properties. Lemma 2 (Progress). If r; ∅ `c he, E, O, ρi : µ σ where r corresponds to ρ (i.e. r = S if ρ = hSi, r = H if ρ = hk, Ki), and ρ 6= Halt r , then there exists C such that he, E, O, ρi 7→1 C. Lemma 3 (Subject Reduction). If C = he, E, O, ρi and r; ∅ `c C : µ σ where r corresponds to ρ, and C 7→1 C 0 = he0 , E 0 , O 0 , ρ0 i, then r 0 ; ∅ `c C 0 : µ0 σ where r 0 corresponds to ρ0 , µ = µ0 ∨ µ01 , and the rule for this transition is (callcc) only if µ = CC ∨ µ00 , for some µ01 and µ00 .

130

Zhong Shao and Valery Trifonov

In brief, in the case when e 6= hvir , the proofs proceed by examining the structure of the typing derivation for r; ∅; Γ, TE `e e : µ0 σ 0 ; together with condition 4 of Definition 3 this yields that the values in the environment and on the heaps have the correct shape for the appropriate transition rule. In the case when e has the form hvir the proofs r → σ, which parallels the inspect the structure of the derivation of Γ `ρ hρ, E, Oi ∈ σ 0 ,− µ

decision tree for the transition rules (val) and (resume) and the halting state. 4.4 Resource Transformations Effect inference and type correctness with respect to resource use allow the compiler to modify the continuation allocation strategy of a program fragment and preserve its meaning. The following definitions adapt the standard notions of ordering and observational equivalence of open terms to the resource-based system. Definition 4. A context C is a term with a hole •; the result of placing a term e in the hole of C is denoted by C[e] and may result in capturing effect and lambda variables free in e. The hole of a context C is of type (r, ∆, Γ ) ⇒ µ σ if C[e] is typeable whenever r; ∆; Γ `e e : µ σ. Definition 5. S; ∆; Γ `e e v e0 : µ σ if for all contexts C with hole of type (r, ∆, Γ ) ⇒ µ σ, all typed environments E closing C[e] and heaps O closing E, and continuation stacks S, the configuration hC[e0 ], E, O, hSii converges if hC[e], E, O, hSii converges. Furthermore, S; ∆; Γ ` e e ≈ e0 : µ σ if S; ∆; Γ ` e e v e0 : µ σ and S; ∆; Γ `e e0 v e : µ σ. One possible optimization is the conversion of heap-allocating code to stack-based strategy provided the code does not invoke callcc or throw, as per the following theorem. Theorem 2. If H; ∆; Γ `e e : ∅ σ, then S; ∆; Γ `e useH (e) ≈ StkCont ∆ (e; Γ ) : ∅ σ, where StkCont is the transformation defined as follows. StkCont ∆ (hviH ; StkCont ∆ (heiµ ; StkCont ∆ (useH (e); StkCont ∆ (useS (e); StkCont ∆ (@ x1 x2 ;

Γ ) = hviS Γ ) = hStkCont ∆ (e; Γ )iµ Γ ) = StkCont ∆ (e; Γ ) Γ) = e Γ ) = letS x01 = hλS x02 : Γ (x2). useH (@ x1 x02 )iS in @ x01 x2 H StkCont ∆ (let x = e1 in e2 ; Γ ) = letS x = StkCont ∆ (e1 ; Γ ) in StkCont ∆ (e2 ; Γx , x : TypeOf (H, ∆, Γ, e2))

5 Translation from HL to RL Programs in language L ∈ {HL, SL} are translated into RL by an algorithm shown in Figure 5. The algorithm infers the effect and resource annotations of a term using fairly

Type-Directed Continuation Allocation

131

standard techniques. It is presented in the form of an inference system for judgments of the form ∆; Γ `L eHL ⇒ ∆0 ` e : µ σ, where eHL , ∆, and Γ are inputs corresponding respectively to the L term to translate (also overloaded to HL top-level programs) and the inherited effect and type environments, initially empty. The outputs of the translation are e, ∆0 , µ, and σ, which stand for the translated term, the inferred effect environment, and the effect and type of e in environments ∆0 and Γ ; thus the output of the algorithm satisfies H; ∆0 ; Γ `e e : µ σ. The function R maps a language name to the resources available to a program in this language: R(HL) = H and R(SL) = S. Several auxiliary functions are shown in the figure, and the definitions of several simpler functions are as follows. The lub of two resources is defined by r t r = r and S t H = H. The function u for merging two effect environments is defined as (∆1 u ∆2 )(t) = ∆1 (t) t ∆2(t) if t ∈ Dom (∆1 ) ∩ Dom (∆2 ), and (∆1 u ∆2)(t) = ∆i (t) on the rest of Dom (∆1 ) ∪ Dom (∆2 ). The free effect variables of a type σ are denoted by fev(σ); the function Close(σ, ∆, Γ ) returns the pair h∀ti ≤ ∆(ti ). σ, ∆\{ti} i, where {ti } = fev (σ) − fev (Γ ), and similarly we have CloseAll (σ, r) = ∀ti ≤ r. σ where {ti } = fev (σ). Separately compiled external functions are treated as parameters of the compiled HL fragment and are wrapped to convert the HL resources (continuation heap) to SL resources (continuation stack). The wrapping is performed by an auxiliary function 0 invoked as Wrap rr (C, x, σ), which produces a term coercing x from type σ to type 0 ConvertType rr (σ) with resource annotations r 0 in place of r, and places it in context C. When compiling separately, the effects of an external function are approximated conservatively by applying Max r to the effect-annotated declared type of the function; r r → σ2 , and σ otherwise. by definition Max r (σ) is σ1 −−−−−→ Max r (σ2 ) when σ = σ1 − MaxEff (r)

µ

This allows the view of external functions as effect-polymorphic without restricting their actual implementations.

6 Related Work and Conclusions The work presented in this paper is mainly inspired by recent research on effect inference [5,10,11,23,24], efficient implementation of first-class continuations [2,8,20,1], monads and modular interpreters [30,12,29,13], typed intermediate languages [7,25], [21,17,16,3], and foreign function call interface [9,18]. In the following, we briefly explain the relationship of these work with our resource-based approach. – Effect systems. The idea of using effect-based type systems to support language interoperation was first proposed by Gifford and Lucassen [6,5]. Along this direction, many researchers have worked on various kinds of effect systems and effect inference algorithms [10,11,23,24,28]. The main novelty of our effect system is that we imposed a “resource-based” upper-bound to the effect variables. Effect variables in all previous effect systems are always universally quantified without any upper bounds, so they can be instantiated into any effect expressions. Our system limits the quantification over a finite set of resources—this allows us to take advantage of the effect-resource relationship to support advanced compilation strategies.

132

Zhong Shao and Valery Trifonov

(Translate-external) σ0 = CloseAll (Max S (AnnotateS (τ, Dom (∆))), S) σ00 = CloseAll (AnnotateH (τ, Dom (∆)), S) ∆; Γx , x : σ00 `HL p ⇒ ∆0 ` e0 : ∆; Γ `HL external(SL) x : τ in p

µ

σ

H

0 0 0 ⇒ ∆0 ` λH x : σ0 . letH x = Wrap H → σ) S (•, x, σ ) in e : ∅ (σ − µ

where

Annotater (β, V ) = β Annotater (τ cont, V ) = (Annotater (τ, V )) r cont r Annotater (τ → τ 0 , V ) = σ − → σ0 where t ∈ EffVar − V, t σ = Annotater (τ, V ∪ {t}), σ0 = Annotater (τ, V ∪ {t} ∪ fev(σ)) 0 0 0 r0 00 00 Wrap r (C, x, ∀t ≤ r . σ) = Λt ≤ r . Wrap rr (C[letr x0 = hx[t]ir in •], x0 , σ) 0 0 0 0 r Wrap rr (C, x, σ1 − → σ2 ) = λr x1 : σ10 . letr x01 = hWrap rr0 (•, x1 , σ10 )ir µ 0 0 in Wrap rr (C[letr x2 = @ x x01 in •], x2 , σ2 ) 0 r0 where σ1 = ConvertType r (σ1 ) 0 0 Wrap rr (C, x, β) = C[hxir ]

(Translate-app) ∆; Γ `L e1 ⇒ ∆1 ` e01 :

µ1

σ1 ∆1 ; Γ `L e2 ⇒ ∆2 ` e02 :

H

µ2

σ2 ∆0 ; S ` σ1 ∼ (σ2 − → α) t

t∈ / fev(σ1 ) ∪ fev(σ2 ) ∪ Dom (∆2 ) x1 ∈ / FV (e02 ) 0 0 H 0 H 0 ∆2 u ∆ ; Γ `L e1 e2 ⇒ ∆ ` let x1 = e1 in let x2 = e2 in @ x1 x2 :

µ1 ∨µ2 ∨St

where ∅; [σ/α] ` α ∼ σ

Sα

∆1 ; S1 ` σ1 ∼ σ10 ∆2 ; S2 ` S1 σ2 ∼ S1 σ20 S = mgu(S2 µ, S2 µ0 ) r r ∆1 u ∆2 ; S ` σ 1 − → σ2 ∼ σ10 − →0 σ20 µ

∆; S ` σ1 ∼ σ2 ∆; S ` σ1 r cont ∼ σ2 r cont

µ

∆; S ` σ ∼ σ0 ∆0 = MinEnv(St ≤ r) ∆ u ∆0 ; S\{t} ` ∀t ≤ r. σ ∼ σ0

MinEnv (t ≤ r) = t ≤ r MinEnv (µ1 ∨ µ2 ≤ r) = MinEnv (µ1 ≤ r)uMinEnv (µ2 ≤ r) MinEnv (∅ ≤ r) = ∅ MinEnv (CC ≤ H) = ∅ (Translate-let) ∆; Γ `L e1 ⇒ ∆1 ` e01 : µ1 σ1 hσ10 , ∆2 i = Close(σ1 , ∆1 , Γ ) ∆2 ; Γx , x : σ10 `L e2 ⇒ ∆0 ` e02 : 0

H

∆; Γ `L let x = e1 in e2 ⇒ ∆ ` let x =

e01

in

e02

:

µ1 ∨µ2

µ2

σ2

(Translate-abs) σ = AnnotateH (τ, Dom (∆)) ∆; Γx , x : σ `L e ⇒ ∆0 ` e0 : H

µ

∆; Γ `L λx : τ. e ⇒ ∆0 ` hλH x : σ. e0 iH : (σ − → σ0 ) ∅

(Translate-callcc) ∆; Γ `HL e ⇒ ∆0 ` e0 :

µ

H

µ

(σ H cont − →0 σ) µ

∆; Γ `HL callcc e ⇒ ∆0 ` letH x = e0 in callcc x :

µ∨µ0 ∨CC

Fig. 5. Typed translation from HL to RL

σ2

σ

σ0

Type-Directed Continuation Allocation

133

– Efficient call/cc. Many people have worked on designing various strategies to support efficient implementation of first-class continuations [2,8,20,1]. To support a reasonably efficient call/cc, compilers today mostly use “stack chunks” (a linked list of smaller stacks) [2,8] or they simply heap allocate all activation records [20]. Both of these representations are incompatible with those used by traditional languages such as C and C++ where activation records are allocated on a sequential stack. First-class continuations thus always impose restrictions and interoperability challenges to the underlying compiler. In fact, many existing compilers choose not to support call/cc, simply because call/cc is not compatible with standard C calling conventions. The techniques presented in this paper provide opportunities to support both efficient call/cc and interoperability with code that use sequential stacks. – Threads. Implementing threads does not necessarily require first-class continuations but only an equivalent of one-shot continuations [1]. A finer distinction between these classes of continuations is useful, however the issues of incorporating linearity in the type system to ensure safety in the presence of one-shot continuations are beyond the scope of this paper. – Monads and modular interpreters. The idea of using resources and effects to characterize the run-time configuration of a function is inspired by recent work on monad-based interactions and modular interpreters [30,12,29,13]. Unlike in the monadic approach, our system provides a way of switching the runtime context “horizontally” from one to another via the user (e) construct. – Typed intermediate languages. Typed intermediate languages have received much attention lately, especially in the HOT (i.e., higher-order and typed) language community. However, recent work [7,15,22,17,3,16,14] has mostly focused on the theoretical foundations and general language design issues. The type system in this paper focused on the problem of compiling multiple source languages into a common typed intermediate format. We plan to incorporate the resource and effect annotations into our FLINT intermediate language [22]. – Foreign function call interface. The interoperability problem addressed in this paper has much in common with frameworks for multi-lingual programming, such as ILU, CORBA [27], and Microsoft’s COM [19]. It also relates to the foreign function call interfaces in most existing compilers [9,18]. Although these work do address many of the low-level problems, such as converting data representations between languages or passing information to remote processes, their implementations do not provide any safety guarantees (or if they do, they would require external programs run in a separate address space). The work presented in this paper focuses on interfacing programs running in the single address space with much higher performance requirements. We emphasize building a safe, efficient, and robust interface across multiple HOT languages. We believe what we have presented in this paper is a good first-step towards a fully formal investigation on the topic of safe fine-grain language interoperations. We have concentrated on the issues of first-class continuations in this paper, but the framework presented here should also apply to handle other language features such as states, exceptions, and non-termination. The effect system described in this paper is also very

134

Zhong Shao and Valery Trifonov

general and useful for static program analysis: because it supports effect polymorphism, effect information is accurately propagated through high-order functions. This is clearly much more informative than the single one-bit (or N-bit) information seen in the simple monad-based calculus [16,26]. There are many hard problems that must be solved in order to support a safe and fine-grained interoperation between ML and safe-C, for instance, the interactions between garbage collection and explicit memory allocation, between type-safe and unsafe language features etc. We plan to pursue these problems in the future. Acknowledgment We are grateful to the anonymous referees for their valuable comments.

References 1. C. Bruggeman, O. Waddell, and K. Dybvig. Representing control in the presence of one-shot continuations. In Proc. ACM SIGPLAN ’96 Conf. on Prog. Lang. Design and Implementation, pages 99–107, New York, June 1996. ACM Press. 2. W. D. Clinger, A. H. Hartheimer, and E. M. Ost. Implementation strategies for continuations. In 1988 ACM Conference on Lisp and Functional Programming, pages 124–131, New York, June 1988. ACM Press. 3. A. Dimock, R. Muller, F. Turbak, and J. B. Wells. Strongly typed flow-directed representation transformations. In Proc. 1997 ACM SIGPLAN International Conference on Functional Programming (ICFP’97), pages 11–24. ACM Press, June 1997. 4. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In Proc. ACM SIGPLAN ’93 Conf. on Prog. Lang. Design and Implementation, pages 237–247, New York, June 1993. ACM Press. 5. D. K. Gifford et al. FX-87 reference manual. Technical Report MIT/LCS/TR-407, M.I.T. Laboratory for Computer Science, September 1987. 6. D. Gifford and J. Lucassen. Integrating functional and imperative programming. In 1986 ACM Conference on Lisp and Functional Programming, New York, August 1986. ACM Press. 7. R. Harper and G. Morrisett. Compiling polymorphism using intensional type analysis. In Twenty-second Annual ACM Symp. on Principles of Prog. Languages, pages 130–141, New York, Jan 1995. ACM Press. 8. R. Hieb, R. K. Dybvig, and C. Bruggeman. Representing control in the presence of first-class continuations. In Proc. ACM SIGPLAN ’90 Conf. on Prog. Lang. Design and Implementation, pages 66–77, New York, 1990. ACM Press. 9. L. Huelsbergen. A portable C interface for Standard ML of New Jersey. Technical memorandum, AT&T Bell Laboratories, Murray Hill, NJ, January 1996. 10. P. Jouvelot and D. K. Gifford. Reasoning about continuations with control effects. In Proc. ACM SIGPLAN ’89 Conf. on Prog. Lang. Design and Implementation, pages 218–226. ACM Press, 1989. 11. P. Jouvelot and D. K. Gifford. Algebraic reconstruction of types and effects. In Eighteenth Annual ACM Symp. on Principles of Prog. Languages, pages 303–310, New York, Jan 1991. ACM Press. 12. J. Launchbury and S. Peyton Jones. Lazy functional state threads. In Proc. ACM SIGPLAN ’94 Conf. on Prog. Lang. Design and Implementation, pages 24–35, New York, June 1994. ACM Press.

Type-Directed Continuation Allocation

135

13. S. Liang, P. Hudak, and M. Jones. Monad transformers and modular interpreters. In Proc. 22rd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 333–343. ACM Press, 1995. 14. G. Morrisett, D. Walker, K. Crary, and N. Glew. From system F to typed assembly language. In Proc. 25rd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, page (to appear). ACM Press, 1998. 15. G. Morrisett. Compiling with Types. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1995. Tech Report CMU-CS-95-226. 16. S. Peyton Jones, J. Launchbury, M. Shields, and A. Tolmach. Bridging the gulf: a common intermediate language for ML and Haskell. In Proc. 25rd Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, page (to appear). ACM Press, 1998. 17. S. Peyton Jones and E. Meijer. Henk: a typed intermediate language. In Proc. 1997 ACM SIGPLAN Workshop on Types in Compilation, June 1997. 18. S. Peyton Jones, T. Nordin, and A. Reid. Green card: a foreign-language interface for Haskell. Available at http://www.dcs.gla.ac.uk:80/ simonpj/green-card.ps.gz, 1997. 19. D. Rogerson. Inside COM: Microsoft’s Component Object Model. Microsoft Press, 1997. 20. Z. Shao and A. W. Appel. Space-efficient closure representations. In 1994 ACM Conference on Lisp and Functional Programming, pages 150–161, New York, June 1994. ACM Press. 21. Z. Shao. An overview of the FLINT/ML compiler. In Proc. 1997 ACM SIGPLAN Workshop on Types in Compilation, June 1997. 22. Z. Shao. Typed common intermediate format. In Proc. 1997 USENIX Conference on Domain Specific Languages, pages 89–102, October 1997. 23. J.-P. Talpin and P. Jouvelot. Polymorphic type, region, and effect inference. Journal of Functional Programming, 2(3), 1992. 24. J.-P. Talpin and P. Jouvelot. The type and effect discipline. Information and Computation, 111(2):245–296, June 1994. 25. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM SIGPLAN ’96 Conf. on Prog. Lang. Design and Implementation, pages 181–192. ACM Press, 1996. 26. D. Tarditi. Design and Implementation of Code Optimizations for a Type-Directed Compiler for Standard ML. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, December 1996. Tech Report CMU-CS-97-108. 27. The Object Management Group. The common object request broker: Architecture and specifications (CORBA). Revision 1.2., Object Management Group (OMG), Framingham, MA, December 1993. 28. M. Tofte and J.-P. Talpin. Implementation of the typed call-by-value λ-calculus using a stack of regions. In Proc. 21st Annual ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 188–201. ACM Press, 1994. 29. P. Wadler. The essence of functional programming (invited talk). In Nineteenth Annual ACM Symp. on Principles of Prog. Languages, New York, Jan 1992. ACM Press. 30. P. Wadler. How to declare an imperative (invited talk). In International Logic Programming Symposium, Portland, Oregon, December 1995. MIT Press.

Polymorphic Equality – No Tags Required Martin Elsman Department of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark [email protected]

Abstract. Polymorphic equality is a controversial language construct. While being convenient for the programmer, it has been argued that polymorphic equality (1) invites to violation of software engineering principles, (2) lays a serious burden on the language implementor, and (3) enforces a runtime overhead due to the necessity of tagging values at runtime. We show that neither (2) nor (3) are inherent to polymorphic equality by showing that one can compile programs with polymorphic equality into programs without polymorphic equality in such a way that there is no need for tagging or for runtime type analysis. Also, the translation is the identity on programs that do not use polymorphic equality. Experimental results indicate that even for programs that use polymorphic equality, the translation gives good results.

1

Introduction

Often, statically typed languages, like ML, provide the programmer with a generic function for checking structural equality of two values of the same type. To avoid the possibility of testing functional values for equality, the type system of Standard ML [11] distinguishes between ordinary type variables, which may be instantiated to any type, and equality type variables, which may be instantiated only to types that admit equality (i.e., types not containing ordinary type variables or function types). In this paper, we show how polymorphic equality may be eliminated entirely in the front-end of a compiler by a type based translation called equality elimination. The translation is possible for expressions that are typable according to the Standard ML type discipline[11]. We make three main contributions: 1. Identiﬁcation and application of equality elimination in a call-by-value language without garbage collection, including treatment of parametric datatypes and side eﬀects. Equality elimination eliminates the last obligation for tagging values in Standard ML and opens for eﬃcient data representations and easier foreign language interfacing. 2. Measurements of the eﬀect of equality elimination in the ML Kit with Regions [23,3,5] (from hereon just the Kit) and a discussion of the possibilities for data representations made possible by the translation. 3. Demonstration of semantic correctness of the translation. It has been considered non-trivial to demonstrate semantic correctness for type classes in Haskell [13, Sect. 4]. X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 136–156, 1998. c Springer-Verlag Berlin Heidelberg 1998

Polymorphic Equality – No Tags Required

137

As an example of equality elimination, consider the following ML program, which declares the function member using polymorphic equality to test if a given value is among the elements of a list: let fun member y [] = false | member y (x::xs) = (y = x) orelse member y xs in (member 5 [3,5], member true [false]) end The function member gets type scheme ∀ε.ε → ε list → bool , where ε is an equality type variable (i.e., a type variable that ranges over equality types). In the example, the function member is used with instances int and bool , which both admit equality. On the other hand, the function map, presented below, gets type scheme ∀αβ.(α → β) → α list → β list, where α and β are ordinary type variables and hence may be instantiated to any particular type. fun map f [] = [] | map f (x::xs) = f x :: map f xs To eliminate polymorphic equality, it is possible to pass extra arguments to equality polymorphic functions as member above – one for each abstracted equality type variable in the type scheme for the function. Using type information, the example is translated into the program let fun member eq y [] = false | member eq y (x::xs) = eq (y,x) orelse member eq y xs in (member eq_int 5 [3,5], member eq_bool true [false]) end For each use of an equality polymorphic function, appropriate instances of the equality primitive are passed as arguments. In the translated program above, eq int and eq bool denote primitive equality functions for testing integers and booleans for equality. These primitive functions are functions on base types and can be implemented eﬃciently by the backend of a compiler without the requirement that values be tagged. An important property of the translation is that it is the identity on expressions that do not use polymorphic equality. Thus, one pays for polymorphic equality only when it is used. In particular, the translation is the identity on the map function. In the next section, we give an overview of related work. The language that we consider is described in the sections to follow. We then proceed to present a translation for eliminating polymorphic equality. In Sect. 7 and Sect. 8, we demonstrate type correctness and semantic soundness of the translation. In Sect. 9 and Sect. 10, we show how the approach is extended to full ML and how it is implemented in the Kit. We then proceed to present experimental results. Finally, we conclude.

138

2

Martin Elsman

Related Work

A type based dictionary transformation similar to equality elimination allows type classes in Haskell to be eliminated at compile time [25,13,16]. However, the motivation for equality elimination is diﬀerent from the motivation behind the dictionary transformation, which is to separate dictionary operations from values at runtime. In lazy languages such as Haskell, tagging cannot be eliminated even if tag-free garbage collection is used. A more aggressive elimination of dictionaries is possible by generating specialised versions of overloaded functions [8]. This technique does not work well with separate compilation and may lead to unnecessary code duplication. No work on dictionary transformations demonstrates semantic soundness. Harper and Stone present an alternative semantics to Standard ML in terms of a translation into an intermediate typed language [6]. Similar to the translation we present here, polymorphic equality is eliminated during the translation. However, because the semantics of their source language is given by the translation, they cannot show correctness of the translation. Ohori demonstrates how Standard ML may be extended with polymorphic record operations in such a way that these operations can be translated into eﬃcient indexing operations [14]. His translation is much similar to equality elimination in that record indices are passed to instantiations of functions that use record operations polymorphically; Ohori demonstrates both type correctness and semantic soundness for the approach. The TIL compiler, developed at Carnegie Mellon University, uses intensional polymorphism and nearly tag-free garbage collection to allow tag-free representations of values at runtime [20]. An intermediate language of TIL allows a function to take types as arguments, which can then be inspected by the function. This means that polymorphic equality can be encoded in the intermediate language, thus, eliminating the primitive notion of polymorphic equality. However, for nearly tag-free garbage collection, records and other objects stored in the heap are still tagged in order for the garbage collector to trace pointers. It has been reported, however, that the nearly tag-free scheme can be extended to an entirely tag-free scheme [21].

3

Language and Types

We consider a typed lambda calculus extended with pairs, conditionals, a polymorphic equality primitive, and a let-construct to allow polymorphic bindings. First, we introduce some terminology. A ﬁnite map is a map with ﬁnite domain and if f and g are such maps we denote by Dom(f ) the domain of f and by Ran(f ) the range of f . Further, we write f + g to mean the modiﬁcation of f by g with domain Dom(f ) ∪ Dom(g) and values (f + g)(x) = if x ∈ Dom(g) then g(x) else f (x). We assume a denumerably inﬁnite set of equality type variables, ranged over by ε, and a denumerably inﬁnite set of ordinary type variables, ranged over by

Polymorphic Equality – No Tags Required

139

α. Types, ranged over by τ , and type schemes, ranged over by σ, are deﬁned as follows: τ ::= ε | α | τ1 → τ2 | τ1 × τ2 | bool σ ::= ∀εα .τ A type τ admits equality if either τ = bool or τ = ε or τ = τ1 × τ2 and τ1 and τ2 admit equality. 3.1

Substitutions

A substitution S is a pair (S ε , S α ), where S ε is a ﬁnite map from equality type variables to types such that, for all τ ∈ Ran(S ε ), τ admits equality and S α is a ﬁnite map from ordinary type variables to types. When A is any object and S = (S ε , S α ) is a substitution, we write S(A) to mean simultaneous capture free substitution of S ε and S α on A. For any type scheme σ = ∀ε1 · · · εn α1 · · · αm .τ and type τ , we say that τ is an instance of σ (via S), written σ ≥ τ , if there exists a substitution S = ({ε1 → τ1 , . . . , εn → τn }, {α1 → τ1 , . . . , αm → τm }) such that S(τ ) = τ . ]). Pairs The instance list of S, written il (S), is the pair ([τ1 , . . . , τn ], [τ1 , . . . , τm of the above form are referred to as instance lists and we use il to range over them. When A is any object, we denote by ftv(A) a pair of a set of equality type variables free in A and a set of ordinary type variables free in A. Further, we denote by fetv(A) the set of equality type variables that occur free in A. 3.2

Typed Expressions

In the following, we use x and y to range over a denumerably inﬁnite set of lambda variables. The grammar for typed expressions is as follows: e ::= λx : τ.e | e1 e2 | (e1 , e2 ) | πi e | xil | let x : σ = e1 in e2 | true | false | if e then e1 else e2 | eqτ We sometimes abbreviate x([],[]) with x.

4

Static Semantics

The static semantics for the language is described by a set of inference rules. Each of the rules allows inferences among sentences of the form ∆, TE e : τ , where, ∆ is a set of equality type variables, TE is a type environment, mapping lambda variables to type schemes, e is a typed lambda expression, and τ is a type. Sentences of this form are read “under assumption (∆, TE ), the expression e has type τ .” A type τ is well-formed with respect to a set of equality type variables ∆, written ∆ τ , if ∆ ⊇ fetv(τ ). Moreover, an instance list il = ]) is well-formed with respect to a set of equality type ([τ1 , . . . , τn ], [τ1 , . . . , τm variables ∆, written ∆ il , if ∆ τi , i = 1..n and ∆ τi , i = 1..m.

140

Martin Elsman

∆, TE e : τ

Expressions ∆, TE + {x → τ} e : τ (1) ∆, TE λx : τ.e : τ → τ

∆, TE e1 : τ1 ∆, TE e2 : τ2 (3) ∆, TE (e1 , e2 ) : τ1 × τ2

i ∈ {1, 2} ∆, TE e : τ1 × τ2 (4) ∆, TE πi e : τi

σ = ∀εα .τ ftv(εα ) ∩ ftv(∆, TE ) = ∅ ∆ ∪ fetv(ε), TE e1 : τ ∆, TE + {x → σ} e2 : τ (6) ∆, TE let x : σ = e1 in e2 : τ

TE (x) ≥ τ via S ∆ il (S) (5) ∆, TE xil (S) : τ

∆, TE true : bool

∆, TE e1 : τ1 → τ2 ∆, TE e2 : τ1 (2) ∆, TE e1 e2 : τ2

(7)

∆, TE false : bool

(9)

∆, TE e : bool ∆, TE e2 : τ ∆, TE e1 : τ (8) ∆, TE if e then e1 else e2 : τ ∆τ τ admits equality (10) ∆, TE eqτ : τ × τ → bool

There are only a few comments to note about the rules. In the rule for applying the equality primitive to values of a particular type, we require the type to be well-formed with respect to quantiﬁed equality type variables. Similarly, in the variable rule, we require the instance list be well-formed with respect to quantiﬁed equality type variables. For simplifying the type system in languages with imperative updates and polymorphism, there is a tendency to restrict polymorphism to bindings of nonside-eﬀecting terminating expressions. This tendency is known as the value restriction, which is enforced by both the Objective Caml system [10] and the Standard ML language [11]. To simplify the presentation, we do not enforce the value restriction in rule 6. We return to this issue later, in Sect. 8.

5

Dynamic Semantics

The dynamic semantics for the language is, as the static semantics, described by a set of inference rules. An untyped expression may be obtained from a typed expression by eliminating all type information. In the rest of this section, we use e to range over untyped expressions.

Polymorphic Equality – No Tags Required

141

A dynamic environment, E, maps lambda variables to values, which again are deﬁned by the grammar: v ::= clos(λx.e, E) | true | false | (v1 , v2 ) | eq The rules of the dynamic semantics allow inferences among sentences of the forms E e ⇓ v and eq (v1 , v2 ) ⇓ v, where E is a dynamic environment, e is an untyped expression, and v, v1 , and v2 are values. Sentences of the former form are read “under assumptions E, the expression e evaluates to v.” Sentences of the latter form are read “equality of values v1 and v2 is v.”

E e⇓v

Expressions

E(x) = v (11) E x⇓v

E λx.e ⇓ clos(λx.e, E)

E e1 ⇓ clos(λx.e, E0 ) E e2 ⇓ v → v} e ⇓ v E0 + {x (13) E e1 e2 ⇓ v

E true ⇓ true

E eq ⇓ eq

E e1 ⇓ eq E e2 ⇓ v eq v ⇓ v (14) E e1 e2 ⇓ v

(15)

(17)

(12)

E false ⇓ false

(16)

E e1 ⇓ v1 E e2 ⇓ v2 (18) E (e1 , e2 ) ⇓ (v1 , v2 )

i ∈ {1, 2} E e ⇓ (v1 , v2 ) (19) E πi e ⇓ vi

E e ⇓ false E e2 ⇓ v (21) E if e then e1 else e2 ⇓ v

E e ⇓ true E e1 ⇓ v (20) E if e then e1 else e2 ⇓ v E e1 ⇓ v1 E + {x → v1 } e2 ⇓ v2 (22) E let x = e1 in e2 ⇓ v2

142

Martin Elsman

eq (v1 , v2 ) ⇓ v

Equality of Values v1 = v2 v1 , v2 ∈ {true, false} (23) eq (v1 , v2 ) ⇓ true v1 = v2 v1 , v2 ∈ {true, false} (25) eq (v1 , v2 ) ⇓ false

eq (v11 , v21 ) ⇓ false (24) eq ((v11 , v12 ), (v21 , v22 )) ⇓ false

eq

eq (v11 , v21 ) ⇓ true eq (v12 , v22 ) ⇓ v (26) ((v11 , v12 ), (v21 , v22 )) ⇓ v

The dynamic semantics includes rules for the polymorphic equality primitive (rules 23 through 26). If the equality primitive is only ever applied to values of type bool (if rules 24 and 26 are not used), the primitive need not distinguish booleans from values of pair type and no runtime tagging is required.

6

Equality Elimination

In this section, we present inference rules for translating typable expressions into typable expressions for which the equality primitive is used only with instance bool . The translation is the identity for typable expressions that do not use polymorphic equality. A translation environment, E, is a ﬁnite map from equality type variables to lambda variables. We occasionally need to construct a function for checking structural equality on a pair of values of the same type. We deﬁne a relation that allows inferences among sentences of the form E eq τ ⇒ e, where E is a translation environment, τ is a type, and e is an expression. Sentences of this form are read “under the assumptions E, e is an equality function for values of type τ .” E eq τ ⇒ e

Equality Function Construction E(ε) = x (27) E eq ε ⇒ x

E eq bool ⇒ eqbool

(28)

E eq τ1 ⇒ e1 E eq τ2 ⇒ e2 x fresh e = e2 (π2 (π1 x), π2 (π2 x)) e = if e1 (π1 (π1 x), π1 (π2 x)) then e else false (29) E eq τ1 × τ2 ⇒ λx : (τ1 × τ2 ) × (τ1 × τ2 ).e

Polymorphic Equality – No Tags Required

143

Each rule for the translation of expressions allows inferences among sentences of the form E e ⇒ e , where e and e are expressions and E is a translation environment. Sentences of this form are read “under the assumptions E, e translates to e .” E e ⇒ e

Expressions E e ⇒ e (30) E λx : τ.e ⇒ λx : τ.e

E e1 ⇒ e1 E e2 ⇒ e2 (31) E e1 e2 ⇒ e1 e2

E e1 ⇒ e1 E e2 ⇒ e2 (32) E (e1 , e2 ) ⇒ (e1 , e2 )

E e ⇒ e (33) E πi e ⇒ πi e

il = ([τ1 , . . . , τn ], [. . .]) n ≥ 0 E eq τi ⇒ ei i = 1..n (34) E xil ⇒ (· · · (xil e1 ) · · · en )

E eq τ ⇒ e (35) E eqτ ⇒ e

.τ y1 · · · yn fresh n ≥ 0 σ = ∀ε1 · · · εn α E + {ε1 → y 1 , . . . , εn → yn } e1 ⇒ e1 τi = εi × εi → bool i = 1..n e1 = λy1 : τ1 . · · · .λyn : τn .e1 E e2 ⇒ e2 σ = ∀ε1 · · · εn α .τ1 → · · · → τn → τ (36) E let x : σ = e1 in e2 ⇒ let x : σ = e1 in e2

E true ⇒ true

(37)

E false ⇒ false

(38)

E e ⇒ e E e1 ⇒ e1 E e2 ⇒ e2 (39) E if e then e1 else e2 ⇒ if e then e1 else e2 In the translation rule for the let-construct, we generate abstractions for equality functions for each bound equality type variable in the type scheme for the letbound variable. Accordingly, in the rule for variable occurrences, appropriate equality functions are applied according to type instances for abstracted equality type variables. In rule 35, we generate a function for checking equality of values of type τ .

144

7

Martin Elsman

Type Correctness

In this section, we demonstrate that the translation preserves types and that all typable expressions may be translated. 7.1

Type Preservation

We ﬁrst give a few deﬁnitions for relating type environments and translation environments. Definition 1. (Extension) A type scheme σ = ∀ε1 · · · εn α .τ extends another type scheme σ = ∀ε1 · · · εm α .τ , written σ σ , if n = m and εi = εi , i = 1..n and α =α and τ = (ε1 × ε1 → bool ) → · · · → (εn × εn → bool ) → τ . A type environment TE extends another type environment TE , written TE TE , if Dom(TE ) ⊇ Dom(TE ) and TE (x) TE (x) for all x ∈ Dom(TE ). Definition 2. (Environment Matching) A translation environment E matches a type environment TE , written E TE , if TE (E(ε)) = ε × ε → bool for all ε ∈ Dom(E). The following proposition states that the equality function generated for a speciﬁc type that admits equality has the expected type. Proposition 1. If E eq τ ⇒ e and E TE and ∆ τ then ∆, TE e : τ × τ → bool . Proof. By induction over the structure of τ .

We can now state a proposition saying that the translation preserves types. Proposition 2. (Type Preservation) If ∆, TE e : τ and E e ⇒ e and E TE TE then ∆, TE e : τ . Proof. By induction over the structure of e. We show the three interesting cases. Case e = xil From (5), we have TE (x) = σ and σ = ∀ε1 · · · εn α .τ and σ ≥ τ via S and ∆ il and ∆, TE xil : τ , where il = il (S). From (34), we have that il = ([τ1 , . . . , τn ], [. . .]) and E eq τi ⇒ ei , i = 1..n and E xil ⇒ (· · · (xil e1 ) · · · en ). Because ∆ il , we have ∆ τi , i = 1..n and because E TE follows from assumptions, we have by Proposition 1 that ∆, TE ei : τi × τi → bool , i = 1..n. Because TE TE follows from assumptions, we have TE (x) = σ , where .(ε1 × ε1 → bool ) → · · · → (εn × εn → bool ).τ , and because σ = ∀ε1 · · · εn α σ ≥ τ via S and S(εi ) = τi , i = 1..n follows from il (S) = ([τ1 , . . . , τn ], [. . .]), we have σ ≥ τ via S, where τ = (τ1 × τ1 → bool ) → · · · → (τn × τn → bool ) → τ . Because ∆ il , we can now apply (5) to get ∆, TE xil : τ .

Polymorphic Equality – No Tags Required

145

Now, because ∆, TE ei : τi × τi → bool , i = 1..n, we can apply (2) n times to get ∆, TE (· · · (xil e1 ) · · · en ) : τ , as required. Case e = let x : σ = e1 in e2 From (6), we have σ = ∀ε1 · · · εn α .τ and ftv(ε1 · · · εn α ) ∩ ftv(∆, TE ) = ∅ and ∆ ∪ {ε1 , . . . , εn }, TE e1 : τ and ∆, TE + {x → σ} e2 : τ and ∆, TE e : τ . Further, from (36), we have y1 · · · yn fresh and E + {ε1 → y 1 , . . . , εn → yn } e1 ⇒ e1 and τi = εi × εi → bool , i = 1..n and e1 = λy1 : τ1 . · · · .λyn : τn .e1 and E e2 ⇒ e2 and σ = ∀ε1 · · · εn α .τ1 → · · · → τn → τ and E e ⇒ let x : σ = e1 in e2 . It now follows from assumptions and from the deﬁnitions of extension → y 1 , . . . , εn → yn } TE TE , where and matching that E + {ε1 → τ1 , . . . , yn → τn }, because Dom(TE ) ∩ {y1 , . . . yn } = ∅ and TE = TE + {y1 Dom(E) ∩ {ε1 , . . . , εn } = ∅ can be assumed by appropriate renaming of bound type variables of σ. We can now apply induction to get ∆ ∪ {ε1 , . . . , εn }, TE e1 : τ . By applying (1) n times, we get ∆ ∪ {ε1 , . . . , εn }, TE e1 : τ1 → · · · → τn → τ . To apply induction the second time, we observe that E TE + {x → σ } TE + {x → σ} by assumptions and deﬁnitions of matching and extension and because Dom(TE ) ∩ {x} = ∅ can be assumed by appropriate renaming of x in → σ } e2 : τ . e. By induction, we have ∆, TE + {x Because we can assume ftv(ε1 · · · εn α ) ∩ ftv(TE ) = ∅ by appropriate renaming of bound equality type variables and type variables in σ, we can apply (6) to get ∆, TE let x : σ = e1 in e2 : τ , as required. Case e = eqτ From (10), we have ∆ τ and τ admits equality and ∆, TE eqτ : τ × τ → bool . From (35), we have E eq τ ⇒ e and E eqτ ⇒ e . By assumptions, we have E TE and because ∆ τ , we can apply Propo sition 1 to get ∆, TE e : τ × τ → bool , as required. 7.2

Typable Expressions are Translatable

We now demonstrate that all typable expressions may indeed be translated by the translation rules. The following proposition states that for a type that admits equality it is possible to construct a function that checks for equality on pairs of values of this type. Proposition 3. If τ admits equality and fetv(τ ) ⊆ Dom(E) then there exists an expression e such that E eq τ ⇒ e. Proof. By induction over the structure of τ .

The following proposition states that all typable expressions may be translated. Proposition 4. (Typable Expressions are Translatable) If ∆, TE e : τ and ∆ = Dom(E) and Dom(TE ) ∩ Ran(E) = ∅ then there exists e such that E e ⇒ e .

146

Martin Elsman

Proof. By induction over the structure of e. We show the three interesting cases. Case e = xil From (5), we have TE (x) = σ and σ ≥ τ via S and ∆ il (S) and ∆, TE xil (S) : τ . Let il (S) be written as ([τ1 , . . . , τn ], [. . .]). Because ∆ il (S), we have ∆ τi , i = 1..n, hence, fetv(τi ) ⊆ Dom(E), i = 1..n. Further, from the deﬁnition of substitution, we have τi admits equality, i = 1..n. We can now apply Proposition 3 to get, there exists an expression ei such that E eq τi ⇒ ei , i = 1..n. By applying (34), we have E xil (S) ⇒ (· · · (xil (S) e1 ) · · · en ), as required. Case e = let x : σ = e1 in e2 From (6), we have σ = ∀εα .τ and ftv(εα ) ∩ ftv(∆, TE ) = ∅ and ∆ ∪ fetv(ε), TE e1 : τ and ∆, TE + {x → σ} e2 : τ and ∆, TE e : τ . Write ε as ε1 · · · εn and let y1 · · · yn be fresh. Further, let E = E + {ε1 → y 1 , . . . , εn → yn }. By assumptions, we have ∆ ∪ fetv(ε) = Dom(E ) and Dom(TE ) ∩ Ran(E ) = ∅. We can now apply induction to get, there exists an expression e1 such that E e1 ⇒ e1 . Also, let e1 = λy1 : τ1 . · · · .λyn : τn .e1 , where τi = εi × εi → bool , i = 1..n. By assumptions and by appropriate renaming of x in e, we have Dom(TE + {x → σ}) ∩ Ran(E) = ∅, hence, we can apply induction to get, there exists e2 such that E e2 ⇒ e2 . Letting σ = ∀εα .τ1 → · · · → τn → τ , we can apply (36) to get E e ⇒ let x : σ = e1 in e2 , as required. Case e = eqτ From (10), we have ∆ τ and τ admits equality and ∆, TE eqτ : τ × τ → bool . Because ∆ = Dom(E) follows from assumptions and ∆ τ , we have fetv(τ ) ⊆ Dom(E), hence, from Proposition 3, we have, there exists an expression e such that E eq τ ⇒ e . From (35), we now have E e ⇒ e , as required.

8

Semantic Soundness

In this section, we demonstrate semantic soundness of the translation inspired by other proofs of semantic soundness of type systems [9,22]. Because equality functions are represented diﬀerently in the original program and the translated program, the operational semantics may assign diﬀerent values to them. For this reason, we deﬁne a notion of semantic equivalence between values corresponding to the original program and values corresponding to the translated program. We write it Γ |= v : τ ≈ v . The type is needed to correctly interpret the values and to ensure well-foundedness of the deﬁnition. The environment Γ is formally a pair (Γ ε , Γ α ) providing interpretations of equality type variables and ordinary type variables in τ . Interpretations are non-empty sets V of pairs (v1 , v2 ) of values. We often abbreviate projections from Γ and injections in Γ . For instance, when Γ = (Γ ε , Γ α ), we write Γ (ε) to mean Γ ε (ε) → V}), for any ε, α, and V. and Γ + {α → V} to mean (Γ ε , Γ α + {α – Γ |= true : bool ≈ true

Polymorphic Equality – No Tags Required

147

Γ |= false : bool ≈ false Γ |= (v1 , v2 ) : τ1 × τ2 ≈ (v1 , v2 ) iﬀ Γ |= v1 : τ1 ≈ v1 and Γ |= v2 : τ2 ≈ v2 Γ |= eq : bool × bool → bool ≈ eq Γ |= eq : τ × τ → bool ≈ clos(λx.e, E) iﬀ for all values v1 , v2 , v1 such that Γ |= v1 : τ × τ ≈ v1 and eq v1 ⇓ v2 , we have E + {x → v1 } e ⇓ v2 – Γ |= clos(λx.e, E) : τ1 → τ2 ≈ clos(λx.e , E ) iﬀ for all values v1 , v2 , v1 such that Γ |= v1 : τ1 ≈ v1 and E + {x → v1 } e ⇓ v2 , there exists a value → v1 } e ⇓ v2 and Γ |= v2 : τ2 ≈ v2 v2 such that E + {x – Γ |= v : α ≈ v iﬀ (v, v ) ∈ Γ (α) – Γ |= v : ε ≈ v iﬀ (v, v ) ∈ Γ (ε) – – – –

The semantic equivalence relation extends to type schemes and environments: α , we have – Γ |= v : ∀α1 · · · αn .τ ≈ v iﬀ for all interpretations V1α · · · Vm α α → V1 , · · · , αn → Vn } |= v : τ ≈ v Γ + {α1 – Γ |= v : ∀ε1 · · · εn α1 · · · αm .τ ≈ clos(λy1 . · · · .λyn .e, E) iﬀ for all interpreα , values v1 · · · vn and semantic environments Γ , tations V1ε · · · Vnε V1α · · · Vm such that Γ |= eq : εi × εi → bool ≈ vi , i = 1..n and Γ = Γ + {ε1 → α → Vnε , α1 → V1α , · · · , αm → Vm }, we have there exists a value v V1ε , · · · , εn such that Γ |= v : τ ≈ v and E + {y1 → v1 , · · · , yn → vn } e ⇓ v – Γ |= E : TE ≈E E iﬀ Dom(E) = Dom(TE ) and Dom(E) ⊆ Dom(E ) and for all x ∈ Dom(E) we have Γ |= E(x) : TE (x) ≈ E (x). Further, for all ε ∈ Dom(E) we have Γ |= eq : ε × ε → bool ≈ E (E(ε))

The following proposition states that a generated equality function for a given type has the expected semantics. We leave elimination of type information from typed expressions implicit. Proposition 5. If E eq τ ⇒ e and for all ε ∈ Dom(E) we have Γ |= eq : ε × ε → bool ≈ E(E(ε)) then there exists a value v such that E e ⇓ v and Γ |= eq : τ × τ → bool ≈ v. Proof. By induction over the structure of τ .

The semantic equivalence relation is closed with respect to substitution. → τ1 , · · · , εn → τn }, {α1 → Proposition 6. Let S be a substitution ({ε1 τ1 , · · · , αm → τm }). Deﬁne Viε = {(v, v ) | Γ |= v : τi ≈ v }, i = 1..n and Viα = {(v, v ) | Γ |= v : τi ≈ v }, i = 1..m. α → V1ε , · · · εn → Vnε , α1 → V1α , · · · , αm → Vm } |= v : τ ≈ v iﬀ Then Γ + {ε1 Γ |= v : S(τ ) ≈ v . Proof. By induction over the structure of τ .

We can now state a semantic soundness proposition for the translation. Proposition 7. (Semantic Soundness) If ∆, TE e : τ and E e ⇒ e and Γ |= E : TE ≈E E and E e ⇓ v then there exists a value v such that E e ⇓ v and Γ |= v : τ ≈ v .

148

Martin Elsman

Proof. By induction over the structure of e. We show the three interesting cases. Case e = xil , il = ([τ1 , · · · , τn ], [τ1 , · · · , τm ]), n ≥ 1 From assumptions, (11), (5), the deﬁnition of semantic equivalence, and the deﬁnition of instantiation, we have Γ |= v : σ ≈ v and v = E (x) and σ = ∀ε1 · · · εn α1 · · · αm .τ and TE (x) = → τ1 , · · · , εn → τn }, {α1 → τ1 , · · · , αm → τm }). Because n ≥ σ and S = ({ε1 1, we have v = clos(λy1 . · · · .λyn .e , E ), for some lambda variables y1 · · · yn , expression e , and dynamic environment E . From assumptions and (34), we have Γ |= E : TE ≈E E and E eq τi ⇒ ei , i = 1..n, hence, we can apply Proposition 5 n times to get, there exist values vi , i = 1..n such that Γ |= eq : τi × τi → bool ≈ vi and E ei ⇓ vi , i = 1..n. Letting Viε = {(v, v )|Γ |= v : τi ≈ v }, i = 1..n and Viα = {(v, v )|Γ |= v : τi ≈ v }, i = 1..m and Γ = Γ + {ε1 → V1ε , · · · , εn → Vnε , α1 → V1α , · · · , αm → α Vm }, we can apply Proposition 6 to get Γ |= eq : εi × εi → bool ≈ vi , i = 1..n. From the deﬁnition of semantic equivalence, we now have, there exists a value v such that Γ |= v : τ ≈ v and E + {y1 → v1 , · · · , yn → vn } e ⇓ v . Now, because v = E (x) and E ei ⇓ vi , i = 1..n, we can derive E (· · · (x e1 ) · · · en ) ⇓ v from (13), (11), and (12). By applying Proposition 6 again, we get Γ |= v : τ ≈ v , as required.

Case e = eqτ From assumptions, (17), (35), and the deﬁnition of semantic equivalence, we have from Proposition 5 that there exists a value v such that E e ⇓ v and Γ |= eq : τ × τ → bool ≈ v , as required. Case e = let x : σ = e1 in e2 , σ = ∀ε1 · · · εn α .τ , n ≥ 1 Write α in the form α be interpretations, let v1eq · · · vneq be values, and α1 · · · αm . Let V1ε · · · Vnε V1α · · · Vm let Γ be a semantic environment such that Γ = Γ + {ε1 → V1ε , · · · , εn → eq ε α α Vn , α1 → V1 , · · · , αm → Vm } and Γ |= eq : εi × εi → bool ≈ vi , i = 1..n. From assumptions and from (36), we have y1 · · · yn are chosen fresh and E e1 ⇒ e1 and e1 = λy1 : τ1 . · · · .λyn : τn .e1 , where τi = εi ×εi → bool , i = 1..n and E = E + {ε1 → y1 , · · · , εn → yn }. From the deﬁnition of semantic equivalence, we can now establish Γ |= eq : ε × ε → bool ≈ E (E (ε)), for all ε ∈ Dom(E ), where E = E + {y1 → v1eq , · · · , yn → vneq }, and hence Γ |= E : TE ≈E E . From assumptions, (6), and (22), we have ∆ ∪ fetv(ε1 · · · εn ), TE e1 : τ and E e1 ⇓ v1 and because we have E e1 ⇒ e1 , we can apply induction to get, there exists a value v1 such that E e1 ⇓ v1 and Γ |= v1 : τ ≈ v1 . Letting v1 = clos(λy1 . · · · .λyn .e1 , E), we have from the deﬁnition of semantic equivalence that Γ |= v1 : σ ≈ v1 and Γ |= E + {x → v1 } : TE + {x → σ} ≈E E + {x → v1 }. From assumptions and from (22), (6), and (36), we have → σ} e2 : τ and E e2 ⇒ e2 , hence, E + {x → v1 } e2 ⇓ v2 and ∆, TE + {x we can apply induction a second time to get, there exists a value v2 such that → v1 } e2 ⇓ v2 and Γ |= v2 : τ ≈ v2 . E + {x We can now apply (12) to get E e1 ⇓ v1 , hence, we can apply (22) to get E e ⇓ v2 , as required. We now return to the value restriction issue. The translation rule for the let-construct does not preserve semantics unless (1) e1 is known to terminate

Polymorphic Equality – No Tags Required

149

and not to have side eﬀects or (2) no equality type variables are generalised. In the language we consider, (1) is always satisﬁed. For Standard ML, the value restriction always enforces either (1) or (2). However, the restriction is enforced by limiting generalisation to so called non-expansive expressions, which include function applications. Adding such a requirement to the typing rule for the letconstruct makes too few programs typable; to demonstrate type correctness for the translation, applications of functions to generated equality functions must also be considered non-expansive.

9

Extension to Full ML

It is straightforward to extend equality elimination to allow imperative features and to allow a letrec-construct for declaration of recursive functions. We now demonstrate how the approach is extended to deal with parametric datatypes and modules. 9.1

Datatype Declarations

In Standard ML, lists may be implemented by a datatype declaration datatype α list = :: of α × α list | nil Because lists are declared to be parametric in the type of the elements, it is possible to write polymorphic functions to manipulate the elements of any list. In general datatype declarations may be parametric in any number of type variables and they may even be declared mutually recursive with other datatype declarations. The datatype declaration for lists elaborates to the type environment {list → (t, {:: → ∀α.α × α t → α t, nil → ∀α.α t})} where t is a fresh type name [11]. Every type name t possess a boolean attribute that denotes whether t admits equality. In the example, t will indeed be inferred to admit equality. This property of the type name t allows values of type τ t to be checked for equality if τ admits equality. When a datatype declaration elaborates to a type environment, an equality function is generated for every fresh type name t in the type environment such that t admits equality. For a parametric datatype declaration, such as the list datatype declaration, the generated equality function is parametric in equality functions for parameters of the datatype. The Kit does not allow all valid ML programs to be compiled using equality elimination. Consider the datatype declaration datatype α t = A of (α × α) t | B of α Datatypes of the above form are called non-uniform datatypes [15, page 86]. It is possible to declare non-uniform datatypes in ML, but they are of limited

150

Martin Elsman

use, because ML does not support polymorphic recursive functions. In particular, it is not possible to declare a function in ML that checks values of nonuniform datatypes for structural equality. However, the problem is not inherent to equality elimination. Adding support for polymorphic recursion in the intermediate language would solve the problem. Other compilation techniques also have troubles dealing with non-uniform datatypes. The TIL compiler developed at Carnegie Mellon University does not support non-uniform datatypes due to problems with compiling constructors of such datatypes in the framework of intensional polymorphism [12, page 166]. 9.2

Modules

The translation extends to Standard ML Modules [11]. However, to compile functors separately, structures must contain equality functions for each type name that admits equality and that occurs free in the structure. Moreover, when constraining a structure to a signature, it is necessary to enforce the implementation of a function to follow its type by generating appropriate stub code. The body of a functor may then uniformly extract equality functions from the formal argument structure.

10

Implementation

The Kit compiles the Standard ML Core language by ﬁrst elaborating and translating programs into an intermediate typed lambda language. At this point, polymorphic equality is eliminated. Then, a simple optimiser performs various optimisations inspired by [1] and small recursive functions are specialised as suggested in [17]. The remaining phases of the compiler are based on region inference [24]. Each value generated by the program resides in a region and region inference is the task of determining when to allocate and deallocate regions. Various analyses determine how to represent diﬀerent regions at runtime [3]. Some regions can be determined to only ever contain word-sized unboxed values, such as integers and booleans. Such regions need never be allocated. Other regions can be determined to only ever hold one value at runtime. Such regions may be implemented on the stack. Other regions are implemented using a stack of linked pages. The backend of the Kit implements a simple graph coloring technique for register allocation and emits code for the HP PA-RISC architecture [5]. 10.1

Datatype Representation

The Kit supports diﬀerent schemes for representing datatypes at runtime. The simplest scheme implements all constructed values (except integers and booleans) as boxed objects at runtime. Using this scheme, the list [1, 2], for instance, is represented as shown in Fig. 1.

Polymorphic Equality – No Tags Required

::

::

nil

Æ

1

151

Æ

2

Fig. 1. Boxed representation of the list [1, 2] with untagged integers. The Standard ML of New Jersey compiler version 110 (SML/NJ) implements lists as shown in Fig. 2, using the observation that pointers are four-aligned on most modern architectures [1]. In this way, the two least signiﬁcant bits of pointers to constructed values may be used to represent the constructor. However, because SML/NJ implements polymorphic equality and garbage collection by following pointers, only one bit remains to distinguish constructed values. tag

1∗

::

tag

2∗

nil

Fig. 2. Unboxed representation of the list [1, 2] with tagged tuples and tagged integers. Utilising the two least signiﬁcant bits of pointers to constructed values, we say that a type name associated with a datatype declaration is unboxed if the datatype binding declares at-most three unary constructors (and any number of nullary constructors) and for all argument types τ for a unary constructor, τ is not a type variable and τ is not unboxed (for recursion, we initially assume that the declared type names of the declaration are unboxed.) A type τ is unboxed if it is on the form (τ1 , . . . , τn ) t and t is unboxed. The Kit treats all values of unboxed types as word-sized unboxed objects. Using this scheme, lists are represented uniformly at runtime as shown in Fig. 3. Eﬃcient unboxed representations of many tree structures are also obtained using this scheme. In the context of separate compilation of functors, as implemented in Standard ML of New Jersey, version 0.93, problems arise when a unique representation of datatypes is not used [2]. If instead functors are specialised for each application, no restrictions are enforced on datatype representations and no representation overhead is introduced by programming with Modules. Current research addresses this idea.

11

Experimental Results

In this section, we present some experimental results obtained with the Kit and the Standard ML of New Jersey compiler version 110 (SML/NJ). The purpose of the experiments are (1) to assess the feasibility of eliminating polymorphic

152

Martin Elsman

1

::

2

nil

Fig. 3. Unboxed representation of the list [1, 2] with untagged tuples and untagged integers.

equality, (2) to assess the importance of eﬃcient datatype representations, and (3) to compare the code generated by the Kit with that generated by SML/NJ. All tests are run on a HP PA-RISC 9000s700 computer. For SML/NJ, executables are generated using the exportFn built-in function. We use KitT to mean the Kit with a tagging approach to implement polymorphic equality. Further, KitE is the Kit with equality elimination enabled. In KitE, tagging of values is disabled as no operations need tags at runtime. Finally, KitEE is KitE with eﬃcient representation of datatypes enabled. All versions of the Kit generate eﬃcient equality checks for values that are known to be of base type (e.g., int or real). Measurements are shown for eight benchmark programs. Four of these are non-trivial programs based on the SML/NJ distribution benchmarks (life, mandelbrot, knuth-bendix and simple). The program fib35 is the simple Fibonacci program and mergesort is a program for sorting 200,000 pseudo-random integers. The programs life and knuth-bendix use polymorphic equality extensively. The program lifem is a monomorphic version of life for which polymorphic functions are made monomorphic by insertion of type constraints. The program sieve computes all prime numbers in the range from 1 to 2000, using the Sieve of Eratosthenes. Running times for all benchmarks are shown in Fig. 4. Equality elimination, and thus, elimination of tags, appears to have a positive eﬀect on the running time for most programs. In particular, the life benchmark runs 48 percent faster under KitE than under KitT. However, programs do exist for which equality elimination has a negative eﬀect on the running time of the program. There are potentially two reasons for a slowdown. First, extra function parameters to equality polymorphic functions may lead to less eﬃcient programs. Second, functions generated by KitE and KitEE for checking two structural values for equality do not check if the values are located on the same address. This check is performed by the polymorphic equality primitive of KitT. In principle, such a check could also be performed by equality functions generated by KitE and KitEE. The knuth-bendix benchmark runs slightly slower under KitE than under KitT. Not surprisingly, eﬃcient representation of datatypes improves the running time of most programs – with up to 40 percent for the sieve benchmark. The Kit does not implement the minimum typing derivation technique for decreasing the degree of polymorphism [4]. Decreasing the degree of polymorphism has been reported to have a great eﬀect on performance; it makes it possible to transform slow polymorphic equality tests into fast monomorphic ones [19,18]. Due to the decrease in polymorphism, the lifem benchmark is 12 percent faster than the life benchmark (under KitEE).

Polymorphic Equality – No Tags Required Running time fib35 sieve life lifem mergesort mandelbrot knuth-bendix simple

KitT 10.9 9.01 35.4 35.2 12.9 35.4 26.4 47.1

153

KitE KitEE SML/NJ 10.2 10.1 18.5 6.18 3.71 9.24 18.5 18.1 5.28 16.2 16.0 5.25 11.9 9.25 15.9 32.3 31.9 7.17 26.7 23.3 17.7 40.7 40.6 15.5

Fig. 4. Running times in seconds for code generated by three versions of the Kit and SML/NJ, measured using the UNIX time program. Space usage for the diﬀerent benchmarks is shown in Fig. 5. No benchmark program uses more space due to elimination of equality. For programs allocating a large amount of memory, equality elimination, and thus, elimination of tags, reduces memory signiﬁcantly – with up to 31 percent for the simple program. Eﬃcient datatype representation reduces space usage further up to 33 percent for the mergesort program. Space usage fib35 sieve life lifem mergesort mandelbrot knuth-bendix simple

KitT KitE KitEE SML/NJ 108 108 108 1,380 1,248 1,052 736 6,180 428 376 272 1,408 428 376 272 1,420 16,000 13,000 8,728 18,000 304 296 296 712 4,280 3,620 2,568 2,724 1,388 960 748 2,396

Fig. 5. Space used for code generated by the three versions of the Kit and SML/NJ. All numbers are in kilobytes and indicate maximum resident memory used, measured using the UNIX top program.

Sizes of executables for all benchmarks are shown in Fig. 6. Equality elimination does not seem to have a dramatic eﬀect on the sizes of the executables. Eﬃcient datatype representation reduces sizes of executables with up to 22 percent for the life benchmark. The Kit and SML/NJ are two very diﬀerent compilers. There can be dramatic diﬀerences between using region inference and reference tracing garbage collection, thus, the numbers presented here should be read with caution. The Kit currently only allows an argument to a function to be passed in one register. Moreover, the Kit does not allocate ﬂoating point numbers in registers. Instead, ﬂoating point numbers are always boxed. The benchmark programs mandelbrot and simple use ﬂoating point operations extensively. No doubt, eﬃcient calling

154

Martin Elsman Program size KitT KitE KitEE SML/NJ fib35 0 0 0 0 sieve 16 20 16 29 life 92 92 72 17 lifem 92 92 72 17 mergesort 20 24 20 40 mandelbrot 8 12 12 37 knuth-bendix 160 168 140 71 simple 356 352 328 199

Fig. 6. Sizes of executables (with the size of the empty program subtracted) for code generated by three versions of the Kit and SML/NJ. All numbers are in kilobytes. conventions and register allocation of ﬂoating point numbers will improve the quality of the code generated by the Kit.

12

Conclusion

The translation suggested in this paper makes it possible to eliminate polymorphic equality completely in the front-end of a compiler. Experimental results show that equality elimination can lead to important space and time savings even for programs that use polymorphic equality. Although tags may be needed at runtime to implement reference tracing garbage collection, it is attractive to eliminate polymorphic equality at an early stage during compilation. Various optimisations, such as boxing analysis [9,7], must otherwise treat polymorphic equality distinct from other primitive operations. Checking two arbitrary values for equality may cause both values to be traversed to any depth. This is in contrast to how other polymorphic functions behave. Further, no special demands are placed on the implementor of the runtime system and the backend of the compiler. For instance, there is no need to ﬂush all values represented in registers into the heap prior to testing two values for equality. Acknowledgements. I would like to thank Lars Birkedal, Niels Hallenberg, Fritz Henglein, Tommy Højfeld Olesen, Peter Sestoft, and Mads Tofte for valuable comments and suggestions.

References 1. Andrew Appel. Compiling With Continuations. Cambridge University Press, 1992. 150, 151 2. Andrew Appel. A critique of Standard ML. In Journal of Functional Programming, pages 3(4):391–429, October 1993. 151 3. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In 23st ACM Symposium on Principles of Programming Languages, January 1996. 136, 150

Polymorphic Equality – No Tags Required

155

4. Nikolaj Bjørner. Minimal typing derivations. In ACM Workshop on Standard ML and its Applications, June 1994. 152 5. Martin Elsman and Niels Hallenberg. An optimizing backend for the ML Kit using a stack of regions. Student Project, July 1995. 136, 150 6. Robert Harper and Chris Stone. An interpretation of Standard ML in type theory. Technical report, Carnegie Mellon University, June 1997. CMU-CS-97-147. 138 7. Fritz Henglein and Jesper Jørgensen. Formally optimal boxing. In 21st ACM Symposium on Principles of Programming Languages, pages 213–226, January 1994. 154 8. Mark Jones. Dictionary-free overloading by partial evaluation. In ACM Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Orlando, Florida, June 1994. 138 9. Xavier Leroy. Unboxed objects and polymorphic typing. In 19th ACM Symposium on Principles of Programming Languages, pages 177–188, 1992. 146, 154 10. Xavier Leroy. The Objective Caml system. Software and documentation available on the Web, 1996. 140 11. Robin Milner, Mads Tofte, Robert Harper, and David MacQueen. The Definition of Standard ML (Revised). MIT Press, 1997. 136, 136, 140, 149, 150 12. Greg Morrisett. Compiling with Types. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, December 1995. 150 13. Martin Odersky, Philip Wadler, and Martin Wehr. A second look at overloading. In 7’th International Conference on Functional Programming and Computer Architecture, June 1995. 136, 138 14. Atsushi Ohori. A Polymorphic Record Calculus and its Compilation. ACM Transactions on Programming Languages and Systems, 17(6), November 1995. 138 15. Chris Okasaki. Purely Functional Data Structures. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, September 1996. 149 16. John Peterson and Mark Jones. Implementing type classes. In ACM Symposium on Programming Language Design and Implementation, June 1993. 138 17. Manuel Serrano and Pierre Weis. Bigloo: a portable and optimizing compiler for strict functional languages. In Second International Symposium on Static Analysis, pages 366–381, September 1995. 150 18. Zhong Shao. Typed common intermediate format. In 1997 USENIX Conference on Domain-Specific Languages, Santa Barbara, CA, Oct 1997. 152 19. Zhong Shao and Andrew Appel. A type-based compiler for Standard ML. Technical report, Yale University and Princeton University, November 1994. 152 20. David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee. TIL: A type-directed optimizing compiler for ML. In ACM Symposium on Programming Language Design and Implementation, 1996. 138 21. David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee. The TIL/ML compiler: Performance and safety through types. In Workshop on Compiler Support for Systems Software, 1996. 138 22. Mads Tofte. Type inference for polymorphic references. Information and Computation, 89(1), November 1990. 146 23. Mads Tofte, Lars Birkedal, Martin Elsman, Niels Hallenberg, Tommy Højfeld Olesen, Peter Sestoft, and Peter Bertelsen. Programming with regions in the ML Kit. Technical report, Department of Computer Science, University of Copenhagen, April 1997. 136 24. Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Information and Computation, 132(2):109–176, 1997. 150 25. Philip Wadler and Stephen Blott. How to make ad-hoc polymorphism less ad hoc. In 16th ACM Symposium on Principles of Programming Languages, January 1989. 138

Optimal Type Lifting Bratin Saha and Zhong Shao Dept. of Computer Science Yale University New Haven, CT 06520-8285 {saha,shao}@cs.yale.edu

Abstract. Modern compilers for ML-like polymorphic languages have used explicit run-time type passing to support advanced optimizations such as intensional type analysis, representation analysis and tagless garbage collection. Unfortunately, maintaining type information at run time can incur a large overhead to the time and space usage of a program. In this paper, we present an optimal type-lifting algorithm that lifts all type applications in a program to the top level. Our algorithm eliminates all run-time type constructions within any core-language functions. In fact, it guarantees that the number of types built at run time is strictly a static constant. We present our algorithm as a type-preserving source-to-source transformation and show how to extend it to handle the entire SML’97 with higher-order modules.

1

Introduction

Modern compilers for ML-like polymorphic languages [16,17] usually use variants of the Girard-Reynolds polymorphic λ-calculus [5,26] as their intermediate language (IL). Implementation of these ILs often involves passing types explicitly as parameters [32,31,28] at runtime: each polymorphic type variable gets instantiated to the actual type through run-time type application. Maintaining type information in this manner helps to ensure the correctness of a compiler. More importantly, it also enables many interesting optimizations and applications. For example, both pretty-printing and debugging on polymorphic values require complete type information at runtime. Intensional type analysis [7,31,27], which is used by some compilers [31,28] to support eﬃcient data representation, also requires the propagation of type information into the target code. Runtime type information is also crucial to the implementation of tag-less garbage collection [32], pickling, and type dynamic [15].

This research was sponsored in part by the DARPA ITO under the title “Software Evolution using HOT Language Technology”, DARPA Order No. D888, issued under Contract No. F30602-96-2-0232, and in part by an NSF CAREER Award CCR9501624, and NSF Grant CCR-9633390. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the oﬃcial policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 156–177, 1998. c Springer-Verlag Berlin Heidelberg 1998

Optimal Type Lifting

157

However, the advantages of runtime type passing do not come for free. Depending on the sophistication of the type representation, run-time type passing can add a signiﬁcant overhead to the time and space usage of a program. For example, Tolmach [32] implemented a tag-free garbage collector via explicit type passing; he reported that the memory allocated for type information sometimes exceeded the memory saved by the tag-free approach. Clearly, it is desirable to optimize the run-time type passing in polymorphic code [18]. In fact, a better goal would be to guarantee that explicit type passing never blows up the execution cost of a program. Consider the sample code below – we took some liberties with the syntax by using an explicitly typed variant of the Core-ML. Here Λ denotes type abstraction, λ denotes value abstraction, x[α] denotes type application and x(e) denotes term application. pair = Λs.λx:s∗s. let f = Λt.λy:t. ... (x , y) in ... f[s∗s](x) ... ...... main = Λα.λa:α. let doit = λi:Int. let elem = Array.sub[α∗α](a,i) in ... pair[α](elem) ... loop = λn1 :Int.λn2 :Int.λg:Int→Unit. if n1 <= n2 (g(n1 ); loop(n1 +1,n2 ,g)) else () in loop(1,n,doit)

Here, f is a polymorphic function deﬁned inside function pair; it refers to the parameter x of pair so f cannot be easily lifted outside pair. Function main executes a loop: in each iteration, it selects an element elem of the array a and then performs some computation (i.e, pair) on it. Executing the function doit results in three type applications arising from the Array.sub function, pair, and f. In each iteration, sub and pair are applied to types α ∗ α and α respectively. A clever compiler may do a loop-invariant removal [1] to avoid the repeated type construction (e.g., α ∗ α) and application (e.g., pair[α]). But optimizing type applications such as f[s∗s] is less obvious; f is nested inside pair, and its repeated type applications are not apparent in the doit function. We may type-specialize f to get rid of the type application but in general this may lead to substantial code duplication. Every time doit is called, pair[α] gets executed and then every time pair is called, f[s∗s] will be executed. Since loop calls doit repeatedly and each such call generates type applications of pair and f, we are forced to incur the overhead of repeated type construction and application. If the type representation is complicated, this is clearly expensive.

158

Bratin Saha and Zhong Shao

In this paper, we present an algorithm that minimizes the cost of run-time type passing. More speciﬁcally, the optimization eliminates all type application inside any core-language function - it guarantees that the amount of type information constructed at runtime is a static constant. This guarantee is important because it allows us to use more sophisticated representations for run-time types without having to worry about the run-time cost of doing so. The basic idea is as follows. We lift all polymorphic function deﬁnitions and type applications in a program to the “top” level. By top level, we mean “outside any core-language function.” Intuitively, no type application is nested inside any function abstraction (λ); they are nested only inside type abstractions (Λ). All type applications are now performed once and for all at the beginning of execution of each compilation unit. In essence, the code after our type lifting would perform all of its type applications at “link” time.1 In fact, the number of type applications performed and the amount of type information constructed can be determined statically. This leads us to a natural question. Why do we restrict the transformation to type applications alone? Obviously the transformation could be carried out on value computations as well but what makes type computations more amenable to this transformation is the guarantee that all type applications can be lifted to the top level. Moreover, while the transformation is also intended to increase the runtime eﬃciency, a more important goal is to ensure that type passing in itself is not costly. This in turn will allow us to use a more sophisticated runtime type representation and make greater use of type information at runtime. We describe the algorithm in later sections and also prove that it is both typepreserving and semantically sound. We have implemented it in the FLINT/ML compiler [28] and tested it on a few benchmarks. We provide the implementation results at the end of this paper.

2

The Lifting Algorithm for Core-ML

This section presents our optimal type lifting algorithm. We use an explicitly typed variant of the Core-ML calculus [6] (Figure 1) as the source and target languages. The type lifting algorithm (Figure. 2) is expressed as a type-directed program transformation that lifts all type applications to the top-level. 2.1

The Language

We use an explicitly typed variant of the Core-ML calculus [6] as our source and target languages. The syntax is shown in Figure 1. The static and dynamic semantics are standard, and are given in the Appendix (Section 7). Here, terms e consist of identiﬁers (x), integer constants (i), function abstractions, function applications, and let expressions. We diﬀerentiate between 1

We are not referring to “link time” in the traditional sense. Rather, we are referring to the run time spent on module initialization and module linkage (e.g., functor application) in a ML-style module language.

Optimal Type Lifting (con’s) (types) (terms) (vterms)

µ σ e ev

::= ::= ::= ::=

159

t | Int | µ1 → µ2 µ | ∀ti . µ i | x | λx : µ.e | @x1 x2 | let x = e in e | let x = Λti . ev in e | x[µi ] i | x | λx : µ.e | let x = ev in ev | let x = Λti . ev in ev | x[µi ]

Fig. 1. An explicit Core-ML calculus

monomorphic and polymorphic let expressions in our language. We use ti (and µi ) to denote a sequence of type variables t1 , ..., tn (and types) so ∀ti . µ is equivalent to ∀t1 . . . ∀tn .µ. The vterms (ev ) denote values – terms that are free of side-eﬀects. There are several aspects of this calculus that are worth noting. First, we restrict polymorphic deﬁnitions to value expressions (ev ) only, so that moving type applications and polymorphic deﬁnitions is semantically sound [33]. Variables introduced by normal λ-abstraction are always monomorphic, and polymorphic functions are introduced only by the let construct. In our calculus, type applications of polymorphic functions are never curried and therefore in the algorithm in Figure 2, the exp rule assumes that the variable is monomorphic. The tapp rule also assumes that the type application is not curried and therefore the newly introduced variable v (bound to the lifted type application) is monomorphic and is not applied further to types. Finally, following SML [17,16], polymorphic functions are not recursive. 2 This restriction is crucial to proving that all type applications can be lifted to the top level. Throughout the paper we take a few liberties with the syntax: we allow ourselves inﬁx operators, multiple deﬁnitions in a single let expression to abbreviate a sequence of nested let expressions, and term applications that are at times not in A-Normal form [4]. We also use indentation to indicate the nesting. 2.2

Informal Description

Before we move on to the formal description of the algorithm, we will present the basic ideas informally. Deﬁne the depth of a term in a program as the number of λ(value) abstractions within which it is nested. Consider the terms outside all value abstractions to be at depth zero. Obviously, terms at depth zero occur outside all loops in the program. In a strict language like ML, all these terms are evaluated once and for all at the beginning of program execution. To avoid repeated type applications, the algorithm therefore tries to lift all of them to depth zero. But since we want to lift type applications, we must also lift the polymorphic functions to depth zero. The algorithm scans the input program and collects all the type applications and polymorphic functions occuring at depths greater than zero and adds them to a list H. (In the algorithm given in Figure 2, the depth is implicitly 2

Our current calculus does not support recursive functions but they can be easily added. As in SML, recursive functions are always monomorphic.

160

Bratin Saha and Zhong Shao

assumed to be greater than zero). When the algorithm returns to the top level of the program, it dumps the expressions contained in the list.

We will illustrate the algorithm on the sample code given in Section 1. In the example code, f[s∗s] is at depth 1 since it occurs inside the λx, Array.sub[α∗α] and pair[α] are at depth 2 since they occur inside the λa and λi. We want to lift all of these type applications to depth zero. Translating main ﬁrst, the resulting code becomes –

pair = Λs.λx:s∗s. let f = Λt.λy:t. ... (x , y) in ... f[s∗s](x) ... ...... main = Λα. let v1 = Array.sub[α∗α] v2 = pair[α] in λa:α. let doit = λi:Int. let elem = v1 (a,i) in ... v2 (elem) ... loop = λn1 :Int.λn2 :Int.λg:Int→Unit. if n1 <= n2 (g(n1 ); loop(n1 +1,n2 ,g)) else () in loop(1,n,doit)

We then lift the type application of f (inside pair). This requires us to lift f’s deﬁnition by abstracting over its free variables. In the resulting code, all type applications occur at depth zero. Therefore when main is called at the beginning of execution, v1 , v2 and v3 get evaluated. During execution, when the function loop runs through the array and repeatedly calls function doit, none of the type applications need to be performed – the type specialised functions v1 , v2 and v3 can be used instead.

Optimal Type Lifting

161

pair = Λs. let f = Λt.λx:s∗s.λy:t. ... (x , y) v3 = f[s∗s] in λx:s∗s. ... (v3 (x))(x) ... ...... main = Λα. let v1 = Array.sub[α∗α] v2 = pair[α] in λa:α. let doit = λi:Int. let elem = v1 (a,i) in ... v2 (elem) ... loop = λn1 :Int.λn2 :Int.λg:Int→Unit. if n1 <= n2 (g(n1 ); loop(n1 +1,n2 ,g)) else () in loop(1,n,doit)

2.3

Formalization

Figure 2 shows the type-directed lifting algorithm. The translation is deﬁned as a relation of the form Γ e : µ ⇒ e ; H; F , that carries the meaning that Γ e : µ is a derivable typing in the input program, the translation of the input term e is the term e , and F is the set of free variables of e . (The set F is restricted to the monorphically typed free variables of e ) The header H contains the polymorphic functions and type applications occuring in e at depths greater than zero. The ﬁnal result of lifting a closed term e of type µ is LET (H, e ) where the algorithm infers ∅ e : µ ⇒ e ; H; ∅. The function LET (H, e) expands a list of bindings H = [x1 , e1 , . . . , xn , en ] and a term e into the resulting term let x1 = e1 in . . . in let xn = en in e. The environment Γ maps a variable to its type and to a list of the free variables in its deﬁnition. In the algorithm, we use standard notation for lists and operations on lists; in addition, the functions List and Set convert between lists and sets of variables using a canonical ordering. The functions λ∗ and @∗ are deﬁned so that λ∗ L. e and @∗ vL reduce to λx1 : µ1 . . . . λxn : µn .e and @(. . . (@vx1 ) . . .)xn , respectively, where L = [x1 : µ1 , . . . , xn : µn ]. Rules (exp) and (app) are just the identity transformations. Rule (fn) deals with abstractions. We translate the body of the abstraction and return a header H containing all the type applications and type functions in the term e. The translation of monomorphic let expressions is similar. We translate each of the subexpressions replacing the old terms with the translated terms and return this as the result of the translation. The header H of the translation is the

162

(exp)

(app)

(fn)

(let)

(tfn)

Bratin Saha and Zhong Shao

Γ (x) = (µ, ) Γ x : µ ⇒ x; ∅; {x : µ}

Γ i : Int ⇒ i; ∅; ∅

Γ (x2 ) = µ1 , Γ (x1 ) = µ1 → µ2 , Γ @x1 x2 : µ2 ⇒ @x1 x2 ; ∅; {x1 : µ1 → µ2 , x2 : µ1 } Γ [x → µ, ] e : µ ⇒ e ; H; F Γ λx : µ.e : µ → µ ⇒ λx : µ.e ; H; F \{x : µ} Γ e1 : µ1 ⇒ e1 ; H1 ; F1 Γ [x → µ1 , ] e2 : µ2 ⇒ e2 ; H2 ; F2 Γ let x = e1 in e2 : µ2 ⇒ let x = e1 in e2 ; H1 ||H2 ; F1 ∪ (F2 \{x : µ1 }) Γ e1 : µ1 ⇒ e1 ; H1 ; F1 L = List(F1 ) Γ [x → ∀ti .µ1 , L] e2 : µ2 ⇒ e2 ; H2 ; F2 Γ let x = Λti .e1 in e2 : µ2 ⇒ e2 ; x, Λti . LET (H1 , λ∗ L.e1 ) :: H2 ; F2

(tapp)

Hr

Γ (x) = ∀ti .µ, L v a fresh variable Γ x[µi ] : [µi /ti ]µ ⇒ @∗ vL; [v, x[µi ]]; S et(L)

Hr

Fig. 2. The Lifting Translation

concatenation of the headers H1 and H2 from the translation of the subexpressions. The real work is done in the last two rules which deal with type expressions. In rule (tfn), we ﬁrst translate the body of the polymorphic function deﬁnition. H1 now contains all the type expressions that were in e1 and F1 is the free variables of e1 . We then translate the body of the let expression(e2). The result of the translation is only e2 ; the polymorphic function introduced by the let is added to the result header Hr so that it is lifted to the top level. The polymorphic function body (in Hr ) is closed by abstracting over its free variables F1 while the header H1 is dumped right after the type abstractions. Note that since Hr will be lifted to the top level, the expressions in H1 will also get lifted to the top level. The (tapp) rule replaces the type application by an application of the newly introduced variable (v) to the free variables(L) of the corresponding function deﬁnition. The type application is added to the header and lifted to the top level where it gets bound to v. Note that the free variables of the translated term do not include the newly introduced variable v. This is because when the header is written out at the top level, the translated expression remains in the scope of the dumped header.

Optimal Type Lifting

163

Proposition 1. Suppose Γ e : µ ⇒ e ; H; F . Then in the expression LET(H,e ), the term e does not contain any type application and H does not contain any type application nested inside a value(λ) abstraction. This propostion can be proved by a simple structural induction on the structure of the source term e. Theorem 1 (Full Lifting). Suppose Γ e : µ ⇒ e ; H; F . Then the expression LET(H,e ), does not have any type application nested inside a value abstraction. The theorem follows from Proposition 1. In the Appendix, we prove the type preservation and the semantic soundness theorems. 2.4

A Closer Look

There are two transformations taking place simultaneously. One is the lifting of type applications and the other is the lifting of polymorphic function deﬁnitions. At ﬁrst glance, the lifting of function deﬁnitions may seem similar to lambda lifting [10]. However the lifting in the two cases is diﬀerent. Lambda lifting converts a program with local function deﬁnitions into a program with global function deﬁnitions whereas the lifting shown here preserves the nesting structure of the program. The lifting of type applications is similar in spirit to the hoisting of loop invariant expressions outside a loop. It could be considered as a special case of a fully lazy transformation [9,24] with the maximal free subexpressions restricted to be type applications. However, the fully-lazy transformation as described in Peyton Jones [24] will not lift all type applications to the top level. Speciﬁcally, type applications of a polymorphic function that is deﬁned inside other functions will not be lifted to the top level. Minamide [18] uses a diﬀerent approach to solve this problem. He lifts the construction of type parameters from within a polymorphic function to the call sites of the function. This lifting is recursively propagated to the call sites at the top level. At runtime, type construction is replaced by projection from type parameters. His method eliminates the runtime construction of types and replaces it by projection from type records. The transformation also does not rely on the value restriction for polymorphic deﬁnitions. However, he requires a more sophisticated type system to type-check his transformation; he uses a type system based on the qualiﬁed type system of Jones [12] and the implementation calculus for the compilation of polymorphic records of Ohori [21]. Our algorithm on the other hand is a source-to-source transformation. Moreover, Minamide’s algorithm deals only with the Core-ML calculus whereas we have implemented our algorithm on the entire SML’97 language with higher-order modules. Jones [11] has also worked on a similar problem related to dictionary passing in Haskell and Gofer. Type classes in these languages are implemented by passing

164

Bratin Saha and Zhong Shao

dictionaries at runtime. Dictionaries are tuples of functions that implement the methods deﬁned in a type class. Consider the following Haskell [8] example f :: Eq a => a -> a -> Bool f x y = ([x] == [y]) && ([y] == [x])

The actual type of f is Eq[a] ⇒ a → a → Bool. Context reduction leads to the type speciﬁed in the example. Here [a] means a list of elements of type a. Eq a means that the type a must be an instance of the equality class. In a naive implementation, this function would be passed a dictionary for Eq a and the dictionary for Eq [a] would be constructed inside the function. Jones optimises this by constructing a dictionary for Eq [a] at the call site of f and passing it in as a parameter. This is repeated for all overloaded functions so that all dictionaries are constructed statically. But this approach does not work with separately compiled modules since f’s type in other modules does not specify the dictionaries that are constructed inside it. In Gofer [11], instance declarations are not used to simplify the context. Therefore the type of f in the above example would be Eq[a] ⇒ a → a → Bool. Jones’ optimisation of dictionary passing can now be performed in the presence of separately compiled modules. However, we now require a more complicated type system to typecheck the code. Assume two functions f and g have the same type (µ → µ ). Both f and g can be passed as a parameter to h in (h = λx : µ → µ .e). However, f and g could, in general, be using diﬀerent dictionaries (df and dg ). This implies that after the transformation, the two functions will have diﬀerent types – df ⇒ µ → µ and dg ⇒ µ → µ . Therefore, we can no longer use f and g interchangeably.

3

The Lifting Algorithm for FLINT

Till now, we have considered only the Core-ML calculus while discussing the algorithm. But what happens when we take into account the module language as well? To handle the Full-ML langauge, we compile the source code into the FLINT intermediate language. The details of the translation are given in [29]. FLINT is based upon a predicative variant of the Girard-Reynolds polymorphic λcalculus [5,26], with the term language written in A-normal form [4]. It contains the following four syntactic classes: kinds (κ), constructors (µ), types (σ) and terms (e), as shown in Figure 3. Here, kinds classify constructors, and types classify terms. Constructors of kind Ω name monotypes. The monotypes are generated from variables, from Int, and through the → constructor. The application and abstraction constructors correspond to the function kind κ1 → κ2 . Types in Core-FLINT include the monotypes, and are closed under function spaces and polymorphic quantiﬁcation. We use T (µ) to denote the type corresponding to the constructor µ (when µ is of kind Ω). The terms are an explicitly

Optimal Type Lifting

165

typed λ-calculus (but in A-normal form) with explicit constructor abstraction and application. Ω | κ1 → κ2 t | Int | µ1 → µ2 | λt :: κ.µ | µ1 [µ2 ] T (µ) | σ1 → σ2 | ∀t :: κ.σ i | x | let x = e1 in e2 | @x1 x2 | λc x : T (µ).e | λm x : σ.e | let x = Λti :: ki .ev in e2 | x[µi ] (values) ev ::= i | x | let x = ev in ev | λc x : T (µ).e | λm x : σ.e let x = Λti :: ki .ev in ev | x[µi ]

(kinds) (cons) (types) (terms)

κ µ σ e

::= ::= ::= ::=

Fig. 3. Syntax of the Core-FLINT calculus

In ML, structures are the basic module unit and functors abstract over structures. Polymorphic functions may now escape as part of structures and get initialized later at a functor application site. In the FLINT translation [29], functors are represented as a polymorphic deﬁnition combined with a polymorphic abstraction (fct = Λti :: ki .λm x : σ.e). The variable x in the functor deﬁnition is polymorphic since the parameterised structure may contain polymorphic components. In the functor body e, the polymorphic components of x are instantiated by type application. Functor applications are a combination of a type application and a term application. with the type application instantiating the type parameters (ti s). Though abstractions model both functors and functions, the translation allows us to distinguish between them. In the FLINT calculus, λc x : T (µ).e denotes functions, whereas λm x : σ.e denotes functors. The rest of the term calculus is standard. This calculus complicates the lifting since type applications arising from an abstracted variable (the variable x in fct above) can not be lifted to the top level. This also diﬀers from the Core-ML calculus in that type applications may now be curried to model escaping polymorphic functions. However, the module calculus obeys some nice properties. Functors in a program always occur outside any Core-ML functions. Type applications arising out of functor parameters (when the input structure contains a polymorphic component) can therefore be lifted outside all functions. Escaping polymorphic functions occur outside Core-ML functions. Therefore the corresponding curried type application is not nested inside Core-ML functions. Therefore a FLINT source program can be converted into a well-formed program satisfying the following constraints – – All functor abstractions (λm ) occur outside function abstractions (λc ). – No partial type application occurs inside a function abstraction. We now redeﬁne the depth of a term in a program as the number of function abstractions within which it is nested with depth 0 terms occuring outside all

166

Bratin Saha and Zhong Shao

function abstractions only. Note that depth 0 terms may not occur outside all abstractions since they may be nested inside functor abstractions. We then perform type lifting as in Figure 2 for terms at depth greater than zero and lift the polymorphic deﬁnitions and type applications to depth 0. For terms already at depth zero, the translation is just the identity function and the header returned is empty. We illustrate the algorithm on the example code in Figure 4. The syntax is not totally faithful to the FLINT syntax in Figure 3 but it makes the code easier to understand. In the code in Figure 4, F is a functor which takes the structure F = Λt0 .λm X:S. f = λc v. let id = Λt1 .λc x2 .x2 ..... v1 = ... id[Int](3) .... in v1 v2 = (#1(X))[t0 ] ....

Fig. 4. Example FLINT code

X as a parameter. The type S denotes a structure type. Assume the ﬁrst component of X (#1(X)) is a polymorphic function which gets instantiated in the functor body(v2 ). f is a locally deﬁned function in the functor body. According to the deﬁnition of depth above, f and v2 are at depth 0 even though they are nested inside the functor abstraction(λX). Moreover, the type application (#1(X))[t0 ] is also at depth 0 and will therefore not be lifted. It is only inside the function f that the depth increases which implies that the type application id[Int] occurs at d > 0. The algorithm will lift the type application to just outside the function abstraction (λv), it is not lifted outside the functor abstraction (λX). The resulting code is shown in Figure 5. Is the reformulation merely an artiﬁce to get around the problems posed by FLINT ? No, the main aim of the type lifting transformation is to perform all the type applications during “link” time—when the top level code is being executed—and eliminate runtime type construction inside functions. Functors are top level code and are applied at “link” time. Moreover they are nonrecursive. Therefore having type applications nested only inside functors results in the type applications being performed once and for all at the beginning of program execution. As a result, we still eliminate runtime type passing inside functions. To summarize, we note that depth 0 in Core-ML (according to the deﬁnition above) coincides with the top level of the program since Core-ML does not

Optimal Type Lifting

167

F = Λt0 .λm X:S. f = let id = Λt1 .λc x2 .x2 z1 = id[Int] .. (Other type expressions in f’s body).. in λc v. let ..... (type lifted body of f) v1 = ... z1 (3) .... in v1 v2 = (#1(X))[t0 ] ....

Fig. 5. FLINT code after type lifting have functors; therefore the Core-ML translation is merely a special case of the translation for FLINT.

4

Implementation

We have implemented the type-lifting algorithm in the FLINT/ML compiler version 1.0 and the experimental version of SML/NJ v109.32. All the tests were performed on a Pentium Pro 200 Linux workstation with 64M physical RAM. Figure 6 shows CPU times for executing the Standard ML benchmark suite with type lifting turned on and turned oﬀ. The third column (New Time) indicates the execution time with lifting turned on and the next column (Old Time) indicates the execution time with lifting turned oﬀ. The last column gives the ratio of the new time to the old time. Benchmark Description New Time Old Time Ratio Simple A ﬂuid-dynamics program 7.04 9.78 0.72 Vliw A VLIW instruction scheduler 4.22 4.31 0.98 lexgen lexical-analyzer generator 2.38 2.36 1.01 ML-Yacc The ML-yacc 1.05 1.11 0.95 Mandelbrot Mandelbrot curve construction 4.62 4.62 1.0 Kb-comp Knuth-Bendix Algorithm 2.98 3.11 0.96 Ray A ray-tracer 10.68 10.66 1.01 Life The Life Simulation 2.80 2.80 1.0 Boyer A simple theorem prover 0.49 0.52 0.96

Fig. 6. Type Lifting Results

The current FLINT/ML and SML/NJ compilers maintain a very minimal set of type information. Types are represented by integers since the compiler only

168

Bratin Saha and Zhong Shao

needs to distinguish primitive types (e.g., int, real) and special record types. As a result, runtime type construction and type application are not expensive. The test results therefore yield a moderate speedup for most of the benchmarks and a good speedup for one benchmark—an average of about 5% for the polymorphic benchmarks. Simple has a lot of polymorphic function calls occuring inside loops and therefore beneﬁts greatly from lifting. Boyer and mandelbrot are monomorphic benchmarks (involving large lists) and predictably do not beneﬁt from the optimization. Our algorithm makes the simultaneous uncurrying of both value and type applications diﬃcult. Therefore at runtime, a type application will result in the formation of a closure. However, these closures are created only once at linktime and do not represent a signiﬁcant penalty. We also need to consider the closure size of the lifted functions. The (tapp) rule in Figure 2 introduces new variables (the set L) which may increase the number of free variables of a function. Moreover after type applications are lifted, the type specialised functions become free variables of the function body. On the other hand, since all type applications are lifted, we no longer need to include the free type variables in the closure which decreases the closure size. We believe therefore that the increase in closure size, if any, does not incur a signiﬁcant penalty. This is borne out by the results on the benchmark suite – none of the benchmarks slows down signiﬁcantly. The creation of closures makes function application more expensive since it involves the extraction of the environment and the code. However, in most cases, the selection of the code and the environment will be a loop invariant and can therefore be optimised. The algorithm is implemented in a single pass by a bottom up traversal of the syntax tree. The (tfn) rule shown in Figure 2 simpliﬁes the implementation considerably by reducing the type information to be adjusted. In the given rule, all the expressions in H1 are dumped right in front of the type abstraction. Note however that we require to dump only those terms (in H1 ) which contain any of the ti s as free type variables. The advantage of dumping all the expressions is that the de Bruijn depth of the terms in H1 remains the same even after lifting. The algorithm needs to adjust the type information only while abstracting the free variables of a polymorphic deﬁnition. (The types of the abstracted variables have to be adjusted.) The implementation also optimises the number of variables abstracted while lifting a deﬁnition – it remembers the depth at which a variable is deﬁned so that variables that will still remain in scope after the lifting are not abstracted.

5

Related Work and Conclusions

Tolmach [32] has worked on a similar problem and proposed a method based on the lazy substitution on types. He used the method in the implementation of the tag-free garbage collector. Minamide [18] proposes a reﬁnement of Tolmach’s method to eliminate runtime construction of type parameters. The speedups

Optimal Type Lifting

169

obtained in our method are comparable to the ones reported in his paper. Mark P. Jones [11] has worked on the related problem of optimising dictionary passing in the implementation of type classes. In their study of the type theory of Standard ML, Harper and Mitchell [6] argued that an explicitly typed interpretation of ML polymorphism has better semantic properties and scales more easily to cover the full language. The idea of passing types to polymorphic functions is exploited by Morrison et al. [19] in the implementation of Napier. The work of Ohori on compiling record operations [21] is similarly based on a type passing interpretation of polymorphism. Jones [12] has proposed evidence passing—a general framework for passing data derived from types to “qualiﬁed” polymorphic operations. Harper and Morisett [7] proposed an alternative approach for compiling polymorphism where types are passed as arguments to polymorphic routines in order to determine the representation of an object. The boxing interpretation of polymorphism which applies the appropriate coercions based on the type of an object was studied by Leroy [14] and Shao [27]. Many modern compilers like the FLINT/ML compiler [28], TIL [31] and the Glasgow Haskell compiler [22] use an explicitly typed language as the intermediate language for the compilation. Lambda lifting and full laziness are part of the folklore of functional programming. Hughes [9] showed that by doing lambda lifting in a particular way, full laziness can be preserved. Johnsson [10] describes diﬀerent forms of lambda lifting and the pros and cons of each. Peyton Jones [25,23,24] also described a number of optimizations which are similar in spirit but have totally diﬀerent aims. Appel [2] describes let hoisting in the context of ML. In general, using correctness preserving transformations as a compiler optimization [1,2] is a well established technique and has received quite a bit of attention in the functional programming area. We have proposed a method for minimizing the cost of runtime type passing. Our algorithm lifts all type applications out of functions and therefore eliminates the runtime construction of types inside functions. The amount of type information constructed at run time is a static constant. We can guarantee that in Core-ML programs, all type applications will be lifted to the top level. We are now working on making the type representation in FLINT more comprehensive so that we can maintain complete type information at runtime.

6

Acknowledgements

We would like to thank Valery Trifonov, Chris League and Stefan Monnier for many useful discussions and comments about earlier drafts of this paper. We also thank the annonymous referees who suggested various ways of improving the presentation.

170

Bratin Saha and Zhong Shao

References 1. A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986. 157, 169 2. A. W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 169, 169 3. N. de Bruijn. A survey of the project AUTOMATH. In To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pages 579–606. Edited by J. P. Seldin and J. R. Hindley, Academic Press, 1980. 4. C. Flanagan, A. Sabry, B. F. Duba, and M. Felleisen. The essence of compiling with continuations. In Proc. ACM SIGPLAN ’93 Conf. on Prog. Lang. Design and Implementation, pages 237–247, New York, June 1993. ACM Press. 159, 164 5. J. Y. Girard. Interpretation Fonctionnelle et Elimination des Coupures dans l’Arithmetique d’Ordre Superieur. PhD thesis, University of Paris VII, 1972. 156, 164 6. R. Harper and J. C. Mitchell. On the type structure of Standard ML. ACM Trans. Prog. Lang. Syst., 15(2):211–252, April 1993. 158, 158, 169 7. R. Harper and G. Morrisett. Compiling polymorphism using intensional type analysis. In Twenty-second Annual ACM Symp. on Principles of Prog. Languages, pages 130–141, New York, Jan 1995. ACM Press. 156, 169 8. P. Hudak, S. P. Jones, and P. W. et al. Report on the programming language Haskell, a non-strict, purely functional language version 1.2. SIGPLAN Notices, 21(5), May 1992. 164 9. R. Hughes. The design and implementation of programming languages. PhD thesis, Programming Research Group, Oxford University, Oxford, UK, 1983. 163, 169 10. T. Johnsson. Lambda Lifting: Transforming Programs to Recursive Equations. In The Second International Conference on Functional Programming Languages and Computer Architecture, pages 190–203, New York, September 1985. SpringerVerlag. 163, 169 11. M. P. Jones. Qualified Types: Theory and Practice. PhD thesis, Oxford University Computing Laboratory, Oxford, july 1992. Technical Monograph PRG-106. 163, 164, 169 12. M. P. Jones. A theory of qualiﬁed types. In The 4th European Symposium on Programming, pages 287–306, Berlin, February 1992. Spinger-Verlag. 163, 169 13. M. P. Jones. Dictionary-free overloading by partial evaluation. In Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pages 107–117. University of Melbourne TR 94/9, June 1994. 14. X. Leroy. Unboxed objects and polymorphic typing. In Nineteenth Annual ACM Symp. on Principles of Prog. Languages, pages 177–188, New York, Jan 1992. ACM Press. Longer version available as INRIA Tech Report. 169 15. X. Leroy and M. Mauny. Dynamics in ML. In The Fifth International Conference on Functional Programming Languages and Computer Architecture, pages 406–426, New York, August 1991. Springer-Verlag. 156 16. R. Milner, M. Tofte, and R. Harper. The Definition of Standard ML. MIT Press, Cambridge, Massachusetts, 1990. 156, 159 17. R. Milner, M. Tofte, R. Harper, and D. MacQueen. The Definition of Standard ML (Revised). MIT Press, Cambridge, Massachusetts, 1997. 156, 159 18. Y. Minamide. Full lifting of type parameters. Technical report, RIMS, Kyoto University, 1997. 157, 163, 168 19. R. Morrison, A. Dearle, R. C. H. Connor, and A. L. Brown. An ad hoc approach to the implementation of polymorphism. ACM Trans. Prog. Lang. Syst., 13(3), July 1991. 169

Optimal Type Lifting

171

20. G. Nadathur. A notation for lambda terms II: Reﬁnements and applications. Technical Report CS-1994-01, Duke University, Durham, NC, January 1994. 21. A. Ohori. A compilation method for ML-style polymorphic record calculi. In Nineteenth Annual ACM Symp. on Principles of Prog. Languages, New York, Jan 1992. ACM Press. 163, 169 22. S. Peyton Jones. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of Functional Programming, 2(2):127–202, April 1992. 169 23. S. Peyton Jones. Compiling haskell by program transformation: a report from trenches. In Proceedings of the European Symposium on Programming, Linkoping, April 1996. 169 24. S. Peyton Jones and D. Lester. A modular fully-lazy lambda lifter in haskell. Software – Practice and Experience, 21:479–506, 1991. 163, 163, 169 25. S. Peyton Jones, W. Partain, and A. Santos. Let-ﬂoating: moving bindings to give faster programs. In Proc. International Conference on Functional Programming (ICFP’96), New York, June 1996. ACM Press. 169 26. J. C. Reynolds. Towards a theory of type structure. In Proceedings, Colloque sur la Programmation, Lecture Notes in Computer Science, volume 19, pages 408–425. Springer-Verlag, Berlin, 1974. 156, 164 27. Z. Shao. Flexible representation analysis. In Proc. 1997 ACM SIGPLAN International Conference on Functional Programming (ICFP’97), pages 85–98. ACM Press, June 1997. 156, 169 28. Z. Shao. An overview of the FLINT/ML compiler. In Proc. 1997 ACM SIGPLAN Workshop on Types in Compilation, June 1997. 156, 156, 158, 169 29. Z. Shao. Typed cross-module compilation. Technical Report YALEU/DCS/RR1126, Dept. of Computer Science, Yale University, New Haven, CT, November 1997. 164, 165 30. Z. Shao and A. W. Appel. A type-based compiler for Standard ML. In Proc. ACM SIGPLAN ’95 Conf. on Prog. Lang. Design and Implementation, pages 116–129. ACM Press, 1995. 31. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A typedirected optimizing compiler for ML. In Proc. ACM SIGPLAN ’96 Conf. on Prog. Lang. Design and Implementation, pages 181–192. ACM Press, 1996. 156, 156, 156, 169 32. A. Tolmach. Tag-free garbage collection using explicit type parameters. In Proc. 1994 ACM Conf. on Lisp and Functional Programming, pages 1–11, New York, June 1994. ACM Press. 156, 156, 157, 168 33. A. K. Wright. Polymorphism for imperative languages without imperative types. Technical Report Tech Report TR 93-200, Dept. of Computer Science, Rice University, Houston, Texas, February 1993. 159

7

Appendix

In this section, we give the proofs of the type preservation theorem and the semantic-soundness theorem. Figure 7 gives the typing rules. Figure 8 gives a slightly modiﬁed version of the translation algorithm. The type environment Γm binds monomorphic variables while the environment Γp binds polymorphic variables. Notation 1 (λ∗ F.e and @∗ zF ) We use λ∗ F.e and @∗ zF to denote repeated abstractions and applications respectively. If F = {x1 , ..., xn }, then λ∗ F.e reduces

172

Bratin Saha and Zhong Shao

Γ i : Int

(const/var)

Γ {x : µ1 } e : µ2 Γ λx : µ1 .e : µ1 → µ2

(fn)

Γ x1 : µ → µ Γ x2 : µ Γ @x1 x2 : µ

(app)

(tfn)

Γ ev : µ1

Γ {x : ∀ti .µ1 } e : µ2

Γ let x = Λti .ev in e : µ2 Γ x : ∀ti .µ Γ x[µi ] : [µi /ti ]µ

(tapp)

(let)

Γ x : Γ (x)

Γ {x : µ1 } e2 : µ2 Γ e1 : µ1 Γ let x = e1 in e2 : µ2

Fig. 7. Static Semantics

to λx1 : µ1 .(...(λxn : µn .e)..) where µ1 , ...µn are the types of x1 , ..., xn in Γm . Similarly @∗ zF reduces to @(..(@zx1 )..)xn . Notation 2 (T (L)) If L is a set of variables, then T (L) refers to the types of the variables in L in the environment Γm . If L = {x1 , x2 , ..., xn } and the types of the variables are respectively µ1 , ..., µn , then T (L) → τ is shorthand for µ1 → (... → (µn → τ )..). Throughout this section, we assume unique variable bindings – variables are never redeﬁned in the program. 7.1

Type Preservation

Before we prove the type soundness of the translation, we will deﬁne a couple of predicates on the header — ΓH and well-typedness of H. Intuitively, ΓH denotes the type that we annotate with each expression in H during the translation and well-typedness ensures that the type we annotate is the correct type. Together these two ensure that the header formed is well typed. Definition 1 (The Header Type Environment – ΓH ). → τ. If H = (h0 . . . hn ), then ΓH = Γh0 . . . Γhn . If hi ::= (x = e, τ ), then Γhi := x Definition 2 (Let H in e). If H = h0 . . . hn , then Let H in e is shorthand for let h0 in . . . let hn in e. The typing rule is as follows — Γm Let H in e : µ iﬀ Γm ; ΓH e : µ.

Optimal Type Lifting

(exp)

(app)

(fn)

(let)

(tfn)

Γm (x) = µ Γm ; Γp ; H x : µ ⇒ x; ∅; {x}

173

Γm ; Γp ; H i : Int ⇒ i; ∅; ∅

Γm (x1 ) = µ1 → µ2 Γm (x2 ) = µ1 Γm ; Γp ; H @x1 x2 : µ2 ⇒ @x1 x2 ; ∅; {x1 , x2 } Γm [x → µ]; Γp ; H e : µ ⇒ e ; H1 ; F Γm ; Γp ; H λx : µ.e : µ → µ ⇒ λx : µ.e ; H1 ; F \{x : µ} Γm ; Γp ; H e1 : µ1 ⇒ e1 ; H1 ; F1 Γm [x → µ1 ]; Γp ; H e2 : µ2 ⇒ e2 ; H2 ; F2 Γm ; Γp ; H let x = e1 in e2 : µ2 ⇒ let x = e1 in e2 ; H1 + H2 ; F1 ∪ (F2 \{x}) Γm ; Γp ; H e1 : µ1 ⇒ e1 ; H1 ; F1 H1 = x = Λti .Let H1 in λ∗ F1 .e1 , ∀ti .T (F1 ) → µ1 Γm ; Γp [x → ∀ti .µ1 , F1 ]; H + H1 e2 : µ2 ⇒ e2 ; H2 ; F2 Γm ; Γp ; H let x = Λti .e1 in e2 : µ2 ⇒ e2 ; H1 + H2 ; F2

(tapp)

Γp (x) = ∀ti .µ, F ΓH (x) = ∀ti .T (F ) → µ z a fresh variable Γm ; Γp ; H x[µi ] : [µi /ti ]µ ⇒ @∗ zF ; z = x[µi ], T (F ) → [µi /ti ]µ; F

H1

Fig. 8. The Lifting Translation

Definition 3 (H Is Well Typed). H is well typed if h0 ...hn are well typed. hi is well typed if h0 ...hi−1 are well typed and — – hi ::= (x = Λti .Let H1 in e, ∀ti .µ), then Γh0 ..hi−1 Let H1 in e : µ. – hi ::= (z = x[µi ], [µi /ti ]µ), then Γh0 ...hi z : [µi /ti ]µ

Lemma 1. Suppose Γm ; Γp ; H e ⇒ e ; H ; F . If x ∈ Γm and x does not occur free in H, then x does not occur free in H + H . Proof. This is proved by induction on the structure of e. Theorem 2 (Type Preservation). Suppose Γm ; Γp ; H e : µ ⇒ e ; H1 ; F . If H is well typed then H + H1 is well typed and if Γm ; Γp e : µ then Γm ; ΓH Let H1 in e : µ Proof. The proof is by induction on the structure of e. We will consider only tf n and tapp.

174

Bratin Saha and Zhong Shao H

Case Tapp. To prove that if H is well-typed, H + (z = x[µi ], T (F ) → [µi /ti ]µ) is also well-typed and Γm ; ΓH Let H in @∗ zF : [µi /ti ]µ Since we assume H is well typed, we need to prove H is well typed. By the precondition on the translation ΓH x : ∀ti .T (F ) → µ. Since F consists of the free variables of x, T(F) cannot have any of the ti s as a free type variable. Therefore ΓH+H z : T (F ) → [µi /ti ]µ which proves that H is well-typed. This also leads to Γm ; ΓH+H @∗ zF : [µi /ti ]µ. Case tfn = . To prove - given H is well-typed, H + H1 + H2 is also well-typed and Γm ; ΓH Let H1 + H2 in e2 : µ2 . By the inductive assumption on the translation of e1 , H + H1 is well-typed and Γm ; ΓH Let H1 in e1 : µ1 . Since the variables in F1 are bound in Γm (and not in H1 ), this implies that Γm ; ΓH Let H1 in λ∗ F1 .e1 : T (F1 ) → µ1 . Since λ∗ F1 .e1 is closed with respect to monomorphic variables, we no longer require the environment Γm . Therefore ΓH Let H1 in λ∗ F1 .e1 : T (F1 ) → µ1 . This implies H1 is well-typed. Again by induction, if H + H1 is well-typed, then H + H1 + H2 is well-typed and Γm ; ΓH+H1 Let H2 in e2 : µ2 . This implies that Γm ; ΓH+H1 +H2 e2 : µ2 which leads to the type preservation theorem.

¾

7.2

Semantic Soundness

The operational semantics is shown in Figure 9. There are only three kinds of values - integers, function closures and type function closures. (values) v ::= i | Closxµ , e, a | Clost ti , e, a Definition 4 (Type of a Value). – Γ i : int – if Γ λx : µ.e : µ → µ , then Γ Closxµ , e, a : µ → µ – if Γ Λti .ev : ∀ti .µ , then Γ Clost ti , ev , a : ∀ti .µ Notation 3 The notation a : Γ e → v means that in a value environment a respecting Γ , e evaluates to v. If a respects Γ , then a(x) = v and Γ (x) = µ implies Γ v : µ. Notation 4 The notation a(x → v) means that in the environment a, x has the value v. Whereas a[x → v] means that the environment a is augmented with the given binding.

Optimal Type Lifting

(const/var)

ai→i

(fn)

a λx : µ.e → Closxµ , e, a

(app)

a x → a(x)

a x1 → Closxµ , e, a a x2 → v a @x1 x2 → v

a + x → v e → v

(tfn)

a Λti .ev → Clost ti , ev , a

(let)

a + x → v1 e2 → v a e1 → v1 a let x = e1 in e2 → v

(tapp)

175

a x → Clost ti , ev , a a ev [µi /ti ] → v a x[µi ] → v

Fig. 9. Operational Semantics

We need to deﬁne the notion of equivalence of values before we can prove that two terms are semantically equivalent. Definition 5 (Equivalence of Values). – Equivalence of Int i ≈ i iﬀ • Γ i : int and Γ i : int and i = i . – Equivalence of Closures Closxµ , e, a ≈ Closxµ , e , a iﬀ • Γ Closxµ , e, a : µ → µ and Γ Closxµ , e , a : µ → µ . • ∀v1 , v1 such that Γ v1 : µ and Γ v1 : µ and v1 ≈ v1 . • a:Γ +x → v1 e → v and a : Γ + x → v1 e → v and v ≈ v – Equivalence of Type Closures Clost ti , ev , a ≈ Clost ti , ev , a iﬀ • Γ Clost ti , ev , a : ∀ti .µ and Γ Clost ti , ev , a : ∀ti .µ and • a : Γ ev [µi /ti ] → v and a : Γ ev [µi /ti ] → v and v ≈ v . Definition 6 (Equivalence of Terms). Suppose a : Γ e → v and a : Γ e → v . Then the terms e and e are semantically equivalent iﬀ v ≈ v . We denote this as a : Γ e ≈ a : Γ e . Before we get into the proof, we want to deﬁne a couple of predicates on the header - aH and well-formedness of H. Intuitively aH represents the addition of new bindings in the environment as the header gets evaluated. Well-formedness of the header ensures that the lifting of polymorphic functions and type applications is semantically sound.

176

Bratin Saha and Zhong Shao

Definition 7 (The Header Value Environment – aH ). aH is equal to ah0 . . . ahn and ahj is – → Clost ti , e, ah0 ...hj−1 – if hj ::= (x = Λti .e, τ ) then ahj := x → v where – if hk ::= (z = x[µi ], τ ) then ahk := z hj ::= x → Clost ti , e, ah for some j < k and ah : Γh e[µi /ti ] → v Definition 8 (Let H in e). Suppose H = h1 ...hn . Then Let H in e is shorthand for let h1 ... in let hn in e. If hj ::= (x = e, τ ), then let hj is shorthand for let x = e. From the operational semantics we get am : Γm Let H in e ≈ am : Γm ; aH : ΓH e. Definition 9 (H is Well-Formed w.r.t am : Γm ; ap : Γp ). H is well-formed w.r.t. am : Γm ; ap : Γp , if h0 , . . . , hn are well-formed. A header entry hj is well-formed if all its predecessors h0 , . . . , hj−1 are well-formed and – – If hj ::= (x = Λti .e, τ ), and Γp (x) = (∀ti .µ, F ) then am : Γm ; ap : Γp x[µi ] ≈ am : Γm ; ah0 ...hj : Γh0 ...hj let z = x[µi ] in @∗ zF – If hj ::= (z = x[µi ], τ ), then hj is well-formed. H is well-formed w.r.t. am : Γm ; ap : Γp will be abbreviated in this section to H is well-formed. Theorem 3 (Semantic Soundness). Suppose Γm ; Γp ; H e : µ ⇒ e ; H1 ; F . If am : Γm ; ap : Γp e → v and H is well-formed w.r.t am : Γm ; ap : Γp then am : Γm ; aH : ΓH Let H1 in e → v and v ≈ v . Proof. The proof is by induction on the structure of e. We will consider the tapp and tf n cases here. Case tapp = . To prove – If H is well-formed then am : Γm ; ap : Γp x[µi ] ≈ am : Γm ; aH : ΓH Let H1 in @∗ zF Substituting Let H1 in the above equation leads to am : Γm ; ap : Γp x[µi ] ≈ am : Γm ; aH : ΓH let z = x[µi ] in @∗ zF By the precondition on the translation rule Γp (x) = (∀ti .µ, F ) and there exists some hj ∈ H such that hj ::= (x = Λti .e, τ ). Since H is well-formed, hj is well-formed as well and therefore by deﬁnition am : Γm ; ap : Γp x[µi ] ≈ am : Γm ; ah0 ...hj : Γh0 ...hj let z = x[µi ] in @∗ zF But since we assume unique variable bindings, no hk for k > j rebinds x. This leads to –

Optimal Type Lifting

177

am : Γm ; ap : Γp x[µi ] ≈ am : Γm ; aH : ΓH let z = x[µi ] in @∗ zF which is what we want to prove. Case tfn = . To prove - given H is well-formed am : Γm ; ap : Γp let x = Λti .e1 in e2 ≈ am : Γm ; aH : ΓH Let H1 + H2 in e2 which means we must prove that if am : Γm ; ap [x → Clost ti , e1 , am + ap ] : Γp [x → ∀ti .µ1 , F ] e2 → v and am : Γm ; aH+H1 : ΓH+H1 Let H2 in e2 → v then v ≈ v . Assume for the time being that H + H1 is well-formed. Then the inductive hypothesis on the translation of e2 leads to the above condition. We are therefore left with proving that H +H1 is well-formed. By assumption, H is well-formed, therefore we must prove that H1 is well-formed. According to the deﬁnition we need to prove that ; ap : Γp x[µi ] ≈ am : Γm ; aH+H1 : ΓH+H1 let z = x[µi ] in @∗ zF am : Γm

In the above equation aH1 := x → Clost ti , Let H1 in λ∗ F.e1 , aH , therefore the operational semantics leads to z → ClosF T (F ) , e1 [µi /ti ], aH + aH1 [µi /ti ] This implies that we must prove – am : Γm ; ap : Γp x[µi ] ≈ am (F ) : Γm ; aH : ΓH + aH1 [µi /ti ] : ΓH1 [µi /ti ] e1 [µi /ti ]

In the source term x → Clost ti , e1 , am + ap which implies that ; ap : Γp x[µi ] ≈ am : Γm ; ap : Γp e1 [µi /ti ] am : Γm

Therefore we need to prove that – am : Γm ; ap : Γp e1 [µi /ti ] ≈ am (F ) : Γm ; aH : ΓH +aH1 [µi /ti ] : ΓH1 [µi /ti ] e1 [µi /ti ] (1) But am (F ) = am (F ) since variables are bound only once. F consists of all the free variables of e1 that are bound in am and therefore in am . Hence evaluating e1 in am (F ) is equivalent to evaluating it in am . So proving Eqn 1 reduces to proving

am : Γm ; ap : Γp e1 [µi /ti ] ≈ am : Γm ; aH : ΓH + aH1 [µi /ti ] : ΓH1 [µi /ti ] e1 [µi /ti ] which follows from the inductive assumption on the translation of e1 .

¾

Formalizing Resource Allocation in a Compiler Peter Thiemann Department of Computer Science University of Nottingham Nottingham NG7 2RD, England [email protected]

Abstract. On the basis of an A-normal form intermediate language we formally specify resource allocation in a compiler for a strict functional language. Here, resource is to be understood in the most general sense: registers, temporaries, data representations, etc. All these should be (and can be, but have never been) speciﬁed formally. Our approach employs a non-standard annotated type system for the formalization. Although A-normal form turns out not to be the ideal vehicle for this investigation, we can prove some basic properties using the formalization.

1

Resource Allocation

Resource allocation in the back end of a compiler is often poorly speciﬁed. More often than not register allocation, administration of temporaries, and representation conversions are only speciﬁed procedurally [1, 6]. Code generators based on such algorithmic speciﬁcations can be hard to maintain or prove correct. Even the authors of such code generators are sometimes not aware of all the invariants that must be preserved. Therefore, we investigate a declarative approach to resource allocation in the back end of a compiler. The approach is based on an annotated type system of implementation types that makes resource allocation and conversion explicit. The use of type conversion rules enables us to defer memory and register allocation until the context of use forces an allocation. For example, a constant initially leads to an annotation of the type of the variable holding the constant without generating any code. The annotation holds the “immediate value” of the constant. There are type conversion rules that change the annotation from “immediate value” to “value in register k” and generate a corresponding piece of code if the context of use requires the value of the variable in a register. Further conversion rules create or remove indirection. The indirection rules move a value to memory and change the annotation to “value in memory at address R[k] + i”, where R[k] is an address in register k and i is an oﬀset. The indirection removing rules work the other way round. The indirection rules usually apply to the arguments of function calls or to values that are put into data structures. Spilling the contents of registers is another application of the last kind of conversion rules. Other rules may make direct use of immediate values, for example when generating instructions with immediate operands. X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 178–193, 1998. c Springer-Verlag Berlin Heidelberg 1998

Formalizing Resource Allocation in a Compiler

179

The resulting high degree of ﬂexibility allows for arbitrary intra-module calling conventions. Since the calling convention is part of every function’s type, each function “negotiates” its convention with all its call sites. Contrast this with the algorithm used in the SML/NJ compiler [2] where the ﬁrst call encountered by the code generator determines the calling convention for a procedure. Obviously, this is one pragmatic way of negotiating, but surely not a declarative one (nor a democratic one). External functions can have arbitrary calling conventions, too, as long as their implementation type is known. If the external functions are unknown, any standard calling convention (including caller saves/callee saves registers) can be enforced just by imposing a suitable implementation type. The same holds for exported functions, where the only requirement is that their implementation type is also exported, for example, in an interface ﬁle. Implementation types can also model some other worthwhile optimizations. For example, a lightweight closure does not contain all free variables of a function. It can only be used correctly if all variables that it is not closed over are available at all call sites. Implementation types can guarantee the correctness of a variant of lightweight closure conversion (cf. [20]). In our case, this conversion does not take place at the level of the source language, it rather happens while translating to actual machine code. The translation ensures that the values that are not put into the closure are available at all call sites. 1.1

Overview

In the next section, we deﬁne the source language, its operational semantics, the implementation type language, and the target language. The introduction and discussion of the typing rules is subject of Section 3. Section 4 documents some properties of the system. Finally, we discuss related work (Sec. 5) and draw conclusions (Sec. 6).

2

Language

We have chosen a simply typed lambda calculus in A-normal form, a typical intermediate language used in compilers, as the starting point of our investigation. Compiling with A-normal forms [5] is said to yield the principal beneﬁts of compiling with continuations (explicit control ﬂow, naming of intermediate results, making continuations explicit) without incurring the overhead of actually transforming the program to continuation-passing style and without complicating the types in the intermediate language. 2.1

Terms

We strengthen our requirements somewhat with respect to the usual deﬁnition of A-normal form. Figure 1 deﬁnes the terms of restricted A-normal form. There are computation terms a and value terms v. Value terms are integer constants

180

Peter Thiemann

a ::= let x = v in a | let x = x @ x in a | let x = x + x in a | x|x@x v ::= n | x | λx.a

Fig. 1. Restricted A-normal form: terms n, variables x, or lambda abstractions λx.a. Computation terms either sequentialize computations let x = . . . in a or they return a result, which can either be the value of a variable or it can take the form of a tail call to some function. Usually, A-normal form [5] only requires the arguments of applications and primitives to be values v. Restricted A-normal form requires variables x in all these places. With this restriction, no resource allocation occurs “inside” of a term and resource conversions can be restricted to occur between some let and its body, without lack of generality. 2.2

Operational Semantics

The semantics is deﬁned by a fairly conventional CEK machine (see Fig. 2). A machine state is a triple (a, ρ, κ) where – a ∈ Term is a term in A-normal form, – ρ ∈ Env = Var− → Val is an environment, and – κ ∈ K is a continuation where K = Void + Env × Var × Term × K. Here, partial functions are denoted by − →, ρ|F restricts the domain of ρ to F , Void is a one-element set, and + denotes disjoint union of sets. A value ∈ Val is either Num (n) or Fun (ρ, λy.a) where ρ ∈ Env and λy.a ∈ Term. Inspection of the last rule reveals that the semantics enforces proper tail recursion, because the function call in tail position does not create a new continuation. The transitions that state additional constraints are undeﬁned if the constraints are not met. 2.3

Types

Figure 3 deﬁnes implementation types, which form an extension of simple types. In an implementation type, each type constructor carries a location l. If the location is ε (“not allocated”) then the information corresponding to the type constructor is only present in the type. For example, (going beyond the fragment considered in this paper) if the product type constructor × carries the annotation ε then only its components are allocated as prescribed by their locations; the pair itself is not physically represented. If the location is imm n (“immediate value n”) then the value corresponding to the type carrying this location is know to be the integer n. The name comes from the immediate addressing mode that is present in many architectures, and

Formalizing Resource Allocation in a Compiler

(let (let (let (let

x = n in a, ρ, κ) x = y in a, ρ, κ) x = λy.a in a, ρ, κ) x = w @ z in a, ρ, κ)

→ → → →

(let x = w + z in a, ρ, κ) → (z, ρ , ρ, x, a, κ) (w + z, ρ , ρ, x, a, κ)

→ →

(w @ z, ρ, κ)

→

181

(a, ρ[x → Num (n)], κ) (a, ρ[x → ρ(y)], κ) (a, ρ[x → Fun (ρ|F V (λy.a ) , λy.a )], κ) (a , ρ [y → ρ(z)], ρ|F V (let x=w @ z in a) , x, a, κ) if ρ(w) = Fun (ρ , λy.a ) (a, ρ[x → Num (m + n)], κ) if ρ(w) = Num (m) and ρ(z) = Num (n) (a, ρ[x → ρ (z)], κ) (a, ρ[x → Num (m + n)], κ) if ρ(w) = Num (m) and ρ(z) = Num (n) (a , ρ [y → ρ(z)], κ) if ρ(w) = Fun (ρ , λy.a )

Fig. 2. Operational semantics τ ::= (σ; l)

F,P,M,F ,k

σ ::= int | τ −−−−−−−→ cont l τ | cont F l ::= ε | imm n | A[reg n] A ::= [ ] | memi, A

Fig. 3. Syntax of implementation types

immediate values are expected to take part in generating instructions using immediate addressing. If the location is reg k then the value of that type is resident in register k. In addition, the register might hold an indirection, i.e., the address of a block of memory where the value is stored at some oﬀset i: memi, reg k. In general, this indirection step may be repeated an arbitrary number of times, which is expressed by memi, A[reg k]. There are two syntactic categories for types. τ ranges over implementation types, i.e., τ is a pair of a “stripped” implementation type σ and the location of its top-level type constructor. For this paper, σ ranges over int, the type of F,P,M,F ,k

integers, and τ2 −−−−−−−→ cont l τ1 , the type of functions that map objects of type τ2 to objects of type τ1 involving a continuation closure at location l, and cont F , the type of a continuation. The annotation F, P, M, F , k on the function arrow is reminiscent to eﬀect systems [7, 11]. It determines the latent resource usage of the function, which becomes eﬀective when the function is called. It is explained in the next section 3 together with the judgements of implementation typing. The last alternative, σ ≡ cont F is the type of a continuation identiﬁer. This type carries the location information of the current continuation, which would otherwise be lost (see Sec. 3).

182

2.4

Peter Thiemann

Additional Conventions

The architecture of a real processor places certain limits on the use of registers. For example, processors may have – – – –

dedicated ﬂoating-point registers; special address registers (“pointers” to closures and tuples); special register(s) for continuations; special register(s) for condition codes.

In addition, the number of such registers is limited. These restrictions are modeled by a function Regs Regs : TypeConstructor → P(RegisterNames) that maps a type constructor to a set of register names (which might be represented by integers). Occasionally, we apply Regs to a stripped implementation type σ when it should be applied to the top-level type constructor of σ. We do not deﬁne Regs here since it depends on the particular architecture that we want to generate code for. 2.5

Target Code

The target code of the translation is an assembly language for an abstract RISC processor. It has the following commands, expressed in a rather suggestive way with R[k] denoting register reference and M [a] denoting a memory reference. Here, k, j ∈ RegisterNames, a, i are memory addresses for data, t is a symbolic label for a code address, and n is an integer. t: label declaration R[k] := n load numeric constant R[k] := t load address constant R[k] := R[i] + R[j] arithmetic operation R[k] := n + R[j] arithmetic operation R[k] := M [i + R[j]] load indirect with oﬀset M [i + R[j]] := R[k] store indirect with oﬀset R[k] := Allocate(n) memory allocation Goto t unconditional jump Goto R[i] unconditional indirect jump The inﬁx operator “;” performs concatenation of code sequences. We identify singleton code sequences with single instructions. For simplicity, we assume that all data objects have a standard representation of the same size (which might be a pointer). The state of the abstract processor is a triple C, R, M where C is a code sequence, R is the register bank (a mapping from a ﬁnite set of register names to data), and M is the memory (a mapping from an inﬁnite set of addresses to data). The program store, which maps labels (code addresses) to code sequences, is left

Formalizing Resource Allocation in a Compiler

183

implicit. The instruction Allocate(n) returns the address of a contiguous block of memory of size n. It guarantees that there is no overlap with previously allocated blocks, i.e., it never returns the same address twice. Some of our proofs exploit this guarantee by relying on the uniqueness of data addresses for identiﬁcation. In practice there will be a garbage collector that maps the inﬁnite address space into a ﬁnite one, which removes old unreachable addresses from the system.

3

Typing

The typing judgement is reminiscent to that of an eﬀect system [7, 11]. The typing process determines a translation to abstract assembly code as deﬁned above. Therefore, we use a translation judgement Γ, P, F, S a : τ ; M, F , C to describe both together. In every judgement, – Γ is a type assumption, i.e., a list of pairs x : τ . By convention, type assumptions are extended by appending a new pair on the right as in Γ, x : τ . The same notation Γ, x : τ also serves to extract the rightmost pair from a type assumption. – P is a set of preserved registers. The translation guarantees that all registers k ∈ P hold the same value after evaluation of a as before, but during evaluation of a these values may be spilled and register k may hold diﬀerent values temporarily. Members of P correspond to callee-saves registers. – F, F are sets of ﬁxed registers. The translation guarantees that a register k ∈ F is not used as long as there is some reference to it in the type assumption or in the context modeled by F . Furthermore, it expects that the context of a handles the registers mentioned in F in the same way. Members of F must not be spilled. However, if there is no reference remaining to some k ∈ F then k may be removed from F . The main use of F and F is lightweight closure conversion and avoiding the allocation of closures altogether. In both cases, the type assumption F1 ,P1 ,M1 ,F ,k1

1 −→ cont l τ1 where F1 describes contains a variable w of type τ2 −−−−−−−−− the components of the closure that have not been allocated (i.e., they reside in registers drawn from F1 ). Consequently, F1 ⊆ F must hold at a call site a ≡ let x = w @ z in a so that all registers in F1 contain the correct values. – S is a list of reloads of the form k, i1 . . . ip . . . where register k points to a spill area and i1 through ip are the spilled registers. The notation ε is used for the empty list of reloads, i.e., when all values are either implicit or reside in registers: S = ε means that nothing is currently spilled. – M is a set of registers that are possibly modiﬁed while evaluating a. – C is code of the target machine (see Sec. 2.5).

Before we start discussing the typing rules proper, we need to deﬁne the set of registers referenced from a type assumption. Definition 1. The reference set of a location, type, or type assumption is the set of registers that the location, type, or type assumption refers to.

184

Peter Thiemann

– Refer ε = ∅, Refer (imm n) = ∅, Refer (A[reg n]) = {n}; – Refer (int; l) = Refer l; F,P,M,F ,k

– Refer (τ2 −−−−−−−→ cont l τ1 ; l) = Refer l ∪ F ; – Refer (contF ; l) = Refer l ∪ F ; – Refer Γ = x:τ ∈Γ Refer τ . 3.1

Typing Rules

The typing rules are organized into context rules that manipulate type assumptions, value rules that provide typings for variables and constants, representation conversion rules, computation rules that deal with let expressions, and return rules that describe returning values from function invocations. Of these rules, the context rules and the conversion rules are nondeterministic, the remaining rules are tied to speciﬁc syntactic constructs, i.e., syntax-directed. Context Rules. Each use of a variable consumes an element of the type assumption. This convention saves us from spilling dead variables since a “good” derivation only duplicates variables that are still live. Hence there is a rule to duplicate assumptions x : τ . (dup)

(Γ, x : τ, x : τ ), P, F, S a : τ , M, F , C (Γ, x : τ ), P, F, S a : τ , M, F , C

There is a dual weakening rule that drops a variable assumption. The set of ﬁxed registers is updated accordingly. Dropping of variable assumptions starts on the left side of a type assumption to avoid problems with shadowed assumptions. (weak)

Γ, P, F ∩ Refer Γ, S a : τ , M, F , C (x : τ, Γ ), P, F, S a : τ , M, F , C

Finally, there is a rule to organize access to the type assumptions. It exchanges adjacent elements of the type assumption provided that they bind diﬀerent variables. Γ, y : τ2 , x : τ1 , Γ , P, F, S a : τ , M, F , C (exch) x ≡y Γ, x : τ1 , y : τ2 , Γ , P, F, S a : τ , M, F , C The explicit presence of these rules is reminiscent of linear type systems [8, 23]. Value Rules. And here is a simple rule that consumes a variable assumption for y : τ at the price of producing one for x : τ . (let-var)

(Γ, x : τ ), P, F, S a : τ ; M, F , C (Γ, y : τ ), P, F, S let x = y in a : τ ; M, F , C

Application of the (let-var) rule does not imply a change in the actual location of the value. The variable x becomes an alias for y in the expression a. The

Formalizing Resource Allocation in a Compiler

185

rule can be eliminated in favor of a reduction rule for expressions in restricted A-normal form: let x = y in a → a[x := y] (capture-avoiding substitution of y for x in a). There is no penalty for this reduction, because the system allows the conversion of each occurrence of a variable individually. So former occurrences of x can still be treated diﬀerently than former occurrences of y. A constant starts its life as an immediate value which is only present in the implementation type. The typing derivation propagates this type and value to the point where it either selects an instruction with an immediate operand or where the context forces allocation into a register. (let-const)

(Γ, x : (int; imm n)), P, F, S a : τ ; M, F , C Γ, P, F, S let x = n in a : τ ; M, F , C

Conversion Rules. Some primitives expect their arguments allocated in registers. As we have seen, values are usually not born into registers. So, how do they get there? The solution lies in conversion rules that transform the type assumption. These rules generate code and allocate registers. A register k is deemed available if it is neither referred to by Γ nor mentioned in P ∪ F : k ∈ Refer Γ ∪ P ∪ F . Immediate integer values generate a simple load instruction. In this case, the register selected must be suitable for an integer (k ∈ RegisterNamesint) besides being available for allocation. (conv-imm)

(Γ, x : (int; reg k)), P, F, S a : τ ; M, F , C (Γ, x : (int; imm n)), P, F, S a : τ ; M ∪ {k}, F , C

where k ∈ Regs(int) \ (Refer Γ ∪ P ∪ F ) C = (R[k] := n; C) The resolution of an indirection memi, reg n generates a memory load with index register R[n] and oﬀset i. (conv-mem)

(Γ, x : (σ; A[reg k])), P, F, S a : τ ; M, F , C (Γ, x : (σ; A[memi, reg n])), P, F, S a : τ ; M ∪ {k}, F , C

where k ∈ Regs(σ) \ (Refer Γ ∪ P ∪ F ) C = (R[k] := M [R[n] + i]; C) There is also an operation that generates indirections by spilling a group of registers to memory. The register k must be suitable to hold the standard representation of a tuple (a pointer to a contiguous area of memory) as indicated by k ∈ RegisterNames×. The (spill) rule is not applicable if there is no such register k. The rule chooses nondeterministically a set X of registers to spill which does not interfere with the ﬁxed registers F . If preserved registers are

186

Peter Thiemann

spilled the corresponding reloads are scheduled in the S component. (spill)

Γ˜ , (P \ X) ∪ {k}, F, k, i1 . . . ip S a : τ ; M, F , C Γ, P, F, S a : τ ; (M \ X) ∪ {k}, F , C

where Γ˜ = Γ [reg ij := memj, reg k | 1 ≤ j ≤ n] X = {i1 , . . . in } X ∩F =∅ X ∩ P = {i1 , . . . , ip }, 0 ≤ p ≤ n k ∈ Regs(×) \ (Refer Γ ∪ P ∪ F ) C = (R[k] := Allocate(|X|); M [R[k] + 0] := R[i1 ]; . . . ; M [R[k] + n − 1] := R[in ]; C) The notation Γ [reg ij := memj, reg k | 1 ≤ j ≤ n] denotes the textual replacement of all occurrences of reg ij in implementation types mentioned in Γ by memj, reg k for 1 ≤ j ≤ n. The corresponding inverse rule (reload) pops one reload entry from S. (reload)

Γ, (P \ {k}) ∪ {i1 , . . . , ip }, F, S a : τ ; M, F , C Γ˜ , P, F, k, i1 . . . ip S a : τ ; M, F , C

where Γ˜ = Γ [reg ij := memj, reg k | 1 ≤ j ≤ p] C = (R[i1 ] := M [R[k] + 0]; . . . ; R[ip ] := M [R[k] + p − 1]; C) Computation Rules. The ﬁrst computation rule deals with lambda abstraction. The type assumptions are divided in those for the free variables of the function Γ and those for the continuation ∆. The function’s body a1 is processed with Γ˜ where some free variables are relocated into the closure, a set P of preserved registers as determined by the call sites of the function, and a set of ﬁxed variables F that contains those ﬁxed registers that are referred to from the assumption Γ . Also, the register m on the function arrow must match the register which is assumed to hold the closure while translating the body of the abstraction. It is not necessary that m = k, where k is the register where the closure is allocated. Finally, the let’s body a2 is processed with ∆. (Γ˜ , x2 : τ2 , c : (cont F ; l)), P , F , ε a1 : τ1 ; M , F , C1

,P ,M ,F ,m (let-abs) (∆, x1 : (τ2 −F−− −−−−−−−→ cont l τ1 ; reg k)), P, F, S a2 : τ0 ; M, F , C2 (Γ, ∆), P, F, S let x1 = λx2 .a1 in a2 : τ0 ; M ∪ {k}, F , C

where F ∩ Refer Γ ⊆ F ⊆ Refer Γ k ∈ Regs(→) \ (Refer (Γ, ∆) ∪ P ∪ F ) m ∈ Regs(→) \ (Refer Γ ∪ P ∪ F ) Γ˜ = Γ [reg ij := memj, reg m | 1 ≤ j ≤ n] {ij } = Refer Γ \ F , |{ij }| = n C = (Goto t2 ; t1 : C1 ; t2 : R[k] = Allocate(n + 1); M [R[k]] := t1 ; M [R[k] + 1] := R[i1 ]; . . . ; M [R[k] + n] := R[in ]; C2 )

Formalizing Resource Allocation in a Compiler

187

All registers that do not become ﬁxed in the function must be evacuated into the closure for the function which is composed in R[k]. Since the continuation (which is located in l) can be handled like any other value, we invent a continuation identiﬁer c and bind it to the continuation. This is a drawback of A-normal form in comparison to continuation-passing style where continuation identiﬁers are explicit. Next, we consider a typical primitive operation. (let-add)

(Γ, x1 : (int; reg k)), P, F, S a : τ, M, F , C (Γ, x2 : (int; reg i), x3 : (int; reg j)), P, F, S a : τ, M ∪ {k}, F , C

where k ∈ Regs(int) \ (Refer Γ ∪ P ∪ F ) C = (R[k] := R[i] + R[j]; C) a = let x1 = x2 + x3 in a In addition, we could include a rule for constant propagation (in the case where the arguments are imm n1 and imm n2 ) and also rules to exploit immediate addressing modes if the processor provides for these. Next, we consider the application of a function. (let-app)

(Γ, x1 : τ1 ), P, F , S a : τ, M, F , C Γ , P, F, S a : τ, M ∪ M ∪ {j, k}, F , C

where P ∪ {i1 , . . . , ip } ⊆ P , F ⊆ F {i1 , . . . , in } = Refer Γ \ F , |{ij }| = n =k {j, k} ⊆ Regs(→) \ (Refer Γ, x1 : τ1 ∪ P ∪ F ), j F ,P ,M ,F ,i

Γ = (Γ, x2 : (τ2 −−−−−−−−−→ cont (reg j) τ1 ; reg i), x3 : τ2 ) a = let x1 = x2 @ x3 in a C = (R[j] := Allocate(n − p + 1); M [R[j] + 0] := t; M [R[j] + 1] := R[ip+1 ]; . . . ; M [R[j] + n − p] := R[in ]; R[k] := M [R[i] + 0]; Goto R[k]; t : R[ip+1 ] := M [R[j] + 1]; . . . ; R[in ] := M [R[j] + n − p]; C) The memory allocation in this rule saves values that are accessed by the continuation a. The preservation of the remaining registers is left to the callee by placing them {i1 , . . . , ip } in the set of preserved registers P . R[j] points to the continuation closure. The sole purpose of the cont (reg j) τ1 construction lies in the transmission of the location of the continuation. The set of currently preserved registers must be a subset of the set of registers preserved by the function. Conversely, the set of currently ﬁxed registers must contain the set of ﬁxed registers demanded by the function. The continuation has to ﬁx registers as indicated by the annotation F of the function type. The i on the function arrow indicates the register where the function body expects its closure. It must coincide with the register in which the closure actually is.

188

Peter Thiemann

Return Rules. Finally, we need to consider rules that pass a value to the continuation. The most simple rule just returns the value of a variable. Due to the conversion rules, we can rely on x : τ already being placed in the location where the continuation expects it. All return rules expect that their reload list is empty. (ret-var)

k ∈ Regs(→) \ (Refer x : τ, c : (cont F ; reg i) ∪ P ∪ F ) (x : τ, c : (cont F ; reg i)), P, F, ε x : τ, {k}, F, C

where C = (R[k] := M [R[i] + 0]; Goto R[k]) In this rule, the current continuation identiﬁer c indicates that register i contains the continuation closure. As with any closure, its zero-th component contains the code address. The ﬁnal rule speciﬁes a tail call to another function. (ret-app)

P ⊆ P , F ⊆ F, k ∈ Regs(→) \ ({j, i} ∪ Refer τ2 ∪ P ∪ F ) Γ, P, F, ε x1 @ x2 : τ1 , {k}, F, (R[k] := M [R[i] + 0]; Goto R[k]) F ,P ,M ,F

where Γ = (x1 : (τ2 −−−−−−−−→ cont (reg j) τ1 ; reg i), x2 : τ2 , c : (cont F ; (reg j))) There is neither a return term nor a return rule for addition, because the allocation properties of let x = y + z in x are identical to those of y + z, if the latter was a legal return term.

4

Properties

In this section, we formalize some of the intuitive notions introduced in the preceding sections. First, we show that preserved registers really deserve their name. Theorem 1. Suppose Γ, P, F, S a : τ ; M, F1 , C and the processor is in state C, R, M . For each register r ∈ P : ∗ Suppose c : (cont F ; reg w) ∈ Γ , y = R[r], c = R[w], and C, R, M → C , R , M . If R [w] = c and C is a suﬃx of C such that Γ , P , F , S a : τ ; M , F1 , C and in the derivation steps between a and a the reload component always has S as a suﬃx then R [r] = y. The reference to the continuation c ensures that both machine states belong to the same procedure activation, by the uniqueness of addresses returned by Allocate(n). It provides the only link between the two machine states. If we dropped this requirement we would end up comparing machine states from different invocations of the same function and we could not prove anything. The condition on the reload component means that arbitrary spills are allowed between p and p , but reloads are restricted not to remove the reload record that

Formalizing Resource Allocation in a Compiler

189

was top-level at p. In other words, S serves as a low-water mark. Our main interest will be in the case where S = S = ε, a is the body of a function, and a is a return term. In this case, the theorem says that registers mentioned in P are preserved across function calls. This theorem can be proved by induction on the number of control transfers ∗ → C , R , M and then by induction on the derivation. in C, R, M Next, we want to formalize a property for F . A value stored in f ∈ F will remain there unchanged as long as the variable binding that f belongs to is in eﬀect or reachable through closures or the continuation. As a ﬁrst step, we deﬁne a correspondence between an environment Γ and a state of the CEK machine (cf. Sec. 2.2). Definition 2. Γ (a, ρ, κ) if 1. there exist P, F, S, M, F1 , C such that Γ, P, F, S a : τ ; M, F1 , C; 2. x : τ in Γ implies x ∈ dom(ρ) and ρ(x) ∈ TSem τ ; 3. if c : (cont F ; reg w) in Γ then there are ρ, x, and a such that κ = ρ, x, a, κ . Otherwise κ = . Unfortunately the connection between c and κ is not very deep. We cannot properly relate c to κ since the “type” of c does not refer to an environment. In fact, c and its type cannot refer to a speciﬁc environment Γ because a may be called from several places with diﬀerent environments. Therefore, the type of the return environment of the continuation must be polymorphic. The function TSem τ maps an implementation type to a subset of Val. TSem (int; l) = {Num (n) | n is an integer} F,P,M,F

TSem (τ2 −−−−−−→ cont l τ1 ; l) = {Fun (ρ , λy.a) | ∀z ∈ TSem τ2 . ∗ (a, ρ [y → z], ) → (x, ρ , ) such that ρ (x) ∈ TSem τ1 or ∗ → z], ) → (x + w, ρ , ) and τ1 = (int; l )} (a, ρ [y However, to formalize reachability through closures and continuations and link this concept with the environment, we need a stronger notion than Γ (a, ρ, κ). What we can actually prove by inspection of the rules is a much weaker theorem. Theorem 2. Suppose Γ, P, F, S a : τ ; M, F1 , C and the processor is in state C, R, M . For each r ∈ F : ∗ Suppose y = R[r] and C, R, M → C , R , M such that Γ , P , F , S a : τ ; M , F1 , C and there is no intermediate state with a corresponding derivation step. If furthermore r ∈ F then R [r] = y. Finally, we establish a formal correspondence between steps of the CEK machine and steps of the translated machine program. To this end, we need a notion of compatibility between a CEK state and a machine state, Γ (a, ρ, κ) ∼ = (C, R, M ).

190

Peter Thiemann

Definition 3. Suppose Γ, P, F, S a : τ ; M, F1 , C. Γ (a, ρ, κ) ∼ = (C, R, M ) if for all x : τ in Γ : – if τ = (int; imm n) then ρ(x) = Num (n) (the value is not represented in the machine state); – if τ = (int; reg r) then there exists an integer n s.t. ρ(x) = Num (n) and R[r] = n; – if τ = (int; memj, reg r) then there exists n s.t. ρ(x) = Num (n) and M [R[r] + j] = n; – if τ = (int; memjk , . . . memj0 , reg r) then there exist n, i0 , . . . , ik s.t. ρ(x) = Num (n) and ik = n, iν = M [iν−1 + jν ] for 1 ≤ ν ≤ k, and i0 = R[r]; F ,P ,M ,F

1 – if τ = (τ2 −−−−−−−−→ cont (reg s) τ1 ; reg r) then there exists ρ , y, a s.t. ∗ ρ(x) = Fun (ρ , λy.a ) and for all z ∈ TSem τ2 (a , ρ [y → z], ) → (x , ρ , ) where ρ (x ) ∈ TSem τ1 , M [R[r]] holds the address of C such that

(Γ , y : τ2 , c : (cont F2 ; reg s)), P , F , S a : τ1 ; M , F1 , C which starts with Γ , P , F , S x : τ1 ; M , F1 , C

∗

and for each CEK state (C , R , M ) → (C , R , M ) where R [s] = R [s] we have that (Γ , y : τ2 , c : (cont F2 ; l)) (a , ρ [y → z], κ ) ∼ = (C , R , M ) and Γ (x , ρ , κ ) ∼ = (C , R , M ); – if τ = (cont F2 ; reg r) then κ = ρ , x , a , κ such that Γ , P , F , S a : τ ; M , F1 , C and M [R[r]] holds the address of C . Theorem 3. Suppose Γ, P, F, S a : τ ; M, F1 , C. ∗ If Γ (a, ρ, κ) ∼ → (a , ρ , κ ) where κ is a suﬃx of = (C, R, M ) and (a, ρ, κ) ∗ κ then (C, R, M ) → (C , R , M ) and there exist Γ , P , F , S , τ , M , and F1 such that Γ , P , F , S a : τ , M , F1 , C and Γ (a , ρ , κ ) ∼ = (C , R , M ).

5

Related Work

Compiling with continuations has already a long history. Steele’s Rabbit compiler [21] has pioneered compilation by ﬁrst transforming source programs to continuation-passing style and then transforming it until the assembly code can be read of directly. Also the Orbit compiler [13] and other successful systems [12, 3, 2] follow this strategy. Recently, there has been some interest in approaches which do not quite transform the programs to continuation-passing style. The resulting intermediate language has been called nqCPS1 , A-normal form [5], monadic normal form [9], etc. These languages are still direct style languages, but have the following special features 1

This term has been coined by Peter Lee [14] but it has never appeared in a published paper.

Formalizing Resource Allocation in a Compiler

191

1. the evaluation order (and hence the control ﬂow) is made explicit; 2. all intermediate results are named; 3. the structure of an expression makes the places obvious where serious computations are performed (i.e., where a continuation is required in the implementation). Another related line of work is boxing analysis (e.g., [10, 19, 22, 18]). Here the idea is to try to avoid using the ineﬃcient boxed representation of values in polymorphic languages. We believe that our system is powerful enough so that (an polymorphic extension of) it can also express the necessary properties. Representation analysis [4] which is used in conjunction with region inference has one phase (their section 5 “Unboxed Values”) whose concerns overlap with the goals of our system. Otherwise, their system is concerned with ﬁnding the number of times a value is put into a region, the storage mode of a value which determines whether a previous version may be overwritten, or the physical size of a region. Typed assembly language [15] is an approach to propagate polymorphic type information throughout all phases of compilation. This work deﬁnes a fully typed translation from the source language (a subset of core ML) down to assembly language in four stages: CPS conversion, closure conversion, allocation, and code generation. As it is documented, the allocation phase takes the conventional fully boxed approach to allocating closures and tuples. It remains to investigate whether the allocation phase operates on the right level of abstraction to take the decisions that we are interested in controlling with our approach.

6

Conclusion

We have investigated an approach to specify decisions taken in the code generation phase of a compiler using a non-standard type system. The system builds on simple typing and uses a restricted version of A-normal form as its source language. We have deﬁned a translation that maps the source language into abstract assembly language and have veriﬁed some properties of the translation. One goal of the approach is the veriﬁcation and speciﬁcation of code generators by adding constraints to our system that make the typing judgements deterministic. In the course of the work on this system, we have learned a number of lessons on intermediate languages for compilers that want to apply advanced optimization techniques (like unboxing, lightweight closure conversion, and so on): It is essential to start from an intermediate language that clearly distinguishes serious computations (that need a continuation) from trivial ones (that yield values directly). Otherwise, rules like the (spill) rule could not be applied immediately before generating the machine code for the actual function call. It is also essential that A-normal form sequentializes the computation. For unrestricted direct-style expressions the control ﬂow is suﬃciently diﬀerent from the propagation of information in typing judgements to make such a formulation awkward, at best. For example, it would be necessary to thread the typing assumptions through the derivation according to the evaluation order.

192

Peter Thiemann

Also, in an unrestricted direct-style term it is sometimes necessary to perform a resource operation (for example, spilling or representation conversion) on return from the evaluation of an expression. This leads to a duplication of typing rules, one set determining the operations on control transfer into the expression and another set for the operations on return from the expression. In A-normal form, returning from one expression is always entering the next expression, so there is no restriction in having only the ﬁrst set of these rules. On the negative side, we found that A-normal form does not supply suﬃcient information when it comes to naming continuations. In contrast to continuationpassing style, our translation has to invent continuation variables in order to make the continuation visible to the resource allocation machinery. It would be interesting to investigate a similar system of implementation types for an intermediate language in continuation-passing style and to establish a formal connection between the two. Another point in favor of continuation-passing style is the allocation of continuation closures. Presently this allocation clutters the rule for function application (let-app). A system based on continuation-passing style might be able to decompose the diﬀerent tasks present in (let-app) into several rules. Finally, in places the rules are rather unwieldy so that it is debatable whether the intermediate language that we have chosen is the right level of abstraction to take the decisions that we are interested in. Imposing the system, for example, on the allocation phase of a system like that of Morrisett et al. [15] might in the end lead to a simpler system and one where the interesting properties can be proved more easily. Beyond the work reported in this paper, we have already extended the framework to conditionals, and to sum and product types. The current formulation does not allow for unboxed function closures. This drawback can be addressed at the price of a plethora of additional rules. Future work will address the incorporation of polymorphic types and investigate the possibilities of integrating our work with typed assembly language. Acknowledgment. Many thanks to the reviewers. Their detailed comments helped to clean up the presentation substantially.

References 1. Alfred V. Aho, Ravi Sethi, and Jeﬀrey D. Ullman. Compilers Principles, Techniques, and Tools. Addison-Wesley, 1986. 178 2. Andrew W. Appel. Compiling with Continuations. Cambridge University Press, 1992. 179, 190 3. Andrew W. Appel and Trevor Jim. Continuation-passing, closure-passing style. In POPL1989 [16], pages 293–302. 190 4. Lars Birkedal, Mads Tofte, and Magnus Vejlstrup. From region inference to von Neumann machines via region representation inference. In Proc. 23rd Annual ACM Symposium on Principles of Programming Languages, pages 171–183, St. Petersburg, Fla., January 1996. ACM Press. 191

Formalizing Resource Allocation in a Compiler

193

5. Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias Felleisen. The essence of compiling with continuations. In Proc. of the ACM SIGPLAN ’93 Conference on Programming Language Design and Implementation, pages 237–247, Albuquerque, New Mexico, June 1993. 179, 180, 190 6. Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: Design and Implementation. Benjamin/Cummings, 1995. 178 7. David K. Giﬀord and John M. Lucassen. Integrating functional and imperative programming. In Proceedings of the 1986 ACM conference on Lisp and Functional Programming, pages 28–38, 1986. 181, 183 8. Jean-Yves Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. 184 9. John Hatcliﬀ and Olivier Danvy. A generic account of continuation-passing styles. In POPL1994 [17], pages 458–471. 190 10. Fritz Henglein and Jesper Jørgensen. Formally optimal boxing. In POPL1994 [17], pages 213–226. 191 11. Pierre Jouvelot and David K. Giﬀord. Algebraic reconstruction of types and eﬀects. In Proc. 18th Annual ACM Symposium on Principles of Programming Languages, pages 303–310, Orlando, Florida, January 1991. ACM Press. 181, 183 12. Richard Kelsey and Paul Hudak. Realistic compilation by program transformation. In POPL1989 [16], pages 281–292. 190 13. D. Kranz, R. Kelsey, J. Rees, P. Hudak, J. Philbin, and N. Adams. ORBIT: An optimizing compiler for Scheme. SIGPLAN Notices, 21(7):219–233, July 1986. Proc. Sigplan ’86 Symp. on Compiler Construction. 190 14. Peter Lee. The origin of nqCPS. Email message, March 1998. 190 15. Greg Morrisett, David Walker, Karl Crary, and Neal Glew. From system F to typed assembly language. In Luca Cardelli, editor, Proc. 25th Annual ACM Symposium on Principles of Programming Languages, San Diego, CA, USA, January 1998. ACM Press. 191, 192 16. 16th Annual ACM Symposium on Principles of Programming Languages, Austin, Texas, January 1989. ACM Press. 192, 193 17. Proc. 21st Annual ACM Symposium on Principles of Programming Languages, Portland, OG, January 1994. ACM Press. 193, 193 18. Zhong Shao. Flexible representation analysis. In Mads Tofte, editor, Proc. International Conference on Functional Programming 1997, pages 85–98, Amsterdam, The Netherlands, June 1997. ACM Press, New York. 191 19. Zhong Shao and Andrew W. Appel. A type-based compiler for Standard ML. In Proc. of the ACM SIGPLAN ’95 Conference on Programming Language Design and Implementation, La Jolla, CA, USA, June 1995. ACM Press. 191 20. Paul Steckler and Mitchell Wand. Lightweight closure conversion. ACM Transactions on Programming Languages and Systems, 19(1):48–86, January 1997. 179 21. Guy L. Steele. Rabbit: a compiler for Scheme. Technical Report AI-TR-474, MIT, Cambridge, MA, 1978. 190 22. Peter Thiemann. Polymorphic typing and unboxed values revisited. In Simon Peyton Jones, editor, Proc. Functional Programming Languages and Computer Architecture 1995, pages 24–35, La Jolla, CA, June 1995. ACM Press, New York. 191 23. Philip Wadler. Is there a use for linear logic? In Paul Hudak and Neil D. Jones, editors, Proc. ACM SIGPLAN Symposium on Partial Evaluation and SemanticsBased Program Manipulation PEPM ’91, pages 255–273, New Haven, CT, June 1991. ACM. SIGPLAN Notices 26(9). 184

An Approach to Improve Locality Using Sandwich Types Daniela Genius, Martin Trapp, and Wolf Zimmermann Institut f¨ ur Programmstrukturen und Datenorganisation University of Karlsruhe 76128 Karlsruhe, Germany {genius|trapp|zimmer}@ipd.info.uni-karlsruhe.de

Abstract. We show how to increase locality of object-oriented programs using several heaps. We introduce the notion of sandwich types which allow a coarser view on objects. Our idea for increasing locality is to use one heap per object of sandwich types. Performance measurements demonstrate that the running time is improved by upto a factor 5 using this strategy. The paper shows how to derive sandwich types from classes. Thus, it is possible to control the allocation of the different heaps using compile-time information.

1

Introduction

In object-oriented programs, the notion of an object is rather fine-grained. The objects are usually allocated on a heap and the size of these objects is small. Thus, a single heap may destroy locality. Improving locality may improve execution time due to caching and paging effects. Often, a coarser view on objects is possible. For example, a list may be considered as a collection of small objects linked in an adequate way, but it may also be considered as one object. We introduce the notion of sandwich types in order to characterize this situation. Our goal is to maintain objects of sandwich types (called sandwich objects) in one single heap (i.e. consecutive fragment of memory) in order to increase locality. We maintain these heaps for a sandwich object by the doubling strategy well-known from the theory of algorithms and data structures (see e.g. [3]). This work was initiated by observations during experiments where lists, trees, sets etc. were implemented with flexible arrays using the doubling strategy. These implementations improved the performance considerably compared to linked implementations. Increasing locality of reference by partitioning the heap is a well known technique. The language Euclid [7] introduced special collections which can be viewed as independent heaps. Dynamically allocated data structures could be assigned to a single collection. The same idea is exploited by the Gnu obstack structure [4]. This package gives the programmer control over an arbitrary number of heaps that require stack discipline for allocation and deallocation. However, the responsibility for mapping objects to heaps remains totally with the programmer, both in using Euclid as well as obstacks. X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 194–214, 1998. c Springer-Verlag Berlin Heidelberg 1998

An Approach to Improve Locality Using Sandwich Types

195

Approaches in automatically finding such mappings have been developed in the context of Smalltalk. The Object-Oriented Zoned Environment (OOZE) locates all instances of a type in one contiguous interval of virtual addresses [8]. This increases locality of reference for objects of the same type, but is unable to deal with structures built from objects of various types. Stamos [12] presents also additional algorithms for grouping related objects with the intention of increasing locality of reference. This technique requires complete knowledge of the dynamic object graph and are used for restructuring the memory image during garbage collection. Thus, the mapping cannot be found at compile time. In [6], Hayes suggested the use of key objects as representatives for clusters. Death of a key object triggers garbage collection of the structure it represents. Again, key object candidates and the clusters they represent are identified during collection at runtime and can not be statically determined. To the authors’ knowledge, there is no work on automatic a priori mapping of dynamically allocated objects to multiple heaps for sequential object-oriented programs. Section 2 introduces the notion of sandwich types and gives some examples. Section 3 shows the performance improvements obtained by object heaps. Finally, Section 4 shows a conservative analysis for identifying sandwich types. Section 5 concludes the paper. Appendix A defines syntax, static and dynamic semantics of a basic object-oriented language Bool. Every object-oriented language has at least the features of Bool.

2

Sandwich Types

The notion of sandwich types is a generalization of balloon types [2]. All objects in a balloon can be accesses only via a distinguished balloon object. However, this excludes container types such as e.g. lists or sets. These types usually contain methods to return their elements and to insert elements. Thus, these elements can be accessed from outside, destroying the balloon type property. However, there is often an internal structure which cannot be accessed from outside. This is the reason for the term sandwich object: its internal structure can be accessed only via the sandwich object, but parts of its structure are known externally. It is the sandwich object which decides what is external. Figure 1 visualizes this idea. Based on the definition of memory states in Appendix A, a state1 of a program is a triple (OBJ , REF , ROOTS ) where OBJ is a set of objects, STATE = a (OBJ , REF ) is a directed graph where o1 → o2 ∈ REF iff there is an attribute a of object o1 that refers to object o2 , and ROOTS ⊆ OBJ is the set of objects refered by the environment env . In particular, obj ∈ ROOTS iff there is a frame f ∈ env such that obj = f ↓2 or there is a variable x such that (x, obj ) ∈ f ↓1 . PRED obj and outdeg obj denote the direct predecessors and the number of ∗ outgoing edges of obj in the graph STATE, respectively. obj 1 → obj 2 denotes2 1

2

We speak of states instead of memory states because the instruction pointer and current method plays no role in our discussion. + → denotes that there is at least one edge in the path.

196

Daniela Genius, Martin Trapp, and Wolf Zimmermann

upper slice

lower slice

Fig. 1. A Sandwich

that there is a path from obj 1 to obj 2 in (OBJ , REF ). An object o is reachable ∗ iff there is an object o0 ∈ ROOTS such that o0 → o. Otherwise it is unreachable. [10,11] defines an operational semantics based on this definition of states. Suppose that there is a state transition such that objects become unreachable in a state. Then, they will be unreachable forever. Therefore, we can assume w.l.o.g. that unreachable objects are removed, i.e. no state contains unreachable objects. The paper does not require any further knowledge of state transitions. Definition 1 (Sandwich Objects and Types). Let A be a class whose attributes are all private. Let s = (OBJ , REF , ROOTS) be a state and x be an object of class A. The set INTERNAL(s) x of objects internal to x is the smallest set satisfying +

(s) INTERNAL(s) x = {y : x → y ∧ PRED y ⊆ INTERNALx ∪ {x}}

x is a sandwich object or upper slice iff INTERNAL(s) x 6= ∅ or outdeg x = 0. ) are external to x. If outdeg x 6= 0, All objects z ∈ OBJ \ ({x} ∪ INTERNAL(s) x the lower slice of x is the set of all external objects that have a predecessor y ∈ INTERNAL(s) x . A sandwich is a sandwich object x together with the set of its internal objects. A is a sandwich type iff for all states s every object x of type A in s is a sandwich object. Remark 1. If there is no lower slice, a sandwich object is also a balloon object according to [2]. Observe that the attributes of internal objects need not be private. Example 1. Consider the following implementation of a doubly-linked list (next and previous are used to navigate in the list): class LIST (T ) is private head : LIST CELL(T ); private end : LIST CELL(T ); private current : LIST CELL(T );

An Approach to Improve Locality Using Sandwich Types

197

previous() is · · · end; next () is · · · end; insert(T ) is · · · end; delete() is · · · end; elem() : T is · · · end; is empty() : BOOL is · · · end; at head () : BOOL is · · · end; at end () : BOOL is · · · end; end class LIST CELL(T ) is elem : T ; previous : LIST CELL(T ); next : LIST CELL(T ); end For every type T , LIST (T ) is a sandwich type. Let x : LIST (T ). Then x is a sandwich object. Examples of internal objects are the head and the end of x. The elements of the list are the lower slice of the sandwich. For every type T , LIST (T ) is a sandwich type. Container types are typical examples of sandwich types. Internal objects can be reached only via the upper slice, i.e. Lemma 1. Let s = (OBJ , REF , ROOTS) be a state and x ∈ OBJ a sandwich iff for every object of type A. Then for every y ∈ OBJ : y ∈ INTERNAL(s) x ∗ z ∈ OBJ satisfying z → y one of the following conditions hold: (i) z ∈ INTERNAL(s) x . (ii) Each path from z to y contains x. Proof. “⇒”: Suppose this would not be the case, i.e. there is a z 6∈ INTERNAL(s) x and a path π from z to y not containing x. Since y ∈ INTERNAL(s) x there must (s) be u, v ∈ π such that u 6∈ INTERNAL(s) x , v ∈ INTERNALx , and u → v ∈ REF . The definition of INTERNAL(s) x implies that u = x contradicting our assumption that π does not contain x. + 6 y or “⇐”: Suppose (i) and (ii) holds, but y 6∈ INTERNAL(s) x . Then either, x → there is a z ∈ PRED y such that z 6∈ INTERNAL(s) x ∪ {x}. The latter contradicts + + (ii). Thus x 6→ y. Consider a z ∈ OBJ such that z → y. Then (i) must hold + + + 6 y. Thus, since (ii) is excluded by x 6→ y. (i) implies x → z, contradicting x → there is no path to y, i.e. y cannot be reached. This is what we excluded from the definition of states. The next Lemma states that sandwiches are either disjoint or nested (i.e. non-overlapping): Lemma 2. Let s = (OBJ , REF , ROOTS) be a state, x ∈ OBJ be any sandwich object of type A and y ∈ OBJ be a sandwich object of type B. If y is internal to x, then every object of the lower slice of x is external to y.

198

Daniela Genius, Martin Trapp, and Wolf Zimmermann

Proof. Suppose there is a state s such that y ∈ INTERNAL(s) x and there is an (s) object u ∈ INTERNALy of the lower slice of x. Figure 2 visualizes this situation. Since u is in the lower slice of x, it is external to x, i.e., there is a path from an object w external to x which does not contain x. Since u ∈ INTERNAL(s) y , by Lemma 1, every path from an object external to y must contain u. Thus, w must be internal to y.

w

x

upper slice of x

+ y +

z

+

upper slice of y

u

lower slice of x

+

+ v

lower slice of y

Fig. 2. Contrary of Lemma 2

Let v be an arbitrary object external to x. Since y is internal in x, every path from v to y must contain x (by Lemma 1). Since w is internal to y, every path from x to w must contain y (by Lemma 1). Hence, every path from v to w must contain x. This contradicts the fact that w is external to x. Example 2. Consider the class HASHTAB(T ) with collision resolution by chain lists: class HASHTAB(T ) is private tab : ARR[n](LIST (T )); insert(x : T ) is · · · end; delete is · · · end; member (T ) is · · · end; end Objects of class HASHTAB(T ) are sandwich objects. The collision lists in the array tab are also sandwich objects. Each lower slice of a collision list is contained in the lower slice of hash tables.

3

Performance Improvement Using Object Heaps

In general, the number of objects that will be allocated in a sandwich’s heap is unknown at compile time. Thus, the heap must be able to grow (and shrink)

An Approach to Improve Locality Using Sandwich Types

199

at runtime. To achieve locality of reference for the objects in a heap, the latter must extend over a minimal number of physical memory pages. I.e. all but at most one page used must be used completely or not at all for object allocation. We guarantee this by allocating a contiguous3 area of virtual memory for a heap and doubling the size whenever the heap would overflow. Since it is not possible to grow a heap in place, we copy its data to a new memory area. This works in amortized constant time, see [3]. Note that all references pointing at objects in a sandwich’s heap come from inside that heap or from its upper slice. We exploit this fact by using a compacting copy garbage collector [9] to move the objects from the old memory area to the new one. The root set for the collector is the singleton set containing just the sandwich object. The important advantage is that the heap of a sandwich can be copy collected independently of all external objects. If after garbage collection less than a quarter of the heap is occupied by allocated objects, we halve its size. Whenever a sandwich object is garbage collected, its heap can be deleted at once. For our measurements we use the small list example. The test program creates first two empty lists. Then, we alternately insert single elements in both lists until they both contain MAX elements. Afterwards, the test program iterates through each list separately ITER times. Figure 3 shows the runtime of this iterated list traversals depending on the length of the lists. We have chosen the values of of MAX and ITER so that their product is constant (107 ). In the first part of the plot, runtime decreases with the length of the list because the iteration overhead becomes less significant. Both axes of the plot are logarithmically scaled. The x-axis shows the number of elements in the list, while the y-axis shows the total runtime in seconds. The values are measured on a 200 MHz i586 Linux system. Each program is run 10 times and the smallest elapsed time is shown in Figure 3. The curve labeled single shows run times for the usual implementation (i.e. one heap for all objects created by the program). The multi curve denotes the result of our method: As noted above, LIST(T) is a sandwich type. Both list objects are sandwich objects by themselves. Thus, they have their own heaps for their internal LIST CELL(T) objects. There is a third heap for all other objects created by the program. As long as all list elements fit into the data cache, both variants have approximately the same running time (< 0.3 seconds). If the lists contain more than approximately 100 elements, cache misses occur in the single heap implementation. With the multi heap implementation this effect is postponed to lists of approximately 500 elements because of the better locality. For 500 elements, the multi heap variant is almost 5 times faster than the single heap variant. Also after the sharp increase in run time of the multi heap variant due to cache misses and page faults the multi heap variant still clearly outperforms the single 3

If it is guaranteed that the sizes of all internal objects evenly divide the physical page size, it would not matter whether the pages are contiguous in virtual memory.

200

Daniela Genius, Martin Trapp, and Wolf Zimmermann 10

Runtime in seconds

multi single

1

0.1 100

1000

10000 Number of elements in a list

100000

1e+06

Fig. 3. Measurements: Lists

heap version. Less memory pages have to be accessed to traverse the lists. For very long lists results for classical allocation stay below 3 seconds, while using multiple heaps never greatly exceed 2 seconds. An overall improvement by our method of about 25 per cent emerges in this case. In general the improvement is even larger: For lists of around 10000 elements the multi heap variant is twice as fast. Although the test program on lists is artificial, it reflects a situation which occurs in practice. Consider for example hash-tables with collision resolution by chaining. Each insertion or look-up in the hash-table traverses a collision list. Furthermore, it is unlikely that the collision lists are stored in contiguous memory cells. This is precisely the situation covered by our experiment. Figure 4 demonstrates this argument. The implementation with collision lists reuses the single and multiple heap implementations for LIST. A simple modulo hashing equally distributes elements over the lists modulo table size. Once the hash value is computed, searching for a key that is not present in the hash table means that one entire collision list has to be traversed once the hash value is computed. The collision lists are treated analogously to the LIST example. MAX elements are distributed over a hash table. Again, MAX and ITER are chosen such that their product is 107 . Figure 4 shows for hash table sizes 23 and 501 that the relative behavior is similar to the list example. The sharp increase in run time due to cache misses occurs later as the hash table size increases, however the single heap variant is always outperformed.

An Approach to Improve Locality Using Sandwich Types

201

100

Runtime in seconds

multi hashtable size 23 single hashtable size 23 multi hashtable size 501 single hashtable size 501

10

1 100

1000

10000 Number of elements

100000

1e+06

Fig. 4. Measurements: Hashtables

4

Recognition of Sandwich Types

This section shows how sandwich types can be recognized in a program. We first derive a sufficient condition for sandwich types. This sufficient condition abstracts from the state by considering the types of the objects. In particular, we consider the type graph of an object-oriented program, i.e. a graph TG = (CLASSES , USE ) where CLASSES are the classes of a program and (A, B) ∈ USE iff A has an attribute of type B or has a method with parameter type B ∗ or return type B. ⇒ denotes the reachability relation in type graphs. A class A + is recursive iff A ⇒ A. Now, we lift Definition 1 to types. Definition 2. Let TG = (CLASSES , USE ) and A ∈ CLASSES be a nonrecursive class whose attributes are private. Classes which are parameter types or return types of methods of A are called the accessible types of A. ACCESSIBLE A denotes the set of accessible types of class A. The set INTERNALA of classes that are internal to A is the smallest set satisfying +

INTERNALA = {B : A ⇒ B ∧ B 6∈ ACCESSIBLE A ∪ {A}∧ PRED B ⊆ INTERNALA ∪ {A}} All classes B ∈ CLASSES \ INTERNALA are external to A. The following lemma relates type graphs and states. Lemma 3. Let TG = (CLASSES , USE ) be a type graph of a program and s = ∗ ∗ (OBJ , REF , ROOTS) an arbitrary state. Then x → y implies type(x) ⇒ type(y) for every two objects x, y ∈ OBJ .

202

Daniela Genius, Martin Trapp, and Wolf Zimmermann

Proof. From Corollary 1 follows that x → y ∈ REF implies (type(x), type(y)) ∈ USE . The claim follows by induction. A class B internal to a class A can be reached only via A: Lemma 4. Let TG = (CLASSES , USE) be a type graph of a program and A be a non-recursive class whose attributes are private, and INTERNALA 6= ∅. Then, ∗ for any class C ∈ INTERNALA and every class B such that B ⇒ C, one of the following properties hold (i) B ∈ INTERNALA (ii) B is external to A and every path from B to C contains A. Proof. Analogous to Lemma 1. Every object y whose type is internal to a class A is either reachable from an object z on the stack whose type is internal to A or an object internal to a sandwich object of class A: Lemma 5. Let TG = (CLASSES , USE) be the type graph of a program and A be a non-recursive class whose attributes are private and INTERNALA 6= ∅. For every state s = (OBJ , REF , ROOTS) and for every object y with type(y) ∈ INTERNALA there is an object x of type A such that y ∈ INTERNAL(s) x or an ∗ object z ∈ ROOTS of type internal to A such that z → y. Proof. Suppose there is a C ∈ INTERNALA and an object y of type C such that it is not internal to an object x of type A. Suppose there is an object z with a path π from z to y where type(z) 6∈ INTERNAL(A) and the type of every u ∈ π is different from A. By Lemma 3 there is path π 0 from type(z) to type(y) not containing A. This contradicts Lemma 4(ii). Thus, type(z) ∈ INTERNALA ∗ for all objects z such that z → y. Since A 6∈ INTERNALA , by Lemma 3 there is no object of type which is external to A that can reach z or x. Since s does not contain unreachable objects, there must be an object w ∈ ROOTS of type ∗ internal to A such that w → y. There is a nesting property analogous to nesting of sandwiches (cf. Lemma 2): Lemma 6. Let TG = (CLASSES , USE) be a type graph of a program, A be a non-recursive class whose attributes are private, and INTERNALA 6= ∅, and B be a non-recursive class whose attributes are private and INTERNALB 6= ∅. If there is a C ∈ ACCESSIBLE B ∩ INTERNALA , then ACCESSIBLE A ∩ INTERNALB = ∅. Proof. Analogous to Lemma 2. If a class B is recursive and reachable from a class A, all classes in the strongly connected component of TG are either external or internal to A.

An Approach to Improve Locality Using Sandwich Types

203

Lemma 7. Let TG = (CLASSES , USE ) be the type graph of a program, A be a non-recursive class where all attributes are private and INTERNALA 6= ∅, and + B be a type reachable from A, i.e. A ⇒ B. If B is recursive, then either all C ∈ B are external to A or all C ∈ B are internal to A where B is the strongly connected component of TG containing B. Proof. Suppose there is a class C ∈ B internal to A and a class C 0 ∈ B external to A. Since B is a strongly connected component and A is non-recursive, there is a path from C 0 to C in TG not containing A. Then, by Lemma 4, A cannot be a non-recursive class satisfying INTERNALA 6= ∅ whose attributes are private, i.e. the assumptions of Lemma 7 are violated. Finally, we show that all objects of classes which have internal classes are sandwich objects: Theorem 1. Let TG = (CLASSES , USE ) and A be a non-recursive class with accessible types whose attributes are private with unaccessible types. Then, for every state s = (OBJ , REF , ROOTS), all objects x ∈ OBJ of type A are either sandwich objects or none of the attributes of x refer to objects. Proof. Suppose that there is a state s = (OBJ , REF , ROOTS) which contains an x ∈ OBJ of type A that is not a sandwich object and one of the attributes refers to an object y. Since the type of this attribute is not accessible this attribute refers to an object y with type(y) ∈ INTERNALA . Since x is not a sandwich type, there must be an object w with a path π from w to y not containing x. By Lemma 5 there is a sandwich object z of type A in path π such that y is internal to z. But then there cannot be a reference to y from x contradicting our assumption. Algorithm sandwich types (defined below) identifies the types satisfying the sufficient condition of Theorem 1. The algorithm computes these types by maintaining a set of candidates. The invariant is that all classes which are not candidates do not satisfy the sufficient condition of Theorem 1. After the last step, every candidate satisfies the sufficient condition of Theorem 1, i.e. they are sandwich types. Furthermore, for every sandwich type A, the set of its internal types is computed. Objects of these types are allocated in the heap associated with sandwich objects of type A. The algorithm sandwich types performs the following steps: 1. Compute the the type graph TG = (CLASSES , USE ) of π. Define the set Candidates of candidates to be the set of all classes that have only private attributes. 2. Compute the strongly connected components SCC of TG and the reduced graph, i.e. RUSE = (SCC , RE ) where (A, B) ∈ RE iff there are A ∈ A and B ∈ B such that (A, B) ∈ USE . 3. Let Candidates := {A ∈ Candidates : A ∈ A for an A ∈ SCC with |A| = 1 ∧ (A, A) 6∈ USE }.

204

Daniela Genius, Martin Trapp, and Wolf Zimmermann

4. For every A ∈ Candidates compute its accessible types ACCESSIBLE A . Remove all classes A from candidates which contain an attribute whose type is accessible. 5. For every A ∈ Candidates define and perform the following step (starting with IA = ∅) until IA does not change: IA := IA ∪ {B ∈ CLASSES : B 6∈ ACCESSIBLE A ∧ ∀C ∈ PRED B : B ∈ IA ∪ {A}}. 6. Remove every A from Candidates where IA = ∅, 7. Declare every A ∈ Candidates to be a sandwich type and define INTERNALA = IA . Remark 2. Algorithm sandwich types recognizes all examples of Section 2. The following lemmas explain algorithm sandwich types: Lemma 8 (Step 1). After Step 1, all classes A ∈ OBJ \ Candidates violate the assumption of Theorem 1. Proof. The assumption of Theorem 1 requires that all attributes of a class are private. Lemma 9 (Step 3). After Step 3, all classes A ∈ OBJ \ Candidates violate the assumption of Theorem 1. Proof. Suppose (A, A) ∈ USE or there is a strongly connected component A ∈ + SCC such that A ∈ A. In both cases A ⇒ A, i.e. A is recursive. Hence, the assumptions of Theorem 1 are violated. Lemma 10 (Step 4). After Step 4, all classes A ∈ OBJ \ Candidates violate the assumption of Theorem 1. Proof. Let A be a class not in Candidates after Step 4. Suppose, it is not eliminated by Steps 1 and 3. Then, it is eliminated by Step 4, i.e. it contains an attribute whose type is accessible. Thus, the assumption of Theorem 1 is violated. Lemma 11 (Step 5). After Step 5, for all classes A ∈ Candidates, IA contains the set of all classes internal to A. Proof. Step 5 is a closure algorithm in the lattice of sets (ordered by the subset relation) starting with the smallest element. Each step increases the set. Thus, by the fix-point theorem of Tarski [13], the smallest set satisfying IA = IA ∪ {B ∈ CLASSES : B 6∈ ACCESSIBLE A ∧ ∀C ∈ PRED B : B ∈ IA ∪ {A}}

An Approach to Improve Locality Using Sandwich Types

205

is computed. It is not hard to see that IA is also the smallest set satisfying +

IA = {B ∈ CLASSES : A ⇒ B ∧ B 6∈ ACCESSIBLE A ∧∀C ∈ PRED B : B ∈ IA ∪ {A}}. Thus, the claim follows by Definition 2. Lemma 12 (Step 6). After Step 6, for every class A ∈ CLASSES : A ∈ Candidates iff A is non-recursive and contains only private attributes with types internal to A. Proof. Before Step 6, A ∈ Candidates iff A is non-recursive and contains only private attributes whose types are not accessible (Lemmas 8, 9, and 10). Thus, after Step 6, A ∈ Candidates iff A is non-recursive, contains only private attributes, and IA . The claim follows from Lemma 11 since it implies INTERNALA = IA . Theorem 2 (Correctness of Algorithm sandwich type). Let π be a program. Every class A declared by Algorithm sandwich type to be a sandwich type is a sandwich type and INTERNALA is the set of its internal classes. Proof. Follows directly from Step 7, Lemmas 11 and 12, and Theorem 1 It remains to prove the time complexity of Algorithm sandwich type: Theorem 3. Algorithm sandwich type terminates for every program π in time O(m · n) where n is the size of program π (i.e. number of nodes in the abstract syntax tree of π) and m is the number of classes in the program. Proof. Obviously, the type graph TG can be constructed in time O(n) by a traversal through the abstract syntax tree of π. Hence, it is |USE | = O(n). Step 2 can be performed in O(|CLASSES | + |USE |) = O(m + n) (see e.g. [1, Section 6.7]). While computing the strongly connected components of TG it is possible to mark the classes A such that (A, A) 6∈ USE and {A} ∈ SCC . Thus, Step 3 can be executed in time O(|Candidates|) = O(m). The accessible types of a class A can be computed by a traversal through the abstract syntax tree of class A. Thus, Step 4 can be executed in time O(n). If the sets IA are implemented by Bit vectors over the classes, it is sufficient to set the Bit for the classes B to true in one iteration iff B 6∈ ACCESSIBLE A ∧ ∀C ∈ PRED B : C ∈ IA . The initialization costs time O(m) and the test costs time O(|USE |) amortized over all classes. The maximum number of iterations is O(m). Hence, Step 5 can be executed in time O(m · n). The implementation of Step 5 can be extended with additional O(m) execution time such that every class A with IA = ∅ is marked. Thus, the execution time of Step 5 remains O(m · n) and the execution time of Step 6 is O(m). It is not hard to see that the execution time of Step 7 is O(m2 ) = O(m · n), because m ≤ n.

206

5

Daniela Genius, Martin Trapp, and Wolf Zimmermann

Conclusions

We introduced the notions of sandwich types and sandwich object and showed that using object heaps (i.e. one heap per sandwich object) can improve the execution time of object-oriented programs. Theorem 1 gives a sufficient condition for sandwich types. This is used to recognize sandwich types in object-oriented programs. Upon creation of a sandwich object, its heap is created and maintained independently of other heaps. Further work focuses on extending the assumptions in Theorem 1. In particular, the requirement that all attributes are private may be relaxed by defining the type of non-private attributes to be accessible. Another candidate for generalization is the notion of internal classes: Lemma 4 implies that every class can be internal to at most one class except a nesting property is satisfied (Lemma 6). This excludes, e.g. that the type LIST CELL in our example is used for more than one sandwich type. Hence, the next step is to relax Definition 2 allowing for some classes to be internal to more than one class without nesting. If it would be allowed in general that a class B is internal to more than one class, the condition INTERNALA 6= ∅ would not be sufficient to imply that A is a sandwich type: it is not excluded that an object is internal to two different sandwich objects (cf. Figure 5). The key question is: What is the restriction such that INTERNALA 6= ∅ implies that A is a sandwich type? Our further work will address this question.

x:A

x:C +

sandwich objects

+ z:B

Fig. 5. Situation if a class B is internal to different classes

An alias and pointer analysis may lead to additional improvements for the implementation of object heaps. For example, stacks, heaps, and lists may be implemented even more efficiently than sketched in this paper. For example, there is no necessity to have a general garbage collection on stacks, queues, and double-ended queues. Since the elements are inserted and deleted at their ends, it is is sufficient to maintain pointers that mark the beginning and the end of the allocated part of the heap. When copying a heap of a list object, it can be linearized. This leads to an additional improvement of locality. Another issue that should be investigated the influence of sandwich types in object-oriented design. It seems natural to use sandwich types when designing object-oriented programs, because it is a way of information hiding. Furthermore, aliasing and sharing of objects can be controlled.

An Approach to Improve Locality Using Sandwich Types

A

207

Bool – A Basic Object-Oriented Language

We define a language which is a prototype of intermediate languages of many object-oriented languages. It is based on [10,11]. We do not consider inheritance and basic types such as integers or booleans since these notions are not important in the discussion of sandwich types. Instead introduction parameterized classes, we assume that in a program the parameters of every parameterized class are instantiated with types. We further focus on the basic features of object-oriented languages (calling methods, accessing attributes, creating objects) and add a few statements required for making Bool Turing-complete (assignments, conditional statements, method returns). For simplicity, we consider only methods with return types (functions). Method without return types (procedures) can be defined similarly as functions. We define the abstract syntax, the static semantics (in particular typing rules), and the dynamic semantics by abstract state machines. A.1

Abstract Syntax

A program is a collection of classes together with a designated class MAIN containing a procedure main. main is called when starting the program. Fig. 6 shows the EBNF defining the abstract syntax of Bool. Attributes of a class A may be private. Private attributes of every object obj of class A can only be accessed when executing a method of obj . T1 × · · · × Tk → T is the signature of a method m(x1 : T1 , . . . , xk : Tk ) : T . . . . The conditional statement requires further explanation. Consider the conditional statement if Des ’=’ Expr then n occuring in a method m of class A. If the condition Des ’=’ Expr is satisfied, then the n-th statement of method m of class A is executed. Otherwise, the statement after the conditional statement is executed. A.2

Static Semantics

A Bool-program must satisfy the following properties on classes, attributes, methods, variable names, and jump targets: – All class names are pairwise disjoint. – For every class, the attribute names and method names are pairwise disjoint. – For every method, the parameter names and the names of local variables are pairwise disjoint. – For every method m, the jump targets of the conditional statements of m must be smaller than the number of statements of m. Furthermore, we assume for simplicity4: – All attribute names and all methods names are different from class names. 4

This can be viewed as a result after name analysis.

208

Daniela Genius, Martin Trapp, and Wolf Zimmermann

Prog ::= Class ∗ Class ::= class Name is Attr ∗ Method ∗ end ’;’ Attr ::= [private] Var ’:’ Type ’;’ Method ::= Id ’(’ [(Par ’;’)∗Par ] ’)’ ’:’ Type is (Par ’;’)∗ (Stat ’;’)∗ end ’;’ Par ::= Var ’:’ Type Type ::= Name Stat ::= Assign|Return|If Assign ::= Des ’:=’ Expr Return ::= return Expr If ::= if Des ’=’ Expr then n Des ::= (Var ’.’)∗ Var Expr ::= Des|Call|void|New Call ::= [Des ’.’]Id ’(’ [(Des ’,’)∗ Des] ’)’ New ::= ’#’ Type where Var is any variable name, Id is any procedure identifier, Name is any class name, and n is a natural number.

Fig. 6. Abstract Syntax of Bool

– Within every class A, for every method m the names of parameters and local variables are different from the attribute names of A, method names of A, and from the class names. It remains to define types. In our case, it is sufficient to assume that classes are types (identified by class names). For defining typing rules, the following context information is required: – Γ contains all classes with their names, attributes, and method signatures. – A class A where the attribute, method, statement, designator, or expression to be typed occurs. – A method m where the statement, designator, or expression to be typed occurs. Notations: Γ, A, m ` e : T denotes that within a given√ context, it can be derived that designator or expression e is of type T . Γ, A, m ` s denotes that statement s is correctly typed within a given context. A program is statically correct iff every statement is correctly typed within the context it occurs. Γ, A ` x : T denotes that class A has an attribute x of type T or method of signature T . Fig. 7 shows the typing rules of Bool using the above notations.

An Approach to Improve Locality Using Sandwich Types

209

Axioms : Γ, A, m ` void : T

for all types T

Γ, A, m ` #T : T Rules : Γ, A ` x : T for all methods m of A Γ, A, m ` x : T Γ, A, m ` des : B Γ, B ` x : C if x is not private in B Γ, A, m ` des.x : C Γ, A ` m : T1 × · · · × Tk → T Γ, A, m ¯ ` d1 : T1 · · · Γ, A, m ¯ ` dk : Tk Γ, A, m ¯ ` m(d1 , . . . , dk ) : T Γ, B ` m : T1 × · · · × Tk → T Γ, A, m ¯ ` des : B Γ, A, m ¯ ` d1 : T1 · · · Γ, A, m ¯ ` dk : Tk Γ, A, m ¯ ` des.m(d1 , . . . , dk ) : T Γ, A, m ` des : T Γ, A, m ` expr : T √ Γ, A, m ` des:=expr Γ, A ` m : T1 × · · · × Tk → T Γ, A, m ` expr : T √ Γ, A, m ` return expr Γ, A, m ` des : T Γ, A, m ` expr : T √ Γ, A, m ` if des=exr then n

Fig. 7. Typing Rules for Bool

A.3

Abstract State Machines

We define the operational semantics by abstract state machines (ASMs). In this subsection we introduce ASMs as it is required for the operational semantics of Bool. For the generalization, we refer the reader to [5]. An ASM consists of a signature ∆ of the state space, an interpretation of ∆ (the initial state), and set of transition rules (used for changing the interpretation of ∆). A state is an interpretation of ∆. It is convenient to assume that interpretations are algebras. In our examples, a transition rule has the form if Condition then Updates where Condition is a term and Updates is a set of updates. An update has one of the following forms: 1. f(t1 , . . . , tn ) := t for f ∈ ∆ and terms t, t1 , . . . , tn . 2. extend M by o Updates end where M ∈ ∆ is interpreted by sets, o is a new symbol, and Updates is a set of updates (here: only of form (1)) Execution of update (1) means that t1 , . . . , tn , and t are interpreted in the current state, and the interpretation of f is changed to t at point (t1 , . . . , tn ) and

210

Daniela Genius, Martin Trapp, and Wolf Zimmermann

unchanged otherwise. Execution of update (2) means that the set M is extended by a new element5 o, and after this, the updates in Updates are executed. A transition rule fires if its condition evaluates to true in the current state. In this case, its updates are executed. In our example, there is at most one transition rule that fires in a given state. A state is final iff no transition rule fires. It is easy to see that the transition rules define a state transition relation. Notation: Upper case letters denote sets, lower case letters other elements of ∆. Sets are denoted as usual. X1 × · · · × Xn denotes the Cartesian product of sets X1 , . . . , Xn . Tuples are denoted as usual, x ↓i denotes the projection to the i-th component of tuple x. X → Y denotes the set of relations R ⊂ X × Y which are functions. R(x) denotes the unique element y such that (x, y) ∈ R. X ∗ denotes lists with element type X. [x|X] denotes the list obtained from list X by adding element x, [] denotes the empty list, and [x1 , . . . , xk ] denotes the list of elements x1 , . . . , xk . As usual hd and tl denote the head and tail of a list, respectively.

A.4

Operational Semantics

We assume that only static correct programs are executed. The state space of Bool consists of the symbols defined in Table 1. A frame consists of a set of bindings of local variables and parameters of the method being executed, an object where the method belongs to, and return point (specified by a method and instruction pointer). The memory state space is the set M = ∆\{curmethod, ip}. A memory state is an interpretation of the memory state space M .

OBJ type ∈ OBJ → Name REF ⊆ OBJ × OBJ × Name ac ∈ OBJ env ∈ FRAME ∗ curmethod ∈ Id ip ∈

N

where

set of objects dynamic type of an object references between objects (via attributes) the accumulator. the environment. the method currently be executed the instruction pointer

FRAME = SET (BINDING) × OBJ × Id ×

N

BINDING = Id × (OBJ ] {void})

Table 1. State Space of Bool

5

This new element is taken from an infinite universe, called Reserve.

An Approach to Improve Locality Using Sandwich Types

211

The initial state is defined by the updates OBJ := {system} type REF ac env

:= {(system, MAIN)} := ∅ := system := [({(y1 , void), . . . , (y1 , void)}, system, undef , undef )]

curmethod := main ip := 0 where main contains the local variables y1 , . . . , yk . For the definition of the operational semantics, it is convenient to assume that assignments, conditional statements, and return statements are decomposed into the statement sequences shown in Table 2. load expr loads the object computed by expr into the accumulator, store des stores the object contained in the accumulator to the object designated by des, and beq des n compares the object in the accumulator with the object designated by des and jumps to the n-th statement of the current procedure. The return instruction returns the object contained in the accumulator. statement instructions des:=expr load expr; store des if des=expr then n load expr; beq des n return e load e; return

Table 2. Decomposition of Statements into Instructions

Notations: The operational semantics uses the following abbreviations and notations. Each of these can be formally defined. Sometimes we leave the formal definition to the reader. self = hd(env ) ↓3 denotes the object where the current method is executed. Cmd is instr denotes the instruction to be executed, i.e. if ip = i and curmethod = m, then instr is the i-th instruction of method m in type(self ). binding = hd (env ) ↓1 denotes the bindings of the method currently being executed. is local (x) is true iff x is a local variable or parameter of the method currently being executed. is attribute(x) is true iff x is an attribute of x type(self ). We write o1 → o2 ∈ REF instead of (o1 , o2 , x) ∈ REF . o1 → o2 ∈ x REF denotes that there is an attribute x such that o1 → o2 ∈ REF . succ(x, obj ) denotes the object referenced by object obj via attribute x, i.e., it denotes the x unique element o such that obj → o ∈ REF if it exists (otherwise it is void). In this case ref (obj , x) denotes this reference. bind (x) denotes the object bound by binding to local variable or parameter x of the current procedure. object(des)

212

Daniela Genius, Martin Trapp, and Wolf Zimmermann

denotes the object designated by designator des, i.e.  void if des = void    succ(self , x) if is attribute(x) object(des) =  bind (x) if is local (x)    0 succ(object(des ), x) if des = des 0 .x for a designator des 0 For a method m(x1 : T1 ; . . . ; xn : Tn ) : T . . . with local variables y1 , . . . , yk , bindto(m, obj 1 , . . . , obj n ) = {(x1 , obj 1 ), . . . , (xn , obj n ), (y1 , void), . . . , (yk , void)} denotes the binding that binds obj i to xi , i = 1, . . . , n. update binding(x, o) updates the binding for x, i.e. binds o to x, i.e., update binding(x, o) = binding := binding \ {(x, bind(x))} ∪ {(x, o)} update(o, x, d) updates the object referenced by object o via attribute x to the object designated by designator d, i.e. update(o1 , x, o2) =   undefined REF := REF \ {ref (o1 , x)}  x  REF := REF \ {ref (o1 , x)} ∪ {o2 → o2 }

if o1 = void if o1 6= void and o2 = void otherwise

Figure 8 shows the transition rules for loading objects into accumulators using the above notations. Figure 9 shows the other transition rules. A transition rule is not applicable if it would access an attribute or call a method of void. The following theorem relates typing rules to the dynamic type of objects. The proof is a induction on the number of state transitions and left to the reader. Theorem 4. Let π ∈ Bool be any program, q be any state of π reachable from an initial state, A = type(self ) and m = curmethod in state q. Then, the following properties hold: (a) If Γ, A, m ` des : T , then type(object(des)) = T or object (des) = void in state q. (b) If Γ, A, m ` expr : T and the command to be executed in state q is a store-instruction (resulting from an assignment des := expr ), a return (resulting from return expr ), or a conditional statement (resulting from if des = expr then n), then type(ac) = T or ac = void in state q. This theorem implies directly the Corollary 1. Let π ∈ Bool be any program and q be any state of π reachable a from an initial state. Then, for every objects obj , obj 0 ∈ OBJ and obj → obj 0 in state q, the following properties are satisfied: (a) T = type(obj ) contains an attribute a. (b) If Γ, A ` a : T 0 , then type(obj 0 ) = T 0 .

An Approach to Improve Locality Using Sandwich Types if Cmd is load des then ac := object(des) ip := ip + 1 if Cmd is load #T then extend OBJ by o ac := obj type(obj ) := T end ip := ip + 1 if Cmd is load m(d1 , . . . , dn ) then env := [(bindto(m, object(d1 ), . . . , object(dn )), self , curmethod, ip + 1)|env] curmethod := m ip := 0 if Cmd is load des.m(d1 , . . . , dn ) ∧ object(des) 6= void then env := [(bindto(m, object(d1 ), . . . , object(dn )), object(des), curmethod, ip + 1)|env] curmethod := m ip := 0

Fig. 8. Transition Rules for loading the Accumulator if Cmd is store des ∧ object(des) 6= void then store(des, ac) ip := ip + 1 where

8 > :update(object(des ), x, ac) 0

if is local(des) if is attribute(des) if des = des 0 .x

if Cmd is return ∧ curmethod 6= main then env := tl(env) curmethod := hd(env) ↓3 ip := hd(env) ↓4 if Cmd is beq des n ∧ object(des) = ac then ip := n if Cmd is beq des n ∧ object(des) 6= ac then ip := pc + 1

Fig. 9. Other Transition Rules

213

214

Daniela Genius, Martin Trapp, and Wolf Zimmermann

References 1. A. V. Aho, J. E. Hopcroft, and J. D. Ullman. Data Structures and Algorithms. Addison-Wesley, 1983. 2. P. S. Almeida. Balloon types: Controlling sharing of state in data types. In ECOOP’ 97 – Object-Oriented Programming, volume 1241 of Lecture Notes in Computer Science, pages 32–59, 1997. 3. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw Hill, 1991. 4. Free Software Foundation. The GNU C library. URL: http://www.gnu.ai.mit.edu/software/libc/libc.html. 5. Y. Gurevich. Evolving Algebras: Lipari Guide. In E. B¨ orger, editor, Specification and Validation Methods. Oxford University Press, 1995. 6. Barry Hayes. Using Key Object Opportunism to Collect Old Objects. In Proceedings of the OOPSLA ’91 Conference on Object-oriented Programming Systems, Languages and Applications, pages 33–46, nov 1991. Published as ACM SIGPLAN Notices, volume 26, number 11. 7. J. J. Horning. A case study in language design: Euclid. In F. L. Bauer and M. Broy, editors, Proceedings of the International Summer School on Program Construction, volume 69 of LNCS, pages 125–132, Marktoberdorf, FRG, July-August 1978. Springer. 8. Daniel H. H. Ingalls. The smalltalk-76 programming system design and implementation. In Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, Tucson, Arizona, pages 9–16. ACM, January 1978. 9. Richard E. Jones. Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley, July 1996. With a chapter on Distributed Garbage Collection by R. Lins. Reprinted February 1997. 10. H. W. Schmidt and W. Zimmermann. A complexity calculus for object-oriented programs. Journal of Object-Oriented Systems, 1(2):117–147, 1994. 11. H. W. Schmidt and W. Zimmermann. Reasoning about complexity of objectoriented programs. In E.-R. Olderog, editor, Programming Concepts, Methods and Calculi, volume A–56 of IFIP Transactions, pages 553–572, 1994. 12. James W. Stamos. Static grouping of small objects to enhance performance of a paged virtual memory. ACM Transactions on Computer Systems, 2(2):155–180, May 1984. 13. A. Tarski. A lattice-theoretical fixpoint theorem and its application. Pacific J.Math., 5:285–309, 1955.

Garbage Collection via Dynamic Type Inference — A Formal Treatment — Haruo Hosoya and Akinori Yonezawa Department of Information Science, The University of Tokyo Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, Japan {haruo,yonezawa}@is.s.u-tokyo.ac.jp

Abstract. A garbage collection (GC) scheme — what we call type inference GC — that dynamically performs type inference is studied. In contrast to conventional garbage collection that can collect only unreachable objects, this scheme can collect objects that are reachable yet semantically garbage. The idea is to exploit ML-like polymorphic types that can tell whether or not each object may be used in the rest of computation. There has been some work studying algorithms of the GC scheme. However, their descriptions had some obscurity in details of their methods, and did not give any formal correctness proof that captures the details. These facts, we believe, make their descriptions still unconvincing for implementors. This paper aims to present a trustworthy specification of the GC scheme. In this specification, we first consider an underlying language that suitably reflects implementation details, on top of which we then formulate an algorithm of type inference GC, and formally prove its correctness. A significant point in our formulation is that we specify how to deal with Hindley-Milner polymorphism. Furthermore, showing our experimental results, we discuss in what cases this GC scheme is beneficial.

1

Introduction

In a program, some objects (i.e., heap-allocated values) are never accessed after a given point of execution. We call such objects semantic garbage at that point. These include not only unreachable objects with no pointers from the live part of the heap, but also some reachable objects whose values are not needed in the rest of the computation. For example, consider the following functional program: let l = [[1], [2], [3]] g = λf.(f l) in g length end Suppose that a garbage collection (GC) is invoked just before the application (f l) is evaluated. Since the elements of the list l are reachable, these elements are retained by conventional trace-based GC schemes, such as mark & sweep and copying. However, they will never be accessed from then on. We could enable conventional GCs to collect such semantic garbage by inserting assignments X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 215–239, 1998. c Springer-Verlag Berlin Heidelberg 1998

216

Haruo Hosoya and Akinori Yonezawa

that “nullify” variables or fields of data structures holding the garbage. However, clearly it would impose a heavy burden on programmers and considerably decrease readability of programs. In order to collect such reachable garbage, recent work has suggested a sophisticated GC scheme — what we call type inference GC — that exploits ML-like polymorphic type systems [5,4]. Basically, the GC scheme traces objects, using their types that indicate their structures. These types are recovered partly from static types. However, due to ML-polymorphism, some static types contain type variables whose actual types are determined dynamically. The GC scheme infers such types at GC time. Not all actual types for type variables can be recovered. Surprisingly, however, objects whose types are not recovered, i.e., remain to be type variables, can safely be collected. In the above example, the global state just before (f l) is evaluated can be represented as letrec l1 = [1] l2 = [2] l3 = [3] l = [l1 , l2 , l3 ] f = length in f l where the bindings represent the heap, and the body expression represents the remaining computation, which is only the application of f to l. In other words, the body expression represents the stack. Let the pointers f and l in the stack be given static types t1 → t2 and t1 , respectively. Suppose that the GC is triggered at this point. The GC first traces the stack, which contains two pointers f and l. Since the static type t1 of l does not tell what type of object it refers to, the GC suspends to trace l. The GC then traces f with static type t1 → t2 , and performs unification between this type and the type ∀t.(t list) → int of the object length. Since this will instantiate the static type t1 of l to be t list, the GC traces l, resulting in giving type t to the pointers l1 , l2 , and l3 to list elements. Finally, these elements are collected as semantic garbage. It is correct intuitively because the pointers l1 , l2 , and l3 can be assumed to refer to objects of any type (nil, for example) for correctly performing the rest of computation. Closely viewing the algorithm, notice that we had to remember the pointer l that was given type variable, because the type variable might be instantiated in the rest of tracing. Goldberg and Gloger uses a mechanism called “defer-list” for this purpose [5]. The GC keeps an association of addresses with types. When the GC finds pointer, it associates the pointed address with the static type given to the pointer. The GC traces an address only when it is associated with a concrete type (a type other than a type variable, such as a function type and a list type). If there are multiple pointers to an address, the GC may trace the same address more than once. When such sharing occurs, the GC unifies the types given to the pointers. For example, if the GC encounters another pointer to the list that is given type (int list) list, then it unifies t list and (int list) list. From the above presentation, we can see that type inference GC is a rather complicated scheme and raises several non-trivial questions. First, it is not ob-

Garbage Collection via Dynamic Type Inference: A Formal Treatment

217

vious that GC-time unifications always succeed, especially those for shared addresses. Second, it is subtle how to unify polymorphic types that are not trivially related (e.g., t1 → t2 and ∀t.(t list) → int). Third, we may not easily understand that the collected objects are actually garbage. Previous papers on the topic have not addressed these questions rigorously. Therefore we aim in this paper to give a trustworthy specification of type inference GC, in the sense that it specifies a GC algorithm reasonably in detail, and that it is given a robust trust by a formal proof of its correctness. For this goal, we first define an underlying language that suitably treats sharing and embeds static type information. On top of it, we construct an algorithm of type inference GC that uses information contained in the language. We then prove correctness of the algorithm. The correctness consists of termination which includes success of each GC-time unification, and soundness with respect to an operational semantics of the underlying language, which tells that collected objects are actually garbage. A significant point in our formulation is that we use Hindley-Milner polymorphism, and specify how to deal with polymorphic types in the GC. Although our framework uses a strict functional language, we believe that it can be applied to languages in other evaluation orders such as a lazy functional language. In addition, our framework can be extended to include other popular features such as tuples, variants, and ML references. It should be remarked, however, that this scheme does not work for languages with operations that may inspect types of their actual arguments. The operations include polymorphic equalities, typecase constructs, and dynamic type checks. Among many questions on practicality, we are especially interested in how much semantic garbage can be collected by the type inference GC scheme in comparison with conventional GCs. We examined it by a preliminary experiment using a prototype interpreter and a type inference GC that we implemented in Standard ML. From the results, though we cannot claim that this scheme benefits for every application, we expect that it benefits particularly for programs with a program structure that we call the phase structure. The rest of this paper is organized as follows. After defining basic notation in Section 3, we present our language in Section 4. On top of this, we describe an algorithm of the GC scheme in Section 5 with its correctness. We formally proved all theorems in this paper. In Section 6, we show results of our experiment and give a discussion on them. In Section 7, we remark on cost of this scheme. Section 8 reviews related work. In Section 9, we conclude this paper as well as touch upon future directions.

2

Underlying Idea

When discussing algorithms of GC, we often consider a notion of accessibility, with which we may give a GC algorithm in such a way that it repeatedly finds an accessible object and traces it. Particularly in traced-based GC schemes, all objects referred from a (live) object are accessible. For example, an object

218

Haruo Hosoya and Akinori Yonezawa

λx.(f l) has two free variables f and l and the objects referred by these variables are accessible. In the type inference GC scheme, we use a type-based accessibility. The above object will have the following type judgment: {f : t1 → t2 , l : t1 } ` λx.(f l) : t3 → t2 Usually, we would read this as “under the type environment {f : t1 → t2 , l : t1 }, the object λx.(f l) has type t3 → t2 .” However, the type judgment tells more interesting information: The type environment {f : t1 → t2 , l : t1 } tells how the object λx.(f l) accesses the free variables. Specifically, the object accesses the variable f at most as a function of type t1 → t2 , and does not access the variable l. In our formulation, we refer to such a pair of a type environment and a type as a typing. For a given typing for an object, if it gives to a free variable x a type other than a type variable, then we say that the object referred by x is accessible. The idea of this accessibility will be understood as follows. As in usual MLlike type systems, well-typedness property of an object is preserved by any instantiation of type variables. Moreover, the well-typedness property ensures that evaluation will never get stuck as far as any subexpression of the object is the redex of the evaluation. Therefore free variables that are assigned type variables can be assumed to refer to arbitrary objects. This intuitively means that the objects referred by the free variables will never be accessed.1

3

Basic Definitions

Let T V ar be a set of type variables ranged over by s, t, . . .. Define monotypes and polytypes as usual: (monotypes) τ ::= t | int | τ1 → τ2 (polytypes) σ ::= τ | ∀t.σ Monotypes are either type variables, integer types, or function types. The type ∀t1 . . . ∀tn .τ is abbreviated as ∀t1 . . . tn .τ or ∀t¯.τ . Define Unq(∀t¯.τ ) = τ . Let there be a set of (program) variables, ranged over by w, x, y, . . .. Addresses are a subset of variables, which are ranged over by a, b, . . . and are assumed to contain a distinct address #. The address # will be used to be temporarily assigned to a variable. We define a type environment as a set of pairs of a variable (including #) and a polytype in the form of x : σ, with no variable x occurring twice: (type environments) Γ, ∆ ::= {x1 : σ1 , . . . , xn : σn } 1

Of course, this reasoning fails when we have some operations such as typecase that may inspect types of objects.

Garbage Collection via Dynamic Type Inference: A Formal Treatment

219

We regard a type environment Γ as a finite function that maps x into σ for each x : σ ∈ Γ. A type substitution S is a substitution of monotypes for free type variables. F T V (τ ) and F T V (Γ ) denote the sets of free type variables occurring in τ and Γ , respectively. SΓ is defined by (SΓ )(x) = S(Γ (x)) for x ∈ Dom(Γ ). Pairs of type environments and monotypes, written hΓ, τi, are called typings. The monotype τ of a typing hΓ, τi is called result type. A type τ 0 is a substitution instance or simply an instance of τ via S, written τ ≺S τ 0 , iff Sτ = τ 0 . A typing hΓ, τi is less instantiated than hΓ 0 , τ 0i, written hΓ, τi ≺S hΓ 0 , τ 0 i, iff SΓ ⊆ Γ 0 and τ ≺S τ 0 . We omit the subscript S if it is not important. A monotype τ is a generic instance of a polytype σ = ∀t1 . . . tn .τ 0 , written σ ≥ τ , iff there is a type substitution S for t1 , . . . , tn such that Sτ 0 = τ . As for polytypes, we write σ ≥ σ 0 iff there is an α-variant ∀s1 . . . sm .τ 0 of σ 0 such that no si (1 ≤ i ≤ m) occurs free in σ and σ ≥ τ 0 . When Dom(Γ ) = Dom(Γ 0 ), we write Γ ≥ Γ 0 iff Γ (x) ≥ Γ 0 (x) for any x ∈ Dom(Γ ). Let f and g be finite mappings. Dom(f) and Rng(f) are the domain and the range of f, respectively. The function composition f ◦ g is a function defined by (f ◦ g)(d) = f(g(d)) for d ∈ Dom(g). When the domains of f and g are disjoint, the disjoint sum f ] g is a function defined by f(d) (d ∈ Dom(f)) (f ] g)(d) = g(d) (d ∈ Dom(g)). We write f|A for the restriction of f to the domain A.

4

Language

As mentioned in Introduction, we consider an underlying language in which sharing and static type information are embodied. In the language, adopting ideas in [12,11], we use explicit temporary variables, environments, stacks, and heaps. 4.1

Source Language

Before giving how to embed static type information in our language, we explain what static type information to use. Instead of type information obtained by a usual static type inference, we use finer type information, which was proposed by Fradet [4]. In this method, we use the principal typing for each continuation. The motivation is as following. The usual type information monolithically gives each variable a single type. However, for a continuation when the GC is invoked, such type information may be too instantiated. For example, consider the following program: let x = hd (hd l) in length l

220

Haruo Hosoya and Akinori Yonezawa

Because of the first line, l is given type (t0 list) list. Suppose that the GC is invoked just before length l. Even though the elements of l will not be used, the GC cannot collect them because they are given a concrete type t0 list. We can (partly) overcome this problem by providing “minimal” type information individually for each continuation. Specifically, we may give l type (t0 list) list for let x = hd (hdl) in length l, and give l type t list for length l. In our formulation, to express this static type information, as a source language, we use the form of expressions that each intermediate result is explicitly bound to a “temporary” variable (also known as A-normal form), and annotate each expression with static type information. The annotation has the form hΓ,τ i e, where hΓ, τi gives a typing for e. That is, intuitively, Γ tells how e accesses its free variables, and τ tells at least what type e has, as explained in Section 2. We can easily see that this form is suitable for computing type information for each continuation at compile time. In the above example, we may give annotation typings as hΓ1 ,τ1 i

let x = hd l in hΓ2 ,τ2 i let y = hd x in hΓ3 ,τ3 i (length l)

where Γ1 (l) = (t0 list) list and Γ3 (l) = t list. Formally, the syntax of our source language is summarized as follows: (values) v ::= hΓ,τ i i | hΓ,τ i λx.c (codes) c¯ ::= hΓ,τ i let x : σ = v in c | hΓ,τ i let w : σ = x y in c | hΓ,τ i x Values are either integers2 , ranged over by i, or functions. A code is a sequence of let expressions that terminates with a variable for result of the code. The let expression binds the result of either a value or an application to a variable. Each value or code is added an annotation typing. 4.2

Evaluation

In real implementation, of course, the GC uses the static type information as it is given at compile time. Therefore we want the description never to operate on the annotation typing at run time. For this purpose, it is not appropriate to use substitutions in our operational semantics. If we used substitutions, we might have to somehow define “Γ [y/x]” for considering (hΓ,τ i e)[y/x] where [y/x] is substitution of y for the variable x. This would involve modification of a mapping from x to a mapping from y, and even unification between the type of x and the type of y if mappings from both variables already exist. (Actually, such manipulations should be postponed until GC time.) Instead of substitutions, we use explicit environments and stacks. An environment is a finite function from variables into addresses, as defined by: (environments) V ::= {x1 = a1 , . . . , xn = an } 2

To simplify the arguments, we allocate integers to heaps, which are usually represented in actual implementation as unboxed values.

Garbage Collection via Dynamic Type Inference: A Formal Treatment

221

We define frames as pairs of environments and codes, and objects as pairs of environments and values:3 (frames) F ::= hhV, cii (objects) h ::= hhV, vii An environment attached to a code or value maps variables in the code or value into addresses in heaps. Thus environments represent local variables in frames of stacks or closure records of function objects. A stack is a semicolon-separated sequence of frames where the right-most frame is “top”: (stacks) C ::= F | F [σ] ; C A stack is extended at a function call and is shrunk at a result. We assume that the environment of each frame in a stack maps some variable x into # except for the top frame. Moreover, to treat sharing, we use explicit heaps. A heap is a finite function from addresses into objects: (heaps) H ::= {a1 = h1 [σ1 ] , . . . , an = hn [σn ] } When a value is evaluated, an object for the value is allocated in a heap. When an application is evaluated, the variable referring to a function is automatically dereferenced. Heaps can treat cycle. However, it will make sense only when our source language includes some feature to create cycles in heaps. A global state is represented as a program in the letrec form that consists of a heap and a stack. An answer is a program that has only one frame whose code is just a variable: (programs) P ::= letrec H in C (answers) A ::= letrec H in hhV, hΓ,τ i xii Above, we have another type annotation σ in h[σ] and σ in F [σ] . We call these types “declaration types” and will explain them later. Before giving our operational semantics, we show an example of evaluation in our language in Figure 1. At the beginning, the program has the empty heap and a stack with one frame that has the empty environment and the code c1 . In the first step, we evaluate the value v1 . It results in allocation of an object at a new address a in the heap. The object is formed by coupling the current environment and the value as hh{}, v1 ii. The address a is assigned to the temporary variable x in the environment, and the code is now set as c2 . In the second step, we evaluate the application (x 1). It dereferences the variable x, which refers to the function object hh{}, v1 ii that has just been allocated in the heap. The stack is extended with a new “callee” frame that is constructed from the object, with the parameter z being assigned the actual argument 1, and the code being set as the body code c4 . In the “caller” frame, # is temporarily assigned to the variable y 3

We can include tuples in the same way by expressing a tuple object as hhV, hx1 , x2 iii, though it should be represented in actual implementation as hV (x1 ), V (x2 )i.

222

Haruo Hosoya and Akinori Yonezawa letrec

{}

in hh{}, c1ii

alloc

−→ letrec {a = hh{}, v1 ii} in hh{x = a}, c2 ii app −→ letrec {a = hh{}, v1 ii} in hh{x = a, y = #}, c3 ii; hh{z = 1}, c4 ii ret

−→ letrec {a = hh{}, v1 ii} in hh{x = a, y = 1}, c3ii

where

c1 = hΓ1 ,τ1 i let x = v1 in c2 c2 = hΓ2 ,τ2 i let y = (x 1) in c3 c3 = hΓ3 ,τ3 i y

v1 = hΓ5 ,τ5 i λz.c4 c4 = hΓ4 ,τ4 i z

Fig. 1. Example of evaluation

until the callee returns. In the third step, the callee returns the value 1 that was assigned to z. We shrink the stack and reassign 1 to the variable y in the caller frame that has been assigned #. Notice that all the annotation typings on the codes and the values are never manipulated in the evaluation. Thus the static type information is available whenever GC is invoked. Our operational semantics is given by an evaluation relation −→ over programs, defined in Figure 2.4 The (stack) rule says that if a program can be reduced, then a program obtained by adding a frame at the bottom to the program can also be reduced. The other three rules are as explained above.

letrec H in C −→ letrec H 0 in C 0 (stack) letrec H in F [σ] ; C −→ letrec H 0 in F [σ] ; C 0 (alloc) letrec H in hhV, hΓ,τ i let x : σ = v in cii −→ letrec H ] {a = hhV, vii[σ] } in hhV ] {x = a}, cii

(a 6∈ Dom(H))

(app) letrec H in hhV, hΓ,τ i let w : σ = (x y) in cii 0

0

(H(V (x)) = hhV 0 , hΓ ,τ i λz.c0ii [σ] −→ letrec H in hhV ] {w = #}, cii ; hhV 0 ] {z = V (y)}, c0 ii

[σ 0 ]

)

(ret) letrec H in hhV ] {w = #}, cii[σ] ; hhV 0 , hΓ,τ i xii −→ letrec H in hhV ] {w = V 0 (x)}, cii

Fig. 2. Operational semantics 4

In this formulation, an environment may contain variables not free in a code or value. This may not only make closures uselessly large but also cause unnecessary unification at GC time. It would be more close to implementation if we restricted the environment as hhV |F V (e) , eii. Our experiment shown in Section 6 is based on it. Our formulation, however, does not adopt it for simplicity. Instead, we avoid unnecessary unification on the side of GC formulation. (See Section 5.)

Garbage Collection via Dynamic Type Inference: A Formal Treatment

223

The GC will use declaration types, which have two kinds. One is on objects. When a let expression let x : σ = v in c of allocation is evaluated, the declared type σ is attached to the allocated object, as h[σ] . The other is on frames. When a let expression let w : σ = (x y) in c of application is evaluated, the declared type σ is attached to the caller frame, as F [σ] . While the declaration type on an object is relevant to the type of the object, the declaration type on a frame is relevant to the type of # in the frame. Therefore declaration types are attached to every frame except for the top one. Declaration types are used only at GC time and are not essential for deriving type judgments. We formally define accessing of an address, and semantic garbage. Definition 4.1. We say P accesses a iff V (x) = a, a ∈ Dom(H) and either 1. P = letrec H in hhV, hΓ,τ i xii; or 2. P = letrec H in F [σ] ; . . . ; hhV, hΓ,τ i let w = (x y) in cii. H1 is semantic garbage for P = letrec H1 ] H2 in C iff there is no P 0 s.t. P −→∗ P 0 and P 0 accesses some a ∈ Dom(H1 ). 4.3

Type System

We give the typing rules for our language in Figure 3. To make the set of typing rules compact, we define expressions as a super set of both the set of values and the set of codes: (expressions) e ::= x | i | λx.e | e1 e2 | let x : σ = e1 in e2 | hΓ,τ i e The typing rules derive the following judgments: Γ `e:τ Γ ` hhV, eii : τ Γ `C:τ Γ ` H : Γ0 `P :τ

well-typed well-typed well-typed well-typed well-typed

code or value frame or object stack heap program

Although these type judgments carry almost the same information as usual Hindley-Milner type systems, we can read these, as mentioned in Section 2, that the type environment indicates how the syntactic object accesses its free variables and what type the syntactic object itself has. The first five rules are the same as in usual type systems. The (annot) rule gives conditions for static type information by two premises. The first premise ensures that any typing for an enclosing context must be more instantiated than the annotation typing. The second premise ensures that the expression must be well-typed under the annotation typing. In Section 4.1, we mentioned that annotation typings were intended to express the principal typings for expressions. However, we do not incorporate the principality in our formulation because it is not relevant to soundness of the GC. The (env) rule has two premises. Assuming a type environment Γ 0 for local variables, the second premise ensures that the expression must be well-typed

224

Haruo Hosoya and Akinori Yonezawa

Expressions: Γ ` i : int Γ ` e1 : τ 0

(int)

Γ ] {x : τ1 } ` e : τ2 (abs) Γ ` λx.e : τ1 → τ2

Γ (x) ≥ τ (var) Γ `x:τ

σ = ∀t¯.τ 0 ¯ t ∩ F T V (Γ ) = ∅ Γ ] {x : σ} ` e2 : τ (let) Γ ` let x : σ = e1 in e2 : τ

Γ ` e1 : τ1 → τ2 Γ ` e 2 : τ1 (app) Γ ` e1 e2 : τ2

hΓ 0 , τ 0 i ≺ hΓ, τ i Γ `

hΓ 0 ,τ 0 i

Γ0 ` e : τ0 e:τ

(annot)

Environments, stacks, heaps, and programs: Γ ◦ V ≥ Γ0 Γ0 ` e : τ (env) Γ ` hhV, eii : τ Γ ` C : τ0

σ0 = ∀t¯.τ 0

∅`H:Γ Γ `C :τ (prog) ` letrec H in C : τ

¯ t ∩ F T V (Γ ) = ∅

Γ ] {# : σ0 } ` F : τ

Γ ` F [σ] ; C : τ

σ σ0

(stack)

∀a ∈ Dom(Γ 0 ) = Dom(H). H(a) = h[σ] Γ ] Γ 0 ` h : Unq(Γ 0 (a)) σ Γ 0 (a) Γ ` H : Γ0

(heap)

Fig. 3. Typing rules

under hΓ 0 , τi. The first premise then relates by generic instance between types of addresses given in Γ and types of the local variables given in Γ 0 . This premise would intuitively be understood by regarding the environment-attached expression hh{x1 = a1 , . . .}, eii as a let expression let {x1 = a1 , . . .} in e. In order to give a typing for this expression, we would require Γ (a1 ) ≥ Γ 0 (x1 ), . . .. Note that this typing rule ensures that x ∈ Dom(V ) and V (x) ∈ Dom(Γ ), for any x ∈ F V (e). Combining the (annot) rule and the (env) rule, we can relate the type of an address a in Γ and the type of a variable x in Γ 0 that refers to a, as Γ (a) ≥ Γ 00 (x) and Γ 0 (x) ≺ Γ 00(x) for some Γ 00. Since this relation is often used in ≺S σ 0 the next section, we abbreviate it as Γ (a) ≺ ≺ Γ 0 (x). Formally, we write σ ≺ 0 00 00 00 0 0 iff σ ≥ σ and σ ≺ σ for some σ . Similarly, we write hΓ, τi ≺ ≺S hΓ , τ i iff Γ 0 ≥ Γ 00 and hΓ, τi ≺S hΓ 00, τ 0 i for some Γ 00. The (stack) rule is analogous to the (let) rule. The (heap) rule deals with potential cycles in heaps. The (stack) rule and the (heap) rule additionally require “compatibility” between declaration types and types that are inferred by the rules. Polytypes σ1 and σ2 are compatible, written σ1 σ2 , iff σ3 ≺ σ1 and σ3 ≺ σ2 , for some polytype σ3 . In the (stack) rule, the declaration type σ on F must be compatible with the inferred type σ 0 of #. In the (heap) rule, the declaration type σ on h must be compatible with the inferred type Γ 0(a).

Garbage Collection via Dynamic Type Inference: A Formal Treatment

225

Although we do not describe in this paper, we can infer annotation typings from source programs using a standard algorithm of polymorphic type reconstruction [10] with a small modification. We can show type soundness of our language as the following theorem that a well-typed program terminates or proceeds without a type error. Theorem 4.1 (Type Soundness). If ` P : τ , then either P is an answer or else there exists P 0 s.t. P −→ P 0 and ` P 0 : τ .

5 5.1

Garbage Collection Overview

Let us begin with viewing the type inference GC scheme on the analogy of the trace-based GC scheme. Both schemes maintain a live set, which is memory regarded as live during the GC, A live set consists of the current stack, and a part of the current heap, called “to-heap”. Then these schemes can be seen as a fixpoint algorithm to find a live set that satisfies the following conditions: the live set contains the stack and all objects accessible from the live set. Concretely, the algorithm begins with a live set consisting of the stack; it then repeatedly finds an object that is accessible from the live set and that is not yet in the live set, and adds the object to the live set; the algorithm terminates when it cannot find such an object. While the trace-based GC uses “refer-to” accessibility, the type inference GC uses a type-based accessibility, as mentioned in Section 2. To obtain an accessible object, we maintain a GC typing during GC. A GC typing gives a “typing for a live set”, that is, it tells the following information: – how the stack accesses addresses – how each object in the to-heap accesses addresses An accessible object can be obtained by picking up an object whose address is given a concrete type in the GC typing. When finding an accessible object, the GC adds it to the live set. At this point, the GC typing must be updated so that it also tells how the added object accesses addresses. It is precisely for this purpose that we perform GC-time unification. In the unification, we extract the annotation typing given in the object, and unify the GC typing with the annotation typing. We call this action “tracing”, which is a reminiscence of the trace-based GC. An interesting point is that the GC typing precisely corresponds to the types of pointers kept at addresses that are mentioned in Introduction. The rest of this section formally describes the above algorithm of type inference GC and proves its correctness. Section 5.2 describes our treatment of polytypes in the GC-time unification. In Section 5.3, we give a function for tracing an object or a stack, which involves unification between a GC typing and an annotation typing. Section 5.4 presents the main loop of the fixpoint algorithm.

226

5.2

Haruo Hosoya and Akinori Yonezawa

Polytype Unification

It seems inevitable that the GC-time unification encounters polytypes that are not trivially related. For example, consider the program let g : ∀t1 t2 .t2 → (t2 → t1 ) → t1 = λx.λy.hΓ2 ,τ2 i (y x) in let z = (g l length) in hΓ1 ,τ1 i length where Γ1 (length) = ∀t3 .t3 list → int and Γ2 (y) = t2 → t1 . After the application (g l length), the program will be [∀t .t list→int] alength = hh{}, . . .ii 3 3 letrec ... in hh{. . . , length = alength , z = #}, hΓ1 ,τ1 i lengthii; hh{. . . , y = alength }, hΓ2 ,τ2 i (y x)ii where alength is the address where the function length allocated. Because the variable length in the first frame and the variable y in the second frame share the same address alength , GC would have to somehow unify their types Γ1 (length) = ∀t3 .t3 list → int and Γ2 (y) = t2 → t1 . Since considering unification between arbitrary two polytypes seems difficult, we develop a unification method that works specially in our framework. First of all, the goal of unification is to obtain a GC typing h∆, τi under which a newly added object or frame is well-typed. In the above case, when we trace the first ≺ Γ1 (length), and when we trace frame, the GC typing should satisfy ∆(alength ) ≺ ≺ Γ2 (y). the second frame, the GC typing should satisfy ∆(alength ) ≺ It turns out technically convenient to initially give a GC typing that maps every address into the least instantiated type of the declaration type on it. In the above case, since the declaration on alength is ∀t3 .t3 list → int, we initially give ∆(alength ) = ∀t3 .t3 list → t4 , where t4 is a fresh type variable. Our unification method is rather simple. Suppose we have as inputs a type ∆(a) = σ1 and a type Γ (x) = σ2 . We first instantiate all quantified type variables in the types σ1 and σ2 with fresh type variables, and then unify the obtained monotypes. As a result, we obtain a type substitution S and update the GC typing as S∆(a) = Sσ1 . In the above example, we will obtain for the first frame a type substitution S1 = {t4 7→ int}, and update the GC typing as S1 ∆(alength ) = ∀t3 .t3 list → int. For the second frame, we will obtain a type substitution S2 = {t1 7→ t03 list, t2 7→ int}, and update the GC typing as S2 S1 ∆(alength ) = ∀t3 .t3 list → int (actually unchanged). To roughly explain why this method works, we have the following invariant at any time in the GC. For any address a, we have its “actual type” Γ0 (a) that ≺ Γ0 (a) for all annotation typings hΓ1 , τ1 i. satisfies ∆(a) ≺ Γ0 (a), and Γ1 (x) ≺ The first condition is ensured by our initial GC typing and unification method given above, and the second condition is ensured by our type system. From these conditions, our unification can be shown to succeed with S∆(a) ≥ SΓ1 (x), which implies the goal condition of unification: S∆(a) ≺ ≺ Γ1 (x).

Garbage Collection via Dynamic Type Inference: A Formal Treatment

227

The following function PolyUni formally specifies our unification method where we generalize the above method to take typings as inputs: PolyUni(hΓ1 , τ1 i, hΓ2 , τ2 i) = let E = {Unique(Γ1 (x)) = Unique(Γ2 (x)) | x ∈ Dom(Γ1 )} ∪ {τ1 = τ2 } in S = Unify(E) The function Unify is the well-known unification algorithm to compute the most general unifier of a set E of unifiable equations of monotypes [14]. Unique instantiates all quantified type variables in a polytype as fresh type variables, defined as Unique(∀t¯.τ ) = Sτ where S(t) = t0 (t ∈ t¯, and t0 is a fresh type variable). The following lemma summarizes the above-mentioned properties for PolyUni. Lemma 5.1. Suppose F T V (Γ1 , τ1 ) ∩ F T V (Γ2 , τ2 ) = ∅ and Dom(Γ1 ) = ≺S2 hΓ0 , τ0 i, then PolyUni(hΓ1 , τ1 i, Dom(Γ2 ). If hΓ1 , τ1 i ≺S1 hΓ0 , τ0 i, hΓ2 , τ2 i ≺ hΓ2 , τ2 i) succeeds to produce S such that SΓ1 ≥ SΓ2 , Sτ1 = Sτ2 , and S1 = S 0 ◦ (S|F T V (Γ1 ,τ1 ) ) for some S 0 . 5.3

Trace Function

When tracing an object or stack, we use the function Trace defined below. Trace takes a GC typing h∆, τi and an object, frame or stack, and unifies the GC typing and the annotation typing in the object, frame or stack, computing a type substitution S. As a result, the object, frame or stack will be well-typed under the new GC typing hS∆, Sτ i: 0

Trace(h∆, τi, hhV, hΓ,τ i eii) = let S = PolyUni(h∆ ◦ V |F V (e) , τi, hΓ |F V (e), τ 0 i) where F T V (∆, τ ) ∩ F T V (Γ, τ 0 ) = ∅ in S|F T V (∆,τ) Trace(h∆, τi, F [σ] ; C) = let σ 0 = LIT(σ) S = Trace(h∆ ] {# : σ 0 }, τi, F ) S 0 = Trace(hS∆, Unq(Sσ 0 )i, C) in S 0 ◦ S The function LIT computes the least instantiated type of a polytype, as defined by LIT(∀t¯.τ ) = ∀t¯.LIT0 (t¯, τ ) where  0 (t¯ ∩ F T V (τ ) = ∅, t0 is fresh) t 0 ¯ 0 ¯ 0 ¯ LIT (t, τ ) = LIT (t, τ1 ) → LIT (t, τ2 ) (τ = τ1 → τ2 )  τ (otherwise). LIT replaces every subterm in a polytype that contains only unquantified type variables with a fresh type variable. For example, LIT(∀t.(int → int) → (t → t)) = ∀t.t0 → (t → t) where t0 is a fresh type variable. In particular, if σ is a monotype, LIT(σ) is simply a fresh type variable. For an object or frame, Trace computes a type substitution that unifies the GC typing h∆, τi and the annotation typing hΓ, τ 0 i given to the value or code e. To explain more specifically,

228

Haruo Hosoya and Akinori Yonezawa

– for each (local or free) variable x in the environment V (that is free in e,) Trace unifies the type of x given in the annotation typing (Γ (x)) and the type of the address assigned to x that is given in the GC typing (∆(V (x))); – Trace unifies the result type of the annotation typing (τ 0 ) and the result type of the GC typing (τ ). For a stack, Trace scans the frames in the stack in a bottom-up manner, unifying the GC typing and the annotation typing of each frame. A technical note is that when tracing a frame F , we pass a GC typing that gives # a type σ0 , which is LIT of the declaration type σ attached to F . In the tracing of F , σ 0 is expected to be unified as Sσ 0 with the type given to a variable assigned # in the annotation typing of F . When tracing the rest stack C, we pass a GC typing that has result type Unq(Sσ 0 ). This type is expected to be unified with the result type of the annotation typing of the next frame. We can prove the following lemma about the Trace function. Lemma 5.2. If Γ0 ` C : τ0 and h∆, τi ≺ hΓ0 , τ0 i, then S = Trace(h∆, τi, C) and S∆ ` C : Sτ and S0 = S1 ◦ S for some S1 . 5.4

Main Loop

We describe the main part of our GC algorithm. Suppose that the GC is triggered just when a program is evaluated to P = letrec H0 in C with ` P : τ0 . For simplifying discussion, we consider only the case τ0 = int. For initialization of the GC, we first set the GC typing as h∆0 , τ0 i where5 ∆0 (a) = LIT(σ) (∀(a = h[σ] ) ∈ H). We then trace the stack C using the function Trace and update the GC typing by the resulting type substitution S: h∆init , τiniti = hS∆0 , Sτ0 i where S = Trace(h∆0 , τ0 i, C) The rest part of our algorithm is described in terms of GC states in the form hHf , Ht, h∆, τii where Hf and Ht are heaps respectively called “from-heap” and “to-heap”, and h∆, τ i is a GC typing. In the terminology used in Section5.1, a live set corresponds to the to-heap Ht plus the stack C. We initialize the GC state as hH0 , ∅, h∆init, τinitii. At this point, the live set contains only the stack C. Then the GC iterates steps described by a rewriting relation =⇒ over GC states, given by the following rule: hHf ] {a = h[σ] }, Ht, h∆, τii =⇒ hHf , Ht ] {a = h[σ] }, hS∆, Sτ ii where ∆(a) 6∈ T V ar and S = Trace(h∆, Unq(∆(a))i, h) 5

In our formulation, polymorphic objects can never be reclaimed even if they are unreachable. An easy way to resolve this problem is to keep track of the set of free addresses in a to-heap and to trace only addresses in the set. However, reachable yet inaccessible polymorphic objects still cannot be reclaimed. It is left in our future work.

Garbage Collection via Dynamic Type Inference: A Formal Treatment

229

The GC first finds an object h that is allocated at an address a in the fromheap Hf , i.e., not yet in the live set, and that is not given a type variable in ∆, i.e., accessible from the live set. The GC then moves the object h to the to-heap Ht , i.e., adds the object to the live set. Next, the GC traces the object h using the Trace function and updates the GC typing h∆, τ i by the resulting type substitution S as hS∆, Sτ i. The GC proceeds until all addresses in Hf are given type variables in the GC typing. The final Ht will be used as a heap for the remaining computation. We can prove correctness of the GC algorithm as the following theorem that the algorithm terminates and finds semantic garbage. Theorem 5.1 (GC Correctness). Let ∅ ` H0 : Γ0 , Γ0 ` C : int and ∆init be as defined above. Then, 1. hH0 , ∅, h∆init, intii =⇒∗ hHf , Ht, h∆, intii, for some Hf , Ht , ∆; and 2. Hf is semantic garbage for letrec H0 in C.

6

Preliminary Experiment

This section presents results of our preliminary experiment and gives a discussion on them. In this experiment, we focus on how much garbage the type inference GC scheme can collect in comparison with conventional trace-based GCs. We did not experiment on costs of the type inference GC scheme. We will remark on it in Section 7. Our prototype system consists of an interpreter of our language and both a type inference GC and a trace-based GC. All of these are implemented in Standard ML. We use as applications a recursive fibonacci function of 10 (fib10), a quick sort program for a list of length 10 (qs10), a merge sort program for a list of length 10 (ms10), a 4-queens program (queen4), a program of the next permutation problem for a list of length 4 (nextperm4), and a simple register allocation program for a series of instructions of length 15. For each application, we measure the live memory size by each GC (i.e., the total size of objects that the GC retains as live) every several steps. The size of each object is calculated as follows: 0 for an integer (since it is usually unboxed6 ); 2 for a cons cell (a tag and a pointer to a pair); the number of elements for a tuple; and the number of the free variables plus one for code pointer for a closure. The first graph in Figure 4 indicates the number of steps where the type inference GC has gain more than zero. Gain means difference of live memory size by the type inference GC and by the trace-based GC. We have no gain for fib10 and qs10, and very little for queen4 and ms10, while we have some gain for nextperm4 and reg15. The second graph in the figure compares the progresses of live memory size by the two GCs during the execution of nextperm4. The third graph shows a similar comparison for reg15. In the second graph, the type inference GC collects some objects earlier than the trace-based GC does. In the 6

To be more precise, our interpreter allocates integers in heaps. However, since integers are usually represented as unboxed values in real implementations, we consider that these should be treated as zero-size objects when we measure live memory size.

230

Haruo Hosoya and Akinori Yonezawa

third graph, the type inference GC collects dramatically (42% at the best) in the last quarter period. We observe two kinds of program structure in the applications that gained some. In one structure, a function takes a tuple as its multiple arguments and dereferences the tuple every time one of the arguments is necessary. Since the tuple is reachable, the trace-based GC cannot collect unnecessary elements of the tuple, but the type inference GC can. Another structure was observed particularly in the register allocation program. This program consists of three phases. It first computes live variables at each instruction and keeps this information in a data structure representing the instruction. Next, the program obtains register assignments for variables, and it finally generates instructions using this information. In the final phase, the information of live variables is never used but kept in reachable data structures. Therefore the trace-based GC cannot collect this data, but the type inference GC can. Generalizing the observation, we expect that the type inference GC scheme is beneficial particularly for programs that possess what we call phase structure. A program possessing this structure has multiple phases and maintains some data structures during the phases. On some phase, the program keeps temporary data in specific fields of the data structures, and it never uses the data after another phase. The two benefiting programs in our experiment are typical examples. To give another example, a compiler program may keep inferred type information in specific fields of data structures representing nodes of parse tree, and never uses the information after some phase. It is interesting how these results are sensitive to other factors. If we increase the size of input, we expect that gain will become large because the size of the useless temporary data tends to depend on the input size. For another factor, if we test the live memory size much less frequently (with intervals larger than one phase), we will see in the graph gain around the average gain over the whole execution.

7

Remark on Cost

Despite that the type inference GC aims to save memory, this scheme itself requires extra memory. First, we will need one pointer field per object for implementing defer-list, that is, a linked-list to keep objects whose traversal are suspended. Second, extra memory is necessary for the GC-time unification.7 Not all of this memory can be allocated at compile time. The unification assumes that the given GC typing and the given annotation typing have no type variables in common (see Section 5.3). Therefore we should copy the annotation typing for each object where the contained type variables are renamed as fresh type variables. This allocation can be done either at run time or at GC time. Another 7

The unification may be implemented in a destructive way. That is, we may represent type variables as pointers (initially null), and when the type variables are instantiated to be some types, the pointers are assigned the types.

Garbage Collection via Dynamic Type Inference: A Formal Treatment

duration with gain

without gain

reg15

with gain nextperm4 queen4 ms10 qs10 fib10 0.00%

20.00%

40.00% 60.00% percentage in time

80.00%

100.00%

progress of live size in nextperm4 40 tracing inference l i v e

30

20 s i z e 10

0 0

100

200 time

progress of live size in reg15 600 tracing 500

inference

l i 400 v e 300 s i 200 z e 100 0 0

10000

20000 time

30000

Fig. 4. Experimental results

231

232

Haruo Hosoya and Akinori Yonezawa

allocation that cannot be done at compile time occurs in generating fresh type variables in the polytype unification described in Section 5.2. It seems inevitable to be done at GC time. Although it is usually said that no memory should be allocated at GC time, we believe that this scheme should allocate the extra memory (especially the second one) at GC time rather than at run time. It is because the extra memory seems to be so large that if it is allocated at run time, it would exceed the memory saving by this GC scheme. On the other hand, if the extra memory is allocated at GC time, allocated space may be momentarily high, but it is completely useless after the GC finishes and therefore can be released. It would be acceptable if the GC is rather infrequent. We can even use the conventional GC for the most time and the type inference GC very infrequently. We believe that allocating at GC time is not prohibitive in most platforms, because even if the run-time system decides to trigger GC, free memory can be expected available from the underlying operating system. (If it turned out that free memory is actually unavailable, we could safely switch to the conventional GC.) We may theoretically estimate space cost of type inference GC. In the worst case, as in the ML type inference, the size of type expression is exponential on the program size (more precisely, the number of nesting of let expressions), so is the space requirement of the GC. However, in practice, type expressions in the program will be within a reasonable size. Then, the space requirement is reasonably linear on the number of live objects.

8

Related Work

Some efforts have been made to resolve some memory leaks in ways specific to their target languages. Wadler [17] proposed a GC technique that makes use of an execution scheme of a lazy functional language. The GC detects the run time representation of the form fst hx, yi and reduces it as x; thus collecting unused tuple elements. We believe that the type inference GC can collect such objects in most cases. Type inference GC originates in researches of tag-free GC for polymorphictyped languages. The most important motivation in this area is to eliminate tags indicating pointer or non-pointer, which incur overheads either at every arithmetic or at every pointer operation. In our description, this kind of tags is not necessary. Annotation typings attached to values of objects do not correspond to them and do not involve run time overheads. Several researchers have studied type inference GC. Goldberg and Gloger [5] discovered that semantic garbage can be detected by dynamic type inference in their research of tag-free GC. (Strictly speaking, we may not regard their scheme as type inference GC because their original goal was to traverse all reachable objects. Because of the goal, they use more unification than necessary.) Because their description and correctness argument of the GC algorithm was informal, its soundness was not clear. Fradet [4] also presented a formulation of type inference GC and proof of its correctness using Reynolds’ abstraction/parametricity

Garbage Collection via Dynamic Type Inference: A Formal Treatment

233

theorem [13]. However, his framework was so abstract that there still remains a gap between models and implementation. Specifically, his formulation did not treat sharing, did not specify precise conditions of static type information, and did not prove that GC-time unification succeeds. Morrisett et al. [12] gave an argument on type inference GC. However, they specified conditions of a heap that a GC should find, but did not give an algorithm. Neither Fradet [4] nor Morrisett et al. [12] have dealt with polymorphism. Several papers have proposed another scheme to realize a tag-free GC in a polymorphically typed language [1,16,11]. In this scheme, a program is transformed into a second-order program where polymorphic functions are passed as extra arguments actual types for type variables in polytypes. Using these types, GC reconstructs types of all reachable objects and traverses all these objects. While this scheme does not collect any reachable garbage, these papers reported that it can be implemented in reasonably low cost. Therefore there would be trade-off between memory saving and speed, possibly depending on applications. This should be investigated through experiments in the future. Many techniques have been proposed to statically estimate life times of objects [2,8,6,15,3]. Techniques using sharing analysis [2,8,6] cannot collect more garbage than type inference GCs can since they use reachability-based liveness of objects. Techniques using region inference [15,3] can collect some semantic garbage. Since we have not studied to the extent to make a precise comparison between their scheme and our scheme, we will below just give examples and show that objects collected by one scheme may not be collected by the other. In an expression fst he1 , e2 i, our scheme can collect the second element of the tuple just after the tuple is allocated, while their scheme cannot. On the other hand, in an expression ( let y = h1, 2i in λx.(fst hx, λw.(fst y)i) end) e their scheme can collect h1, 2i just after the evaluation of the let expression, while our scheme cannot.

9

Conclusion and Future Work

This paper has given a formal description of type inference GC with fully proved correctness. Our framework is sufficiently detailed yet reasonably abstract. Therefore it will serve as a specification for implementing a type inference GC and as a foundation for discussing and extending the GC scheme. We conjecture, in addition to soundness, optimality in the sense that the described algorithm collects more garbage than any other algorithm using the same type system does. We have already proved the optimality for a monomorphic type system by requiring principality of annotation typings. We could do it for polymorphic type systems, but it may be more complicated. From our experimental results, we expect that type inference GC benefits particularly for programs with the phase structure. In order to claim more, we

234

Haruo Hosoya and Akinori Yonezawa

need to implementation the GC scheme in more low-level language, on some existing ML system. Then we should measure space and time cost of the GC scheme. We should also investigate larger applications for which the GC scheme is effective. We expect that one such application would be a compiler program that would contain many phase structures. Even for non-benefiting applications, the GC should be at least not worse. Optimization techniques to reduce extra memory at GC-time by using statically available information as much as possible would be effective. We could even use a conventional GC or a type-passing tagfree GC for the most time, and the type inference GC very infrequently.

Acknowledgments We express our warmest thanks to Kenjiro Taura for encouraging us in this research and for giving us a precious advice for clarifying the motivation of our work. Comments from Benjamin Pierce, Naoki Kobayashi, Atsushi Igarashi and the TIC referees were very helpful in improving the presentation of the paper. We also give thanks for many discussions to members of Yonezawa’s Group and members of Programming Language Seminor in Indiana University.

References 1. S. Aditya, C. Flood, and J. Hicks. Garbage collection for strongly-typed languages using run-time type reconstruction. In Proceedings of Conference on LISP and Functional Programming, pages 12–23, 1994. 2. H. G. Baker. Unify and conquer (garbage, updating, aliasing, ...) in functional languages. In Proceedings of Conference on LISP and Functional Programming, pages 218–226, 1990. 3. L. Birkedal, M. Tofte, and M. Vejlstrup. From Region Inferrence to von Neumann Machines via Region Representation. In Conference record of Symposium on Principles of Programming Languages, pages 171–183, 1996. 4. P. Fradet. Collecting more garbage. In Proceedings of Conference on LISP and Functional Programming, pages 24–33, 1994. 5. B. Goldberg and M. Glogar. Polymorphic type reconstruction for garbage collection without tags. In Proceedings of Conference on LISP and Functional Programming, pages 53–65, 1992. 6. K. Inoue, H. Seki, and H. Yagi. Analysis of functional programs to detect runtime garbage cells. ACM Transactions on Programming, Languages and Systems, 10(4):555–579, 1988. 7. R. Jones. Tail recursion without space leaks. Journal of Functional Programming, 2(1):73–79, 1992. 8. S. B. Jones and D. L. M´etayer. Compile-time garbage collection by sharing analysis. In Conference Proceedings of Functional Programming Languages and Computer Architecture, pages 54–74, Imperial College, London, September 1989. 9. S. L. P. Jones. The Implementation of Functional Programming Languages. Prentice-Hall, 1987. 10. R. Milner. A theory of type polymorphism in programming. Journal of Computer and System Sciences, 17:348–185, 1978. 11. G. Morrisett. Compiling with Types. PhD thesis, School of Computer Science Carnegie Mellon University, 1995.

Garbage Collection via Dynamic Type Inference: A Formal Treatment

235

12. G. Morrisett, M. Felleisen, and R. Harper. Abstract models of memory management. In Proceedings of Functional Programming Languages and Computer Architecture, pages 66–76, 1995. 13. J. Reynolds. Types, abstraction, and parametric polymorphism. In Information Processing, volume 83, pages 513–523, 1983. 14. J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of ACM, 12, 1965. 15. M. Tofte and J.-P. Talpin. Implementation of the Typed Call-by-Value λ-calculus using a Stack of Regions. In Conference record of Symposium on Principles of Programming Languages, pages 188–201, 1994. 16. A. Tolmach. Tag-free garbage collection using explicit type parameters. In Proceedings of Conference on LISP and Functional Programming, pages 1–11, 1994. 17. P. Wadler. Fixing some space leaks with a garbage collector. Software Practice and Experience, 17(9):595–608, September 1987.

Appendix A

Proof of Theorem 4.1

Lemma A.1. If hΓ, τi ≺ ≺ hΓ 0 , τ 0 i and Γ ` hΓ

00

,τ 00 i

e : τ , then Γ 0 ` hΓ

00

,τ 00 i

e : τ 0.

Proof. Trivial. Lemma A.2 (Type Preservation). If ` P : τ and P −→ P 0 , then ` P 0 : τ . Proof. By induction on the derivation of P −→ P 0 . We have the following four cases. (stack) P = letrec H in F [σ] ; C −→ letrec H 0 in F [σ] ; C 0 = P 0 with P1 = letrec H in C −→ letrec H 0 in C 0 = P10 . From ` P : τ , ` P1 : τ 0 . By the induction hypothesis, ` P10 : τ 0 , which implies ∅ ` H 0 : Γ 0 and Γ 0 ` C 0 : τ with Γ 0 ⊇ Γ . Using Lemma A.1, we can derive ` P 0 : τ 0 . 0 0 (alloc) P = letrec H in hhV, cii with c = hΓ ,τ i let x : σ = v in c0 . Let σ = ∀t¯.τ 00 . From ` P : τ , Γ0 ` hhV, cii : τ , which implies for some Γ0 , Γ0 ◦ V ≥ Γ hΓ 0 , τ 0 i ≺S hΓ, τi Γ 0 ` (let x : σ = v in c0 ) : τ 0

(1) (2) (3)

We can assume that Dom(S) ∩ t¯ = ∅ by an appropriate renaming of ¯t in σ. Therefore Unq(Sσ) = Sτ 00 . From (2), hΓ 0 , τ 00 i ≺S hΓ, Sτ 00 i and from (3), Γ 0 ` v : τ 00 . Applying Lemma A.1 to these, we conclude that Γ ` v : Sτ 00 , which implies Γ0 ` hhV, vii : Sτ 00 together with (1). We also know ∅ ` H [∆] : [σ] Γ0 from ` P : τ . Easily, ∅ ` H ] {a = hhV, vii } : Γ0 ] {a : Sσ}. Therefore it is sufficient to show Γ0 ] {a : Sσ} ` hhV ] {x = a}, c0 ii : τ . It follows from Lemma A.1 applied to Γ 0 ] {x : σ} ` c0 : τ 0 by (3), (Γ0 ] {a : Sσ})◦ (V ] {x = a}) ≥ Γ ] {x : Sσ} by (1), and hΓ 0 ] {x : σ}, τ 0 i ≺S hΓ ] {x : Sσ}, τ i by (2).

236

Haruo Hosoya and Akinori Yonezawa 0

0

(app) P = letrec H [∆] in hhV, cii with c = hΓ ,τ i let w : σ = (x y) in c0 . Let σ = ∀t¯.τ 00 . From ` P : τ , Γ0 ` hhV, cii : τ , which implies for some Γ0 , Γ0 ◦ V ≥ Γ hΓ 0 , τ 0 i ≺S hΓ, τi Γ 0 ` (let w : σ = (x y) in c0 ) : τ 0

(4) (5) (6)

We can assume that Dom(S) ∩ ¯t = ∅ by an appropriate renaming of t¯ in σ. Therefore Unq(Sσ) = Sτ 00 . We obtain (Γ0 ] {# : Sσ}) ◦ (V ] {w = #}) ≥ Γ ] {w : Sσ} from (4), hΓ 0 ] {w : σ}, τ 0 i ≺S hΓ ] {w : Sσ}, Sσ}i from (5), and Γ 0 ] {w : σ} ` c0 : τ 0 from (6). Applying Lemma A.1 to these, we conclude that Γ0 ] {# : Sσ} ◦ (V ] {w = #}) ` c0 : τ , which implies Γ0 ] {# : Sσ} ` hhV ] {w = #}, c0 ii : τ . Then it is sufficient to show Γ0 ` hhV 00 ] {z = V (y)}, c00 ii : Sτ 00 . Let 00 H(V (x)) = hhV 00 , hΓ ,τ1 →τ2 i λz.c00 ii, σx = Γ0 (V (x)) and σy = Γ0 (V (y)). Since ∅ ` H [∆] : Γ0 from ` P : τ , we have Γ0 ` H(V (x)) : Unq(σx), which implies Γ0 ◦ V 00 ≥ Γ 000 hΓ , τ1 → τ2 i ≺S 0 hΓ 000, Unq(σx )i Γ 00 ] {z : τ1 } ` c00 : τ2 00

(7) (8) (9)

From (6), we have Γ 0 (x) ≥ τ100 → τ 00 and Γ 0 (y) ≥ τ100 for some τ 00 . Therefore, together with (4) and (5), we obtain σx = Γ0 (V (x)) ≥ Γ (x) = SΓ 0 (x) ≥ S(τ100 → τ 00 ) and σy = Γ0 (V (y)) ≥ Γ (y) = SΓ 0 (y) ≥ Sτ100 . From σx ≥ S(τ100 → τ 00 ) and (8), we have S 00 S 0 (τ1 → τ2 ) = S(τ100 → τ 00 ) where F T V (Γ 000, τ1 → τ2 ) ∩ Dom(S 00 ) = ∅. Therefore S 00 S 0 τ1 = Sτ100 , implying σy ≥ S 00 S 0 τ1 , and S 00 S 0 τ2 = Sτ 00 . Consequently, we have (Γ0 ◦ V 00 ) ] {z : σy } ≥ Γ 000 ] {z : S 00 S 0 τ1 } and hΓ 00 ] {z : τ1 }, τ2 i ≺S 0 hΓ 000 ] {z : S 0 τ1 }, S 0 τ2 i ≺S 00 hΓ 000 ] {z : S 00 S 0 τ1 }, Sτ 00 i. Applying Lemma A.1 to these and (9), we conclude that Γ0 ` hhV 00 ] {z = V (y)}, c00 ii : Sτ 00 . (ret) P = letrec H [∆] in F [σ] ; C with F = hhV ] {w = #}, cii and 0 0 C = hhV 0 , hΓ ,τ i xii. From ` P : τ , Γ0 ` F [σ] ; C : τ for some Γ0 , which implies Γ0 ] {# : σ 0 } ` F : τ Γ0 ` C : τ 00 0 00 σ = ∀t¯.τ and t¯ ∩ F T V (Γ0 ) = ∅

(10) (11) (12)

From (11), Γ0 ◦ V 0 ≥ Γ and hΓ 0 , τ 0 i ≺S hΓ, τ 00 i and Γ 0 ` x : τ 0 , for some Γ and S. Therefore we can derive Γ0 (V 0 (x)) ≥ Γ (x) = SΓ 0 (x) ≥ Sτ 0 = τ 00 . From (12), Γ0 (V 0 (x)) ≥ σ 0 . Then, from (10), we have (Γ0 ◦ V ) ] {w : σ 0 } ≥ Γ 00 ] {w : σ 00 } for some σ 00 . Thus (Γ0 ◦ V ) ] {w : Γ0 (V 0 (x))} ≥ Γ 00 ] {w : σ 00 }. We also know from (10), Γ 00 ] {w : σ 00 } ` c : τ . Hence, we conclude Γ0 ` hhV ] {w : V 0 (x)}, cii : τ .

Garbage Collection via Dynamic Type Inference: A Formal Treatment

237

Theorem A.1 (Type Soundness). If ` P : τ , then either P is an answer or else there exists P 0 s.t. P −→ P 0 and ` P 0 : τ . Proof. We can know that either P is an answer or else there exists P 0 s.t. P −→ P 0 by induction on the structure of the stack of P with case analysis on its top frame. Then Lemma A.2 is sufficient to prove this theorem.

B

Proof of Theorem 5.1

Lemma B.1 (Unification). There exists an algorithm Unify s.t. Unify(E) computes the most general unifier of E, for any set E of unifiable equations of monotypes. = ∅ and Lemma B.2. Suppose F T V (Γ1 , τ1 ) ∩ F T V (Γ2 , τ2 ) ≺S2 hΓ0 , τ0 i, Dom(Γ1 ) = Dom(Γ2 ). If hΓ1 , τ1 i ≺S1 hΓ0 , τ0 i, hΓ2 , τ2 i ≺ then PolyUni(hΓ1 , τ1 i, hΓ2 , τ2 i) succeeds to produce S such that SΓ1 ≥ SΓ2 , Sτ1 = Sτ2 , and S1 = S 0 ◦ (S|F T V (Γ1 ,τ1 ) ) for some S 0 . (x)

(x)

Proof. For x ∈ Dom(Γ1 ), let Γi (x) = ∀t¯i .τi (i = 0, 1, 2) and Unique(Γi (x)) = (x) (x) (x) (x) (x) (x0 ) Si τi where Dom(Si ) = t¯i (i = 1, 2). We assume ¯t0 ∩ t¯0 = ∅ if x 6= x0 . (x) (x) (x) ≺S2 hΓ0 , τ0 i, there is S3 s.t. Dom(S3 ) = t¯0 and From hΓ2 , τ2 i ≺ (x) (x) (x) (x) (x) = S2 τ2 . From hΓ1 , τ1 i ≺S1 hΓ0 , τ0 i, τ0 = S1 τ1 , implying S3 τ0 U (x) (x) (x) (x) (x) = S3 S1 τ1 . Therefore, defining S3 = S3 τ0 x∈Dom(Γ1 ) S3 S1 S2 , we conclude that S3 unifies E. Hence Unify(E) succeeds by Lemma B.1. (x) (x) (x) (x) By Lemma B.1, we have Dom(S) = F T V (S1 τ1 , S2 τ2 , τ1 , τ2 ), (x) (x) (x) (x) (x) (x) = SS2 τ2 , Sτ1 = Sτ2 . Since SΓ1 (x) ≥ SS1 τ1 and SS1 τ1 (x) 0(x) 0(x) (x) ¯ ¯ ¯ ∩ F T V (SΓ1 (x)) = ∅ where t2 = S2 t2 , we can derive t2 0(x) 0(x) (x) (x) (x) (x) SΓ1 (x) ≥ ∀t¯2 .SS2 τ2 = S(∀t¯2 .S2 τ2 ) = SΓ2 (x). (x) (x) (x) (x) = S3 S2 τ2 and S3 τ1 = S3 τ2 , Since we already know S3 S1 τ1 by Lemma B.1, we conclude that there is S 00 s.t. S3 = S 00 ◦ S. From S3 |F T V (Γ1 ,τ1 ) = S1 , S1 = S 00 ◦ S|F T V (Γ1 ,τ1 ) . Lemma B.3. If Γ0 ` C : τ0 , h∆, τi ≺S0 hΓ0 , τ0 i and (Dom(S0 ) ⊆ F T V (∆, τ )), then S = Trace(h∆, τi, C), and S∆ ` C : Sτ and S0 = S1 ◦ S for some S1 . 0

Proof. We first show that the lemma holds for C = hhV, hΓ,τ i eii. From Γ0 ` C : ≺ hΓ0 ◦ V, τ0 i. From the hypothesis, h∆ ◦ V, τ i ≺S0 hΓ0 ◦ V, τ0 i. By τ0 , hΓ, τ 0i ≺ ≺ hS∆ ◦ V, Sτ i and Lemma B.2, the Unify(E) in Trace succeeds with hΓ, τ 0 i ≺ S0 = S1 ◦ (S|F T V (∆,τ) ) for some S1 . By Lemma A.1, S∆ ` C : Sτ . We then show the lemma by induction on the structure of C. C is either F or F [σ] ; C 0 . We have already shown the former case above. For the latter case, from Γ0 ` C : τ0 , Γ0 ] {# : σ 0 } ` F : τ0 with σ σ 0 . Let σ 00 = LIT(σ). Then σ 00 ≺ σ 0 . By applying our first argument, we conclude S(∆ ] {# : σ 00 }) ` F : Sτ and S0 = S1 ◦ S for some S1 . From Γ0 ` C : τ0 ,

238

Haruo Hosoya and Akinori Yonezawa

Γ0 ` C 0 : τ 0 with σ 0 = ∀t¯.τ 0 and t¯ ∩ F T V (Γ0 ) = ∅. Since σ 00 ≺ σ 0 , we can let t = t¯, Sτ 00 ≺S1 τ 0 . Thus, we obtain hS∆, Sτ 00 i ≺S1 hSΓ0 , τ 0 i σ 00 = ∀t¯.τ 00 . From S0 ¯ (with Dom(S1 ) ⊆ Dom(S0 ) ∪ F T V (Rng(S)) ⊆ F T V (S∆, Sτ 00 )). Applying the induction hypothesis, we conclude S 00 ∆ ` C 0 : S 00 τ 00 with S 00 = S 0 ◦ S, and S1 = S2 ◦ S 0 for some S2 . Therefore S0 = S1 ◦ S = S2 ◦ S 00 . Further, t = S0 t¯ = t¯. Let ¯ t0 = S 00 ¯t. If there is t ∈ t¯0 ∩ F T V (S 00 ∆), we obtain S2 S 00 ¯ 0 ¯ ¯ S2 t ∈ S2 t = t and S2 t ∈ F T V (S2 S 00 ∆) = F T V (Γ0 ), which contradicts with ¯ t ∩ F T V (Γ0 ) = ∅. Therefore t¯0 ∩ F T V (S 00 ∆) = ∅. Consequently, since we have S 00 ∆ ] {# : ∀t¯0 .S 00 τ 00 } ` F : S 00 τ obtained from S(∆ ] {# : σ 00 }) ` F : Sτ , S 00 ∆ ` C 0 : S 00 τ 00 , and σ ∀t¯0 .S 00 τ 00 , we conclude that S 00 ∆ ` F [σ] ; C 0 : S 00 τ . The following well-formedness is a basic property of GC states. Definition B.1 (Well-formedness). Let ∅ ` H0 : Γ0 and Γ0 ` C : τ0 . hHf , Ht, h∆, τii is well-formed w.r.t. letrec H0 in C and hΓ0 , τ0 i iff 1. 2. 3. 4.

H0 = Hf ] Ht ∆`C:τ ∆ ` Ht (a) : Unq(∆(a)) for x ∈ Dom(Ht ). h∆, τi ≺ hΓ0 , τ0 i

A GC state always proceeds by =⇒ preserving the well-formedness as long as there remain live objects in the from-heap. Lemma B.4 (GC Progress). If hHf , Ht , h∆, τii is well-formed w.r.t. P and hΓ0 , τ0 i, then 1. ∆(a) ∈ T V ar for any a ∈ Dom(Hf ); or else 2. hHf , Ht , h∆, τii =⇒ hHf0 , Ht0 , h∆0 , τ 0 ii for some Hf0 , Ht0 , ∆0 and τ 0 , and hHf0 , Ht0 , h∆0 , τ 0 ii is well-formed w.r.t. P and hΓ0 , τ0 i. Proof. Suppose (1) does not hold. Then, there is some a ∈ Dom(Hf ) s.t. ∆(a) 6∈ T V ar. Let τ 00 = Unq(∆(a)), Hf0 = Hf \ {a = h} and Ht0 = Ht ] {a = h}. From the well-formedness, we have Γ0 ` h : τ1 where τ1 = Unq(Γ0 (a)), and h∆, τ 00i ≺S0 hΓ0 , τ1 i (Dom(S0 ) ⊆ F T V (∆, τ )). By Lemma B.3, we obtain S∆ ` h : Sτ 00 and S0 = S1 ◦ S for some S1 . Hence hHf , Ht , h∆, τii =⇒ hHf0 , Ht0 , hS∆, Sτ ii. We shall show each condition of the well-formedness of hHf0 , Ht0 , hS∆, Sτ ii. 1. Trivial. 2. S∆ ` C : Sτ from ∆ ` C : τ in the well-formedness. 3. We have ∆ ` Ht (a) : Unq(∆(y)) for y ∈ Dom(Ht ) from the well-formedness and we already know S∆ ` h : Sτ 00 . Thus, noticing Dom(S) ⊆ Dom(S0 ) ⊆ F T V (∆, τ ), we conclude S∆ ` Ht0 (a) : Unq(S∆(y)) for y ∈ Dom(Ht0 ). 4. From S0 = S1 ◦ S, we obtain hS∆, Sτ i ≺S1 hΓ0 , τ0 i. We then introduce a notion of isolation, which is defined as a well-formed GC state with the objects in Hf having types of type variables. (We call Hf the isolated heap of H0 .) Definition B.2 (Isolation). hHf , Ht , h∆, τii is an isolation w.r.t. P and hΓ0 , τ0 i iff

Garbage Collection via Dynamic Type Inference: A Formal Treatment

239

1. hHf , Ht , h∆, τii is well-formed w.r.t. P and hΓ0 , τ0 i 2. For any a ∈ Dom(Hf ), ∆(a) ∈ T V ar. Once a heap is isolated for a program, its evaluation preserves the heap to be isolated. Lemma B.5 (Isolation Preservation). If hHf , Ht , h∆, τii is an isolation w.r.t. P and hΓ0 , τ0 i, and P −→ P 0 , then there is an isolation hHf , Ht0 , h∆0 , τii w.r.t. P 0 and hΓ00 , τ0 i. Proof. Analogous to the proof of Lemma A.2. Lemma B.6. If hHf , Ht , h∆, intii is an isolation w.r.t. P and hΓ0 , inti, then P does not accesses any x ∈ Dom(Hf ). Proof. Suppose that P accesses x and x ∈ Dom(Hf ). Then, P is either letrec H0 in hV, hΓ,τ i zi or letrec H0 in F [σ] ; . . . ; hV, hΓ,τ i let w : σ = (z y) in ci where V (z) = x. In the former case, we have hΓ, τi ≺ ≺S h∆ ◦ V, inti. Since we know from x ∈ Dom(Hf ) that ∆(V (z)) is a type variable t, Γ (z) must also be a type variable t0 with S(t0 ) = t. Since we can derive Γ ` z : τ , we have t0 ≥ τ , implying t0 = τ . However, it contradicts with S(τ ) = int. In the latter case, we have hΓ, τi ≺ ≺S h∆ ◦ V, Sτ i, ∆(V (z)) = t and Γ (z) = t0 s.t. S(t0 ) = t. Since we can derive Γ ` z : τ1 → τ2 for some τ1 , τ2 , we have t0 ≥ τ1 → τ2 and it is impossible. Using these lemmas, we can show that an isolated heap is semantic garbage. Theorem B.1 (Isolation Garbage). If hHf , Ht, h∆, intii is an isolation w.r.t. P and hΓ0 , inti, then Hf is semantic garbage for P . Proof. This follows from Lemma B.5 and Lemma B.6. The proof of correctness of GC completes by the following theorem that our algorithm finds an isolation. Theorem B.2 (GC Correctness). Let ∅ ` H0 : Γ0 , Γ0 ` C : int and ∆init be as defined in Section 5.4. Then, there exists hHf , Ht, h∆, intii such that hH0 , ∅, h∆init, intii =⇒∗ hHf , Ht, h∆, intii and hHf , Ht, h∆, intii is an isolation w.r.t. letrec H0 in C and hΓ0 , inti. Proof. By Theorem B.4, it suffices to show that hH0 , ∅, h∆init, intii is well-formed w.r.t. letrec H0 in C and hΓ0 , inti. Let ∆00 be as defined above. From ∅ ` H0 : Γ0 , we have h∆00 , inti ≺ hΓ0 , inti. By Lemma B.3, we can obtain S∆ ` C : Sτ and h∆init, inti ≺ hΓ0 , inti, which are sufficient for the well-formedness.

Strong Normalization by Type-Directed Partial Evaluation and Run-Time Code Generation Vincent Balat1 and Olivier Danvy2 1

´ D´epartement d’Informatique, Ecole Normale Sup´erieure de Cachan 61, avenue du Pr´esident Wilson, F-94230 Cachan Cedex, France [email protected] 2

BRICS Department of Computer Science, University of Aarhus Building 540, Ny Munkegade, DK-8000 Aarhus C, Denmark [email protected]

Abstract. We investigate the synergy between type-directed partial evaluation and run-time code generation for the Caml dialect of ML. Type-directed partial evaluation maps simply typed, closed Caml values to a representation of their long βη-normal form. Caml uses a virtual machine and has the capability to load byte code at run time. Representing the long βη-normal forms as byte code gives us the ability to strongly normalize higher-order values (i.e., weak head normal forms in ML), to compile the resulting strong normal forms into byte code, and to load this byte code all in one go, at run time. We conclude this note with a preview of our current work on scaling up strong normalization by run-time code generation to the Caml module language.

1 1.1

Introduction Motivation

Strong normalization: Suppose one is given a strongly normalizable (closed) λterm. How does one normalize this term? Typically one parses it into an abstractsyntax tree, one writes a strong normalizer over abstract-syntax trees, and one translates (unparses) the resulting normal form into whichever desired format (e.g., LATEX). A solution in ML: ML, like all functional languages, provides a convenient format for representing λ-terms: as an ML expression. Suppose thus that we are given a strongly normalizable ML expression. How do we normalize it? Type-directed partial evaluation [8] offers an efficient alternative to writing a parser to represent this ML expression as an ML data structure representing its abstract-syntax tree, writing a strong normalizer operating over this abstract-syntax tree, and X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 240–252, 1998. c Springer-Verlag Berlin Heidelberg 1998

Strong Normalization by TDPE and RTCG

241

unparsing the resulting normal form into an ML expression. Instead, the ML evaluator maps this ML expression into an ML value, and the type-directed partial evaluator maps this ML value into the abstract-syntax tree of its normal form. We can then either evaluate this abstract-syntax tree (for immediate use) or unparse it (for later use). Motivation: Type-directed partial evaluation entrusts the underlying programming language with all the mechanisms of binding and substitution that are associated with normalization. Higher-order abstract syntax [24] shares the same motivation, albeit in a Logical Framework instead of in a functional setting. Goal: Type-directed partial evaluation, as it is, maps an ML value into the text of its normal form. We want instead to map it into the corresponding ML value — and we want to do that in a lighter way than by invoking either an interpreter or the whole compiler, after normalization. An integrated solution in Objective Caml: Objective Caml [22] is a byte-code implementation of a dialect of ML. This suggests us to represent normal forms as byte code, and to load this byte code at run time for both immediate and later use. 1.2

Contribution

We report our experiment of integrating type-directed partial evaluation within Caml, which in effect yields strong normalization by run-time code generation. We list below what we had to do to achieve this integration: – we wrote several type-directed partial evaluators in Caml, in various styles and with various properties (listed below); – we wrote a dedicated translator from normal forms to byte code; – this required us to find the necessary (hidden) resources in the Caml implementation and recompile the system to make them available, in effect obtaining a more open implementation. These resources are mainly the representation of types, the representation of byte code, and the ability to load byte code at run time. 1.3

Non-contribution

Even though it is primarily inspired by theory, our work is experimental. Indeed, neither the OCaml compiler nor the OCaml virtual machine are formalized. We therefore have not formalized our byte-code translator either. As for typedirected partial evaluation, only its call-by-name version has been formalized so far [1,2,7]. In that sense our work is experimental: we want to investigate the synergy between type-directed partial evaluation and run-time code generation for OCaml.

242

Vincent Balat and Olivier Danvy

module ChurchNumbers = struct let cz s z = z let cs n s z = n s (s z) let rec n2cn n = if n=0 then cz else cs (n2cn (n-1)) let cn2n n = n (fun i -> i+1) 0 end

Fig. 1. Church numbers

1.4

An Example: Church numbers

Let us illustrate strong normalization by run-time code generation to optimize a computation over Church numbers, which we define in Figure 1. The module ChurchNumbers defines zero (cz), the successor function (cs), and two conversion functions to and from ML numbers and Church numbers. For example, we can convert the ML number 5 to a Church number, increment it, and convert the result back to ML as follows: # ChurchNumbers.cn2n(ChurchNumbers.cs (ChurchNumbers.n2cn 5));; - : int = 6 #

Thus equipped, let us define the function incrementing its argument with 1000: # let cs1000 m = ChurchNumbers.n2cn 1000 ChurchNumbers.cs m;; val cs1000 : ((’a -> ’a) -> ’a -> ’b) -> (’a -> ’a) -> ’a -> ’b = # ChurchNumbers.cn2n(cs1000 (ChurchNumbers.cz));; - : int = 1000 #

If it were not for ML’s weak-normalization strategy, 1000 β-reductions could be realized at definition time. We strongly normalize the value denoted by cs1000 by invoking our function nip (for “Normalize In Place”) on the name of the identifier cs1000: # nip "cs1000";; - : unit = () #

Now cs1000 denotes the strongly normalized value, as reflected by its execution time: applying cs1000 to the Church number 0 is 4800 times faster now. Depending on the version of the type-directed partial evaluator, normalization takes between 0.1 and 18 seconds. In this example, cs1000 then needs to be applied between 5 and 1000 times to amortize the cost of normalization.

Strong Normalization by TDPE and RTCG

1.5

243

Overview

The rest of this article is organized as follows. We first review type-directed partial evaluation (Section 2), independently of run-time code generation, and with two simple examples: the Hilbert combinators and Church numbers. We then describe run-time code generation in OCaml (Section 3). Putting them together, we report the measures we have collected (Section 4) and we assess the overall system (Section 5). The Caml implementation of modules suggests a very simple extension of our system to handling both first-order and higherorder modules, and we describe this extension in Section 6. After reviewing related work (Section 7), we conclude.

2

Type-Directed Partial Evaluation

Type-directed partial evaluation strongly normalizes closed values of parametric type, by two-level η-expansion [8,14]. Let us take two concrete examples, a simple one first, and a second one with Church numbers. We represent residual lambdaterms with the data type of Figure 2.

type exp = Var of string | Lam of string * exp | App of exp * exp

Fig. 2. Abstract syntax of the λ-calculus

module type SK_sig = sig val cS : (’a -> ’b -> ’c) -> (’a -> ’b) -> ’a -> ’c val cK : ’a -> ’b -> ’a end module SK : SK_sig = struct let cS f g x = f x (g x) let cK a b = a end

Fig. 3. Hilbert’s Combinatory Logic basis

244

2.1

Vincent Balat and Olivier Danvy

The Hilbert Combinators

As is well-known, the identity combinator I can be defined with the Hilbert combinators S and K. This is often illustrated in ML with the functions cS and cK defined in Figure 3: # let cI x = SK.cS SK.cK SK.cK x;; val cI : ’a -> ’a = # cI 42;; - : int = 42

It is the point of type-directed partial evaluation that one can visualize the text of cI by two-level η-expansion. In the present case, all we need is to η-expand cI with a dynamic introduction rule (the construction of a residual lambdaabstraction) and a static elimination rule (an ML application): # let ee_id f = Lam("x", f (Var "x"));; val ee_id : (exp -> exp) -> exp = # ee_id (SK.cS SK.cK SK.cK);; - : exp = Lam ("x", Var "x") #

where in the definition of ee id, x is fresh. The result of applying ee id to the ML identity function is its text in normal form. 2.2

Church Numbers

Let us play the same game with Church numbers. The type of a Church number is (’a -> ’a) -> ’a -> ’a

Since it is composed with three arrows, we need to η-expand it three times. Since the two outer arrows occur positively, we η-expand a Church number cn with two dynamic introduction rules and two static elimination rules: Lam("s", Lam("z", cn (...(Var "s")...) (Var "z")))

where s and z are fresh. Since the inner arrow occurs negatively, we η-expand the corresponding variable s with one static introduction rule (an ML abstraction) and one dynamic elimination rule (the construction of a residual application): fun v -> App(Var "s", v)

The result reads as follows: # let ee_cn cn = Lam("s", Lam("z", cn (fun v -> App(Var "s", v)) (Var "z")));; val ee_cn : ((exp -> exp) -> exp -> exp) -> exp = #

Strong Normalization by TDPE and RTCG

245

We are now equipped to visualize the normal form of a Church number, e.g., 2: # ee_cn (ChurchNumbers.n2cn 2);; - : exp = Lam("s", Lam("z", App(Var "s", App(Var "s", Var "z")))) #

The result of applying ee cn to the ML Church number 2 is the text of this Church number in normal form. 2.3

Summary and Conclusion

We have illustrated type-directed partial evaluation in ML with two very simple examples: the Hilbert combinators and Church numbers. We have defined them in ML and we have constructed the text of their normal form, by two-level η-expansion. Type-directed partial evaluation directly constructs two-level η-redices, given a representation of the type of the value to normalize. It also handles more types, such as base types (in restricted position), and can interpret function types as having a computational effect (in which case it inserts a residual let expression, using continuations). Figure 4 displays our grammar of admissible types.

htypei ::= hcovariant-typei hcovariant-typei ::= | | | hcontravariant-typei ::= | | |

hbase-typei variable hcontravariant-typei "->" hcovariant-typei hcovariant-typei * ... * hcovariant-typei bool variable hcovariant-typei "->" hcontravariant-typei hcontravariant-typei * ... * hcontravariant-typei

hbase-typei ::= unit | int | float | bool | string

Fig. 4. Abstract syntax of types

We therefore implemented several type-directed partial evaluators: – inserting or not inserting let expressions; and – in a purely functional way, i.e., implementing two-level eta-expansion directly in ML, using Andrzej Filinski and Zhe Yang’s strategy,1 or with an explicit representation of two-level terms as the abstract-syntax tree of an ML expression (which is then compiled). 1

Personal communications to the second author, spring 1995 and spring 1996 [27].

246

Vincent Balat and Olivier Danvy

In the following section, instead of constructing a normal form as an abstractsyntax tree, we construct byte code and load it in place, thereby obtaining the effect of strong normalization by type-directed partial evaluation and run-time code generation.

3

Run-Time Code Generation

We therefore have written a translator mapping a term in long βη-normal form into equivalent byte code for the OCaml virtual machine. And we load this byte code and update in place the value we have normalized.

3.1

Generating Byte Code

We do not generate byte code by calling the Caml compiler on the text of the normal forms. The language of normal forms is a tiny subset of ML, and therefore we represent it with a dedicated abstract syntax. Since normal forms are well typed, we also shortcut the type-checking phase of the compiler. Finally, we choose not to use the resident byte-code generator: instead, we use our own translator from normal forms to byte code.

3.2

Loading Byte Code

For this we need to access OCaml’s byte-code loader, which required us to open its implementation. We have thus added more entry points in some of the modules that are available at the user level (i.e., Caml’s toplevel). We have also made several interfaces available, by copying them in the OCaml libraries. We essentially needed access to functions for loading byte code, and access to the current environment and its associated access functions. As a side benefit, our user does not need to specify the type of the value to optimize, since we can retrieve this information in the environment.

3.3

Updating in situ

Finally, being given the name of a variable holding a value to optimize, and being able to find its type in the environment, nothing prevents us to update the binding of this variable with the optimized value — which we do. We illustrated the whole process in Section 1.4, by – defining a variable cs1000 denoting 1000 compositions of Church’s successor function, and – normalizing it in place with our function nip.

Strong Normalization by TDPE and RTCG

4

247

Applications

We have tested our system with traditional partial-evaluation examples, the biggest of which are definitional interpreters for programming languages. The results are consistent with the traditional results reported in the partial-evaluation literature [20]: the user’s mileage may vary, depending (in the present case) on how much strong normalization is hindered by ML’s weak-normalization strategy. The definitional interpreters we have considered are traditional in partial evaluation: they range from a simple while language [5] to an Algol-like language with subtyping and recursion [16]. Our interpreters are written in Caml. Some use continuation-passing style (CPS), and the others direct style. In the definitional interpreters, iteration and recursion are handled with fixed-point operators. All our examples clearly exhibit a speedup after normalization. The specialized version of an interpreter with respect to a program, for example, is typically 2.5 times faster after normalization. On some other examples (e.g., Section 1.4), the residual programs are several thousand times faster than the (unnormalized) source program. The computational resources mobilized by type-directed partial evaluation vary wildly, depending on the source program. For example, specializing a directstyle interpreter with respect to a 10000-lines program takes 45 seconds and requires about 170 runs to be amortized. Specializing a CPS interpreter with respect to a 500-lines program, on the other hand, takes 20 minutes. We believe that this low performance is due to an inefficient handling of CPS in OCaml. Essentially the same implementation takes a handful of seconds in Chez Scheme for a 1000-lines program, with less than 0.5 seconds for type-directed partial evaluation proper, and with a fairly small difference if the interpreter is in direct style or in CPS. We also experimented with the resident OCaml byte-code generator, which is slower by a factor of at least 3 than our dedicated byte-code generator. This difference demonstrates that using a special-purpose byte-code generator for normal forms is a worthwhile optimization.

5

Assessment

Although so far we are its only users, we believe that our system works reasonably well. In fact, we are in the process of writing a users’s manual. Our main problem at this point is the same as for any other partial evaluator: speedups are completely problem-dependent. In contrast with most other partial evaluators, however, we can quantify this statement: because (at least in its pure form) type-directed partial evaluation strongly normalizes its argument, we can state that it provides all the (strong) normalization steps that are hindered by ML’s weak-normalization strategy. Our secondary problem is efficiency: because OCaml is a byte-code implementation, it is inherently slower than a native code implementation such as

248

Vincent Balat and Olivier Danvy

Chez Scheme [18], which is our reference implementation. Therefore our benchmarks in OCaml are typically measured in dozens of seconds whereas they are measured in very few seconds in Chez Scheme.2 Efficiency becomes even more of a problem for the continuation-based version of the type-directed partial evaluator: whereas Chez Scheme represents continuations very efficiently [19], that is not the case at all for OCaml. On the other hand, the continuation-based partial evaluator yields perceptibly better residual programs (e.g., without code duplication because of let insertion). Caveat: If our system is given a diverging source program, it diverges as well. In that it is resource-unbounded [13,17].

6

Towards Modular Type-Directed Partial Evaluation

In a certain sense, ML’s higher-order modules are essentially the simply typed lambda-calculus laid on top of first-order modules (“structures”) [23]. Looking under the hood, that is precisely how they are implemented. This suggests us to extend our implementation to part of the Caml module language. Enabling technology: After type-checking, first-order modules (“structures”) are handled as tuples and higher-order modules (“functors”) are handled as higherorder functions. Besides, enough typing information is held in the environment to be able to reconstruct their type. Put together, these two observations make it possible for us to reuse most of our existing implementation. Achievements and limitations: We handle a subset of the Caml module language, excluding polymorphism and sharing constraints. An example: typed Combinatory Logic. Let us build on the example of Section 2.1. We have located the definition of the Hilbert combinators in a module defining our standard Combinatory Logic basis (see Figure 3). We then define an alternative basis in another module, in terms of the first one (see Figure 5). Because of ML’s weak-normalization strategy, using the alternative basis incurs an overhead. We can eliminate this overhead by normalizing in place the alternative basis: # nip_module "BCWK";; - : unit = () #

What happens here is that the identifier BCWK denotes a tuple with four entries, each of which we already know how to process. Given the name of this identifier, the implementation 2

For comparison, an interpreter-based and optimized implementation of type-directed partial evaluation in ML consistently performs between 1000 and 10000 times slower than the implementation in Chez Scheme [25]. The point here is not byte code vs. native code, but interpreted code vs. compiled code.

Strong Normalization by TDPE and RTCG

249

module type BCWK_sig = sig val cB : (’a -> ’b) -> (’c -> ’a) -> ’c -> ’b val cC : (’a -> ’b -> ’c) -> ’b -> ’a -> ’c val cW : (’a -> ’a -> ’b) -> ’a -> ’b val cK : ’a -> ’b -> ’a end module BCWK : BCWK_sig = struct open SK let cB f g x = cS (cK cS) cK f g x let cC f x y = cS (cS (cK (cS (cK cS) cK)) cS) (cK cK) f x y let cW f x = cS cS (cK (cS cK cK)) f x let cK = cK end

Fig. 5. A Combinatory Logic basis of regular combinators

1. 2. 3. 4. 5. 6.

7

locates it in the Caml environment; accesses its type; constructs the simple type of a tuple of four elements; strongly normalizes it, using type-directed partial evaluation; translates it into byte code, and loads it; updates in place the environment to make the identifier BCWK denote the generated code.

Related Work

Partial evaluation is traditionally defined as a source-to-source program transformation [6,20]. Type-directed partial evaluation departs from that tradition in that it is a compiled-to-source program transformation. Run-time code generation completes the picture by providing a source-to-compiled transformation at run time. It is thus a natural idea to compose both, and this has been done in two settings, using offline partial-evaluation techniques: For imperative languages: the Compose research group at Rennes is doing runtime code generation for stock languages such as C, C++ , and Java [3]. For functional languages: Sperber and Thiemann have paired a traditional, syntax-directed partial evaluator and a run-time code generator for a bytecode implementation of Scheme [26]. Both settings use binding-time analysis. Sperber and Thiemann’s work is the most closely related to ours, even though their partial evaluator is syntaxdirected instead of type-directed and though they consider an untyped and module-less language (Scheme) instead of a typed and modular one (ML). A remarkable aspect of their work, and one our implementation so far has failed to

250

Vincent Balat and Olivier Danvy

achieve, is that they deforest the intermediate representation of the specialized program, i.e., their partial evaluator directly generates byte code. Alternative approaches to partial evaluation and run-time code generation include Leone and Lee’s Fabius system [21], which only handles “staged” firstorder ML programs but generates actual assembly code very efficiently.

8

Conclusion and Issues

We have obtained strong normalization in ML by pairing type-directed partial evaluation and run-time code generation. We have implemented a system in Objective Caml, whose byte code made it possible to remain portable. The system can be used in any situation where strong normalization could be of benefit. Besides the examples mentioned above, we have applied it to type specialization [9], lambda-lifting and lambda-dropping [10], formatting strings [11], higher-order abstract syntax [12], and deforestation [15]. We are also considering to apply it for cut elimination in formal proofs, in a proof assistant. We are in the process of extending our implementation for a subset of the Caml module language. This extension relies on the run-time treatment of structures and of functors, which are represented as tuples and as higher-order functions. Therefore, in a pre-pass, we assemble type information about the module to normalize (be it first order or higher order), we coerce it into simply typed tuple and function constructions, and we then reuse our earlier implementation. The practical limitations are the same as for offline type-directed partial evaluation, i.e., source programs must be explicitly factored prior to specialization. The module language, however, appears to be a pleasant support for expressing this factorization.

Acknowledgements This work is supported by BRICS (Basic Research in Computer Science, Centre of the Danish National Research Foundation; http://www.brics.dk). It was carried out at BRICS during the summer of 1997. We are grateful to Xavier Leroy for supplying us with a version of call/cc for OCaml, and to the anonymous reviewers for comments.

References 1. Ulrich Berger. Program extraction from normalization proofs. In M. Bezem and J. F. Groote, editors, Typed Lambda Calculi and Applications, number 664 in Lecture Notes in Computer Science, pages 91–106, Utrecht, The Netherlands, March 1993. 2. Ulrich Berger and Helmut Schwichtenberg. An inverse of the evaluation functional for typed λ-calculus. In Proceedings of the Sixth Annual IEEE Symposium on Logic in Computer Science, pages 203–211, Amsterdam, The Netherlands, July 1991. IEEE Computer Society Press.

Strong Normalization by TDPE and RTCG

251

3. The COMPOSE Project. Effective partial evaluation: Principles and applications. Technical report, IRISA (http://www.irisa.fr), Campus Universitaire de Beaulieu, Rennes, France, January 1996 – May 1998. A selection of representative publications. 4. Charles Consel, editor. ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, Amsterdam, The Netherlands, June 1997. ACM Press. 5. Charles Consel and Olivier Danvy. Static and dynamic semantics processing. In Robert (Corky) Cartwright, editor, Proceedings of the Eighteenth Annual ACM Symposium on Principles of Programming Languages, pages 14–24, Orlando, Florida, January 1991. ACM Press. 6. Charles Consel and Olivier Danvy. Tutorial notes on partial evaluation. In Susan L. Graham, editor, Proceedings of the Twentieth Annual ACM Symposium on Principles of Programming Languages, pages 493–501, Charleston, South Carolina, January 1993. ACM Press. 7. Catarina Coquand. From semantics to rules: A machine assisted analysis. In Egon B¨ orger, Yuri Gurevich, and Karl Meinke, editors, Proceedings of CSL’93, number 832 in Lecture Notes in Computer Science. Springer-Verlag, 1993. 8. Olivier Danvy. Type-directed partial evaluation. In Guy L. Steele Jr., editor, Proceedings of the Twenty-Third Annual ACM Symposium on Principles of Programming Languages, pages 242–257, St. Petersburg Beach, Florida, January 1996. ACM Press. 9. Olivier Danvy. A simple solution to type specialization. Technical Report BRICS RS-98-1, Department of Computer Science, University of Aarhus, Aarhus, Denmark, January 1998. To appear in the proceedings of ICALP’98. 10. Olivier Danvy. An extensional characterization of lambda-lifting and lambdadropping. Technical Report BRICS RS-98-2, Department of Computer Science, University of Aarhus, Aarhus, Denmark, January 1998. 11. Olivier Danvy. Formatting strings in ML (preliminary version). Technical Report BRICS RS-98-5, Department of Computer Science, University of Aarhus, Aarhus, Denmark, March 1998. To appear in the Journal of Functional Programming. 12. Olivier Danvy. The mechanical evaluation of higher-order expressions. In Preliminary proceedings of the 14th Conference on Mathematical Foundations of Programming Semantics, London, UK, May 1998. 13. Olivier Danvy, Nevin C. Heintze, and Karoline Malmkjær. Resource-bounded partial evaluation. ACM Computing Surveys, 28(2):329–332, June 1996. 14. Olivier Danvy, Karoline Malmkjær, and Jens Palsberg. The essence of etaexpansion in partial evaluation. LISP and Symbolic Computation, 8(3):209–227, 1995. An earlier version appeared in the proceedings of the 1994 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation. 15. Olivier Danvy and Kristoffer Høgsbro Rose. Deforestation by strong normalization. Technical report, BRICS, University of Aarhus and LIP, ENS Lyon, April 1998. To appear. 16. Olivier Danvy and Ren´e Vestergaard. Semantics-based compiling: A case study in type-directed partial evaluation. In Herbert Kuchen and Doaitse Swierstra, editors, Eighth International Symposium on Programming Language Implementation and Logic Programming, number 1140 in Lecture Notes in Computer Science, pages 182–197, Aachen, Germany, September 1996. Extended version available as the technical report BRICS-RS-96-13. 17. Saumya Debray. Resource-bounded partial evaluation. In Consel [4], pages 179– 192.

252

Vincent Balat and Olivier Danvy

18. R. Kent Dybvig. The Scheme Programming Language. Prentice-Hall, 1987. 19. Robert Hieb, R. Kent Dybvig, and Carl Bruggeman. Representing control in the presence of first-class continuations. In Bernard Lang, editor, Proceedings of the ACM SIGPLAN’90 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 25, No 6, pages 66–77, White Plains, New York, June 1990. ACM Press. 20. Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial Evaluation and Automatic Program Generation. Prentice Hall International Series in Computer Science. Prentice-Hall, 1993. 21. Mark Leone and Peter Lee. Lightweight run-time code generation. In Peter Sestoft and Harald Søndergaard, editors, Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, Technical Report 94/9, University of Melbourne, Australia, pages 97–106, Orlando, Florida, June 1994. 22. Xavier Leroy. The Objective Caml system, release 1.05. INRIA, Rocquencourt, France, 1997. 23. David B. MacQueen. Modules for Standard ML. In Guy L. Steele Jr., editor, Conference Record of the 1984 ACM Symposium on Lisp and Functional Programming, pages 198–207, Austin, Texas, August 1984. 24. Frank Pfenning and Conal Elliott. Higher-order abstract syntax. In Mayer D. Schwartz, editor, Proceedings of the ACM SIGPLAN’88 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 23, No 7, pages 199–208, Atlanta, Georgia, June 1988. ACM Press. 25. Tim Sheard. A type-directed, on-line, partial evaluator for a polymorphic language. In Consel [4], pages 22–35. 26. Michael Sperber and Peter Thiemann. Two for the price of one: composing partial evaluation and compilation. In Ron K. Cytron, editor, Proceedings of the ACM SIGPLAN’97 Conference on Programming Languages Design and Implementation, SIGPLAN Notices, Vol. 32, No 5, pages 215–225, Las Vegas, Nevada, June 1997. ACM Press. 27. Zhe Yang. Encoding types in ML-like languages (preliminary version). Technical Report BRICS RS-98-9, Department of Computer Science, University of Aarhus, Aarhus, Denmark, April 1998.

Determination of Dynamic Method Dispatches Using Run-Time Code Generation Nobuhisa Fujinami Sony Computer Science Laboratory Inc.

Abstract. Run-time code generation (RTCG) enables program optimizations specific to values that are unknown until run time and improves the performance. This paper shows that RTCG can be used to determine dynamic method dispatches. It can produce a better result than conventional method dispatch prediction mechanisms because other run-time optimizations help the determination. Further, the determined functions can be inlined, and this may lead to other optimizations. These optimizations are implemented in the author’s RTCG system. The evaluation results showed good performance improvement.

1

Introduction

Run-time code generation (RTCG) is a partial evaluation [1] performed at run time. It generates machine code specific to values which are unknown until run time and enhances the speed of a program, while preserving its generality. RTCG itself is becoming a mature technique. Description languages and systems for runtime code generators have been proposed. Systems that automatically generate run-time code generators from source programs have also been proposed. Also, much efforts has been made to improve the performance of objectoriented languages. Recent research papers have focused on using run-time type feedback or static type inference to optimize dynamic method dispatch. This paper describes an RTCG system that can optimize dynamic method dispatches of an object-oriented language. This system can produce a better results than conventional method dispatch prediction mechanisms because other run-time optimizations, such as global constant propagation/folding and complete loop unrolling, help the determination. This paper focuses on these optimizations. The basics of the RTCG system itself are described only briefly in this paper. Refer to [2], [3], and [4] for details. This system focuses on the instance variables of objects and uses the fact that objects can be regarded as closures [5]. If the values of some instance variables are run-time constants, the system generates specialized code generators for methods that use them. Machine code routines optimized to their values are generated at run time. The rest of the paper is organized as follows: Section 2 overviews the RTCG system. The optimizations implemented in the system are described in Section 3. Section 4 evaluates the optimizations. Section 5 overviews related research. Finally, Section 6 provides a summary and future plans. X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 253–271, 1998. c Springer-Verlag Berlin Heidelberg 1998

254

2

Nobuhisa Fujinami

System Overview

This section describes briefly the RTCG system for an object-oriented language proposed by the author. As stated in Section 1, RTCG improves the efficiency of programs by generating machine code optimized to values that are unknown until run time, e.g. intermediate results of computation and the user’s inputs. If programs operating on these values are written in object-oriented languages, it is natural to define objects with instance variables that represent the values known at run time. For example, to program stream input/output functions, the programmer may assign descriptors of files, sockets, strings, etc., to instance variables of stream objects. Stream objects may have methods for reading or writing streams, which have the descriptors as their run-time constants. Another example is the generation and rendering of a three dimensional scene. The programmer may represent the scene, which is a run-time constant during rendering, as a scene object with instance variables representing a set of graphics objects, a viewing point, light sources, etc. The scene object’s methods for rendering can be optimized through RTCG. The benefits of focusing on instance variables of objects are as follows: Automation of the timing of code generation/invalidation: Because of the encapsulation mechanism of object-oriented languages, all the assignments to non-public instance variables (e.g. private data members in C++) can be known, except for indirect accesses through pointers, from the definition of the class and its methods. Since the system knows when to generate/invalidate code, the programmer is freed from annotating programs and from providing suitable parameters to preserve consistency between the embedded values in the code and the actual values. Automation of the management of generated code: Since generated machine code (a specialized method) can be viewed as a part of the instance, management of it can be left to the instance creation/destruction mechanism of a object-oriented language. Management of multiple machine code routines for the same method is trivial. The generated machine code can be automatically invoked instead of using the original method. The programmer is freed from managing memory for the code and from rewriting programs to invoke the code. The system is implemented as a preprocessor of a C++ compiler. The current implementation is for Borland C++ compilers (Version 4.0 or higher) running on 80x86 based computers with a Win32 API. The executable file name is RPCC.EXE. The reasons for choosing C++ as the source language are as follows: – Since C++ has static type declarations, it is easy to determine the types of values used in the run-time code generator. – Since C++ is quite an efficient object-oriented language, the system can provide the best possible implementation of a program written in a highlevel language.

Determination of Dynamic Method Dispatches

255

The programmer directs the system to use RTCG by inserting the keyword runtime before a declaration of a member function.1 The system assumes all the “known” data members (see the next paragraph) used but not changed in that member function to be run-time constants. The programmer can direct not to assume data members as run-time constants by putting the keyword dynamic before the definitions of the members. The “known” data members are detected as follows: In the first step of analyzing the source program, all private, protected, or const2 data members of the class without the keyword dynamic are marked “known”. Then, if any of the member functions of the class use the non-const member that satisfies the following conditions, the mark for the member is cleared: – The address of the member is taken, e.g. an operand of the unary & operator or a reference parameter. – The address is passed directly or indirectly via casts or the binary + and − operators to a variable, as a function parameter, or as a return value. – The type of the destination is not a pointer/reference to a const. The values of the members still marked “known” are known in the sense that only the functions that explicitly use or modify the members can use or modify them. Let F be a member function with the keyword runtime, and let X be any data member marked “known”. If X is used, but not changed in F, X is treated as a run-time constant in the code generator for F. If X is a run-time constant in the code generator for F and member function G changes X, the code to invalidate the machine code for F is inserted into G. If such G’s exist, and F calls other functions, then X may be modified during the execution of the generated code. In this case, a new data member is introduced to count the number of active executions of F. Code to check the counter value is inserted into G. If the value is not zero, the code warns the programmer that the insertion of the keyword runtime is inappropriate.3 Figure 1 shows an overall organization of the system. The upper half illustrates the action at compile time, and the lower half illustrates the program execution. At compile time, C++ preprocessor directives in a source program are processed first (CPP32.EXE in Borland C++). Then RPCC.EXE analyzes the program and generates, if necessary, run-time code generators in C++. The code generators, the code for invoking them, and the code for invoking/invalidating the generated code are embedded into the original source program. The output 1

2

3

Automatic detection of the applicability is possible but not practical, because a too aggressive application of RTCG increases the compilation time and the size of the executable file. It may violate the assumption of the analysis to cast a pointer to const into a pointer to non-const. Such an attempt is considered to be illegal because it is not safe to modify an object through such a pointer. Using the exception handling of C++ may lead to false warnings because the counter may not have been decreased correctly. In this case, catching exceptions in F will solve the problem.

256

Nobuhisa Fujinami

Source Program (file.cpp) C++ Preprocessor (CPP32.EXE in Borland C++) (file.i) Preprocessor (RPCC.EXE by the author) Translator from source to intermediate representation Optimizer of intermediate representation Generator of run-time code generator (file.out) C++ Compiler (BCC32.EXE in Borland C++) (file.exe) Compiled program with run-time code generators compile time run time run-time constant ... Run-time code generator ... generate

Compiled program

: contents of file.exe

run-time constant ... Call

Run-time code generator ... generate

Machine code Machine code

Fig. 1. Organization of the implemented system

is compiled into an executable file using a normal C++ compiler (BCC32.EXE in Borland C++). The source program and its intermediate representation are manipulated only at this compile time. At run time, code generators are invoked with run-time constants as parameters. They generate member functions optimized to run-time constants in machine code format. Each code generator is specific to one member function. Since the code is directly written into memory, and since neither source program nor intermediate representation of it is used, code generation is efficient. One code generator may generate multiple machine code routines with different run-time constant values. The generated routines, which are expected to be more efficient than statically compiled ones, are invoked instead of the original member functions. Figure 2 shows an example of an input to RPCC.EXE. Figure 3 shows the output (comments are added for readability). Preprocessor RPCC.EXE processes member functions with the keyword runtime and generates run-time code generators in C++. Pointers to generated machine code routines are added to the class as its data members; code generators are added as its member functions. The processed member functions are replaced with code fragments to check the validity of the generated code, to invoke the code generators if necessary, and to invoke the generated code. The preprocessor also inserts code for deleting generated machine code in the destructors and in the member functions that modifies the data members embedded in the generated machine code.

Determination of Dynamic Method Dispatches

257

class A { private: int x; public: A(int i); runtime int f(int y); }; A::A(int i): x(i) {} int A::f(int y) { return y-x*x; }

Fig. 2. Example of an input to RPCC.EXE

3

Optimizations

The optimizations of the machine code generated by the run-time code generator are divided into two categories: those detected at compile time and those detected at code generation time (i.e. run time). The former are treated in a way similar to conventional code optimizations. They include constant propagation/folding, copy propagation, strength reduction, reassociation, redundant code elimination, algebraic simplification, jump optimization, delay slot filling, and loop invariant motion. Since the latter is performed at run time, the efficiency is important. The system adopts a method to generate machine code directly. It does not manipulate any source program or its intermediate representation at run time. The output from RPCC.EXE contains optimization routines specialized to the target member functions. Optimizations performed at run time include local constant propagation/folding, strength reduction, redundant code elimination, and algebraic simplification. Because of the naive implementation of RPCC.EXE, redundant optimization code may be included in the code generator, but most of it is optimized away by the C++ compiler (Code generators are generated in C++. See Section 2.). The rest of this section describes non-trivial optimizations performed at code generation time. These optimizations include global run-time constant propagation, complete loop unrolling, and virtual function inlining. 3.1

Intermediate Representation

This subsection describes intermediate representation used at compile time. RPCC.EXE consists of three phases similar to conventional compilers (see Figure 1): 1. Translator from source to intermediate representation 2. Optimizer of intermediate representation 3. Generator of run-time code generator

258

Nobuhisa Fujinami

#include class A { private: int x; public: A(int i); int f(int y); ~A() char *qq_f; void qq__f() const; static char *qql_f; static char *qql__f(); };

// macros and functions for RTCG

// // // // //

destructor pointer to generated code code generator address of label "generate" in f function to initialize qql_f

A::~A() { if(qq_f!=qql_f) delete qq_f; } A::A(int i): x(i) ,qq_f(qql_f){} int A::f(int ) { retry: asm MOV ECX,this; asm JMP DWORD PTR [ECX].qq_f; // jump to generated code generate: qq__f(); // invoke code generator goto retry; } char *A::qql_f=qql__f(); void A::qq__f() const { char *qqcode; // code address // prologue code generator (omitted) qqMOVdx(0,5,12); // MOV EAX,[EBP+12] ; y qqSUB_I(0,(int)x*x); // SUB EAX,x*x // epilogue code generator (omitted) *(char **)&qq_f=qqcode; // set code address }

Fig. 3. Example of an output from RPCC.EXE (Macro qqXX(YY) writes instruction XX with operand(s) YY into memory.)

Determination of Dynamic Method Dispatches

=

=

select < i n

259

no return s

s 0 i 0 ?1: temporary yes = = = ?1 fcall s + i + f i s ?1 i 1 Fig. 4. Intermediate representation s+=f(i); return s; }

of { int i,s=0; for(i=0;i
The intermediate representation format used in these phases is designed to be suitable for generating run-time code generators. Intermediate representation is a flow graph that represents the meaning of a function. It is generated for each function to be optimized or inlined. The nodes of the graph are basic blocks. They are represented as a sequence of statements. Figure 4 shows an example of intermediate representation. A statement is one of assignment, function invocation, selection, switch, virtual, and return. A goto-statement is represented as an edge of the flow graph and has no special node. Some statements have expressions as their operands. Expressions are represented as directed acyclic graphs (DAGs) whose nodes are operators, identifiers, or constants. Expressions with side effects are divided into multiple statements. Conditional operators (&&, ||, and ?:) are expressed using separate basic blocks. Temporary variables may be introduced in this transformation. The reasons for using DAGs instead of a conventional flat format such as a three-address code are as follows: – It is easy to classify nodes into stages (see the next paragraph). – It is easy to reconstruct C++ expressions to calculate run-time constants which will be embedded into run-time code generators. – It is easy to replace variables with expressions. This operation is necessary in the optimization described in Subsection 3.2. Leaf nodes of DAGs are identifiers or compile-time constants. Each identifier’s entry in the symbol table has a flag that tells if the identifier is a data member treated as a run-time constant. RPCC.EXE classifies nodes into three stages: “compile time”, “code generation time”, and “dynamic”. Simple traversal of DAGs can classify internal nodes and statements into three stages. When the run-time code generator is generated, the stage information is used as follows: If the stages of a subgraph is “compile time”, its value is calculated at compile time and the result is embedded into the run-time code generator. If its stage is “code generation time”, a C++ expression that calculates the run-time

260

Nobuhisa Fujinami

=

select select yes yes return y >= < * y p q y x x y no

no

Fig. 5. Intermediate representation before run-time constant propagation

select >= p * x

x no

select < q *

yes x

yes

return * x x

x no

Fig. 6. Intermediate representation after run-time constant propagation

constant value is embedded into the code generator. If its stage is “dynamic”, a code generation routine for it is embedded. The system inlines functions during translation to intermediate representation. Functions that perform recursive calls are not inlined to prevent infinite loops of partial evaluation. 3.2

Global Run-time Constant Propagation

This optimization is processed in the second phase (optimizer of intermediate representation). Like conventional compilers, this phase performs dataflow analysis on the flow graph, propagates compile-time constants, eliminates redundant code, etc. Simple extension of compile-time constant propagation allows runtime global constant propagation. In the normal constant propagation, only the values of compile time constant expressions are propagated to the places of their use. If the right operand of an assignment operator is an arithmetic expression consisting of run-time constants, this phase also propagates the expression. Thus, all the use of run-time constants are replaced with expressions that compute their values. In the third phase (generator of run-time code generator), the new expressions are classified as “compile time”. The C++ expressions are reconstructed and embedded into the run-time code generator. This enables global run-time constant propagation. For example, if x is a run-time constant in the block: { y=x*x; if(y>=p && y
Determination of Dynamic Method Dispatches

261

qqMOVdx(0,5,12); // MOV EAX,[EBP+12] ; p qqCMP_I(0,x*x); // CMP EAX,x*x qqGenJccF(G,2); // JG L2 qqMOVdx(1,5,16); // MOV ECX,[EBP+16] ; q qqCMP_I(1,x*x); // CMP ECX,x*x qqGenJccF(G,5); // JG L5 qqGenLbl(2); //L2: // // code generator for other basic blocks // qqGenLbl(5); //L5: qqMOV_I(0,x*x); // MOV EAX,x*x QQEXIT // exit code

Fig. 7. Run-time code generator (qqXX is a macro for code generation) MOV EAX,[EBP+12] ; p CMP EAX,9 JG L2 MOV ECX,[EBP+16] ; q CMP ECX,9 JG L5 L2: ; ; code for other basic blocks ; L5: MOV EAX,9 ; exit code

Fig. 8. Generated machine code if x=3 (translated into Intel mnemonic)

(its intermediate representation is in Figure 5), uses of y are replaced with the run-time constant expression x*x (see Figure 6). In the next phase, a code generation routine shown in Figure 7 is generated. If run-time constant x=3 is supplied to the code generator at run time, machine code shown in Figure 8 will be generated. Since this optimization technique is flow-sensitive, it successfully processes cases in which a variable is a run-time constant at one point in the program, but not at another point. For example, suppose p and q are not run-time constants and an assignment y=q-p; follows the code in Figure 5. In this case, the variable y does not always hold a run-time constant, but an assignment y=x*x; is successfully propagated using dataflow analysis. An assignment y=q-p; does not disturb the optimization in Figure 6. Using the Static Single Assignment Form (SSA) and a flat intermediate representation format also allows flow-sensitive stage classification. This method avoids duplicated run-time constant expressions (e.g. x*x in the previous ex-

262

Nobuhisa Fujinami

int C::g(int p,int q) { int s=0; for(i=0;i=p && y
Fig. 9. Member function g ample) but requires stage annotations for all of the temporary variables. The system does not adopt it and leaves the optimization to eliminate duplicated run-time constant expressions, if any, to the C++ compiler. 3.3

Complete Loop Unrolling

Unlike other run-time code generation systems (see Section 5), the author’s system can unroll only simple loops. But it automatically decides whether each loop should be fully unrolled or not, depending on the upper limit of the number of iterations of the loop. In the second phase (optimizer of intermediate representation), loops are detected during dataflow analysis. It checks to see if one of the exits of the loop is a simple comparison of the control variable and compile- or run-time constant, and if the initial value and the update step are compile- or run-time constants. The loop may have other exits. If a loop passes the check, its upper limit of the number of iterations are compile- or run-time constant. The third phase (generator of run-time code generator) emits a code fragment to unroll the loop based on this information. If the upper limit of the number of iterations is a compile-time constant, either a normal code generation routine or a complete unrolling routine of the loop is generated, depending on the upper limit. If it is a run-time constant, code generators of both versions are generated, and the selection is performed at code generation time. To support both versions, the run-time constant propagator is extended. If a run-time constant expression contains a loop control variable, the propagated expression (for calculating the value of the run-time constant) is represented as a special node that also holds the original variable. An example is a loop in function g in Figure 9. If a and n are run-time constant data members of class C, the use of y is replaced with a special node that holds both y and a[i]. If the loop is not unrolled, variable y is used in the if statement. If the loop is completely unrolled, run-time constant a[i] is used in the if statement. If the loop is completely unrolled, the stage of the loop control structure is assumed to be “code generation time” in the output code generator. The loop control variable becomes a variable of the run-time code generator and is treated as a run-time constant in the body of the loop (see Figure 10).

Determination of Dynamic Method Dispatches qq0i=0; // i=0 for(;;) { qqMOVdx(0,5,12); // MOV EAX,[EBP+12] qqCMP_I(0,a[qq0i]); // CMP EAX,a[qq0i] qqGenJccF(G,5); // JG L5 qqMOVdx(1,5,16); // MOV ECX,[EBP+16] qqCMP_I(1,a[qq0i]); // CMP ECX,a[qq0i] qqGenJccF(LE,5); // JLE L5 qqADD_I(6,1); // INC ESI qqGenLbl(5); //L5: qq0i=qq0i+1; // i++ if(!(qq0i
263

; p ; a[i] ; q ; a[i] ; s

Fig. 10. Code generator for the loop in g (qqXX is a macro for code generation)

class of p: C1 Virtual p

C2 C3

I1 I2

next statement

I3 : : Fig. 11. Flow graph of virtual function invocation

3.4

Inlining Virtual Functions

Suppose a run-time constant is a pointer to a class object. If virtual function invocation is used with this pointer, the actual invoked function can be determined at code generation time. This is because of the following reason: Since the pointer value is constant, the object’s address cannot be changed, and a legal way to change the class is to use union. A class with some virtual functions should have an implicit or an explicit constructor. If a class has any constructor, it cannot be a member of union. Thus the actual class of the object cannot be changed. Furthermore, if the invoked function is declared as inline, it can be inlined4 . In the intermediate representation used in RPCC.EXE, virtual function invocation is represented as a special node “Virtual” (see Figure 11). Each Ik represents a call of a member function of derived class Cn or an inlined image of it. RPCC.EXE embeds code to test the actual class and to generate machine code for Ik if the class is Ck. Operator typeid, which returns run-time type information, is used in the test. 4

Virtual functions of C++ can be declared as inline. Normal compilers inline them only if the invoked functions can be determined at compile time.

264

Nobuhisa Fujinami

class objectTableType { int count; // graphics object counter objectType *table[MAXOBJECT]; // pointers to graphics objects public: objectTableType(): count(0) {} int add(objectType *p); // add graphics object runtime const objectType *intersect_all(rayType &, myfloat &); }; // return the first object the ray intersects

Fig. 12. Definition of class objectTableType const objectType * objectTableType::intersect_all(rayType &ray, myfloat &t) { myfloat t1; const objectType *obj=0; for(int i=0;iintersect(ray,t1)) { if(t1>MIN_DISTANCE && (obj==0 || t1
Fig. 13. Member function intersect all

Other run-time optimizations, such as constant propagation and complete loop unrolling, help this optimization. For example, Figure 12 shows a class that represents a set of graphics objects. Member function intersect_all has the keyword runtime, and its implementation is given in Figure 13. Each element of table can point to any graphics object (derived class of objectType: see Figure 14). If the loop is unrolled, the invocation of virtual function intersect is determined and inlined. If table[0] points to an object of class Plane and table[1] points to an object of class discType, then the unrolled loop looks like: Inlined Rest of Inlined Rest of ...

image of the loop image of the loop

Plane::intersect body disc::intersect body

This is impossible for conventional techniques, such as type inference of variables or variable occurrences in call sites. Inlined member functions are further optimized through run-time constant propagation/folding, algebraic simplification, etc. In the previous example, run-

Determination of Dynamic Method Dispatches

265

class objectType { protected: const surfaceType * surface; const pointType center; public: inline virtual int intersect(const rayType&,myfloat &t) const = 0; objectType(surfaceType * s, const pointType& c); virtual const SinfoType * getSinfo(const pointType & pos) const; virtual vectorType getNormal(const pointType &) const =0; virtual int flat() const = 0; }; class Plane: public objectType { const vectorType N; const myfloat d; public: Plane(surfaceType *s,pointType &pos,vectorType &NN); inline int intersect(const rayType &,myfloat &) const; vectorType getNormal(const pointType &) const { return N; } int flat() const { return 1; } }; int Plane::intersect(const rayType &ray,myfloat &t) const { myfloat t1=inner(N,ray.v); if(t1==0.0) return 0; t=(d-inner(N,ray.p))/t1; if(t<=MIN_DISTANCE) return 0; return 1; }

Fig. 14. Class objectType, derived class Plane and virtual member function intersect

time constants table[0], table[1], · · · are propagated to the this pointers of inlined member functions Plane::intersect, discType::intersect, · · ·, and they are specialized with respect to const data members, if any, of graphics objects pointed to by table[0], table[1], · · ·. Figure 14 shows definitions of class Plane and member function intersect. If an instance of Plane represents a plane parallel to the x-y plane, the x and y components of its normal vector N are zeros. In this case, inlined image of function inner (inner product: three multiplications and two additions) is algebraically simplified into a single multiplication.

266

Nobuhisa Fujinami Program execution time (sec) speed ratio original (C) 72.5 1.00 0.59 new (C++) 42.6 1.70 1.00 optimized (C++) 30.4 2.38 1.40

Table 1. Evaluation results (ray tracer) Program execution time (sec) speed ratio original (C) 69.2 1.00 1.04 original (C++) 72.3 0.96 1.00 optimized (C++) 36.9 1.87 1.96

Table 2. Evaluation results (puzzle solver)

4

Evaluation

This section reports the evaluation results of the implementation. The evaluation environment is as follows: Machine: NEC PC-9821St15/L16 (Pentium Pro 150MHz, RAM: 48MBytes) Operating System: Microsoft Windows 95 Operating System Compiler: Borland C++ Version 5.0J Compiler Options: -6 -O2 -OS -vi (Pentium Pro, optimize for speed, instruction scheduling, enable inlining) The first program is a ray tracer. Table 1 shows the results. The program reads a scene file at run time and displays the ray-traced image. It contains class objectTableType in Figures 12 and 13. The keyword “runtime” is inserted before the declaration of member function intersect_all. The ray-traced scene contains three transparent spheres, seventeen opaque spheres, two mirrors, one disc, one checked square, and one light source. The output is a 512 × 512 24bit color image. The original program is in [6] and is written in C (“original” in Table 1). The author’s part-time assistant rewrote it into C++ (“new” in Table 1). The new program runs about 1.7 times as fast as the original one5 and the run-time-optimized one is 1.4 times as fast as the new one. Analyzing the generated code shows that the speedup is mostly due to determination of virtual member function invocation combined with inlining. The second program is a box-packing puzzle solver. Table 2 shows the results. It reads a puzzle definition file that describes the box and the pieces and prints the solutions. Run-time optimization is applied to member function put of class Piece (see Figure 15). Its instance variables represent the shapes of the pieces and are run-time constants. The goal here is to pack 11 pieces into a size 4 cubic box. There are three solutions. The optimized program runs about twice as fast as the normal one. Here the speedup is mostly due to loop unrolling and constant 5

The original program uses a switch statement to classify graphics objects. Using virtual member function optimized the classification.

Determination of Dynamic Method Dispatches

267

class Piece { private: int index,dirs,num; int offset[MaxDir][MaxSize]; public: Piece(int i); runtime void put(Box *b,int o,int i0,PieceList *l); private: void set(Box* b,int o,int j); void reset(Box* b,int o,int j); }; void Piece::put(Box *b,int o,int i0,PieceList *l) { int j,k; for(j=0;jputok(b,o,index,i0); reset(b,o,j); next:; } }

Fig. 15. Class Piece

folding. The author wrote the puzzle solver in both C and C++. Notice that the program in C++ is a little bit slower than that in C. In both cases, the optimized program runs faster than the original one and the one written in C. Optimized programs in C++ can run faster than their C counterparts. The cost for code generation is 625 microseconds per 7638 bytes of generated instructions, or about 12 machine cycles per bytes in the case of the ray tracer. It is low enough compared with the code generation system that manipulates intermediate representations [7] but it is still higher than the result of [8], which emits run-time code generators in machine code. The cause of this const seems to be that the compiled code generator contains quite a few instructions operating on byte data. The Pentium Pro processor cannot execute such instructions efficiently. Rewriting the generator of code generator will make the cost even lower.

5

Related Work

There are a number of related research papers on optimization of object-oriented languages [9] [10] [11] [12] [13] [14] [15]. These papers focus on run-time type feedback or static type inference to optimize method dispatch, and the methods

268

Nobuhisa Fujinami

are partially evaluated with respect to the inferred type. Since the author focuses on values, more aggressive optimizations can be applied, and run-time valuespecific optimizations help to determine dynamic method dispatches. However, the author’s system cannot optimize cases where the pointer value is not constant but the type of the pointed object is fixed. Using the results of these research papers will enable optimization of both cases. A framework named specialization classes [16] focuses on the instance variables of objects and can specialize methods with respect to their values at both compile time and run time. Its prototype was implemented for Java. Since it does not specialize methods with respect to run-time types or to constant objects6 , and since it cannot consider array variables as invariants, inlining methods combined with loop unrolling, run-time constant propagation, and algebraic simplification (see Subsection 3.4) is impossible. There is another value-specific partial evaluator for an object-oriented language [17]; however, objects are not regarded as closures in it. There are also various systems for run-time code generation. Fabius [18] [8] [19] is a compiler for a subset of ML with automatic run-time light-weight code generation. It requires the programmer to declare functions to take their arguments in curried form. The idea is very similar to that of using objects as closures. It is also very natural in functional programming languages. But in languages with assignments, it is not desirable because the actual values may be different from the values embedded in partially evaluated functions. The author’s method is preferable in richer and efficient languages like C++, because the values of their instance variables can be changed after instantiation. Tempo [20] [21] is an online and offline partial evaluator of system programs in C. It automatically generates quite efficient run-time code generators. But the programmer has to invoke the code generator explicitly and has to manage the generated code. It is the programmer’s responsibility to maintain consistency between the actual value and the value embedded in the generated code. DyC and its predecessor [22] [23] [24] implicitly generate machine code for program regions the programmer indicates. The indications include run-time constants for those regions. If the values of some run-time constants change, the corresponding machine code is automatically generated. The system uses a pair of dataflow analyses to identify which variables will be constant at run time. The system, however, is error prone because it requires the programmer to insert the keyword dynamic at every use of a C pointer that refers to dynamic values. The system can manage multiple machine code routines for each region. The generated code is looked up using the values of run-time constants. The author’s method uses a pointer in each object instance and is therefore more efficient. If a number of object instances share the same set of values of the instance variables, the author’s method requires larger storage for generated code. Reusing code at code generation time reduces the storage requirement without a significant effect on speed. 6

If an instance variable is of object type, it can specialize methods with respect to the specialization state (declared by specialization class name) of the object.

Determination of Dynamic Method Dispatches

269

‘C [25] is a language for run-time code generation. The programmer can control run-time code generation explicitly. It may be efficient, but the programmer has to rewrite the source program using ‘ and $ operators.

6

Summary and Future Work

This paper showed that RTCG can be used to determine dynamic method dispatches. It can produce a better result than conventional method dispatch prediction mechanisms because other run-time optimizations, such as global runtime constant propagation and complete loop unrolling, help the determination. Further, the determined functions can be inlined. This may lead to other optimizations. These optimizations are implemented in the author’s RTCG system by simple extension of the intermediate representation optimizer and the generator of runtime code generator. The system is implemented as a preprocessor of a C++ compiler, and time-consuming operations are performed only at compile time. The evaluation results showed good performance improvement. The author plans to extend the system to optimize groups of objects. Commonly used data structures, such as linked lists and hash tables, consist of groups of objects. Operations on these data structures can be represented as code fragments held in the objects. This is similar to executable data structures or Quajects by Massalin [26] [27], which were implemented using some handwritten templates in an assembly language. Object-oriented languages may permit automatic application of this optimization. The author has already proposed basic ideas to optimize groups of objects [28]. The author is also going to release the preprocessor RPCC.EXE as a free software (tentative name: C++ Doubler).

Acknowledgments I would like to express my gratitude to Dr. Mario Tokoro for supervising the research. I also appreciate the many suggestions offered by Dr. Satoshi Matsuoka and Dr. Calton Pu as well as the work of programming ray tracer in C++ by Ms. Kayoko Sakai. Finally, I would like to thank the members of Sony CSL for their valuable advice.

References 1. Neil D. Jones. An Introduction to Partial Evaluation. ACM Computing Surveys, Vol. 28, No. 3, pp. 480–503, September 1996. 2. Nobuhisa Fujinami. Run-Time Optimization in Object-Oriented Languages. In Proceedings of 12th Conference of Japan Society for Software Science and Technology, September 1995. In Japanese. Received Takahashi Award.

270

Nobuhisa Fujinami

3. Nobuhisa Fujinami. Automatic Run-Time Code Generation in C++. In Yutaka Ishikawa, Rodney R. Oldehoeft, John V.W. Reynders, and Marydell Tholburn, editors, LNCS 1343: Scientific Computing in Object-Oriented Parallel Environments. Proceedings, December 1997. Also appeared as Technical Report SCSL-TR-97-006 of Sony Computer Science Laboratory Inc. 4. Nobuhisa Fujinami. Automatic and Efficient Run-Time Code Generation Using Object-Oriented Languages. To appear in Computer Software, Japan Society for Software Science and Technology, 1998. Also appeared as Technical Report SCSLTR-98-001 of Sony Computer Science Laboratory Inc. (In Japanese). 5. Uday S. Reddy. Objects As Closures: Abstract Semantics of Object Oriented Languages. In Proceedings of the ACM Conference on Lisp and Functional Programming. ACM Press, July 1988. 6. Peter Holst Andersen. Partial Evaluation Applied to Ray Tracing. Student Report, DIKU, University of Copenhagen, 1993. 7. Dawson R. Engler and Todd A. Proebsting. DCG: An Efficient, Retargetable Dynamic Code Generation System. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 263–272. ACM Press, October 1994. Also appeared in SIGPLAN NOTICES, Vol.29, No.10. 8. Peter Lee and Mark Leone. Optimizing ML with Run-Time Code Generation. In Proceedings of the SIGPLAN ’96 Conference on Programming Language Design and Implementation, pp. 137–148, May 1996. 9. Jeffrey Dean, Craig Chambers, and David Grove. Identifying Profitable Specialization in Object-Oriented Languages. Technical Report 94-02-05, Department of Computer Science and Engineering, University of Washington, 1994. 10. Jeffrey Dean, David Grove, and Craig Chambers. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis. In Walter Olthoff, editor, LNCS 952, Object-Oriented Programming, Proceedings of ECOOP’95, August 1995. Also appeared as Technical Report 94-12-01, Department of Computer Science and Engineering, University of Washington. 11. Jeffrey Dean, Greg DeFouw, David Grove, Vassily Litvinov, and Craig Chambers. Vortex: An Optimizing Compiler for Object-Oriented Languages. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1996. ACM Press, October 1996. 12. David F. Bacon and Peter F. Sweeney. Fast Static Analysis of C++ Virtual Function Calls. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1996. ACM Press, October 1996. 13. Urs H¨ olzle and David Ungar. Optimizing Dynamically-Dispatched Calls with RunTime Type Feedback. In Proceedings of the SIGPLAN ’94 Conference on Programming Language Design and Implementation, pp. 326–336, 1994. 14. Gerald Aigner and Urs H¨ olzle. Eliminating Virtual Function Calls in C++ Programs. In Proceedings of ECOOP’96, June 1996. 15. Jan Vitek, R. Nigel Horspool, and James S. Uhl. Compile-Time Analysis of ObjectOriented Programs. In U. Kastens and P. Pfahler, editors, LNCS 641, Compiler Construction, 4th International Conference, CC ’92, pp. 236–250, October 1992. 16. Eugen N. Volanshi, Charles Consel, Gilles Muller, and Crispin Cowan. Declarative Specialization of Object-Oriented Programs. In Proceedings of Object-Oriented Programming Systems, Languages and Applications in 1997. ACM Press, October 1997. 17. Morten Marquard and Bjarne Steensgaard. Partial Evaluation of an ObjectOriented Imperative Language. Master’s thesis, University of Copenhagen, April 1992.

Determination of Dynamic Method Dispatches

271

18. Mark Leone and Peter Lee. Lightweight Run-Time Code Generation. In Proceedings of the 1994 ACM SIGPLAN Workshop on Partial Evaluation and SemanticsBased Program Manipulation, pp. 97–106. ACM Press, June 1994. 19. Mark Leone and Peter Lee. A Declarative Approach to Run-Time Code Generation. In Workshop Record of WCSSS’96: The Inaugural Workshop on Compiler Support for System Software, pp. 8–17, February 1996. 20. Charles Consel, Luke Hornof, Fran¸cois N¨ oel, and Nicolae Volanshi. A Uniform Approach for Compile-time and Run-time Specialization. Technical Report No. 2775, INRIA, January 1996. 21. Eugen-Nicolae Volanshi, Gilles Muller, Charles Consel, Luke Hornof, Jacques Noy´e, and Calton Pu. A Uniform and Automatic Approach to Copy Elimination in System Extensions via Program Specialization. Technical Report No. 1021, IRISA, June 1996. 22. Joel Auslander, Matthai Philipose, Crig Chambers, Susan J. Eggers, and Brian N. Bershad. Fast, Effective Dynamic Compilation. In Proceedings of the SIGPLAN ’96 Conference on Programming Language Design and Implementation, pp. 149–159, May 1996. 23. Brian Grant, Markus Mock, Matthai Philipose, Graig Chambers, and Susan J. Eggers. Annotation-Directed Run-Time Specialization in C. In Proceedings of Workshop on Partial Evaluation and Semantics-Based Program Manipulation (PEPM’97), June 1997. 24. Brian Grant, Markus Mock, Matthai Philipose, Graig Chambers, and Susan J. Eggers. DyC: An Expressive Annotation-Directed Dynamic Compiler for C. Technical Report 97-03-03, Department of Computer Science and Engineering, University of Washington, 1997. 25. Dawson R. Engler, Wilson C. Hsieh, and M. Frans Kaashoek. ‘C: A language For High-Level, Efficient, and Machine-independent Dynamic Code Generation. In Conference Record of POPL ’96: The 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 258–270, January 1996. 26. Calton Pu, Henry Massalin, and John Ioannidis. The Synthesis kernel. Computing Systems, Vol. 1, No. 1, pp. 11–32, Winter 1988. 27. Henry Massalin. Synthesis: An Efficient Implementation of Fundamental Operating System Services. PhD thesis, Graduate School of Arts and Sciences, Columbia University, April 1992. 28. Nobuhisa Fujinami. Run-Time Optimization of Groups of Objects. In Proceedings of 14th Conference of Japan Society for Software Science and Technology, September 1997. Also appeared as Technical Memo SCSL-TM-97-007 of Sony Computer Science Laboratory Inc. (In Japanese).

Type-Based Analysis of Concurrent Programs Naoki Kobayashi Department of Information Science, University of Tokyo 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan [email protected]

Analysis and compilation of concurrent programs are challenging tasks: since concurrency primitives for thread creation and communication have a much more dynamic nature than sequential primitives like function creation and application, it is difficult to reason about program behavior for both programmers and compilers. For example, unlike in sequential programming languages, it is not easy to know which part of a program is executed first — consider scheduling of a process that tries to receive a value from a communication channel: if a value is available, the process is executed immediately, while, if not, the process is suspended and another process must be scheduled. This kind of dynamic program behavior complicates not just a programmer’s debugging but also a compiler’s efficient code generation. In order to deal with the above problems, several type systems and program analyses have been studied through process calculi. In this talk, we focus on Kobayashi, Pierce, and Turner’s type system for linear (use-once) channels [3] and its extensions [1,2]. The main idea of those type systems is to augment ordinary types with information on how often and in which order each communication channel can be used. With such extra information, we can ensure that a certain part of a concurrent program is confluent and/or deadlock-free. After giving an overview of the type systems, we show how such type information can be used for reasoning about program behavior and program optimizations.1

References 1. Atsushi Igarashi and Naoki Kobayashi. Type-based analysis of usage of communication channels for concurrent programming languages. In Proceedings of International Static Analysis Symposium (SAS’97), Lecture Notes in Computer Science, Vol. 1302. Springer-Verlag, Berlin Heidelberg New York (1997) 187–201. 2. Naoki Kobayashi. A partially deadlock-free typed process calculus. To appear in ACM Transactions on Programming Languages and Systems, ACM, New York (1998). A preliminary summary appeared in Proceedings of LICS’97, (1997) 128– 139. 3. Naoki Kobayashi, Benjamin C. Pierce, and David N. Turner. Linearity and the pi-calculus. In Proceedings of ACM SIGPLAN/SIGACT Symposium on Principles of Programming Languages, ACM, New York (1996) 358–371.

1

An electronic copy of the slides is available through http://www.yl.is.s.u-tokyo.ac.jp/members/koba/publications.html.

X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 272–272, 1998. c Springer-Verlag Berlin Heidelberg 1998

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages Dominic Duggan Department of Computer Science Stevens Institute of Technology Castle Point on the Hudson Hoboken, New Jersey 07030 [email protected]

Abstract. Marshalling is an important aspect of distributed programming, particularly in typed programming languages. A semantics is provided for userdefined marshalling operations in polymorphic languages such as ML. The semantics of this are expressed in an internal language with recursion and dynamic type dispatch at both the term and type levels. User-defined marshalling amounts to reifying dynamic type dispatch to the programmer level in an ML-like landyn guage. A “external” language XMLΠ is provided with explicit pickle types and operations for building and deconstructing pickles with user-defined marshalling.

1 Introduction In distributed programming environments, where programs operate in separate address spaces, there must be some way of converting values from their internal format to an external “wire” format for communication to other programs. This conversion process is referred to as marshalling, and its converse as unmarshalling. User-defined marshalling is now widely recognized as essential for monomorphic distributed programming languages. For example Birrell et al. [5] report: It is difficult to provide fully general marshalling code in a satisfactory way. Existing systems fail in one or more of the following ways. Some apply restrictions to the types that can be marshalled, typically prohibiting linked, cyclic or graph-structured values. Some generate elaborate code for almost any data type, but the resulting stub modules are excessively large. Some handle a lot of data types, but the marshalling code is excessively inefficient. The Modula-3 Pickle module allows user-defined type-specific pickling and unpickling routines (called “specials”) to be registered with the pickler. Such a facility is also found in languages designed for distributed programming; for example the Argus distributed programming language [26], and Concurrent CLU, developed for use in the Cambridge Distributed Computer System [4], allow ADT implementations to export programmer-defined marshalling code, to support efficient remote procedure call. Allowing a user-defining pickling operation essentially provides a mechanism for reflection in distributed programming [18]. X. Leroy and A. Ohori (Eds.): TIC’98, LNCS 1473, pp. 273–297, 1998. c Springer-Verlag Berlin Heidelberg 1998

274

Dominic Duggan

Our intent in this paper is to provide a semantics for user-defined marshalling in polymorphic language, based on the use of run-time type information to guide userdefined marshalling operations that recurse over type descriptions. Recent work has suggested type-based transformations as a general framework for program optimization for polymorphic languages [20, 30]. Our semantics is couched in terms of this framework. We extend this framework with refinement kinds, that allow the exhaustiveness of dynamic type dispatching to be checked statically. Refinement kinds play a crucial rˆole in typing our semantics. Although not essential to our work, we attach our semantics to explicit pickles in the language. Explicit pickle types are found in many distributed languages. For example the OMG CORBA provides for a type of ANY, the type of pickles that can be transmitted between address spaces. Abadi et al. [2] suggested a similar mechanism for adding dynamic typing to statically typed languages, the type dynamic. This mechanism incorporated an operation dynamic for bundling a value with its type, and a typecase construct for examining the value: fun print (x:dynamic) = typecase x of int(xi) ⇒ output (toString xi) | string(xs) ⇒ output xs print(if true then dynamic 3 else dynamic "hello") We attach our semantics for user-defined marshalling to dynamics; however this semantics could just as well be attached to the message-passing operations themselves. The usefulness of a facility such as dynamics for distributed programming has been echoed by practical experience. For example, Krumvieda reports from his implementation of a distributed dialect of Standard ML that: The lack of a dynamic type or some other method of implicitly attaching marshalling functions to SML types hampered much of DML’s interface development and complicated its signature. Although DML was originally intended to support dynamic types, the necessary work never materialized and group type objects have proliferated and propogated through its implementation and coding examples [23]. We introduce a new construct for dynamics that allows user-definable marshalling routines to be attached to the dynamic construct. Our semantics for dynamics is particularly aimed at polymorphic languages, such as ML. We make use of a new approach to computing with dynamics, based on dynamic type dispatch, that fixes some problems with the use of dynamics in polymorphic languages. User-definable marshalling is based fundamentally on allowing user-specified type-based transformations to be reflected in a semantics based on dynamic type dispatch. Languages such as Modula-3 allow many parts of the run-time to be implemented in the language itself. For example, most of the threads and garbage collection code, and all of the marshalling code, for Modula-3 is implemented in Modula-3 itself [28]. For a “high-level” language such as ML, it should be possible to define a “safe” subset of ML in which low-level operations such as marshalling can be implemented. As examples of this endeavour, the SML/NJ compiler generates ML code for polymorphic equality

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

275

(including reference equality), while Cardelli considers a subset of Quest in which a garbage collector for Quest can be implemented [8]. If there is any intrinsic reason that a safe subset of ML cannot be defined in which marshalling can be implemented, then the semantics presented here should be considered as being for a hypothetical language that is not so deficient. Sect. 2 reviews our approach to dynamic type dispatch with refinement kinds, reviewing the language XMLdyn originally introduced by Duggan [12]. In this approach, dynamic type dispatch is refined so that the programmer can control where run-time failures happen due to the use of dynamic type dispatch. Sect. 3 introduces our operations for user-definable marshalling, including their static semantics. We call the dyn language introduced in this section XMLΠ , since it extends XMLdyn . Sect. 4 gives a dyn dyn translation semantics from XMLΠ into XMLT , an extension of XMLdyn with iteration at the type level. In Sect. 5 we give an alternative semantics for user-definable dyn marshalling. This uses a simpler version of the static semantics for XMLΠ , but requires a somewhat more complicated “internal language” for its operational semantics.

2 Dynamic Type Dispatch With Refinement Kinds In this section we describe XMLdyn , the kernel language that is at the heart of our approach. XMLdyn combines dynamic type dispatch with “refinement kinds” that ensure the absence of run-time type failures. Type failure is isolated to a particular construct. XMLdyn was originally introduced by Duggan [12]. Types in our approach are stratified into simple types τ and polymorphic types σ: τ ::= α | t | (τ1 τ2 ) σ ::= τ | σ1 → σ2 | ∀α <: τ.σ | ∀α <: χ.σ | ∀κ <: χ.σ Our type system is based on the two-level stratified type system used by Harper and Mitchell [19] to explain ML’s polymorphic type discipline. In this approach we have the usual collection of monomorphic types or monotypes (closed under the → type constructor and any other type constructors), and a second level of polymorphic types or polytypes, based on the closure of the collection of monotypes under the universal type quantifier ∀. Type constructors t denote both base types, such as int and real, as well as type constructors such as list and the product and function type constructors (τ ∗ τ and τ → τ , respectively). Type variables range over both types and type constructors, so type expressions include both list(int) and α(int) (the latter being the application of the type constructor variable α to int). We sometimes use the syntax t(τ1 , . . . , τn ) to denote (t τ1 . . . τn ). Polymorphic types abstract over both type variables α and kind variables κ. Kinds are regular tree expressions denoting (possibly infinite) sets of types. The syntax of kinds is given by: χ ::= ρ | χ1 → χ2 ρ ::= ⊥ | | κ | t(ρn ) | ρ1 ∪ ρ2 | µκ.ρ

276

Dominic Duggan

where ρ denotes kinds, and κ denotes kind variables. χ is used to denote arities for type variables ranging over type constructors. For example denotes the arity of all types, while → denotes the arity of unary type constructors (for example, list). Kinds ρ denote refinement kinds, that refine the arity denoting the set of all ground monotypes. The kind operator ∪ denotes union, and µ is the fixed point operator. Kinds intuitively form a lattice of sets of types, with ⊥ and as the bottom and top of the lattice, respectively. Each type constructor t has an associated kind constructor t of the same arity; a kind expression t(ρ1 , . . . , ρn ) denotes the set of types with outermost type constructor t applied to types τ1 ∈ ρ1 , . . . , τn ∈ ρn . Then for example the kind int ∪ real denotes the set of types {int, real}, while the recursive kind µκ.int ∪ (κ list) denotes the infinite set of types {int, int list, int list list, . . .}. We let τ <: ρ denote that τ is contained in the set of types denoted by ρ, and we refer to this as the containment relation. So for example int <: (int ∪ real). The subset relation between the interpretation of kinds induces a subkinding relationship ρ <: ρ between kinds. For example we have list(list(int)) <: µκ.int ∪ list(κ) Inclusion between kinds ρ induces a subtype inclusion between the arities of type constructors. For example we have (χ1 → χ2 ) <: (χ1 → χ2 ) if χ1 <: χ1 and χ2 <: χ2 . The function arity constructor should not be confused with the kind constructor →, that describes the set of types with outermost type constructor →, and where τ1 → τ2 <: ρ1 →ρ2 if τ1 <: ρ1 and τ2 <: ρ2 . Our type system involves constraints of the form τ <: χ (with in particular τ <: ρ denoting that τ is included in the kind ρ), χ1 <: χ2 (with in particular ρ1 <: ρ2 denoting that the set of types described by ρ1 is included in the corresponding set for ρ2 ), and σ1 <: σ2 (denoting that σ1 is a subtype of σ2 ). We use γ, δ to denote both type variables α, β and kind variables κ. Furthermore we use ψ to denote both types σ and kinds ρ. Then ψ1 <: ψ2 stands generically for any of the above three forms of constraints. We have the following judgement forms: Judgement Γ χ1 = χ2 Γ χ1 <: χ2 Γ τ <: χ Γ σ1 <: σ2

Meaning Kind equality Kind containment Kind membership Subtyping

where Γ is a context of constraints on type and kind variables: Γ ::= {} | {κ <: ρ} | {α <: χ} | {α <: τ} | Γ1 ∪ Γ2 We require that kinds are contractive [12]. This allows us to assume that kinds have the following form: ρ ::= κ | | µκ.t1 (ρ1 ) ∪ · · · ∪ tn (ρn ) We furthermore require that kinds are discriminative: in a union kind, the outermost type constructors t1 , . . . , tn are required to be distinct.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

277

The abstract syntax for the core language of XMLdyn is given by: e ::= x | λx : σ.e | (e1 e2 ) | let x = e1 in e2 | recσ1 →σ2 e | Λγ <: ψ.e | e[ψ] The construct for dynamic type dispatch in XMLdyn is provided by the typerec construct: e ::= . . . | typerec f : σ of t1 (α1 ) ⇒ e1 | . . . | tk (αk ) ⇒ ek The novelty of this construct is that it defines a function that recurses over a type rather than over a value. The fixed point of this function is given by the variable f that is introduced by the typerec. Such a function specifies a form of type-safe dynamic type dispatch, wherein a polymorphic function dispatches at run-time on the basis of type arguments. The type rule for this construct is provided by the T Y R EC type rule: σ = (∀α <: ρ.σ )

ρ = µκ.

tk (ρk )

ti = t j if i =j

Γ, αi <: {ρ/κ}ρi; A, f : σ ei : {ti (αi )/α}σ for i = 1, . . . , k Γ; A (typerec f : σ of t1 (α1 ) ⇒ e1 | . . . | tk (αk ) ⇒ ek ) : σ

(T Y R EC )

The rule demonstrates that the typerec defines a polymorphic function of type ∀α <: ρ.σ. The kind constraint on α restricts the domain of applicability of this function, to types for which the cases in dynamic type dispatch are defined. Since the typerec in general defines a recursive function, this domain kind constraint must also be recursive. Besides the obvious computation rules for the other constructs, the typerec has this computation rule: e [ti (τ)] −→ {τ/αi , e/ f }ei where:

e = (typerec f : σ of t1 (α1 ) ⇒ e1 | . . . | tk (αk ) ⇒ ek )

For example, the following defines a function that can be applied to integers, references (no matter their element type), lists of integers, lists of references, lists of lists of integers, and so on: typerec f : (∀α <: (µκ.int ∪ ref() ∪ list(κ)).α → α) of int ⇒ λx : int. intPlus(x,x) | ref(α) ⇒ λx : ref(α). x | list(α) ⇒ λxs : list(α). map (f [α]) xs The first clause defines a function of type int → int. The second clause defines a function of type ∀α <: .ref(α) → ref(α), where the type variable α is unconstrained because no operations are defined on the element type of the reference cell. The third clause defines a function of type ∀α <: (µκ.int ∪ ref() ∪ list(κ)).list(α) → list(α). In this third clause, the fixed point f of the typerec is applied to the list element type, and hence the

278

Dominic Duggan

element type α in this clause is constrained by the declared domain kind of f. The domain of the typerec is then int ∪ ref() ∪ list(µκ.int ∪ ref() ∪ list(κ)). By the fixed point unrolling rule for kinds (µκ.ρ = {(µκ.ρ)/κ}ρ), this is equal to µκ.int ∪ ref() ∪ list(κ). Harper and Morrisett [20] and Morrisett [27] present a calculus, λML i , that includes a typerec construct for dynamic type dispatch by recursing over type descriptions. The motivation for their framework is in using dynamic type dispatch for type-based compilation based on transformations of data representations. The most important difference in the approaches is our provision of “refinement kinds” that refine the structure of , the set of all simple monotypes. Harper and Morrisett assume that all uses of dynamic type dispatch are total (defined for all types). Morrisett [27] describes an approach where a “characteristic function” F can be defined for the domain of an operation that uses dynamic type dispatch, using Typerec. F(τ) = void, the empty type, if τ contains any type constructor outside the domain of the operation, and F(τ) = τ otherwise. Beyond the fact that type inference is hard or impossible with the Typerec construct, there is also the problem that this approach does not prevent instantiation of an operator with a type outside its domain kind. For the example above, the function would be instantiated to type void → void when applied to the string type, under the approach described by Morrisett. This is not as precise as preventing the erroneous instantiation of the function in the first place. Dubois et al. [11] have considered another approach to dynamic type dispatch, with different guarantees of type correctness relative to this and other work. Essentially they use dynamic type dispatch to provide unsafe operations such as a C-like printf function and variable-arity procedures in ML, as an alternative to Haskell-style parametric overloading. Their type system only distinguishes between “static” and “dynamic” type variables, the latter being variables which need to be instantiated with run-time type arguments. They also provide a static check for the exhaustiveness of the dynamic type dispatching code. However this check is not formalized in a type system. Furthermore it requires abstract interpretation of the entire program, and so is inapplicable for separate compilation of reusable software components; type checking of uses of overloaded operations is done at link-time. Once we have bounded universal types ∀α <: ρ.σ, an obvious next step is to consider bounded existential types ∃α <: ρ.τ. These are useful in the sequel, so we add them here: e ::= . . . | pack∃α<:ρ.τ(τ , e) | open e1 as pack∃α<:ρ.τ(α, x) in e2 | narrow∃α<:ρ.α,∃α<:ρ.α (e) τ ::= . . . | ∃α <: ρ.τ Kind inclusion induces a type widening rule for existentials: (∃α <: ρ.τ) <: (∃α <: ρ .τ) if ρ <: ρ . The narrow construct allows us to narrow a value of existential type to a more specific type. All type failure in our framework is isolated to the narrow construct.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

279

3 Primitives for Marshalling We now consider how to extend the language introduced in the previous section with primitives for marshalling and unmarshalling data. We name the language introduced dyn in this section XMLΠ . An obvious first choice for marshalling operations is: extern : ∀α.α ∗ port → unit intern : ∀α.port → α There are some problems with this approach. These operations are not total (for example, in general it is not possible to marshall native-code functions in a heterogeneous environment), but this partiality is not captured by the above types. Invoking marshalling may therefore lead to a run-time type failure that should have been caught at compiletime. One approach to this problem, in a language such as Haskell with parametric overloading, is to define a type class for marshalling: class Extern(α) where extern : α ∗ port → unit intern : port → α This is similar to the approach taken with Java [17], where only objects that implement the Serializable interface can be marshalled. There are several advantages to this approach. Applications of extern to types for which no marshalling operation is available are detected statically in the Haskell type system. This framework allows the programmer to define her own marshalling operations for a type, as instances of this class. Finally the compiler can automatically generate specialized marshalling operations based on combining instances of these operations. Rather than using type classes to restrict the domain of marshalling operations and dispatch the operations, we instead rely on the approach to dynamic type dispatch summarized in the previous section. The precise relationship between this approach and type classes is developed in another paper [14]. We choose this course because the framework of dynamic type dispatch is the basis for an approach to computing with dynamically typed values, that overcomes several problems with the traditional approach to computing with dynamically typed values in polymorphic languages. This is explained more fully by Duggan [12]. Nevertheless if the marshalling primitives are expressed using (an extended version of) type classes, it is possible to adapt our semantics based on dynamic type dispatch to this situation. Our second reason for deviating from the type class approach is that the Extern class above does not ensure that an instance of the intern operation is in agreement with the corresponding extern operation on what should be the external representation “on the wire” of the type. This is also an issue with the Serializable interface in Java. Our approach is to define marshalling operations as pickling and extraction operations that map to and from an external representation type. Marshalling a data value consists of first transforming it to the corresponding external type, then using the built-in marshaller to pickle the value. Unmarshalling a data value consists of first unpickling a value from external storage, then using the extraction function to obtain a copy of the original pickled value. An attempt to express this in the parlance of type classes is given by:

280

Dominic Duggan

class Pickle(α) where pickle : α → pickleType(α) extract : pickleType(α) → α extern and intern can then be represented as ordinary (non-overloaded) functions, of type ∀α.Pickle(α) ⇒ α ∗ port → unit and ∀α.Pickle(α) ⇒ port → α. The intention is that pickleType be a function that maps from a type to the corresponding representation type. Each instance for the Pickle class should then specify a case in this type function for transforming the instance type. In general this type function must be applied recursively to the element types for a collection type. Essentially we need a construct analogous to the Typerec construct introduced by Harper and Morrisett [20] and Morrisett [27]. In general implementing type inference in the presence of Typerec appears difficult or impossible. However for the special case of user-definable marshalling, we can make use of a construct similar to Typerec internally, while providing type inference in the external language. In order to introduce our semantics independent of any particular message-passing operations, we make pickles explicit in the language as dynamics [2, 1, 25, 24, 7]. A dynamic is a bundling of a value and a type descriptor for that value, into a single value of type dynamic. A typecase construct allows the tag in a dynamic to be examined, and the bundled value to be extracted in a type-safe way.

σ = ∀κ <: DOMAIN(Π).∀α <: κ.α → Dynamic(κ) Π; Γ; A dynamic : σ

(DI NTRO )

Γ ρ <: ρ Γ Dynamic(ρ) <: Dynamic(ρ )

(DW ID )

Π; Γ; A e : Dynamic(ρ) Γ ρ <: ρ Π; Γ; A narrowρ,ρ (e) : Dynamic(ρ )

(DNAR )

Π; Γ; A e1 : (∀α <: ρ.α → τ) (α ∈ / FV (τ)) Π; Γ; A e2 : Dynamic(ρ) Π; Γ; A typecase(e1 , e2 ) : τ

(DE LIM )

∃t2 .(t1 → t2 ) ∈ Π Π; Γ, κ <: , αn <: κ, β <: ( → ); A, ( f : ∀α <: κ.α → β(α)) e1 : t1 (αn ) → t2 (β(αn )) Π; Γ, κ <: , αn <: κ, β <: ( → ); A, (g : ∀α <: κ.β(α) → α) e2 : t2 (β(αn )) → t1 (αn ) Π ∪ {t1 → t2 }; Γ; A e : τ Π; Γ; A (defdynamic t1 ⇒ t2 with ( f .e1 , g.e2 ) in e) : τ (DD EF )

Fig. 1. Type Rules for Dynamics in XMLdyn

A dynamic is essentially a data algebra inhabiting an existential type ∃α <: .α. The refinement kinds introduced in the previous section motivate an obvious general-

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

281

ization of dynamics to safe dynamics, originally introduced by Duggan [12]. A safe dynamic type Dynamic(ρ) exports a refinement kind revealing the structure of the encapsulated type; semantically it is a bounded existential type ∃α <: ρ.α. The constructs for safe dynamics are given by: τ ::= . . . | Dynamic(ρ) e ::= . . . | dynamic | narrowρ,ρ (e) | typecase(e1, e2 ) narrow is the only operation where type failure can arise; it allows us to refine the kind of a safe dynamic (for example, from to int ∪ real). In the expression typecase(e1, e2 ), e2 is a safe dynamic of type Dynamic(ρ), while e1 is a polymorphic function with type ∀α <: ρ .σ. In the semantics of the typecase, the dynamic value e2 is unbundled and the polymorphic function e1 applied to both the type and value components of the dynamic. Provided ρ <: ρ (which can be checked statically), this application of dynamic type dispatch is guaranteed not to encounter run-time type failure. The type rules for safe dynamics are provided in Fig. 1. The dynamic operation creates a dynamic value. There are several possible typings for this operation: dynamic dynamic dynamic dynamic

: : : :

∀α.α → Dynamic() ∀α <: ρ.α → Dynamic() ∀α <: ρ.α → Dynamic(ρ) ∀κ <: ρ.∀α <: κ.α → Dynamic(κ)

The latter three of these types define dynamic as a polymorphic operation whose argument type variable is constrained by a kind describing the set of types for which dynamic is defined. The second type is sufficient if we are not concerned with safe dynamics (only full dynamics). The fourth rule gives the most precise typing if we are interested in using safe dynamics. For example, suppose dynamic is only defined for integers and reals. Then the following type-checks: (dynamic [int] [int] 3) : Dynamic(int∪ string) This following point is worth emphasizing: Our semantics for user-definable marshalling can be adapted to work with any of the above possible typings for dynamic, and with or without safe dynamics. Although we use safe dynamics and the fourth typing for dynamics, our semantics for user-definable marshalling can be adapted fairly easily to the following types for the dynamic operations: dynamicτ : τ → dynamic typecaseτ : dynamic → τ Refinement kinds and safe dynamics ensure static checking of the uses of dynamic type dispatch. Using the latter form of dynamic operations amounts to foresaking this static checking in the programming language. However such static checking might still be used internally, in an analogous manner to the use of refinement types and soft types [16, 32].

282

Dominic Duggan

The construct for attaching user-defined marshalling and unmarshalling operations to the operations for building pickles is given by: e ::= . . . | defdynamic t1 ⇒ t2 with ( f .e1 , g.e2 ) in e The type rule for this construct is given by the DD EF rule in Fig. 1. This construct has all of the elements of user-defined marshalling operations that were discussed earlier in this section. t2 denotes the external representation for the type t1 . A use of this construct must specify a clause in the definition of the external type representation function, pickleType. The following clause is added to the final definition of this function: pickleType(t1 (α1 , . . . , αn )) = t2 (pickleType(α1 ), . . . , pickleType(αn )) The static semantics of XMLdyn uses type judgements of the form Π; Γ; A e : σ. The environment Π carries information about clauses in the type pickle function that → have been contributed by uses of the defdynamic. Π contains pairs of the form t1 t2 , representing clauses in the type pickle function. Π is used to define the domain of the dynamic operation, used in the DI NTRO rule: → ti2 | i = 1, . . . , k} DOMAIN(Π) = µκ.t11 (κ) ∪ · · · ∪ t11 (κ) where Π = {ti1 e1 is the pickling operation for the t1 type constructor; e1 has type ∀αn <: κ.t1 (αn ) → t2 (pickleType(αn )), where κ is a local rigid “kind” variable in the scope of the definition of e1 and e2 . e2 is the corresponding extraction operation, of type ∀αn <: κ.t2 (pickleType(αn )) → t1 (αn ). In defining e1 , the programmer in general will need to transform values of type αi to type pickleType(αi ). This is provided by the local variable f introduced by the construct, bound to a function of type ∀α <: κ.α → pickleType(α). Note that within the definition of e1 , each αi is constrained from above by κ; this restricts the application of f to values of type α1 , . . . , αn . A similar explanation is given for the definition of e2 . f (in the definition of e1 ) and g (in the definition of e2 ) represent the final fixed points of the pickling and extraction functions that are built up using the defdynamic construct. In typing e1 and e2 , reference must be made to the final fixed point of the type representation function pickleType. This reference is represented by the type constructor variable β that is introduced locally by the defdynamic construct. At the use site for the dynamic operation, the clauses contributed by the defdynamic construct are joined to form the type representation function pickleType. The pickle and extraction operations are also formed by joining the instances contributed by defdynamic, to define the operations of type: ∀α <: ρ.α → pickleType(α) ∀α <: ρ.pickleType(α) → α where ρ is the domain of dynamic. These operations are used to build a pickle, as described in the next section.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

283

4 Semantics for User-Definable Marshalling In this section we consider the operational semantics for user-definable marshalling. dyn The language XMLΠ introduced in the previous section was an extension of XMLdyn . In this section we introduce another extension of XMLdyn , and then map the extensions dyn dyn of XMLΠ into this latter language. We name the new language XMLT . This language adds one new construct to XMLdyn , a type-level Typerec construct: τ ::= . . . | Typerecρ→χ t1 (α1 ) ⇒ τ1 | · · · | tk (αk ) ⇒ τk This is essentially the Typerec introduced by Harper and Morrisett [20]. We use a slightly simpler form of it, supporting iteration rather than recursion (this is sufficient dyn for our purposes). We keep XMLdyn and XMLT separate because type inference is possible with an implicitly-typed version of XMLdyn , whereas type inference appears dyn difficult or impossible with the Typerec. XMLT is only intended to be used as an dyn “internal” language, into which source programs (in XMLΠ ) are translated. The type rule for the Typerec is given by: ρ = µκ.t1 (ρ1 ) ∪ · · · ∪ tk (ρk ) Γ, αi <: {ρ/κ}ρi τi <: χ for i = 1, . . . , k Γ (Typerecρ→χ t1 (α1 ) ⇒ τ1 | · · · | tk (αk ) ⇒ τk ) <: (ρ → χ)

(TYR EC )

The type-level computation rule for the Typerec is given by: τ (ti (τ1 , . . . , τni )) −→ {τ (τ1 )/αi1 , . . . , τ (τni )/αini }τi where:

τ = (Typerecρ→χ t1 (α11 ) ⇒ τ1 | · · · | tk (αknk ) ⇒ τk )

We can define the notions of canonical forms for terms and types (v and υ, respectively), and evaluation contexts for terms and types (E[ ] and T [ ], respectively), in a fairly standard manner. A term e is failed if e is not a (term) value, and e ≡ E[e ] where e ≡ narrowρ,ρ (pack(υ, v)) for some v and υ, and e is not a redex. A term e is faulty if e is not a value, e is not failed, and e ≡ E[e ] or e ≡ E[τ] where e , τ are not redices. A type τ is faulty if τ is not a (type) value, and τ ≡ T [τ ] or τ ≡ T [e] where e, τ are not redices. A term or type is closed if it has no free variables. e ⇑ denotes that the evaluation of e loops infinitely i.e. there is an infinite sequence e −→ · · · −→ ei −→ ei+1 −→ · · ·. Theorem 1 (Semantic Soundness). 1. If Γ; A e : τ, then e ⇑, or e −→ e where e is failed or e is some value v. 2. If Γ τ <: χ then τ ⇑, or τ −→ υ for some value υ. dyn

dyn

To define the translation from XMLΠ to XMLT , we start with the following dyn translation of types in XMLΠ : [[α]] = α

284

Dominic Duggan [[Γ]]; [[A]] ∪ ENV(Π) e : (∀κ <: ρ.∀α <: κ.α → [[Dynamic(κ)]]

(DI NTRO )

where: τpkl = PICKLE(Π) and ρ = DOMAIN(Π) σpkl = ∀α <: ρ.α → τpkl (α) and σext = ∀α <: ρ.τpkl (α) → α epkl = typerec f : σpkl of t11 (α) ⇒ ( ft1 [ρ] τpkl f [α]) | . . . | tk1 (α) ⇒ ( ftk [ρ] τpkl f [α]) 1 1 eext = typerec f : σext of t11 (α) ⇒ (gt1 [ρ] τpkl g [α]) | . . . | tk1 (α) ⇒ (gtk [ρ] τpkl g [α]) 1 1 e = Λκ.Λαwit .λx.pack(αwit , pack(τpkl (αwit ), (epkl [αwit ] (x), eext [αwit ]))) [[Γ]]; [[A]] ∪ ENV(Π) e1 : ∀α <: ρ.[[σ ]] [[Γ]]; [[A]] ∪ ENV(Π) e2 : [[Dynamic(ρ)]] [[Γ]]; [[A]] ∪ ENV(Π) e : [[τ]]

(DE LIM )

where: e = (open e2 as pack(αpkl , pack(β, (x, extract))) in e1 [α] (extract x))

[[Γ]], κ <: , αn <: κ, β <: → }; [[A, ( f : ∀α <: κ.α → β(α))]] ∪ ENV(Π)

e1 : t1 (αn ) → t2 (β(α)) [[Γ]], κ <: , αn <: κ, β <: → }; [[A, (g : ∀α <: κ.β(α) → α)]] ∪ ENV(Π)

e2 : t2 (β(α)) → t1 (αn ) → t2 }) e : [[τ]] [[Γ]]; [[A]] ∪ ENV(Π ∪ {t1 [[Γ]]; [[A]] e : [[τ]] (DD EF ) where: e = let ft1 = Λκ.Λβ.λ f .Λαn .e1 in let gt1 = Λκ.Λβ.λg.Λαn .e2 in e dyn

Fig. 2. Translation of Dynamics in XML<:

[[t(τ1 , . . . , τn )]] = t([[τ1 ]], . . . , [[τn ]]) [[σ1 → σ2 ]] = [[σ1 ]] → [[σ2 ]] [[∀γ <: ψ.σ]] = ∀γ <: [[ψ]].[[σ]] [[Dynamic(ρ)]] = ∃α <: ρ.∃β <: .β ∗ (β → α) The last case in this definition is the real point of this translation, i.e. dynamic types are translated as existential types. The data algebra for a dynamic now has two witness types: α is the external type of the value that has been bundled in a dynamic, with the external kind constraint ρ which reveals some of the structure of the type. β is the internal pickle type, encapsulated by the existential type quantifier, and with no structure revealed by the kind witness constraint. The dynamic contains two values: the pickled copy of the value that has been bundled in the dynamic, and an extraction operation for converting from the pickle value back to the original value. This extraction operation is bundled in the dynamic when it is created, and is invoked when the dynamic

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

285

is unbundled. As such we refer to these as self-extracting dynamics. The translation of kinds is simply the identity: [[ρ]] = ρ. We also have: [[Γ]] = {γ <: [[ψ]] | (γ <: ψ) ∈ Γ} [[A]] = {(x : [[σ]]) | (x : σ) ∈ A} dyn

dyn

The interpretation of XMLΠ is XMLT is defined by induction on type environdyn ments in the former. A type derivation for the judgement Π; Γ; A e : σ in XMLΠ is dyn used to construct a program e in XMLT , with correctness given by the following: Theorem 2. If Π; Γ; A e : σ in XMLΠ , with e the program in XMLT constructed based on this type derivation, then [[Γ]]; [[A]] ∪ ENV(Π) e : [[σ]]. dyn

dyn

ENV(Π) denotes the types of the pickling and extraction function fragments defined by uses of defdynamic. This metafunction is defined by: PTYPE(t1 , t2 ) = ∀κ <: .∀β <: → .(∀α <: κ.α → β(α)) → (∀αn <: κ.t1 (αn ) → t2 (β(αn ))) ETYPE(t1 , t2 ) = ∀κ <: .∀β <: → .(∀α <: κ.β(α) → α) → (∀αn <: κ.t2 (β(αn )) → t1 (αn )) ENV(Π) = { ft1 : PTYPE(t1 , t2 ), gt1 : ETYPE(t1 , t2 ) | (t1 → t2 ) ∈ Π} Each use of defdynamic defines two functions, ft1 and gt1 , that represent clauses in the definition of the pickling and extraction functions. We maintain these functions in the environment, extracting them from the environment when they are needed. PTYPE(t1 , t2 ) denotes the type of a clause for the pickling function, for the case when values of type t1 (τ) are pickled to type pickleType(t1 (τ)) = t2 (pickleType(τ)). The function ft1 abstracts over the domain kind κ of the final pickling function, the final definition β of the pickle type function, and the fixed point of the pickling function itself. A similar explanation can be given for ETYPE(t1 , t2 ). The metafunction ENV(Π) generates a type environment for those instances of these functions that have been defined. Fig. 2 gives the cases for the translation of the dynamic constructs. The case for the defdynamic, DD EF, is fairly uninteresting. This simply builds the functions ft1 and gt2 . The main work is done by uses of dynamic, given by the DI NTRO rule. At the use site for dynamic, the final definition of the pickle type function is constructed, defined by: PICKLE(Π) = Typerec t11 (α) ⇒ t12 (α) | · · · | tk1 (α) ⇒ tk2 (α) → ti2 | i = 1, . . . , k} where Π = {ti1 The pickling function is constructed by using the typerec construct to assemble the various clauses defined using defdynamic. Each such clause, say of type PTYPE(t1 , t2 ), is applied to ρ = DOMAIN(Π) and τpkl = PICKLE(Π), giving a function of type ( ft1 [ρ] τpkl ) ∈ (∀α <: ρ.α → τpkl (α)) → (∀αn <: ρn .t1 (αn ) → t2 (τpkl (αn )))

286

Dominic Duggan

Let f be the fixed point of the function defined by the typerec, then we have: ( ft1 [ρ] τpkl f ) ∈ (∀αn <: ρn .t1 (αn ) → t2 (τpkl (αn ))) This clause of the typerec is defined to be: (t1 (αn ) ⇒ ft1 [ρ] τpkl f [αn ]) ∈ (t1 (αn ) → t2 (τpkl (αn ))) By the definition of τpkl , this latter type is equal to (t1 (αn ) → τpkl (t1 (αn ))). Therefore by the T Y R EC rule, the function epkl has type ∀α <: ρ.α → τpkl (α). The translation of dynamic is a function that takes a type αwit and a value x of type αwit . The pickling of this value consists of creating the pickled value (epkl [αwit ] x), of type τpkl (αwit ) (the external representation type). The extraction function eext , of type ∀α <: ρ.τpkl (α) → α is constructed in a manner similar to epkl . The expression (eext [αwit ]) gives the extraction function specialized to the type of x. The resulting pair of type τpkl (αwit ) ∗ (τpkl (αwit ) → αwit ) denoting the pickled value, and an operation for extracting the original value from this pickle value, is then encapsulated in the data algebra for the dynamic that is constructed, with existential type [[Dynamic(ρ)]] = ∃α <: ρ.∃β <: .β ∗ (β → α). We call this value a self-extracting dynamic. The translation of the typecase is reasonably obvious: the data algebra for the dynamic is opened, the bundled extraction function is applied to the pickle value, and the typecase function is then applied to the resulting extracted value.

5 An Alternative Semantics for User-Definable Marshalling

Γ; A dynamic : A(dynamic)

(DI NTRO )

A(dynamic) = ∀κ <: (µκ.ρ).∀α <: κ.α → Dynamic(κ)

t1 ∈ / tc(ρ)

Γ, κ <: , αn <: κ, β <: ( → )}; A, ( f : ∀α <: κ.α → β(α)) e1 : t1 (αn ) → t2 (β(αn )) Γ, κ <: , αn <: κ, β <: ( → ); A, (g : ∀α <: κ.β(α) → α) e2 : t2 (β(αn )) → t1 (αn ) Γ; A, (dynamic : (∀κ <: (µκ.ρ ∪ t1 (κ)).∀α <: κ.α → Dynamic(κ))) e : τ Γ; A (defdynamic t1 ⇒ t2 with ( f .e1 , g.e2 ) in e) : τ (DD EF )

dyn

Fig. 3. Type Rules for Dynamics in XMLΠ−

The approach to user-defined marshalling provided in the previous two sections was dyn facilitated by the Π environment in the static semantics for XMLΠ . The disadvantage of this environment in the semantics is that the clauses of the external representation

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

287

type (the pickleType function) are exposed in the environment. In this section we consider the repercussions of abstracting over this external representation type function. In this section we demonstrate how this may be done. With this alternative approach, the semantics no longer requires the Π environment recording external representations. Instead the type of dynamic is recorded as a type of the form ∀κ <: ρ.∀α <: κ.α → Dynamic(κ) in the type environment, with the domain ρ providing the only information about the abstract type representation function. This approach admittedly brings with it some complexity in the internal language. In particular our internal language requires both coproducts and general recursion at the type level in order to type our semantics. To ensure equational consistency, we require that type functions are strict [6]. dyn Fig. 3 gives the type rules of XMLΠ that are modified with this alternative apdyn proach. We name this variation XMLΠ− . tc denotes the outermost type constructors of a kind: tc(t1 (ρ1 ) ∪ · · · ∪ tm (ρm )) = {t1 , . . . , tm }. We concentrate in the sequel on the translation semantics for this language. The basis for our semantics is a new language, XML∆ . XML∆ is formed by taking XMLdyn as defined in Sect. 2, omitting the typerec construct, and adding the following constructs: e ::= . . . | abortτ | t(αn : ρn ) =⇒ e | e1 ⊕ e2 | cl(e) | tyrec(∀α<:ρ.σ) e σ ::= . . . | ∆α <: ρ.τ The type rules for these constructs are provided in Sect. 5. The construct t(α1 : ρ1 , . . . , αn : ρn ) =⇒ e is used to define the individual clauses in the definition of a typecase. The construct e1 ⊕ e2 is used to combine these clauses. The resulting combination is a polymorphic function with non- domain kind, that uses run-time type discrimination with respect to its single type argument. tyrec(∀α<:ρ.σ) e denotes the fixed point of a recursively defined polymorphic function. This fixed point operator is necessary because the typecase defines a function that computes by recursing over its type argument. The special type ∆α <: ρ.τ is used to type the composition of a collection of clauses that make up a typecase. The operation cl(e) closes up a typecase definition to an ordinary polymorphic function. The fixed point operator for kinds µκ.ρ is used to define recursive domain kinds, while the fixed point operator for polymorphic functions tyrecσ e is used to define recursive dynamic type dispatch. To see how the typerec can be translated into this language, consider that the typerec construct in XMLdyn has the form: typerec f : σ of t1 (α1 ) ⇒ e1 | . . . | tk (αk ) ⇒ ek

where σ = ∀α <: ρ.σ and ρ = µκ. tk (ρk ). Assuming ei is the translation of ei from dyn XMLΠ− into XML∆ , then the translation of the clauses of the typerec is given by: cl((t1 (α1 : {ρ/κ}ρ1) =⇒ e1 ) ⊕ · · · ⊕ (tk (αk : {ρ/κ}ρk) =⇒ ek )) ∈ (∀α <: (t1 ({ρ/κ}ρ1) ∪ · · · ∪ tk ({ρ/κ}ρk )).[[σ ]]) = (∀α <: ρ.[[σ ]]) Abstracting over the fixed point, and then using the fixed point operator for polymorphic functions, gives the translation of the typerec: tyrec(λ f . cl((t1 (α1 : {ρ/κ}ρ1) =⇒ e1 ) ⊕ · · · ⊕ (tk (αk : {ρ/κ}ρk ) =⇒ en )))

288

Dominic Duggan

XMLdyn incorporates a monolithic typerec construct that is the basis for defining dynamic type dispatch. All clauses in a typerec are defined at once. In XML∆ , by contrast, the clauses in a function using dynamic type dispatch are defined as independent program fragments, of the form t(α : ρ) =⇒ e. The ⊕ operation combines these clauses, and the cl(e) operation forms this collection of clauses into a polymorphic function. The reason for taking this approach is that the clauses of the pickling and extraction operations in the implementation of the dynamic operation are contributed by independent uses of the defdynamic construct. There needs to be some way of combining these clauses at the use sites for the dynamic operation. The approach pursued in the previous section was to carry the individual clauses as polymorphic functions in the environment, and then use the typerec construct to combine them at the use site. dyn The problem with this approach, with the static semantics of XMLΠ− , is that it does not help us with the combination of the clauses of the pickle type function, that are also contributed by uses of the defdynamic construct. The approach we adopt is to extend the approach for dynamic type dispatch in the term language of XML∆ , to the type level. In other words, we now allow type functions at the type level, which discriminate based on their type argument, giving a form of typecase for types. We also add a recursion operator at the type level to define the fixed points of such recursive type functions. We extend the syntax of types and kinds in XML∆ with: e ::= . . . | (e1 , e2 ) | π1 (e) | π2 (e) τ ::= . . . | λα : χ.τ | (τ1 τ2 ) | (t(α : ρ) =⇒ τ) | τ1 ⊕ τ2 | abort | cl(τ) | tyrecχ1 →χ2 (τ) σ ::= . . . | σ ∗ σ | ∃α <: ρ.σ χ ::= . . . | ∆α <: ρ.χ Figure 6 in App. A provides the kind rules for functions, recursion and dynamic type dispatch at the type level. These essentially repeat the corresponding type rules for programs. The T YA BS and T Y C ASE rules include the proviso that the variables introduced by λ-abstraction and type-casing occur in the body of the type operator (so type operators are a variation of the λI-calculus). We provide the kind rules as congruence rules for the equality relation on type operators, omitting the obvious reflexivity, symmetry and transitivity rules. Figure 6 in App. A provides the conversion rules for type operators. The T Y C ASE C OP rule characterizes union kinds as coproducts, and allows reasoning-by-cases when verifying the equality of type operators defined over types of union kind. The T Y F IX B ETA rule allows folding and unfolding of fixed points at the type level. These rules are necessary in Theorem 3 when verifying that the translation dyn of programs in XMLΠ− preserves well-typedness in XML∆ . The λI-calculus restriction in the T YA BS and T Y C ASE rules is necessary in order to preserve the equational consistency of the type system [6]. Proposition 1. The equality theory for types in XML∆ is consistent. P ROOF S KETCH : We give an interpretation for types using Scott domains, where union kinds are interpreted as separated sums and type operators are interpreted as strict continuous maps [6].

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

289

The computation rules for terms include the computation rules: tyrec(∀α<:ρ.σ) e −→ e (Λα <: ρ.(tyrec(∀α<:ρ.σ) e) [α]) cl(. . . ⊕ (t(α1 : ρ1 , . . . , αn : ρn ) =⇒ e1 ) ⊕ . . .) [t(τ1 , . . . , τn )] −→ {τ1 /α1 , . . . , τn /αn }e The reduction rules for types are obtained by orienting the T Y F UN B ETA, T Y C ASE B ETA and T Y F IX B ETA rules in Fig. 6 from left to right as rewrite rules. We do not include the extensionality rules T Y F UN B ETA and T Y C ASE C OP in this rewrite system. In recent work on rewrite systems, these extensionality rules are oriented from right to left, as expansion rules rather than as reduction rules [9, 10]. Expansion rules, and their complications,do not appear appropriate in a run-time evaluator for a programming language. dyn The translation of types σ, kinds ρ and kind environments Γ of XMLΠ− into XML∆ dyn is similar to the translation of XMLΠ into XMLdyn . For type environments we have: [[A]] = {(x : [[σ]]) | (x : σ) ∈ A, x = dynamic} ∪{( fdyn : (∀κ.∃αpkl <: ρpkl .τin ∗ τout )) | (dynamic : σdyn ) ∈ A} where σdyn = ∀κ <: (µκ.ρ).∀α <: κ.α → Dynamic(κ) , and ρpkl = (κ → ) → ∆α <: ρ. τin = ∀β <: κ → .(∀α <: κ.α → β(α)) → (∆α <: ρ.α → tycase(αpkl β)(α)) τout = ∀β <: κ → .(∀α <: κ.β(α) → α) → (∆α <: ρ.tycase(αpkl β)(α) → α)) Essentially the dynamic operation consists of three parts: a type function αpkl mapping from external types to internal pickle types, a pickling function pickle mapping from values of an external type to the corresponding internal pickle type, and an extraction function extract mapping from values of an internal pickle type to values of the corresponding external type. pickle constructs a pickled value, while extract recovers the original value from a pickled value. Therefore the dynamic operation is represented in the environment as a data algebra inhabiting an existential type. This existential type is parameterized by κ, the fixed point of the kind constraining the domain of the pickling operation. The pickle type function is parameterized by its fixed point (of kind κ → ). The pickling functions are parameterized by the fixed point of the pickle type function, and the fixed point of the pickling operation; similarly for the extraction operation. These fixed points are left open to further extension by the defdynamic construct. Theorem 3. If Γ; A e : σ, then Γ; [[A]] e : [[σ]], where e is the translated program extracted using the algorithm in Fig. 4. Proof. By induction on the translation of the program e, which in turn is defined by dyn induction on type derivations in XMLΠ− . The cases for DI NTRO and DD EF are nontrivial, because the type pickle function is encapsulated in the existential type for the

290

Dominic Duggan A(dynamic) = ∀κ <: ρ.∀α <: κ.α → Dynamic(κ) τpkl = TYCLOS(αpkl ) epkl = CLOS(pickle [τpkl ]) [αwit ] eext = CLOS(extract [τpkl ]) [αwit ] edyn = pack(αwit , pack(τpkl (αwit ), (epkl (x), eext ))) [[Γ]]; [[A]] e : [[A(dynamic)]] (DI NTRO )

where: e ≡ (Λκ.Λαwit .λx.open fdyn [µκ.ρ] as pack(αpkl , (pickle, extract)) in edyn ) [[Γ]]; [[A]] e1 : ∀α <: ρ.[[σ ]] [[Γ]]; [[A]] e2 : [[Dynamic(ρ)]] [[Γ]]; [[A]] (open e2 as pack(αpkl , pack(β, (x, extract))) in e1 [α] (extract x)) : [[τ]] (DE LIM ) A(dynamic) = ∀κ <: (µκ.ρ).∀α <: κ.α → Dynamic(κ) [[Γ]], κ <: , αn <: κ, β <: → }; [[A, ( f : ∀α <: κ.α → β(α))]] e1 : t1 (αn ) → t2 (β(α)) [[Γ]], κ <: , αn <: κ, β <: → }; [[A, (g : ∀α <: κ.β(α) → α)]] e2 : t2 (β(α)) → t1 (αn ) epkl = Λβ.λ f .(pickle [β] f ) ⊕ (t1 (αn ) =⇒ e1 ) eext = Λβ.λg.(extract [β] g) ⊕ (t2 (αn ) =⇒ e2 ) τpkl = λβ.(t(α) =⇒ cl(αpkl β)(t(α))) ⊕ (t1 (αn ) =⇒ t2 (β(α))) edyn = pack(τpkl , (epkl , eext )) [[Γ]]; [[A, (dynamic : (∀κ <: (µκ.ρ ∪ t1 (κ)).∀α <: κ.α → Dynamic(κ)))]] e : [[τ]] [[Γ]]; [[A]] e : [[τ]] (DD EF ) where: e ≡ (let fdyn = (Λκ.open fdyn [κ] as pack(αpkl , (pickle, extract)) in edyn ) in e) dyn

Fig. 4. Translation of Dynamics in XMLΠ−

dynamic implementation. We therefore rely on certain equality rules, the T Y C ASE C OP and T Y F IX B ETA rules, in order to reason about the encapsulated type transformation. We consider the case for DI NTRO first of all. The translation for the dynamic operation constructs a polymorphic function from the data algebra for dynamic in the environment. This function abstracts over κ and αwit , the kind and type arguments in any application of dynamic. The following metafunctions are used to build the body of the polymorphic function: TYCLOS(τ) = tyrec (λα.cl(τ α)) CLOS(e) = tyrec (λ f .cl(e f )) The witness type αpkl in the data algebra for the implementation of dynamics has kind ρpkl = ((κ → ) → ∆α <: ρ.). Instantiating κ with the domain kind µκ.ρ, and closing the (type-level) typecase in this type function, produces a functional

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

291

λα.cl(αpkl α) of type ((µκ.ρ) → ) → ((µκ.ρ) → ) and the fixed point operator for types, applied to this, gives the transformation function τpkl that maps external types (in the domain of the dynamic operation) to internal pickle types. αwit is the type of the value being bundled (a use-site type argument to dynamic). Then τpkl (αwit ) denotes the type of the argument to dynamic once it is pickled. The data algebra also contains a pickling operation pickle of type τin . This polymorphic function is parameterized by the fixed point β of the witness type function αpkl . Instantiating κ with µκ.ρ, and instantiating β with τpkl , and closing up the type case for the body of the pickling function, produces a functional λ f .cl(pickle [τpkl ] f ) of type: (∀α <: (µκ.ρ).α → τpkl (α)) → (∀α <: (µκ.ρ).α → cl(αpkl τpkl )(α)) Using the T Y F IX B ETA rule we have the equivalence: (cl(αpkl (tyrec(λβ.cl(αpkl β))))) ←→ (tyrec(λβ.cl(αpkl β))) Then using the congruence rules in λ f .cl(pickle [τpkl ] f ) has type:

Fig. 6 in App. A,

we have that

(∀α <: (µκ.ρ).α → τpkl (α)) → (∀α <: (µκ.ρ).α → τpkl (α)) Taking the fixed point of this functional gives a function epkl with type: ∀α <: (µκ.ρ).α → τpkl (α) which is the expected type of the pickling operation at the use sites for the dynamic operation. The data algebra also contains an extraction operation extract of type τout . Applying the same specializations as with the pickling operation, we obtain a function eext of type: ∀α <: (µκ.ρ).τpkl (α) → α which is the expected type of the extraction operation at the use sites for the dynamic operation. We now consider the case for the DD EF rule. We need to verify that the extension of a dynamic implementation is well-typed. In the context of the definition of the extended type function, we have αpkl <: ((κ → ) → ∆α <: ρ.), β <: κ → Therefore we have: (t(α) =⇒ cl(αpkl β)(t(α))) <: (∆α <: ρ.) (t1 (αn ) =⇒ t2 (β(α))) <: (∆α <: t1 (κ).)

292

Dominic Duggan

Therefore using T Y J OIN and T YA BS we have: τpkl = λβ.((t(α) =⇒ cl(αpkl β)(t(α))) ⊕ (t1 (αn ) =⇒ t2 (β(α)))) τpkl <: ((κ → ) → ∆α <: ρ ∪ t(κ).) Now consider the well-typedness of the definition of the extended pickling function. The environment contains the constraints and types: αpkl <: ((κ → ) → ∆α <: ρ.), β <: κ → pickle : τin , f : (∀α <: κ.α → β(α) Then we have (pickle [β] f ) : (∆α <: ρ.α → cl(αpkl β)(α)) We have cl(τpkl β) <: (ρ → ) where ρ = T Y C ASE B ETA rules, we have:

t(κ). By the T Y F UN B ETA and

cl(τpkl β)(t(α)) ←→ cl(αpkl β)(t(α)) <: for all t ∈ {t}. By congruence we have: cl(t(α) =⇒ cl(τpkl β)(t(α)))(α) ←→ cl(t(α) =⇒ cl(αpkl β)(t(α)))(α) for α <: ρ. Therefore by T Y C ASE C OP we have: cl(τpkl β)(α) ←→ cl(αpkl β)(α) Thus we have (pickle [β] f ) : ∆α <: ρ.α → cl(τpkl β)(α) We also have the series of type judgements: e1 : t1 (αn ) → t2 (β(αn )) e1 : t1 (αn ) → cl(τpkl β)(t1 (αn )) (t1 (αn ) =⇒ e1 ) : ∆α <: t1 (κ).α → cl(τpkl β)(α) where the step from the first to the second judgement follows from the definition of τpkl , and using T Y C ASE B ETA. Then combining these with T M J OIN, we have: ((pickle [β] f ) ⊕ (t1 (αn ) =⇒ e1 )) : ∆α <: ρ ∪ t1 (κ).α → cl(τpkl β)(α) The new pickling function is then formed by abstracting over f and κ. The welltypedness of the extension of the extraction operation is similar.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

293

The major point of difference between XML∆ and the framework of Harper and Morrisett [20] is the formulation of the λ-calculus at the type level. The latter use a simple typed λ-calculus extended with bounded recursion over the free algebra generated by the type constructors. They are able to verify strong normalization and confluence for their calculus, and are then able to perform type-checking on the intermediate language of their compiler. Strong normalization also provides a definitional notion of equality for their calculus. By contrast, and as already noted, the abstraction of the pickle type function in our semantics means that we require a stronger equality theory for types in order to type our semantics: specifically, the T Y F IX B ETA rule for the DI NTRO rule, and the T Y C ASE C OP rule for the DD EF rule. Because our type calculus combines fixed points and coproducts, we add the λI-restriction (requiring that all type operators are strict) to ensure equational consistency. Without the fixed point operator, Dougherty [10] has verified strong normalization for a λ-calculus similar to our calculus of types, with the T Y C ASE C OP and T Y F UN B ETA rules oriented as expansion rules. Dougherty also verifies confluence for base types, and these properties still hold with bounded recursion. These results suggest that, if desired, we can at least expect to have a type-checking algorithm for our internal language that is practically useful, even if theoretically incomplete. The main problem is with the unrolling rule for fixed points, T Y F IX B ETA. However there are two places in the translation of programs into XMLdyn where we expect to use this unrolling rule: first, where a type transformation is applied to a type (but all type transformations are restricted by the defdynamic to be primitive recursive); and secondly in the typing of the translation of the dynamic construct (but in this case the recursion in the type is regular, and there are by now well-known algorithms for checking the equivalence of regular recursive types [3]).

6 Related Work Dynamics have received some attention in the literature [2, 24, 25, 1]. More recently Duggan [12] has considered an approach to computing with dynamics, based on dynamic type dispatch, that overcomes some problems with the traditional typecase construct in polymorphic languages. Duggan [12] introduces safe dynamics, which are superficially related to partial dynamics [7], although in fact quite different. The difference is that safe dynamics incorporate recursion at the kind level, not the type level (as with some extensions of partial dynamics). As an example of the difference, the following can be type-checked with recursion at the type level: fun Ap (f,x) = typecase (f, x) of ((α → β)(f),α(x)) ⇒ dynamic(f x) However type-checking is undecidable with safe dynamics extended with this form of multi-parameter typecase patterns [15]. It remains to be seen if a restricted version of these patterns could be added to safe dynamics. Herlihy and Liskov [21] propose an approach to adding user-definable marshalling code to a distributed programming language. In this case, the language is CLU, and

294

Dominic Duggan

the approach is based on defining overloaded marshall and unmarshall operations that are exported by a CLU cluster. For parameterized types, Herlihy and Liskov use Adastyle constrained genericity to parameterize the overloaded marshalling operations by marshalling operations for the element types. Herlihy and Liskov do not consider a formal semantics for their approach. They also do not consider type inference. Although we have omitted details from the current presentation for lack of space, our semantics can be used in an implicitly typed language with type inference [13]. There is a relationship between our type system and parametric overloading, explored in more detail in other papers [12, 14]. Essentially our approach is based on a “closed world assumption,” as opposed to the “open world assumption” underlying parametric overloading. This allows us to make direct use of dynamic type dispatch for our semantics, whereas the semantics of parametric overloading is based on call-site closure construction [14]. Knabe [22] has considered the problem of preventing the marshalling of nativecode-implemented functions, in the Facile distributed programming environment [31]. Knabe introduces xfun and xfn constructs, analogous to the fun and fn constructs in Standard ML, for building transmissible closures (referred to by him as “potentially transmissible functions”). Potentially transmissible functions require two representations, one as a transmissible representation, the other as machine code. When a transmissible function is received at a remote site, it is typed as an ordinary function, with the run-time system implicitly compiling it to native code as it is used. However Knabe does not formalize potential transmissibility in the type system. Instead the compiler attempts to marshall potentially transmissible functions at compile-time, with the marshaller raising an exception if there is an attempt to marshal a function without a transmissible representation [22, Page 62]. Our type system can be extended to distinguish transmissible closures from other forms of closures: dynamic can be given the domain t t kind µκ.(. . . ∪ ( → ) ∪ . . .) where values of type τ1 → τ2 are transmissible closures. Ohori and Kato [29] give a semantics for marshalling in polymorphic languages. In their approach functions are not transmitted, instead proxy functions are transmitted that invoke the function at the sender site when invoked on the receiver site. Ohori and Kato do not consider the issue of providing user-specified marshalling code in polymorphic languages.

7 Conclusions We have considered a semantics for dynamic typing for distributed programming in polymorphic languages, that allows the addition of user-defined marshalling operations. The semantics of this are expressed in an internal language with recursion and dynamic type dispatch at both the term and type levels. User-defined marshalling amounts to reifying dynamic type dispatch to the programmer level in an ML-like language. In practice there is an obvious inefficiency in having the marshaller make a copy of a data structure before transmitting it. This is somewhat orthogonal to the concern of the current paper. It should be possible to apply “deforestation” optimizations to build transformed data structures “on the wire.” This remains an important topic for further work.

References

[1] Martin Abadi, Luca Cardeli, Benjamin Pierce, and Didier Remy. Dynamic typing in polymorphic languages. In Peter Lee, editor, Proceedings of the ACM SIGPLAN Workshop on ML and its Applications, San Francisco, California, 1992. CarnegieMellon University Technical Report CMU-CS-93-105. [2] Martin Abadi, Luca Cardelli, Benjamin Pierce, and Gordon Plotkin. Dynamic typing in a statically typed language. ACM Transactions on Programming Languages and Systems, 13(2):237–268, 1991. [3] Roberto Amadio and Luca Cardelli. Subtyping recursive types. ACM Transactions on Programming Languages and Systems, 15(4):575–631, 1993. [4] J. Bacon and K. G. Hamilton. Distributed computing with RPC: The Cambridge approach. In Proceedings of IFIP Conference on Distributed Computing, Amsterdam, 1987. North-Holland. [5] A. Birrell, G. Nelson, S. Owicki, and E. Wobber. Network objects. Technical report, DEC Systems Research Center, Palo Alto, California, 1993. [6] Val Breazu-Tannen, Thierry Coquand, Carl Gunter, and Andre Scedrov. Inheritance as implicit coercion. Information and Computation, 93(1):172–221, 1991. [7] Peter Buneman and Atsushi Ohori. Polymorphism and type inference in database programming. ACM Transactions on Database Systems, 1996. To appear. [8] Luca Cardelli. Typeful programming. Technical report, DEC Systems Research Center, 1989. [9] Roberto Di Cosmo and Delia Kesner. A confluent reduction for the extensional typed λ-calculus with pairs, sums, recursion and terminal object. In Proceedings of the International Conference on Automata, Languages and Programming, volume 700 of Lecture Notes in Computer Science, pages 645–656. Springer-Verlag, 1993. [10] Daniel Dougherty. Some lambda calculi with categorical sums and products. In Rewriting Techniques and Applications, Lecture Notes in Computer Science. Springer-Verlag, 1993. [11] Catherine Dubois, Francois Rouaix, and Pierre Weis. Extensional polymorphism. In Proceedings of ACM Symposium on Principles of Programming Languages, San Francisco, California, 1995. ACM Press. [12] Dominic Duggan. Dynamic typing for distributed programming in polymorphic languages. To appear in Transactions on Programming Languages and Systems, 1998. [13] Dominic Duggan. Finite subtype inference with explicit polymorphism. Submitted for publication, 1998. [14] Dominic Duggan and John Ophel. Scoped parametric overloading. Submitted for publication, 1997. [15] Dominic Duggan and John Ophel. Type-checking multi-parameter type classes. Submitted for publication, 1997. [16] Tim Freeman and Frank Pfenning. Refinement types for ML. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 268–277. ACM Press, 1991.

296

Dominic Duggan

[17] James Gosling, Bill Joy, and Guy Steele. The Java Language Specification. The Java Series. Addison-Wesley, 1997. [18] Graham Hamilton, Michael L. Powell, and James J. Mitchell. Subcontract: A flexible base for distributed programming. In Symposium on Operating Systems Principles, pages 69–79. ACM Press, 1993. [19] Robert Harper and John C. Mitchell. On the type structure of Standard ML. ACM Transactions on Programming Languages and Systems, 15(2):211–252, 1993. [20] Robert Harper and Gregory Morrisett. Compiling polymorphism using intensional type analysis. In Proceedings of ACM Symposium on Principles of Programming Languages, San Francisco, California, 1995. ACM Press. [21] Maurice Herlihy and Barbara Liskov. A value transmission method for abstract data types. ACM Transactions on Programming Languages and Systems, 4(4): 527–551, 1982. [22] Frederick Knabe. Language Support for Mobile Agents. PhD thesis, Carnegie Mellon University, 1995. [23] Clifford Krumvieda. Distributed ML: Abstraction for Efficient and Fault-Tolerant Programming. PhD thesis, Cornell University, Ithaca, New York, 1993. [24] Xavier Leroy and Michel Mauny. Dynamics in ML. Journal of Functional Programming, 3(4):431–463, 1993. [25] Xavier Leroy and Pierre Weiss. Dynamics in ML. In Proceedings of ACM Symposium on Functional Programming and Computer Architecture, 1991. [26] Barbara Liskov. Distributed programming in Argus. Communications of the ACM, 31(3), 1988. [27] J. Gregory Morrisett. Compiling With Types. PhD thesis, Carnegie-Mellon University, 1995. [28] Greg Nelson. Systems Programming in Modula-3. Prentice-Hall Series in Innovative Technology. Prentice-Hall, 1991. [29] Atsushi Ohori and Kazuhiko Kato. Semantics for communication primitives in a polymorphic language. In Proceedings of ACM Symposium on Principles of Programming Languages, pages 99–112. ACM Press, 1993. [30] David Tarditi, Greg Morrisett, Perry Cheng, Christopher Stone, Robert Harper, and Peter Lee. TIL: A type-directed optimizing compiler for ML. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Philadelphia, Pennsylvania, 1996. ACM Press. [31] Bent Thomsen, Lone Leth, Sanjiva Prasad, Tsung-Min Kuo, Andre Kramer, Fritz Knabe, and Alessandro Giacalone. Facile Antigua release programming guide. Technical Report ECRC-93-20, European Computer-Industry Research Centre, Munich, Germany, 1993. [32] Andrew Wright and Robert Cartwright. A practical soft type system for Scheme. In Proceedings of ACM Symposium on Lisp and Functional Programming, pages 250–262, Orland, Florida, 1994. ACM Press.

A Type-Based Semantics for User-Defined Marshalling in Polymorphic Languages

A

297

Kind Rules for XML∆ Γ, αn <: ρn }; A e : {t(αn )/α}τ Γ; A (t(α1 : ρ1 , . . . , αn : ρn ) =⇒ e) : (∆α <: t(ρ1 , . . . , ρn ).τ)

(T M C ASE ) (T M A BORT )

Γ; A abort : (∆α <: ⊥.τ)

Γ; A e1 : (∆α <: ρ1 .τ) Γ; A e2 : (∆α <: ρ2 .τ) tc(ρ1 ) ∩ tc(ρ2 ) = {} Γ; A e1 ⊕ e2 : (∆α <: ρ1 ∪ ρ2 .τ) (T M J OIN ) Γ; A e : (∆α <: ρ.τ) Γ; A cl(e) : (∀α <: ρ.τ)

(T M C LOS )

Fig. 5. Type Rules for Dynamic Type Dispatch with XML∆ t has arity n → Γ t ←→ t <: ρ1 → · · · → ρn → t(ρ1 , . . . , ρn )

(T Y C ON )

Γ, α <: χ1 τ ←→ τ <: χ2 (α ∈ FV (τ) ∩ FV (τ )) Γ (λα.τ) ←→ (λα.τ ) <: χ1 → χ2

(T YA BS )

Γ, αn <: ρn ; τ τ ←→ χ <: {αn } ⊆ FV (τ) ∩ FV (τ ) Γ (t(α1 , . . . , αn ) =⇒ τ) ←→ (t(α1 , . . . , αn ) =⇒ τ ) <: (∆α <: t(ρ1 , . . . , ρn ).χ) (T Y C ASE ) Γ τ1 ←→ τ1 <: (∆α <: ρ1 .χ) Γ τ2 ←→ τ2 <: (∆α <: ρ2 .χ) tc(ρ1 ) ∩ tc(ρ2 ) = {} Γ τ1 ⊕ τ2 ←→ τ1 ⊕ τ2 <: (∆α <: ρ1 ∪ ρ2 .χ) (T Y J OIN ) Γ τ ←→ τ <: (∆α <: ρ.χ) Γ cl(τ) ←→ cl(τ ) <: ρ → χ

(T Y C LOS )

Γ, α <: χ τ <: χ Γ τ <: χ Γ ((λα.τ) τ ) ←→ {τ /α}τ <: χ

(T Y F UN B ETA )

Γ τ <: χ → χ Γ (λα.τ α) ←→ τ <: χ → χ Γ; Γ, αi <: ρi τi <: ρ for i = 1, . . . , n Γ; Γ (cl(tn (αn ) =⇒ τ) (t j Γ τ <: ρ → χ

(τ )))

Γ τ <: ρ

Γ; Γ τ <: ρ j

←→ {τ /α j }τ j <: ρ ρ=

t(ρ)

Γ (τ τ ) ←→ cl(t(α) =⇒ τ(t(α)))(τ ) <: χ

(T Y F UN E TA ) (T Y C ASE B ETA )

(T Y C ASE C OP )

Γ τ <: (χ1 → χ2 ) → (χ1 → χ2 ) Γ tyrec(τ) ←→ (τ (λα.tyrec(τ)(α))) <: (χ1 → χ2 )

Fig. 6. Kind and Conversion Rules for Type Operators in XML∆

(T Y F IX B ETA )

Types for Proofs and Programs: International Workshop, TYPES '98, Kloster Irsee, Germany, March 27-31, 1998, Selected Papers

Automated Deduction in Geometry: Second International Workshop, ADG'98, Beijing, China, August 1-3, 1998, Proceedings

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Proceedings of the International Congress of Mathematicians, August 21-29, 1990, Kyoto, Japan (International Congress of Mathematicians Proceedings)

Databases in Networked Information Systems: 4th International Workshop, DNIS 2005, Aizu-Wakamatsu, Japan, March 28-30, 2005, Proceedings

Proceedings Of The Sixth International Workshop: Proceedings Of The Sixth International Workshop

Types in Compilation: Second International Workshop, TIC'98, Kyoto, Japan, March 25-27, 1998 Proceedings

Types for Proofs and Programs: International Workshop, TYPES '98, Kloster Irsee, Germany, March 27-31, 1998, Selected Papers

Automated Deduction in Geometry: Second International Workshop, ADG'98, Beijing, China, August 1-3, 1998, Proceedings

Information Hiding. Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Information Hiding. Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Information Hiding: Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Algorithms in Bioinformatics: Second International Workshop

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Proceedings of the International Congress of Mathematicians, August 21-29, 1990, Kyoto, Japan (International Congress of Mathematicians Proceedings)

Databases in Networked Information Systems: 4th International Workshop, DNIS 2005, Aizu-Wakamatsu, Japan, March 28-30, 2005, Proceedings

Proceedings Of The Sixth International Workshop: Proceedings Of The Sixth International Workshop

Vlsi Design Methods: International Workshop Proceedings

Rewriting Techniques and Applications: 9th International Conference, RTA-98, Tsukuba, Japan, March 30 - April 1, 1998, Proceedings

Discovery Science: Third International Conference, DS 2000 Kyoto, Japan, December 4-6, 2000 Proceedings

Physics at the Japan Hadron Facility: Proceedings of the Workshop Adelaide, Australia, 14-21 March 2002

Zero-Carbon Energy Kyoto 2009: Proceedings of the First International Symposium of Kyoto University GCOE of Energy Science, Kyoto, Japan, August 2009

Kyoto

Databases in Networked Information Systems: Second International Workshop, DNIS 2002, Aizu, Japan, December 16-18, 2002, Proceedings

Databases in Networked Information Systems: International Workshop DNIS 2000 Aizu, Japan, December 4-6, 2000 Proceedings

Approaches to Intelligent Agents: Second Pacific Rim International Workshop on Multi-Agents, PRIMA'99, Kyoto, Japan, December 2-3, 1999 Proceedings

Noncommutative geometry and physics : proceedings of the COE International Workshop, Yokohama, Japan, 26-28 February, 1-3 March, 2004

Understanding the Insider Threat: Proceedings of a March 2004 Workshop

Advanced Multimedia Content Processing: First International Conference, AMCP'98, Osaka, Japan, November 9-11, 1998, Proceedings

Discovery Science: First International Conference, DS'98, Fukuoka, Japan, December 14-16, 1998, Proceedings

Zero-Carbon Energy Kyoto 2010: Proceedings of the Second International Symposium of Global COE Program

Intelligent Agents for Telecommunication Applications: Second International Workshop, IATA'98, Paris, France, July 4-7, 1998, Proceedings

Intelligent Agents for Telecommunication Applications: Second International Workshop, IATA'98, Paris, France, July 4-7, 1998, Proceedings

Pilgrimages and Spiritual Quests in Japan (Japan Anthropology Workshop Series)

Collective Robotics: First International Workshop, CRW'98, Paris, France, July 4-5, 1998, Proceedings

Japan Workshop, Cambridge, MA, USA, October 14-17, 1992. Proceedings

Knots in Hellas '98: Proceedings Delphi, 1998

Types in Compilation: Second International Workshop, TIC'98, Kyoto, Japan, March 25-27, 1998 Proceedings

Types for Proofs and Programs: International Workshop, TYPES '98, Kloster Irsee, Germany, March 27-31, 1998, Selected Papers

Automated Deduction in Geometry: Second International Workshop, ADG'98, Beijing, China, August 1-3, 1998, Proceedings

Information Hiding. Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Information Hiding. Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Information Hiding: Second International Workshop, IH'98, Portland, Oregon, USA, April 14-17, 1998, Proceedings

Algorithms in Bioinformatics: Second International Workshop

Pentaquark 04: Proceedings of International Workshop, Spring-8, Japan, 20-23 July 2004 (Proceedings of the International Workshop)

Proceedings of the International Congress of Mathematicians, August 21-29, 1990, Kyoto, Japan (International Congress of Mathematicians Proceedings)

Databases in Networked Information Systems: 4th International Workshop, DNIS 2005, Aizu-Wakamatsu, Japan, March 28-30, 2005, Proceedings

Proceedings Of The Sixth International Workshop: Proceedings Of The Sixth International Workshop

Vlsi Design Methods: International Workshop Proceedings

Rewriting Techniques and Applications: 9th International Conference, RTA-98, Tsukuba, Japan, March 30 - April 1, 1998, Proceedings

Discovery Science: Third International Conference, DS 2000 Kyoto, Japan, December 4-6, 2000 Proceedings

Physics at the Japan Hadron Facility: Proceedings of the Workshop Adelaide, Australia, 14-21 March 2002

Zero-Carbon Energy Kyoto 2009: Proceedings of the First International Symposium of Kyoto University GCOE of Energy Science, Kyoto, Japan, August 2009

Kyoto

Databases in Networked Information Systems: Second International Workshop, DNIS 2002, Aizu, Japan, December 16-18, 2002, Proceedings

Databases in Networked Information Systems: International Workshop DNIS 2000 Aizu, Japan, December 4-6, 2000 Proceedings

Approaches to Intelligent Agents: Second Pacific Rim International Workshop on Multi-Agents, PRIMA'99, Kyoto, Japan, December 2-3, 1999 Proceedings

Noncommutative geometry and physics : proceedings of the COE International Workshop, Yokohama, Japan, 26-28 February, 1-3 March, 2004

Understanding the Insider Threat: Proceedings of a March 2004 Workshop

Advanced Multimedia Content Processing: First International Conference, AMCP'98, Osaka, Japan, November 9-11, 1998, Proceedings

Discovery Science: First International Conference, DS'98, Fukuoka, Japan, December 14-16, 1998, Proceedings

Zero-Carbon Energy Kyoto 2010: Proceedings of the Second International Symposium of Global COE Program

Intelligent Agents for Telecommunication Applications: Second International Workshop, IATA'98, Paris, France, July 4-7, 1998, Proceedings

Intelligent Agents for Telecommunication Applications: Second International Workshop, IATA'98, Paris, France, July 4-7, 1998, Proceedings

Pilgrimages and Spiritual Quests in Japan (Japan Anthropology Workshop Series)

Collective Robotics: First International Workshop, CRW'98, Paris, France, July 4-5, 1998, Proceedings

Japan Workshop, Cambridge, MA, USA, October 14-17, 1992. Proceedings

Knots in Hellas '98: Proceedings Delphi, 1998

Recommend Documents